Scope of Business Process
The following is an example of an exposure to Business Processes as our understanding of common needs. For example, this Business Process is divided into 3 phases and 2 additional processes, with the following descriptions:
Take the data source for processing into staging. Sources of data are taken from various sources and have various formats, in general this system is designed to be able to flexibly adapt to needs, including:
- Text file with a specific template format, each field is separated by a special character (CSV or TXT).
- The specific SQL dump file from the database used (PostgreSQL, Oracle, MySQL, MSSQL).
- Data pulled from RDBMS (PostgreSQL, Oracle, MySQL, MSSQL) in the form of queued data lines.
When collecting data towards Staging, it is necessary to process:
- Standardization, dan
Processing Staging Data into Data Warehouse. After the Data Source enters the Staging Database, then before entering the Data Warehouse, the following processes are carried out:
- Matching Aggregation,
- Cubes Aggregation
The result of this process is data that is ripe and ready to be entered into the Data Mart.
Processing the Data Warehouse into a Data Mart. This data mart is a database that will be accessed directly by other applications for example. The ongoing process is:
- Export to Data Mart XML,
- Export to Data Mart DB.
The result of the above process in the form of an XML file for the Data Mart and into a specific data structure in the Data Mart Database.3
Equnix Data Integration Management (EDIM) Architecture
Equnix Data Integration Management (EDIM)
Equnix Data Integration Management (EDIM) is an engine in the form of a fairly streamlined application developed on a multi-process principle. For client business needs, further EDIM customization and development can be done, to suit the algorithm rules that will be defined later. EDIM is composed of several main modules, namely EDIM Agent, EDIM Management, EDIM Scheduler, EDIM Monitoring, and EDIM Reporting. In layers, EDIM is divided into 2 parts, namely Backend and Frontend. EDIM also runs on a Web Server, so it can be easily managed and monitored via a Web Browser. Because it is accessed via a Web browser, it can also be accessed via a Smartphone so that it is more flexible. For Backend process, the EDIM module will work as a multi-process background process.
As a whole system, EDIM was built using the C programming language. The C programming language has proven to be an effective and optimal programming language, especially for systems that handle very high and complex loads.
For the frontend, EDIM was built using the HTML5, CSS3, and JQuery programming languages by utilizing AJAX and layout design using Material Design to improve User Experience.
In processing very large amounts of data, in this case tens of millions of records, a multithreaded mechanism is implemented in each EDIM Agent process. This is so that the EDIM Agent can run in parallel, when taking data from any data source and writing data to the destination database (Database Staging). The number of EDIM Agent processes can be configured via the Management page and the EDIM Agent will run in the background on the server.
Broadly speaking, the EDIM Agent being developed has 3 main parts as described in the architecture above:
Record Queue is an internal queue in the EDIM Agent which functions to accommodate a collection of records produced by Thread Worker Input and distribute to Thread Worker Output to be transformed and entered into PostgreSQL.
It is a thread that executes Extraction and Validation to retrieve data from Data Sources and enter the set of records into a queue on the EDIM Agent. This EDIM Agent works in accordance with the system configuration, both in terms of the type of Data Source, what file format is being read, the location of the file, the Connection String, to the Driver library used to connect to the Data Source.
This is a thread that consumes all the records that have been entered by the Worker Input Thread via the Record Queue.
This worker also has a function to transform data or process data that has not or has entered into the Staging Database to be adjusted to the desired data form.
This worker has access to all the tables on the PostgreSQL server that are needed to transform data. The Transformation configuration is also set through the ETL Management page.
EDIM Management is the main module that functions to manage and manage the performance of all ETL Processes running on the backend side. This Management page is divided into several sub-menus:
System configuration including email sending configuration for notifications, log file location, HA configuration for EDIM, listening configuration for IP Address and Port, Backup and Archive configuration.
User Management Configuration
User Management configuration manages the Access List of each User in accessing EDIM. There are several roles such as Administrator, Monitor, Configurator, Executor. The administrator has the right to set the overall system configuration, the Monitor has the right to monitor and supervise EDIM performance, the Configurator has the right to make arrangements on the EDIM business process side, and the Executor has the right to execute business processes manually. For each User, can have multiple roles.
Configure EDIM Agent
EDIM Agent configuration includes the number of running ETL Processes, type of EDIM Agent, location of Driver library for Data Sources, Connection Strings for Data Sources, Queries executed for either Data Sources with Database type, along with Queries to enter into the Data Warehouse or to Data Mart. In addition, the EDIM Agent schedule can also be configured to run to process schedules, or be executed manually by the User.
Upload Data Source
This menu allows the User to upload the Data Source manually via a Web Browser, then instructs the EDIM Agent to execute processing on the file.
After the EDIM Agent is configured on a scheduled basis, the EDIM Scheduler module will read the configuration of all EDIM Agents, and manage the execution of each EDIM Agent. This scheduler acts as a Watchdog and ensures the EDIM Agent runs exactly as scheduled and reports any EDIM Agent that fails to operate.
ETL Engine supports configurable email delivery, so that if there is a failure or success to run an activity, it can be notified to the destination email. In addition, there is also a special page to monitor all EDIM Agent processes that are running. 1 EDIM Agent is represented as 1 widget in the monitoring page, and is shown the status of the widget background. Yellow color indicates EDIM Agent is in progress, Green color indicates EDIM Agent has finished processing, Red color indicates EDIM Agent failed to carry out its task and if clicked, it can be shown the cause of process failure. Each widget can also see the progress as a percentage. The EDIM Agent process can be started manually, stopped, or paused temporarily by pressing the control button on the widget. At the end of the process, this widget will show the total amount of data processed, data that failed to process, and data that was successfully processed.
All activities carried out by the EDIM will be logged for audit trail purposes, both Management activities and EDIM Agent process execution. To display all activity logs of these activities, it can be accessed through EDIM Reporting. All activities, including the Data Cleansing process, Data Matching, Data Loading, Data Transformation, and Data Export to the Data Mart, can be viewed in the form of a report on this page. The results of the report can also be exported to XLS if needed.
Batch Insert 100 Millions Records
|Test Type||Number of Rows||Validation||Process||Duration (Minutes)|
|COPY FROM query (Distributed Randomly)||100,000,000||No||Single||10|
|Using XDRIVE (Distributed Randomly)||100,000,000||No||Single||15|
|COPY FROM query (Distributed By)||100,000,000||No||16 Parallel||9.30|
|Using XDRIVE (Distributed By)||100,000,000||No||16 Parallel||7.30|
|Using EDIM (Distributed Randomly)||100,000,000||No||16 Parallel||7.30|
|Using EDIM (Distributed By)||100,000,000||YES||16 Parallel||7|