PostgreSQL & MySQL Data ETL
PostgreSQL & MySQL Data ETL

PostgreSQL & MySQL Data ETL

A genuine big data project - data extraction, transformation, loading ETL agent design and development to process daily data feed-in from thousands of vehicles generating around 40,000 records per sec, which is almost 25GB raw data per day...

A publicly listed transport corporation requested NEC Hong Kong Limited for a business intelligence (BI) solution in re-designing the data extracting-transforming-loading (ETL) agent, in order to draw insightful transportation information (vehicle tracking and fleet logging data) from their thousands of vehicles for their daily business analysis and descisions. mindVan was requested to take up the ETL re-designing task.

Our Roles

  • Design and develop a BI data ETL agent between 2 major open-source database systems, PostgreSQL and MySQL
  • Co-operate with the vehicle tracking and logging system developer Telargo d.o.o. for the BI ETL agent development
  • Manage the data backup and replication under PostgreSQL Write-Ahead-Lock (WAL) environment with point-in-time disaster recovery
  • Assist users for data migration, perform technical acceptance test (TAT), user acceptance test (UAT) and drill test
  • Provide on-going support and maintenance

Project Scale

  • Extremely huge data growth – daily data feed-in from thousands of vehicles generates around 40,000 records per sec, which is almost 25GB raw data per day
  • An ETL agent was developed under an open source environment to deal with the huge volume of data feed-in; the agent runs 24x7 round-the-clock with multithreading mechanism

Deployment Timeline

Challenges

  • Because the data volume and transactions are extremely huge in the project, and one of major requirements is that the ETL agent has to process at least 40,000 records per second - not in a certain period of time but round the clock - a multithreading program (i.e. multiple programs running in parallel, and yet they have to coordinate with each other during their running to deliver logical results) has to be developed.
  • At the preliminary stage, the existing database systems were found unstable and unreliable, and the development progress was dragged. Fortunately after investigation for a couple of weeks, mindVan had identified the deficiency of the existing systems. After co-operating with Telargo and enhancing the systems, the ETL agent could be then commissioned and delivered to the users.

if you want to know more about setting up big-data or business intelligence solutions...


Published/Reviewed: 2024/09/16