May 5, 2023
Data migration with ETL Tools can be a deceptively complex process and challenges can appear suddenly without a comprehensive plan. This is not uncommon when project teams focus on system configuration and software customization and delay the data migration. This is a mistake that can cause major delays and threaten business continuity. This checklist summarizes best practices for migration planning with the support of automated Extract-Transfer-Load (ETL) tools.
Managing a data migration project typically involves both manual processes (e.g., developing custom software, manual data entry, and file relocation) and automated programmable tools. Given the time and labor involved in manual data migration, it is essential to leverage automated tools to their fullest.
- Migrate data early in the implementation. When migrating data to the new system, static legacy data (and sometimes dynamic data as well) must be extracted, translated, and loaded into a new location.
- Before the migration begins, examine the data structure of both the source and target. The data migration plan should define which data will be migrated, why, and in what manner, based on the needs of all the different stakeholders involved.
- Validate data both during and after This step is especially important for life sciences data given its significance in research, development, and manufacturing. A data migration plan should detail the ways in which the data transformations performed during the migration (prior to loading them into the final system) will be validated to ensure correctness. The plan should also detail the methods used to verify that the data is correct after the final loading into the new system.
- Choose a commercial ETL tool. In general, the best approach maximizes use of automated programmable tools (ETL) for data migration, supplemented by manual methods when needed. While there are a variety of commercial and open-source ETL tools available, in our experience choosing a commercial tool is usually the better option because they are less buggy.
- Back up and archive all raw data acquired in the extraction phase. A good ETL tool has a staging area that enables storage of an intermediate version of the extracted data. This enables validation activities to occur before the transformation stage, to confirm that the extracted data has the expected values. Any data failing validation is rejected entirely or in part and held back for analysis to discover where the problem occurred. A staging area also avoids the necessity of re-extraction in later phases of the migration.
- Cleanse and validate data in the transformation While some data (AKA pass-through data) may not require transformation and can be loaded directly into the target system, most data will need some form of pre-load transformation. A good ETL tool enables complex processes and extends a tool library to add custom functions.
- Ensure your ETL tool will process multiple terabytes per hour or one gigabyte per second. This is essential since the loading phase is usually the bottleneck with high-volume migrations. If you need to move high volumes of data through your ETL tool, it should be using servers with multiple CPUs, hard drives, gigabit-network connections, and plenty of memory.
- An experienced subject matter expert, leveraging a good ETL tool, will reduce risk and expense.. Regardless of the ETL tool utilized, facilitating a successful data migration requires a thorough understanding of data structures and definitions, as well as how the data will be used. An experienced migration manager can reduce, or eliminate, the need for developers to fix problems or build patches later in the process.
Astrix is the unrivaled market-leader in creating & delivering innovative strategies, solutions, and people to the life science community. Through world class people, process, and technology, Astrix works with clients to fundamentally improve business & scientific outcomes and the quality of life everywhere. Founded by scientists to solve the unique challenges of the life science community, Astrix offers a growing array of strategic, technical, and staffing services designed to deliver value to clients across their organizations.