Using ETL Tools for LIMS Data Migration

Posted on Lab Informatics. 23 July, 2018

For today’s scientific organizations, a laboratory information management system (LIMS) is necessary tool for effective data management in the laboratory. For many reasons, companies often find themselves in a situation where it is advantageous to upgrade the laboratory environment by migrating data from legacy systems into a new or even existing LIMS.  A LIMS data migration project will often involve the use of ETL technology to assist.

Migrating data to a LIMS can be a challenging endeavor, however. Companies often have years of historical data and knowledge stored in their existing LIMS that must be migrated over to the chosen system. Depending on the amount and complexity of the data sources in the legacy system(s), as well as the complexity of the new LIMS, the data migration can require much time and labor. Unless the project team does proper planning and execution, the migration process can end up being the cause of major project delays.

Data migration for a LIMS implementation project involves extracting, translating and loading as much static and historical legacy data as possible into the new or chosen system. The migration is typically accomplished through a combination of manual processes (e.g., developing custom software, manual data entering, moving files, etc.) and automated programmable migration tools. Given the time and labor involved in manual data migration, it is wise to maximize the use of automated tools in the process. In this blog, we will discuss some of the benefits, challenges and best practices involved in using automated ETL (Extract-Transfer-Load) tools to migrate data from legacy systems to a new LIMS.

Data Migration Plan

Data migration is often a bigger job than anticipated during a LIMS implementation. It is common for the project team to focus on activities such as system configuration and software testing and put off data migration to later stages of the implementation. This is a mistake that can end up causing major project delays and effect business continuity in the lab. Data migration should be accomplished in the early stages of the implementation project.

When migrating data to the new LIMS, static legacy data (and sometimes dynamic data as well) must be extracted, translated and loaded into a new location. Before the migration begins, an analysis should be done that examines the data structure of both the source and target. Once this analysis is complete, a data migration plan is developed that takes into account the needs of all the different stakeholders involved. The plan should define which data will be migrated, why, and in what manner.

Regulated laboratories must also take steps to validate the data both during and after the migration. The data migration plan should detail the ways in which the data transformations that are performed during the migration (prior to loading them into the final system) are correct. The plan should also detail the methods used to verify that the data is correct after the final loading into the new LIMS.

The data migration plan will likely include both manual and automated data migration activities. In general, maximizing the use of automated programmable tools (ETL) for data migration, while utilizing manual methods when needed, is the best approach. There are a variety of commercial and open source ETL tools out there. Our experience is that open source tools tend to be buggy, so a commercial tool is usually the best option.

Using Programmable ETL Tools

ETL tools integrate data from one or more applications/systems into a single target system using three phases – Extract, Transform, Load. Since the whole process can be time consuming, it is common for these tools to execute all three phases in parallel – each phase executes immediately on the data that it receives from the previous phase without waiting for these phases to complete.

Extraction Phase

For an ETL extraction to be successful, a good understanding of the source data layout/format must be in place in order to configure the tool properly. The extraction phase consolidates and extracts the data from different homogeneous and/or heterogeneous data sources. The extraction must convert the data into a suitable single format for processing in the transformation phase. The extraction process also selects the proper data from the source to extract, as the source data typically contains redundant data and data that is not of interest. All raw data acquired in the extraction phase should be backed up and archived.

A good ETL tool has a staging area that enables storage of an intermediate version of the data being extracted. This enables validation activities to occur before the transformation stage to confirm that the extracted data has the expected values. Any data failing validation is rejected entirely or in part and held back for analysis to discover where the problem occurred. A staging area also avoids the necessity of re-extraction in case of issues in subsequent phases.

Transformation Phase

Data cleansing/transformation is usually performed during a data migration to both improve data quality and match the requirements of the new system. The transformation phase involves an application of a series of rules or functions to the data from the extraction phase to prepare it for loading into the end target. Validation activities should also be performed to validate the transformations in this phase.

While some data (AKA pass through data) may not require any transformation and can be loaded directly into the target LIMS, most data will need some form of transformation before loading. For example, the new LIMS may support multiple departments, so that the storage location names will need to be changed in the new LIMS compared to the old. Some common transformation processes include clearing the duplicates, standardizing, filtering, format conversion and sorting. A good ETL tool needs to enable building complex processes and extending a tool library so custom functions can be added.

Transforming dynamic data usually involves complex rules that require multiple transformations on several pieces of source data, and IF/Then rules may need to be applied. These complex transformations can contribute substantially to the validation work required to prove the integrity and completeness of the data has been preserved through the data migration process, and this must be accounted for in the overall data migration plan.

Loading Phase

This final stage of the ETL process loads the extracted and transformed data into the target LIMS. The loading phase is usually the bottleneck of the whole data migration process. If you need to move high volumes of data through your ETL tool, it should be able to process at least multiple TB (terabytes) per hour (or ~1 GB per second) using powerful servers with multiple CPUs, multiple hard drives, multiple gigabit-network connections, and lots of memory.

Conclusion

ETL processes oftentimes involve considerable complexity, and improperly designed ETL systems can result in significant operational problems for a LIMS migration project. Purchasing an ETL tool from a reputable vendor is therefore recommended. Some organizations may opt for a data management suite that enables extraction, transformation, loading, verifying, and reporting all within the same toolset.

Some of the benefits of using a good ETL tool include:

  • The data loaded into the new LIMS will be useful and available for analytics.
  • The time and resources needed for data migration LIMS go-live will be reduced.
  • Data quality in the new LIMS will be improved due to reduced manual data transcription errors.
  • Reduced need for developers to participate in data migration. A good subject matter expert is usually all that is necessary to operate a good ETL tool.

Regardless of the ETL tool utilized, a thorough understanding of data structures and definitions, as well as how the data will be used, is essential to facilitating a successful data migration. Astrix Technology Group has 20 years of experience facilitating data migration as part of LIMS implementation projects, and our professionals have extensive domain, laboratory and industry experience. If you would like to discuss your LIMS migration project, data migration challenges, or data migration plan, feel free to contact us for a free, no obligations consultation.

Jeff Policastro is Vice President of Business Development and Strategy at Astrix, and he is responsible for the commercial growth and market strategy of the Professional Services Division.  He has more than two decades of experience in Quality informatics throughout the global life sciences, chemical, and consumer goods industries. He has an extensive track record driving success and delivering value to his client’s projects.

A Selection of Current Customers