The Astrix Blog

Expert news and insights for scientific & technology professionals.

The Life Science Industry Blog for R&D Professionals

Using Programmable ETL Tools for a Data Migration of Your Pharmacovigilance System

ETL tools integrate data from one or more applications/systems into a single target system using three phases – Extract, Transform, Load. Since the whole process can be time consuming, it is common for these tools to execute all three phases in parallel – each phase executes immediately on the data that it receives from the previous phase without waiting for these phases to complete.

Extraction Phase

For an ETL extraction to be successful, a good understanding of the source data layout/format must be in place to configure the tool properly. The extraction phase consolidates and extracts the data from different homogeneous and/or heterogeneous data sources. The extraction must convert the data into a suitable single format for processing in the transformation phase. The extraction process also selects the proper data from the source to extract, as the source data typically contains redundant data and data that is not of interest. All raw data acquired in the extraction phase should be backed up and archived.

A good ETL tool has a staging area that enables storage of an intermediate version of the data being extracted. This enables validation activities to occur before the transformation stage to confirm that the extracted data has the expected values. Any data failing validation is rejected entirely or in part and held back for analysis to discover where the problem occurred. A staging area also avoids the necessity of re-extraction in case of issues in subsequent phases.

Transformation Phase

Data cleansing/transformation is usually performed during a data migration to both improve data quality and match the requirements of the new system. The transformation phase involves an application of a series of rules or functions to the data from the extraction phase to prepare it for loading into the end target. Validation activities should also be performed to validate the transformations in this phase.

While some data (AKA pass through data) may not require any transformation and can be loaded directly into the target Pharmacovigilance system, most data will need some form of transformation before loading. For example, the new Pharmacovigilance system may store product data differently, so that the storage of this information will need to be changed in the new Pharmacovigilance  system compared to the old. Some common transformation processes include clearing the duplicates, standardizing, filtering, format conversion and sorting. A good ETL tool needs to enable building complex processes and extending a tool library so custom functions can be added.

Transforming dynamic data usually involves complex rules that require multiple transformations on several pieces of source data, and If/Then rules may need to be applied. These complex transformations can contribute substantially to the validation work required to prove the integrity and completeness of the data has been preserved through the data migration process, and this must be accounted for in the overall data migration plan.

Loading Phase

This final stage of the ETL process loads the extracted and transformed data into the target system. The loading phase is usually the bottleneck of the whole data migration process. If you need to move high volumes of data through your ETL tool, it should be able to process at least multiple TB (terabytes) per hour (or ~1 GB per second) using powerful servers with multiple CPUs, multiple hard drives, multiple gigabit-network connections, and lots of memory.


ETL processes oftentimes involve considerable complexity, and improperly designed ETL systems can result in significant operational problems for a Pharmacovigilance system migration project. Purchasing an ETL tool from a reputable vendor is therefore recommended. Some organizations may opt for a data management suite that enables extraction, transformation, loading, verifying, and reporting all within the same toolset.

Some of the benefits of using a good ETL tool include:

  • The data loaded into the new PV system will be useful and available for analytics.
  • The time and resources needed for data migration go-live will be reduced.
  • Data quality in the new PV system will be improved due to reduced manual data transcription errors.
  • Reduced need for developers to participate in data migration. A good subject matter expert is usually all that is necessary to operate a good ETL tool.


Regardless of the ETL tool utilized, a thorough understanding of data structures and definitions, as well as how the data will be used, is essential to facilitating a successful data migration. Astrix Technology Group has many years of experience facilitating data migration as part of implementation projects, and our professionals have extensive domain and industry experience. If you would like to discuss your Pharmacovigilance  system migration project, data migration challenges, or data migration plan, feel free to contact us for a free, no obligations consultation.

About Astrix

Astrix is the unrivaled market-leader in creating & delivering innovative strategies, solutions, and people to the life science community.  Through world class people, process, and technology, Astrix works with clients to fundamentally improve business & scientific outcomes and the quality of life everywhere. Founded by scientists to solve the unique challenges of the life science community, Astrix offers a growing array of strategic, technical, and staffing services designed to deliver value to clients across their org