December 26, 2020
With the successful completion of the Human Genome Project on April 14, 2003, the mapping of the human genome took approximately 13 years and 3 billion dollars to accomplish. Today, next-generation sequencing (NGS) technology allows a human genome to be sequenced in just a few days for a few hundred dollars.
With ultra-high throughput, speed and scalability, NGS allows researchers to study biological systems in ways never before possible. Combined with clinical data, readily accessible genomic data is driving a revolution in drug discovery by accelerating the development of personalized, targeted treatments in a medical model known as Precision Medicine.
The new field of genomics research, combined with the potential of Precision Medicine to drive improved patient care, has spawned significant growth in the biotechnology industry, with a recent report by Grand View Research projecting that the global biotechnology market will reach 727.1 billion USD by 2025. The promise of Precision Medicine that was envisioned decades ago with the mapping of the human genome is fast becoming a reality.
While genomic research is paving the way for significant advances in healthcare, its practice has led to new challenges for biotechnology firms in terms of increasing data volume, variety and complexity.Given that the data representing a single human genome takes up to 100 gigabytes of storage space, biotech companies engaged in genomic research must develop the computing infrastructure and skills to manage, store, analyze and interpret massive quantities of highly complex data. In this blog, we will explore challenges and best practices involved in addressing data management challenges in genomics research.
Challenges and Solutions for Genomic Workflows
While access to genomic data has improved dramatically over the last decade, processing and analytics for these massive data sets has become the new bottleneck. For pharmaceutical and biotech companies engaged in genomics research, a modern laboratory information management system (LIMS)that is flexible enough to deal with the volume,variety and complexity of genomic data sets is now a necessity. A modern LIMS that integrates with instrumentation and enterprise systems is now a critical tool to automate the lab to handle the increased throughput.
Biotech can either be new companies looking to replace paper-based systems with a new informatics solution, or an academic/pharma spin off with legacy systems that need to be transitioned into a more modern solution. Either way, a number of challenges will need to be addressed during the implementation of the LIMS solution in order to achieve an efficient, scalable and cost-effective laboratory environment.
So what are the key data management challenges in genomics research?
#1 – Complex Workflows and Sample Tracking. NGS biologics add complexity to sample tracking and workflows as compared with previous small molecule based tools and workflows. As opposed to distinct and easy to track small molecule samples, compounds and batches, biologics involve complex aspects such as sequences, ligated variants and linkages. In addition, processing of biological samples is highly complex and involves the use of cell lines and other biologics building blocks, resulting in the need to track sample lineage and progeny over time.
Additionally, NGS workflows can involve different sequencing technologies, assays, sample preparation kits/protocols, automation robots and analytical instruments. These workflows can span several different departments and user types, and many of these aspects are subject to frequent change (e.g., reagent kit upgraded, new liquid handler comes on-line, workflow adjusted with new data-dependent trigger, etc.).
#2 – R&D Externalization and Data Security. Genomics research requires diverse teams (e.g., bioinformaticians,clinicians, computational biologists, etc.) to work together to process and analyze genomic and clinical data to extract the insights that drive innovative therapies. Legacy systems often don’t communicate with each other and lack the integration to facilitate good collaboration, effectively locking researchers into siloes and hampering drug discovery.
It is increasingly common for biotech firms to externalize aspects of R&D to CROs,academia and/or smaller biotech firms to provide specific skills, resources and technology. In this case, the parent biotech will need communication and data exchange capabilities to collaborate effectively with their new partners, where the partners can access only the data they need to see, while the parent company has secure access to all of the data in the appropriate format. An informatics solution needs to be in place to facilitate management of R&D processes and data security across different locations, time zones, user groups that have different permissions,and is flexible enough to handle a changing web of external partners. A cloud-based LIMS can be particularly effective in this regard, while also providing the scalability to handle the massive genomic datasets.
#3 – High Volume of Data. With scientists are generating gigabytes of data with every experiment, it is critical to choose a solution that can handle the throughput and volume of data. Challenges remain for cloud solutions, as integration with on-premise instrumentation and other systems presents risks. You will want to do a full analysis to choose a vendor with modern, RESTful APIs and a robust security model.
#4– Regulatory Compliance. Finally, given that genomic research typically involves human samples, biotech companies must adhere to a variety of regulations – CLIA, GDPR, HIPAA – administered by US and international agencies in addition to GxP and 21 CRF part 11. As such, the LIMS solution chosen will need to have a robust set of controls that support compliance with applicable regulations and standards on data integrity and privacy. Additionally, the project team needs to have the skills, knowledge and experience necessary to properly develop and apply these controls during the implementation so as to enhance and ensure regulatory compliance for your business.
Best Practices for Implementing a LIMS in a Biotech Startup
Biotech startups typically need to implement a modern, robust LIMS that has the flexibility and scalability to support your business both now and as it evolves, offers an affordable total cost of ownership (TCO), and has the ability to integrate effectively with both enterprise systems and instrumentation. Selecting, implementing and integrating a LIMS solution for genomics workflows can be a challenging endeavor, however.
One of the biggest mistakes companies make when starting a complex informatics project is omit the strategic planning necessary to create the solid project foundation that ensures success. Success, in this case, is defined by the implementation fulfilling the following metrics:
- Delivered on time
- Delivered within budget
- Meets or exceeds requirements, including the necessary security and compliance requirements
- Provide a high level of business value
- Provide a high level of user adoption
In order to meet these metrics, a comprehensive and proven methodology should be followed that leverages a workflow and business analysis to maximize business value for your organization. In this initial phase of the project, business analysts with both domain, industry and system knowledge work to document the current state of laboratory operations, as well as an optimized set of future state requirements that will encompass the flexibility necessary for genomic workflows.
When designing system requirements, it is beneficial to consider the key data management issues in genomics research. An easy way to take these into account is to remember to address the 6V’s in your technology selection process:
- Volume (Large amount of data)
- Variety (cellular, molecular, genomic, preclinical data)
- Velocity (data is being collected constantly at a high velocity)
- Veracity (Accuracy of the data)
- Vulnerability (Security and compliance)
- Vocabulary (Data standardization is a challenge)
Once a set of optimized future-state requirements are generated, it is it is important to design a laboratory informatics architecture that is aligned with business goals, along with a road map to deployment, before engaging in a proper technology selection process for your organization.
With the proper foundation laid for your project in this way, a skilled project team can engage with system design, implementation, integration, validation and user training with a high degree of success. But even when a new system is successfully deployed to researchers, the system will need to be maintained and evolve over time, and any legacy systems still in use will need to be maintained as well.
The explosion of data generated by the rapid pace of discovery in genomics research has great potential to aid in the development of innovative new therapies that will significantly improve patient care. To deliver on that potential, however, biotech companies generally need to invest in dynamic informatics systems that can help to process and analyze the high volume, variety and complexity of genomic datasets. Legacy LIMS are generally not flexible or extensible enough to handle NGS workflows, and attempts to adapt these systems to support genomic research are unlikely to succeed. The reality is that modern, cloud-based LIMS are often ideal for genomics research.As such, your project team should have the expertise and experience necessary to help you determine the ideal hosting architecture for your LIMS solution.
Astrix has over 20 years of experience facilitating success in LIMS implementation projects in pharmaceutical and biotech companies. Our experienced professionals have the experience and knowledge to help you implement innovative informatics solutions that allow organizations to turn data into knowledge, increase organizational efficiency, improve quality and facilitate regulatory compliance.