The Astrix Blog

Expert news and insights for scientific & technology professionals.

The Life Science Industry Blog for R&D Professionals

Big Data’s Impact on a Digital Transformation of Life Sciences

In Life Sciences we have data that we would classify as “Big Data”. That being high in volume, velocity, or variety. This Big Data is fundamentally transforming the industry.

Some key examples of the impact of big data in Life Sciences can be found with:

  • Genomics data  – By 2025 genomics data (NGS and HiFi) will exceed the data totals of YouTube, Twitter and astronomy combined.1
  • Variety of sources of clinical data from streaming and mobile apps will continue to grow –  Our research indicates that there will be a significant increase in the R&D relevant data through social media, mobile apps, biosensors, multi-omics, imaging, single cell analysis, and real-world evidence.
  • Increase in high content analysis techniques being used in Research  e.g., Imaging and single cell analysis 

The explosion of health-relevant data enables new opportunity for R&D transformation; however, tools are needed to store, manage, and analyze this data. The following are key considerations:

  • Cloud & hybrid cloud architecture is needed – to facilitate performant access and analysis of data and fit for data purpose storage modalities and formats that can best serve the analytics engines.
  • Clear data use, storage, and archiving strategies are required – a strategy and roadmap to understand how data is being utilized throughout the organization and where it is stored, along with when and how it should be archived are requirements in the life sciences industry.
  • Strategies for data management are needed and include:
    • Support for a variety of data ingestion patterns.
    • Continuous prospective data curation and data harmonization (to standardize vocabularies/ontologies and structure).
    • Application of AI/ML capabilities to data processing to increase speed and minimize human interaction.
    • Parallel and distributed data processing engines to process (e.g., Spark).
    • Edge devices and systems that push more of the data processing to this method.
  • Unique Storage requirements for big data
    • Big data will require exabytes of space and data usage, partnering, and storage strategy with storage tiers at multiple price points coupled with effective compression technologies
    • A mix of on-premise and cloud-based storage with advanced and highly automated data processing (edge and distributed processing) and management capabilities are needed to ensure frictionless access and use of data, while ensuring compliance
    • There is a requirement for a data platform with fit for purpose storage modalities and formats with a clear data use, storage, and partnering strategy
  • Data Strategies for raw vs. cataloged vs. curated vs. data product (Insight)
    • Distinct management strategies for core data vs. highly curated data are required as a best practice.
    • Human and machine-searchable indexes of data resources are necessary.
    • Proactive service to automate and manage updates, access, context, and indexes of data, using AI/ML capabilities to scan new data availability and consistently enrich key data as available.
  • Cleaning and the context of the data
    • R&D data exists in a wide variety of formats and vocabularies, requiring extensive cleaning for utility and interoperability. There is a need to adopt well-established ontologies and industry standards (SEND, CDISC) where available and leverage ML technologies to characterize/classify data.
    • Automate and clean data, establish data context and deliver data through services and secure collaboration environments.
    • Requires R&D to establish and govern data and data standards.


Big data will continue to be both a challenge and major opportunity for the life sciences industry. Those organizations that optimize their big data will ensure smooth operations and patient safety. By utilizing the right strategy, processes, and technologies, the organization is prepared for the future as big data continues to grow exponentially.

Why It Matters to You

In the life sciences industry, big data will continue to play a significant role in optimizing operations. It is therefore imperative to ensure that the organization has the right methods and procedures in place to optimize its use.

In this blog we discussed:

  • What big data is and the sources of big data in the life sciences industry.
  • Key considerations relative to big data from both a strategic and tactical perspective.
  • Optimizing the organization’s big data leveraging the strategic and tactical aspects.