How the UK Biobank is Revolutionizing Precision Medicine

Posted on Lab Informatics. 17 October, 2019

In the last decade, there has been a paradigm shift in the development of gene sequencing diagnostics. This revolution has occurred mainly due to the development of ultra-high throughput next-generation sequencing (NGS) technology that allows a human genome to be sequenced in just a few days. Combined with clinical data, readily accessible genomic data is also accelerating the development of personalized, targeted treatments in a medical model known as Precision Medicine. The UK Biobank Project is on the forefront of this initiative.

In order to develop the genetic diagnostics and personalized treatments characteristic of Precision Medicine, scientists must sift through and effectively analyze immense data sets. This necessity has led to the rapid growth of a new engineering field known as bioinformatics – an interdisciplinary field of scientific research that utilizes biology, computer science, data engineering, mathematics and statistics to increase our understanding of biological processes. Bioinformatics has played a major supportive role in the emergence and continued development of precision medicine.

Bioinformatics scientists need access to the health and genetics data of large cohorts so that the influence of genetics on disease states can be sussed out and incorporated into tailored treatments. In order to facilitate access to the large amounts of biological and health data that bioinformatics scientists need, many nations have launched biobank projects, including the United States, United Kingdom, China, Austria, Qatar, Estonia, Japan, Canada and Finland.

The United Kingdom’s Biobank (UKB) project recently released a vast collection of genetic data to health researchers around the world, offering an unparalleled resource to enhance our understanding of human biology and aid in the advancement of precision medicine. In this blog, we will discuss the types of data made available by the UK Biobank project, along with the details on how scientists can access this important data.

UK Biobank Project

Initially proposed by an expert panel in 2000, the UKB project began in earnest in 2006 with the intention to collect genetic, physical and clinical data from a large cohort of individuals in the United Kingdom. Over a four-year period from 2006 to 2010, the project enrolled approximately 500,000 people between the ages of 40 and 69 via the UK’s National Health Service (NHS), as following this age group enables scientists to focus on diseases of middle and/or old age. Those who enrolled had their blood and urine sampled and were examined for more than 2400 different traits or phenotypes, including aspects of their social lives, habits, cognitive state, lifestyle, and physical health. The project follows participants by accessing their health records and national registries, including those for deaths and cancer diagnoses.

What really sets the UKB apart from other biobank projects is the crowdsourcing spirit of the project. Other biobanks have comparably rich health data, and some are even bigger (e.g., 23andMe DNA testing company in the US), but in these cases you must collaborate with the database creators to access the data. With UKB, the intention from the beginning has been to democratize the accumulated data to help enhance understanding of human biology and accelerate therapeutic discovery. Rory Collins, an epidemiologist at the University of Oxford who has been with the project from the beginning, summed up UKB’s open access spirit in an interview with Science Magazine earlier this year when he said, “By making data available to 100 people around the world, we can get a lot more research done than if I sit here and do one study a year with the data.”

To date, the UKB has an impressive record of following through on its data sharing intentions by releasing data to approved researchers.

  • 2015: Genetic data, which included genotyping and imputed data, on 150,000 anonymized participants was made publicly available.
  • 2017: Genetic data for the full cohort was released.
  • 2019: Exome sequence data of 50,000 participants was released that is linked to detailed health records, imaging and other health-related data for participants.
  • 2020: Complete exome sequencing of the remaining 450,000 UK Biobank participants is scheduled for release by the end of 2020. Additionally, about half of the participants’ primary care data, including clinical data and prescriptions, should become available in 2020.

In addition to exome sequencing, whole genome sequencing of 50,000 participants funded by the UK Research and Innovation as part of the Industrial Strategy Challenge Fund is currently under way. The UKB has also done MRI scans of the brains, hearts, and abdomens of 25,000 participants, and plans to scan 100,000.

How Researchers are Using the Data

Within days of the UKB genetic data on the full cohort in 2017, researchers at the Broad Institute had linked more than 120,000 genetic markers to more than 2000 diseases and traits, effectively doubling the 60,000 genetic markers that had been tied to diseases up until that point. As of early 2019, about 7000 researchers were registered to use UKB data on 1400 projects, and nearly 600 papers had been published. While some studies simply link behaviors with disease (e.g., drinking more coffee can reduce mortality), others perform genome-wide associations studies to zero in on genes that influence a particular attribute or disease. Dozens of papers have now been published using UKB data that add to our knowledge of what genes contribute to heart disease, diabetes, Alzheimer’s, and other conditions. Some of these studies have also highlighted genes’ role in shaping traits like personality, depression, birth weight, insomnia, and others. Some studies have even delved into controversial areas linking DNA markers to human behavior.

While these genetic links are suggestive correlations, establishing cause and effect will take more work and research. By combining UKB data with other databases (i.e., pharmaceutical company internal data, other biobank data, etc.), researchers can work to flush out some of the more subtle effects of gene variants (single nucleotide polymorphisms, frequently called SNPs). The goal is to increase our understanding of human biology, reveal new disease pathways, discover new drug targets, and ultimately new effective drugs.

Some of the many ways in which scientists can use the anonymized UKB data include:

  • Investigate whether particular changes in inherited DNA are associated with particular diseases.
  • Undertake more sophisticated analyses of genes in order to identify the causes of disease and ways to intervene that will improve health.
  • Investigate how particular SNPs may be involved in different diseases.
  • Research how genetic and lifestyle factors influence health.
  • Develop polygenetic risk scores, which calculate a person’s disease risk by combining different genetic markers.
  • Investigate how genetic risk factors interact with diet, lifestyles, environment, and other aspects of health.

A list of genetics-based papers that have been generated through the use of the UKB database can be found here.

How to Access the Data

Since the UK Biobank Project first allowed open access to its database in March 2012, there have been over 8,000 approved registrations, and over 800 formally registered projects are now under way. A step by step guide to help researchers apply for access to UK Biobank data can be found here. FAQs and other helpful information about working with the UKB database can be found here. Note: users will pay a relatively modest fee of $2,500 and agree to return their raw data, results, and code to the UKB after publishing. Users also sign a legal agreement confirming that they will not attempt to identify any participants.


The UK Biobank has generously made anonymized data sets of genetic, physical and clinical data from a large cohort of individuals in the UK, as well as all results from studies conducted by researchers using this data, available to researchers around the world. This massive data set is revolutionizing the field of precision medicine. Every few days now, a new paper is published using UK Biobank Project data to link particular gene variants to a disease or trait.

Yet working with large and complex data sets in R&D laboratories requires a well-thought out and integrated laboratory informatics architecture and ecosystem. Many leading-edge companies are now applying artificial intelligence and virtual modeling in an effort analyze big data sets more efficiently and accelerate the discovery process. The bottom line: Pharmaceutical and biotechnology companies engaged in genomic research must develop the computing infrastructure and skills to manage, store, analyze and interpret massive quantities of highly complex data.

Astrix Technology Group has over 20 years of experience facilitating success in implementing laboratory informatics projects in pharmaceutical and biotech companies. Our experienced professionals can help you implement innovative informatics solutions that will turn data into knowledge, increase organizational efficiency, improve quality and facilitate regulatory compliance. If you would like to discuss how to optimize your overall laboratory informatics strategy with an Astrix expert, please contact us for a free, no obligations consultation.

About Dale Curtis

Dale Curtis Dale Curtis Jr. is the President of Astrix Technology Group. Dale is a leader in providing innovative laboratory informatics solutions to the scientific community. With over 21 years of proven success, his business approach delivers deep scientific insight with the understanding of how technology and people will impact scientific industries. Dale’s strategy focuses on issues related to value engineered solutions, on demand resource and domain requirements, flexible and scalable operating and business models that help Astrix’s clients find future value and growth in a scientific world.

A Selection of Current Customers