Posted on Lab Informatics. 6 February, 2020
A new consortium of pharmaceutical, technology and academic partners is hoping to improve collaboration and competitive data sharing among pharmaceutical companies in ways that would address IP concerns. Utilizing blockchain and federated learning technologies, the MELLODDY (Machine Learning Ledger Orchestration for Drug Discovery) Consortium aims to use deep learning methods on the chemical libraries of 10 pharma companies to create a modeling platform that can more quickly and accurately predict promising compounds for development, all without sacrificing the data privacy of the participating companies.
Over the last few years, pharma and biotech companies have been adopting artificial intelligence (AI) and big data approaches, particularly machine learning and deep learning, in hopes that these technologies will change the way the industry discovers, develops, and manufactures medicines. Advances in computational power have opened up the possibility of creating digital models utilizing machine learning that can dramatically accelerate innovation and yield inexpensive predictions from expensive data.
One of the challenges to implementing these technologies, however, is the size of the datasets that are required to train the AI algorithms. To train models effectively, AI algorithms need to be feed enormous amounts of diverse data from patients of different genders, ages, demographics, environmental exposures, etc. With an astounding amount of preclinical data and research being generated among them, pharmaceutical companies could dramatically improve the success of AI models if they could find a way to pool their data for use in training the algorithms. Unfortunately, this kind of data sharing among pharmaceutical companies has not been an option, due to concerns around loss of intellectual property (IP).
The MELLODDY consortium consists of 17 partners across Europe:
The 10 pharma companies in the public-private consortium have made available over a billion data points relevant to drug development from their chemical libraries, as well as hundreds of terabytes of image data annotating the biological effects of over 10 million different small molecules with known biochemical or cellular activity. The other 7 participants in the consortium include data science companies, AI companies, public research groups and university partners, all of which will assist with the processing and analysis of this enormous data trove.
The hardware powering the MELLODDY platform is being provided by NVIDIA, manufacturer of graphics processing units (GPUs) that are popular in the gaming industry. IT startup Loodse is deploying software components across infrastructures of the different pharma companies. Owkin and Substra Foundation are providing the private blockchain technologies that will allow pharma companies to maintain control and visibility over their data. Machine learning algorithms with privacy and security features are being provided by European universities Katholieke Universiteit Leuven and Budapesti Muszaki és Gazdaságtudományi Egyetem, as well as French startup Iktos.
The 3-year MELLODDY project, which launched in June 2019, has an estimated budget of over 20 million USD and receives partial funding from the Innovative Medicines Initiative (IMI). The intention of the project is to build an AI deep learning platform that can ingest large libraries of proprietary data (while protecting IP) to enable more accurate predictive models and increase efficiencies in drug discovery. Participating pharma companies are at liberty to use whatever predictions emerge during the 3-year period of the project.
Pharmaceutical companies have always guarded their research and development datasets carefully, as this data is the key to generating the proprietary insights that fuel a robust revenue-generating drug pipeline. Projects involving collaboration in this core competitive space have therefore been virtually non-existent in the pharmaceutical industry. To generate useful insights, however, AI algorithms need access to large and varied datasets. This fact has fueled interest in finding ways for pharmaceutical companies to share proprietary data for research purposes without compromising IP.
The MELLODDY project utilizes a federated learning model in which the training data from the individual pharma companies does not have to be pooled into a single aggregating server in order to train models. Instead, each pharmaceutical partner will house its data on its own cluster of NVIDIA V100 Tensor Core GPUs hosted on Amazon Web Services in a private blockchain infrastructure.
The MELLODDY developers intend to create a deep learning model on a centralized server that will travel among these different cloud clusters to train on each company’s annotated data, allowing the model to get exposed to a significantly wider range of data than any one company has in house. Once the model has trained for a couple of iterations on an individual company’s data, it is sent back to the centralized server to aggregate the contributions before moving on to the next company’s data cluster. In essence, the sensitive data remains protected within each individual company’s secure infrastructure – it’s only the non-sensitive models that are exchanged.
With this federated learning model, each participating pharma company will be able to create anonymized queries about specific drug compounds, with queries being sent to each organization’s data repositories to identify potential matches. In this way, individual pharma companies will be able to finetune the deep learning model and effectively tailor it to their specific field of inquiry, with the individual research projects remaining confidential.
Additionally, the private blockchain infrastructure of the MELLODDY project will provide full transparency as the deep learning algorithm trawls each individual company’s biological and chemical data over the full duration of the project. Like a bank statement, the blockchain ledger contains a log of all activities and can be requested after each federated run, effectively allowing each company to verify that their data has not been improperly accessed or shared with a competitor.
The MELLODDY project’s goal is to harness the collective knowledge of the consortium through a federated learning AI platform to identify the most promising compounds for drug development. The project uses a private blockchain infrastructure to protect participating company’s IP while providing complete transparency over all activities. If successful, the MELLODDY project could inspire participants to open up even more data (e.g., preclinical data) for analysis with the platform and eventually lead to the platform becoming commercially available.
Federated learning approaches create a win-win scenario by enabling secure participant collaboration with local, competitive data in order to create global knowledge that benefits everyone. Federated learning has the potential to revolutionize how AI models are trained, with benefits filtering out into the wider healthcare ecosystem. King’s College London, for example, is bringing AI to clinical environments using federated learning through its London Medical Imaging and Artificial Intelligence Centre for Value-Based Healthcare project. Ultimately, these technologies will benefit patients by providing faster access to effective and innovative new medications that help combat disease.
Astrix Technology Group has over 20 years of experience facilitating success in laboratory informatics implementation projects in pharmaceutical and biotech companies. Our experienced professionals have the experience and knowledge to help you implement innovative informatics solutions that allow organizations to turn data into knowledge, increase organizational efficiency, improve quality and facilitate regulatory compliance. If you would like to discuss your project or explore how to optimize your overall laboratory informatics strategy with an Astrix expert, please contact us for a free, no obligations consultation.
Copyright (C) 2020