January 1, 2023
The chances for a drug candidate entering a preclinical trial to eventually pass the FDA approval stage and make it to market are exceptionally grim – only 1 in 5,000 successfully complete the journey. Additionally, the entire process of bringing a new drug to market takes roughly 12 years on average, while the US Department of Health and Human Services has estimated that it costs somewhere between $161 million and $2 billion to bring a new drug to market.
The high costs, long timeframes and poor success rates inherent in traditional drug development and discovery processes have scientists seeking out new paths to innovation. With technological advances such as next-generation sequencing (NGS), the internet-of-things (IoT), increased computational power, and machine learning, leading pharmaceutical companies are now turning to advanced analytics and machine learning to spur the discovery of novel compounds and improve the efficiency of drug discovery and development.
According to Market Research Future, the global commercial pharma analytics market is expected to reach over $9 billion by 2027. In this blog, we will discuss some of the benefits of utilizing BIOVIA Pipeline Pilot’s advanced analytics and modeling capabilities to enhance drug discovery.
Advanced Analytics and Modeling In Pharma
Drug discovery is a challenging endeavor. The first step – finding compounds with the desired medicinal effect on the target pathogens – has traditionally involved the automated high-throughput screening of large compound libraries to identify “hits” with biological activity. Once identified, promising compounds are put through extensive testing to evaluate criteria such as their dose-response curve, cellular efficacy, affinity towards the target, reactivity with other compounds, cytotoxicity, etc.
Real world data and experimentation are and always will be extremely important for answering scientific questions in the drug discovery process. That said, the traditional approach of just doing physical experiments to discover and validate drug candidates is very expensive and time-consuming – just one assay can cost hundreds of dollars, not to mention the synthesis costs. Additionally, physical testing typically yields very small data sets for the money and effort it takes to run the experiments.
Advances in computational power have opened up the possibility of creating digital models utilizing advanced analytics and machine learning that can dramatically accelerate innovation and yield inexpensive predictions from expensive data. These in silico methods use mathematical, probabilistic, and statistical modeling techniques to enable predictive processing and automated decision making. Modeling and simulation allows companies to save costs and time by working in the virtual world to innovate and learn, and then only moving into physical when the model is ready to be verified or tested.
Virtual modeling with advanced analytics allows scientists to create a wide variety of virtual compounds, test them with virtual assays, and prune to retain only the best performing compounds that are then moved into physical experiments. This allows scientists to only spend significant resources on the candidates that have the best chances of success. Also, the new data that these physical tests create can be feed back into the virtual models to refine accuracy and thus enhance predictive ability.
A few of the many ways in which scientists are applying these virtual modeling techniques to improve overall R&D efficiency include:
- Identify and remove poor quality candidates sooner in the development pipeline
- Create high quality candidates more quickly
- Make sure that only high-quality candidates enter late stage R&D
- Increase the likelihood of a given candidate’s success
- Reduce time to market and development cost
- Bring later stage processes like manufacturability upstream into early R&D
The Data Scientist Deficit
In order to design and implement effective models that produce actionable insights to advance drug discovery, you need data scientists with extensive knowledge in the life sciences and deep technical knowledge and expertise. Data scientists need to know what questions to ask and how to ask them, and they have to design, build, train, and validate each model to ensure high quality results. Unfortunately, there is currently a significant gap in the pharmaceutical industry between the supply and demand for good data scientists with the requisite skills, and the demand is projected to increase significantly in the coming years.
Organizations need skilled and experienced data scientists to develop global models that are designed to work for an organization’s entire compound portfolio – typically a collection of hundreds of thousands of molecules. Bench scientists, on the other hand, want more specialized local models that have been optimized for their compounds.
Due to the data scientist gap, experienced data scientists on staff are often swamped with requests for project-centric models by scientists, which only serves to take time away from their ability to develop and maintain global models that have wider applicability and strategic use for the enterprise. This leaves bench scientists in a bind – they can either wait in line for a more specialized local models to be developed by the data science team, or struggle for weeks through the process themselves, which typically ends up resulting in poorer quality models.
Democratizing Data Science with BIOVIA Pipeline Pilot
BIOVIA Pipeline Pilot is a graphical scientific authoring application that provides a comprehensive environment for the design, training, validation, and deployment of virtual models utilizing advanced analytics. It provides a drag-and-drop graphical user interface with “components” that allow common code to be shared throughout the organization. The platform supports a wide range of analytics capabilities, component collections, and domain-specific functionalities to support a full battery of scientific tasks for your organization.
BIOVIA Pipeline Pilot helps to address the data scientist gap by democratizing the role of data scientists. The graphical user interface allows bench scientists to develop specific data workflows utilizing prebuilt components and predictive algorithms to carry out tasks without needing any prior coding experience. The Analytics and Machine Learning Collection for Pipeline Pilot gives scientists the tools for everything from data ingestion, cleaning and exploration, to model building, validation, deployment, optimization, and design of future experiments – all within a single environment.
Expert users can create their own custom components using languages like Python or R to capture best practices and share them throughout the organization. These protocols can be deployed as a simple UI or as web services to allow scientists to easily tailor models to their specific needs or run individual instances of these models. The protocols can also be deployed as robots operating in the background to automate many data analytics processes.
Pipeline Pilot facilitates the conversion of broadly applicable global models into local models that are optimized for a specific set of compounds. It empowers researchers to create their own custom, domain-specific analyses with a library of curated scientific component collections, or use applications built by their colleagues to more rapidly get insights. Previously, this kind of model development and analysis would have required a team of data scientists to implement on a case-by-case basis, but the technologies that constitute Pipeline Pilot allow bench scientists to carry out this work at their desks.
It is important to note that this approach neither diminishes expert data scientists’ utility nor changes their role – it simply gives non-expert users the ability to tailor their models to better support individual decisions. All of this frees up the expert data scientists in your organization, allowing them to focus on creating and publishing new global models for widespread use.
Finally, Pipeline Pilot offers a wide variety of APIs to facilitate integration with a number of instruments and third-party software, allowing automatic extraction of data for analysis. Pipeline Pilot has the ability to ingest a large variety of data types (e.g., numerical, chemical, genomic, proteomic, textual, and image) from different locations, allowing your organization to integrate a wide range of data sources and types into a collaborative framework.
Key Capabilities of Pipeline Pilot include:
- Apply any of 15+ machine learning (ML) methods to your scientific and engineering data
- Merge, join, characterize, and clean your data sets
- Perform exploratory analysis, including PCA, clustering, and multi-dimensional data visualization
- Build fast, scalable Bayesian classification models
- Use the GFA method’s genetic algorithm for variable selection and building regression ensemble models
- Build accurate, easy-to-use RP Forest regression and classification models
- Use R-based ML methods such as support vector machines, neural networks, and XGBoost without writing R scripts
- Employ the ML framework for cross-validation, hyperparameter tuning, and variable importance assessment for any type of model
- Use regression and classification model evaluation viewers to assess and compare model test set performance
- Use built-in applicability domain measures and error models to assess sample-specific prediction confidence
- Apply Pareto optimization to multi-objective optimization problems
Modeling with advanced analytics has enabled scientists in almost every industry to leverage data-driven decision-making to optimize products and processes. Until recently, however, a lack of domain specific data scientists has limited the application of modeling in the pharmaceutical industry. With the deployment of a global data science platform like BIOVIA Pipeline Pilot, modeling and advanced analytics become available to bench scientists in your organization to drive the innovation engine of drug discovery. Pipeline Pilot allows data science to become a fundamental part of your organization’s R&D and manufacturing culture, adding laser precision to predictions and giving scientists a deeper understanding of trends within their individual compound libraries.
Pipeline Pilot will help your organization to:
- Tackle more complex problems and look at areas of research that you previously thought were inaccessible
- Expand your data science team’s expertise across the organization
- Ensure that data science best practices become the standard for your organization
- Empower bench scientists to help develop actionable insights with the latest analytics and machine learning capabilities
- Make the most of all of your data
- Make decisions and drive R&D from a larger pool of data
BIOVIA Pipeline Pilot is the foundation for an effective data science platform that supports standardization of practices across the enterprise – data ingestion, data preparation, advanced modeling (both testing and deployment), data exploration and visualization. Pipeline Pilot can also be utilized as a core configuration/customization language to extend the power of many other BIOVIA applications (LIMS, chemical and biological registration, and other routinely used biopharma software systems).
To get the most out of your Pipeline Pilot investment, it is important to work with a quality informatics consultant who understands the platform, your applications, and your scientific domain, and will work with you to complete the necessary business analysis and requirements gathering. Astrix has an experienced team of Pipeline Pilot experts that can provide services for all your Pipeline Pilot needs including implementation, instrument and system integration, workflow and application authoring, custom reports, user training and more. As the premier BIOVIA Services partner, Astrix provides increased value to customers through cost effective design, implementation and/or integration services leveraging BIOVIA applications and software.