Tips for Using Pipeline Pilot as an Integration Platform

Posted on Lab Informatics. 11 December, 2017

Whether part of a large enterprise software implementation or a project to pass data between two applications, integration projects always present challenges. In the laboratory environment, achieving integration goes beyond merely exchanging data between applications. Laboratory equipment/automation, specialized data sets, and validation complicate an already difficult task and create unique challenges. In addition to the goals of streamlining business operations, eliminating data silos and minimizing redundant data entry, integration also has to be done in such a way that supports regulatory compliance and promotes data integrity.

One of the most common integration platforms used in laboratory informatics is Pipeline Pilot from BIOVIA. Pipeline Pilot is designed to enable scientists and IT professionals to rapidly create, test, and publish scientific workflows that automate the process of accessing, analyzing and reporting on data. So, what are some of the options for integrating scientific applications via Pipeline Pilot? And what are the best practices for effective use of Pipeline Pilot in the laboratory environment?

Advantages of using Pipeline Pilot

One of the challenges in working in the laboratory enterprise environment is the sheer diversity of capabilities and file types required to automate data and process workflows. In addition, specialized data formats such as SDfiles, RDfiles, genomic data sets (FASTQ, PDB, etc), and open source algorithms and specialized programming languages (e.g., R), make it problematic to bring the data together and transform all in one place. This process can be especially challenging when attempting to automate workflows across the organization.

This is where Pipeline Pilot brings a great deal of value for IT applications.  A diverse set of collections are available that offer fine-grained, “ready-to-run” components for data transformation, querying, analysis, and reporting. From text mining to cheminformatics to ADMET to sequence analysis and reporting, Pipeline Pilot offers a rich toolkit to perform common reusable workflow tasks.

For example, both open source and industry-standard algorithms commonly used to manipulate and transform scientific data are available out of the box. You can perform substructure or similarity searches, read in genomic or plate-based data sets, analyze images using complex algorithms, align genomic sequence data, trim short or long reads and assess run quality. In addition, components specific for reading data from a variety of laboratory instruments are available that enable you to capture information from instruments and incorporate them quickly into the workflow.

Dealing with unstructured text? The text mining collection allows you to mine PDF files and other key unstructured information as well. And no need to code custom reports to deploy across the organization – the Reporting collection enables you to quickly configure reports and dynamic web applications and deploy them without coding. All of this is built on an open framework that also supports your own internal and/or third-party databases and technologies. And for large data sets, you can take advantage of grid computing, parallelization and threading, which are embedded in many sub-components.

What You Should Know about the Integration Collection

There is, in fact, a collection of Pipeline Pilot components designed specifically for integration. For developers, the Integration Collection ships with detailed documentation and example applications to help you get started.

Here are some examples of the kind of things you can do with the integration collection:

  • Retrieve data and store results directly in your own corporate databases, and integrate existing in-house or third-party programs as computational services
  • Access external databases
    • Create custom components using standard SQL
    • Supports ODBC/JDBC
    • Integrates with databases such as Oracle and SQL
    • Templates provided for integration of molecular and biological databases, and cartridge technologies from a variety of commercial vendors.
  • Integrate external applications
    • Command line integration, Telnet/FTP, SSH
    • VBScript component for automating COM based applications. These applications can have a graphical user interface (such as Microsoft Office applications) or they can be non-graphical.
    • Web services integration via SOAP (Simple Object Access Protocol)
    • Java and Perl based components for in-process integration and custom component building
  • Write your own components. You can integrate external applications or write new components such as data readers, writers, viewers and manipulators in standard languages such as Java, Perl, or VBScript. These provide a natural way to integrate with applications written in, for example, BioPerl and BioJava. The components run in the same process as Pipeline Pilot and give you rapid and flexible access to data as it flows through a pipeline. Each of these languages is based on a well-documented object model representing the data and component interfaces.

Best Practices for the Pipeline Pilot Platform

Pipeline Pilot has been in production across the industry for more than twenty years and is a mature software platform. With this maturity comes complexity and a learning curve. Pipeline Pilot plays well with standard programming languages and interfaces, but the platform itself requires a learning curve for advanced capabilities. It is simple enough for a bench scientist to leverage effectively, but often you need the help of a programmer or expert to scale the solution and/or deploy it across the organization in a secure and consistent way. In addition, the workflows should be optimized for performance to avoid unnecessary costs. This is particularly true when large data sets are involved, and also when cloud computing is used and the meter is running.

Unlike open source platforms, the rich functionality of Pipeline Pilot incurs a licensing cost for both developers and end users, and it is important to bear any cost or process limitations in mind when you build protocols for the enterprise. From both a cost and project methodology standpoint, there is a difference between using Pipeline Pilot as a bench scientist to automate routine calculations and building a protocol intended for a large group or even the enterprise scientific community. The latter requires a disciplined approach, where the protocols should be optimized for performance and meet the end user community specifications. This approach, like any software development project, requires upfront work in business analysis, requirements gathering, and even process engineering in some cases.

When designing a protocol for the enterprise, you may find that some groups need a different workflow and therefore require slightly different Pipeline Pilot protocols, and you may have to deploy those protocols differently. For example, some groups may want to access the protocol to perform a calculation for a single required field as part of a registration step, whereas others may need to create a workflow on a set of molecules (in bulk) in the context of a workbench to make decisions about candidate promotion. The protocols are similar but there are differences in how you must integrate these with the end-user applications.

In a case such as this, it is helpful to consult the experts on the applications with which you need to build those integrations. You may know how to write the Pipeline Pilot protocol but may not be aware of the vendor APIs and/or underlying database structure of the internal application that needs to be integrated. In addition, it is quick and easy to build a Pipeline Pilot protocol, but you must remember that Pipeline Pilot is an integration tool for both the individual and the enterprise. Before you roll out a protocol to a group of users, you should vet the solution with the larger community and work with IT to ensure the solution is both secure and cost-effective.

Conclusion

BIOVIA’s Pipeline Pilot is a versatile laboratory automation and integration tool. Pipeline Pilot allows companies to effectively establish a digital thread throughout the product lifecycle to drive innovation and efficiency in the lab. By automating manual, repetitive data preparation and collation tasks, this platform allows scientists to focus on the science, and thereby improve scientist job satisfaction and drive innovation.

To get the most out of your Pipeline Pilot investment, it is important to work with a quality informatics consultant who understands the platform, your applications, and your scientific domain, and will work with you to complete the necessary business analysis and requirements gathering. Astrix has an experienced team of Pipeline Pilot experts that can provide services for all your Pipeline Pilot needs including implementation, instrument and system integration, workflow and application authoring, custom reports, user training and more. As the premier BIOVIA Services partner, Astrix provides increased value to customers through cost effective design, implementation and/or integration services leveraging BIOVIA applications and software.

About the Author

Jonathan Lawson is a Senior Informatics Engineer for Astrix Technology Group in the Informatics Professional Services Practice.  He is focused on developing custom informatics solutions that involve the utilization of Pipeline Pilot from Biovia. Mr. Lawson offers 10 years of experience in scientific software development and informatics engineering. During his career, Mr. Lawson has worked with multiple laboratories, in various fields of science, to provide tailored informatics solutions to optimize laboratory workflows

A Selection of Current Customers