The Methods Hub project seeks to build a bridge for analytical methods to transition from text-based information to fully digitized, machine-readable instruction sets. In this new paradigm, human interpretation and transcription go away. Data integrity, method reproducibility, and interoperability increase providing value to many in the Pharma industry, including manufacturers, CRO/CMOs, and regulators.
Why is this important?
High-Performance Liquid Chromatographym (HPLC) with its corresponding Chromatography Data System (CDS) is one of the most widely-used separation, quantification, and identification techniques in Drug Discovery.
Methods are utilized to document all the parameters for specific analyses. Methods development is a very time-consuming process but is a necessary part of performing HPLC analysis. Until now, methods were proprietary to the brand of HPLC and the CDS system used by the developer of the method. And while the industry uses many methods that are similar or the same, transfer of these analytical methods across business units or across companies is challenging and time-consuming, primarily done through manual drive movement, email, and transcription. Additionally, the existing process requires the users to rebuild the methods in the new environment based on free-text documents.
Given the widespread adoption of software and systems (LIMS, ELNs, LES, etc. systems) that must accurately represent methods in ways that machines can store, export, import, and compare what was specified in the method against executed results, the current paradigm presents problems for Data Entry, Reproducibility and Results Analysis.
Methods that are authored in the free text are often entered into electronic systems in 2 ways:
- Humans (e.g. analytical chemists) transcribe the method into electronic systems
- Software programs using NLP, of various degrees of sophistication and quality, “screen scraping” the content to avoid the need for manual transcription
Both of these approaches introduce the possibility for error as well as interpretation of the method, depending on the subject matter expert or even the Natural Language Processing algorithm’s interpretation. This can impact reproducibility as well as introduce unintended errors.
What will the project achieve?
community, the Pistoia Alliance is equipped to manage this project. We will bring together key Pharmaceutical companies dealing with these issues along with key solutions providers to digitize analytical methods into a cloud-based Methods Db to enable seamless exchange between analytical chemists in the drug discovery process.
Key deliverables from this project include:
- Standardized methods descriptions based on a CDS instruction parameter set derived from the Allotrope Foundation Ontology and a semantic data model
- Methods Db based on Zontal Space with a public API specification and human-readable method representation
- Methods Db to CDS adapters for at least two CDSs
Methods Hub could lead to commercially available repositories of methods where digital as well as text-based method information with appropriate metadata, in a machine-readable format, would be shared and exchanged. The platform would allow for both free and paid downloads of monographs or manuscripts, the interchange between CRO/CMO and Pharma, and also free access to methods that are currently open source.
This phase of the project is due to be completed by the end of summer 2022.
How will the project do this?
To date, Pistoia Alliance has, in collaboration with Allotrope Foundation, developed a Methods database based on Zontal Space with a public API specification and human-readable representation of a limited number of chromatography instruction sets. This will provide a good entry point for the Methods Hub Project.
All available sources for Analytical Method information will be evaluated in terms of their information structure from free-text, via semi-structured to fully digital information e.g. based on the Allotrope Foundation Ontology (AFO) and other available ontology providers.
For free-text and semi-structured methods, the extraction of key metadata would be considered.
Natural Language processing tools could be used to pre-populate a standard format for a fully digital representation of the method. Alternatively, well-implemented methods can be exported from a Chromatography Data System (CDS) and just validated by comparison to the text-based description.