Current Projects
Benchmarks for Natural Language Data Mining with LLMs
Develop comprehensive benchmarks for Natural Language data mining and Scientific Chat applications to facilitate the rigorous evaluation of Large Language Model (LLM) performance across key stages of the data-to-insight pipeline in pharmaceutical R&D, ultimately supporting more effective deployment of AI in drug discovery.
Current Projects
Chemical Exchange Format Committee
The Chemical Exchange Format Committee provides a neutral forum where pharma companies, software vendors, and standards owners can collaboratively tackle chemical data formats and drive standardization.
Current Projects
Minimal Metadata Set (MNMs) for Repurposing Non-Clinical In Vivo Data
The Minimal MetaData Set (MNMS) project brings together biopharma, academia, vendors, and regulators to demonstrate how sharing harmonised metadata alongside raw locomotion data enables fully reproducible analyses. Focusing on untreated C57BL/6J mice in standard conditions, the project highlights the value of rich, FAIR-compliant metadata in advancing data reuse and reproducibility.
Current Projects
Pharmacovigilance Systems and Processes Standards
The goal of the Pharmacovigilance Systems and Processes Standards (PS2) program is to establish a set of standard solution requirements for pharmacovigilance systems, starting with Case Intake in 2025 via the PS2 – Case intake project, to avoid replication of effort during solution selection and implementation and foster innovation.
Current Projects
In Vitro NAM Data Standards
We propose to develop harmonized standards for describing animal alternative methods, their characterization, and develop best practices for management and analysis of data.
Current ProjectsData Driven Value
Global Substance Registration System
The goal of the system is to make it easier for regulators and other stakeholders to exchange information about substances in medicines, supporting scientific research on the use and safety of the ingredients in medicinal products.
Current ProjectsData Driven Value
In Vitro Pharmacology
The goal of the In Vitro Pharmacology project is to develop a shared data template that standardizes description vitro methodologies.
AICurrent ProjectsFAIRProjects
LLM and NLP Use Case Database
This collaborative project was launched to create a bottom-up qualitative database of Natural Language Processing (NLP) use cases, enabling practitioners in pharmaceutical companies to share successes and failures with peers. In 2024, the database was expanded to include use cases relying on Large Language Models (LLMs), with the explicit aim of identifying factors that drive success or failure of NLP and LLM projects.
Current ProjectsKnowledge GraphsOntologies
Large Language Models in Life Sciences
The use of Large Language Models such as GPT-4, presents a transformative opportunity for pharmaceutical R&D, particularly in target discovery and validation.
Current ProjectsFAIR
Datafairy Bioassay Annotation
This project aims to convert unstructured assay protocol descriptions into a high-quality FAIR data set, and create standards for this information.
Current Projects
CMC Process Ontology
The Pistoia Alliance is developing an advanced semantic architecture for a Pharmaceutical CMC Process Ontology. The objective is to create a domain lexicon and taxonomy that extend the ISA‑88 framework, enabling standardized laboratory and plant production process recipes.
Current ProjectsIDMPOntologies
IDMP – O
The project goal is to build an IDMP Ontology that enables deep, semantic interoperability based on FAIR principles to enhance and augment the existing IDMP standards.