Delivering Data Driven Value

Evolving Challenges in Chemical Interoperability – Rob Owen

Rob Owen’s presentation explores Pfizer’s journey in managing chemical data formats, beginning with the transition from V2000 to V3000 and the interoperability challenges this created—particularly around stereochemistry, reactions, and degeneracy in molecular representations. It details the company’s strategic decision to maintain compatibility with both formats, the reliance on Chemdraw interpretations, and the move toward CXSMILES to address the “lossy” nature of SMILES and InChI. The shift toward web-based, text-friendly formats like CDXML, the complexities of copy-paste and reaction handling, and the need for consistent rendering across multiple toolkits are emphasized. Owen advocates for open standards, broader vendor support, and focusing on functionality rather than file-format lock-in, all while acknowledging the evolving role of SaaS solutions and the importance of enabling chemists with flexible, interoperable tools.

Interoperability in Cheminformatics – Gerd Blanke

The challenges of data exchange in a FAIR world

In his presentation, Gerd Blanke of StructurePendium Technologies GmbH addresses the persistent challenge of interoperability in cheminformatics, focusing on the critical role of reliable chemical exchange formats for structures and reactions. He explains that existing formats are often lossy, leading to data quality issues, higher costs, and unpredictable workloads—especially during database mergers where discrepancies and legacy structures emerge. Poorly implemented formats undermine FAIR principles and hinder AI readiness. Blanke calls for regular cross-industry dialogue among data engineers, vendors, and format owners to share experiences, identify limitations, and agree on standardized improvements, positioning the Pistoia Alliance as an ideal forum to coordinate these efforts.

Interoperability in Cheminformatics – Susan Leung

The challenges of data exchange in a FAIR world

In this presentation, Susan Leung from AstraZeneca examines the interoperability challenges in cheminformatics, especially in multi-vendor DMTA (Design–Make–Test–Analyse) ecosystems where diverse tools, formats, and modalities must exchange data. She highlights that current data exchange formats (e.g., SMILES, molfile, CDXML, HELM) can be lossy, inconsistent, and subject to conflicting standards, creating problems in representation, search, and identity. Case studies illustrate issues with stereochemistry encoding, biopolymer representation, and toolkit incompatibilities, particularly when multiple standards or format extensions are in play. Leung emphasizes the need for better education, transparent communication, and systematic feedback processes, proposing that whether improving existing formats or creating new ones, the guiding principles must be clarity, documentation, and collaboration.

Challenges in Cheminformatics: The View of An Independent Consultant

In this presentation, independent consultant Thomas Doerner outlines five major challenges in cheminformatics from his experience working with large pharmaceutical and chemical companies: unFAIR chemical data (born in ELNs without early standardization), inconsistent representation of complex compounds (e.g., organometallics, polymers, nanomaterials), limitations of traditional chemical graphs for real-life substances that require additional contextual data, hesitancy and technical gaps in adopting open-source cheminformatics tools, and the need to integrate cheminformatics into “non-classic” environments like cloud-native platforms and corporate data lakes. He stresses that these issues hinder data findability, interoperability, and reuse, and calls for the Pistoia Alliance community to agree on priority challenges, understand the business value of solving them, and form representative working groups to develop solutions collaboratively.

How to Keep Linear Compute Scaling with Ever-Growing Data?

This presentation by Ramil Nugmanov addresses the challenge of maintaining linear compute scaling when working with ever-growing datasets in AI-assisted drug discovery, particularly for DNA-encoded libraries (DELs) containing billions of molecules. By breaking combinatorial libraries into fragments and using SMILES concatenation with placeholder atoms, the method avoids storing every entity, reducing memory use from gigabytes to ~10 MB and compute time to minutes. Search efficiency is improved by fragmenting queries, reconstructing fingerprints on the fly, and using bitwise operations to calculate Tanimoto similarity without full molecule reconstruction. The approach enables rapid similarity search, efficient CPU cache usage, and parallelization, while avoiding deep neural architectures that are inefficient for such data. The key message: don’t apply standard solutions blindly—design fragment-based, resource-efficient methods for massive chemical spaces.

IDMP Ontology July 2025 Community of Interest Meeting

Join us for another IDMP Ontology webinar where we discuss important achievements of our IDMP-O project. These sessions are designed to improve data alignment and interoperability across the pharmaceutical industry.

Agenda
  • Introduction
  • Status and Progress of Phase 4 use cases
    • Batch Tracking
    • Regulatory Data Alignment: EMA PMS and other jurisdictions
  • Recent Progress of IDMP Ontology
  • Introduction to Sustainability Model of IDMP-O
  • Upcoming Events
Speakers
  • Aditya Tyagi, Pistoia Alliance
  • Fabian Muttach, Boehringer Ingelheim
  • Raphael Sergent, Accurids
  • Elisa Kendall, EDMC
  • Toby Broom, CrownPoint Technologies
  • Cameron Gibbs, CrownPoint Technologies

Pharma General Ontology (PGO) Phase 1

Data inter-operability is a key enabler to accelerate life science workflows. Yet, many concrete hurdles exist. For example, organizations use different definitions for similar core concepts, requiring significant amounts of mappings across data sets even within a given company. The first objective of the Pharma General Ontology (PGO) project is to define a set of agreed-upon core entities, the PGO “core concepts”, and recommendations for associating controlled terminologies to service data exchange among Pharmaceutical Industry stakeholders. By reusing community-agreed terms and definitions, PGO aims to help organizations enhance data interoperability, integration, discovery, and reuse.
 
The webinar will address:

  • PGO vision
  • Deliverables Phase 1 & Next Steps
  • Q&A Session with the participants
Speakers
  • Philippe Rocca-Serra / Astrazeneca / Senior Director – FAIR Collaboration – Data Office
  • Martin Romacker / Roche / Product Manager Roche Data Marketplace (RDM)
  • Markus Hartmann / Merck Group / Global Product Lead Data Semantics
  • Birgit Meldal / Pfizer / Senior Manager, Enterprise Data Standards and Ontologies
  • Peter McQuilton / GSK / Senior Product Owner – Reference Data, Information & Data Architecture and Ontologies
  • Elena Businaro / Chiesi Group / R&D and PPM Digital Strategy and Execution Support Head​
  • Joshua Daniel Valdez / Novo Nordisk / Head of Ontology and Semantic Engineering

Moderation: Giovanni Nisato / Pistoia Alliance / Project Manager PGO

FAIR 2024 Business Survey Report

Insights and Recommendations

The FAIR 2024 Business Survey Report presents key insights and recommendations on how pharmaceutical and life science companies are applying FAIR Data Principles across their organizations. Based on input from survey respondents and industry experts, the report highlights emerging business value, challenges, and the growing need for clear ROI frameworks to support FAIR’s strategic impact.

CMC Process Ontology Community of Interest May 2025

The Pistoia Alliance is building a pharmaceutical CMC process ontology based on the ISA88/95 framework.

Our aim is to standardize laboratory and plant production process recipes to establish standardized definitions, facilitate digital technology transfers and integration with execution systems to capture structured process data for material lot genealogy tracking, streamlined technology transfers, and advanced process analytics; thereby enhancing efficiency and transparency throughout the pharmaceutical production lifecycle.

Having shown value of a semantic approach to CMC process management during the initial PoC phase, we have now continued with the next phase of the work to move beyond the PoC to a usable Ontology.

For more information please visit our dedicated project page: CMC Ontology.

AI-Ready Data and Why FAIR Data Matters in Life Science Companies

As life-science organizations race to adopt (generative) AI, one point begins to stand out: your AI is only as good as your data. While large language models (LLMs) offer powerful capabilities, they’re not tailored to specialized scientific data—and do need a solid data foundation. Making data Findable, Accessible, Interoperable, and Reusable (FAIR) enables AI systems to deliver more accurate, reliable, and cost-effective outcomes.
 
Key points include:

  • Why many AI projects are still fundamentally reliant on robust data management
  • How FAIR Data complements LLMs through explicit semantics and structure
  • The critical role of data quality and governance in AI success

Whether you’re a data steward, scientist, or innovation leader, this session will help you get more perspective aligning your data and AI strategies for maximum impact. Join us on May 21 to explore why durable AI strategy needs a robust data strategy including FAIR Principles. Don’t let unstructured data hold your AI back—make it FAIR.

Speakers
  • Angelika Fuchs, Roche, Chapter Lead, Data Products & Platforms
  • Martin Robbins, Ontoforce, Product Manager
  • Tom Plasterer, XponentL Data, Managing Director, Knowledge Graph & FAIR Data Capability
  • Ted Slater, EPAM, Managing Principal, Scientific Informatics Consulting

Hosted by Giovanni Nisato, Project Manager, Pistoia Alliance

FAIR Business Survey Report

Beyond Research: Realizing the Value of FAIR Initiatives

The FAIR Community has undertaken an extensive survey to assess the impact of FAIR data in life science companies. Our new report examines the most significant business drivers for implementing FAIR data principles and the key outcomes being delivered.

Pharmaceutical companies are applying FAIR in research and extending it to development, clinical, and operational functions. FAIR initiatives that started five or more years ago now deliver tangible improvements in data management processes and enhanced operational efficiency. Thanks to all our members who contributed to this vital research.