Delivering Data Driven Value

The FAIR Toolkit

Supports the implementation of the Findable, Accessible, Interoperable and Re-usable (FAIR) guiding principles

Key Challenges In Developing A Data Governance Framework

Data is the key asset in biopharmaceutical research; it is highly valued, but how well is it managed?

Developing a rigorous and robust data governance strategy is the foundation for driving innovation and improving the efficiency and effectiveness of R&D. New technologies, such as AI/ML, currently being utilised in modern R&D including its laboratories are dependent on the machine-ready availability (FAIR) of well-categorised data. Furthermore, security and data integrity is crucial for ensuring regulatory compliance.

The first session in the Pistoia Alliance Data Governance Webinar Series will address some of the key challenges in developing a data governance framework.

FAIR Implementation with Mondo

This webinar explores how the Mondo ontology bridges clinical and translational research, featuring insights from Chris Mungall and hosted by the Pistoia Alliance’s FAIR Implementation Community of Interest.

The Cellosaurus: A FAIR Repository to Help Researchers Navigate the Confusing Universe of Cell Lines

By Amos Bairoch, University of Geneva and Swiss Institute of Bioinformatics

This webinar presents the Cellosaurus, a manually curated knowledge resource which aims to describe all cell lines used in biomedical research. It provides information on immortalized, naturally immortal and finite life cell lines. Its taxonomy scope encompasses both vertebrates and invertebrates. Currently it describes over 122,000 cell lines from 684 species. For each cell line it provides a wealth of information, cross‐references and literature citations.

The Cellosaurus is available on the ExPASy server (https://web.expasy.org/cellosaurus/) and can be downloaded in different formats under the CC BY 4.0 license. The Cellosaurus is a key resource to help researchers identify potentially contaminated/misidentified cell lines, thus contributing to improving the quality and reproducibility of research in the life sciences. It is part of the Resource Identification Initiative (RII) which aims to enable resource transparency within the biomedical literature through the use of Research Resource Identifiers (RRIDs). Some of the information in the Cellosaurus is uploaded into Wikidata thus allowing semantic connection of cell lines to other biological objects. We would like to expand its use in the context of the FAIRification of biological data by providing an RDF version of the resource and a SPARQL endpoint query service.

FAIR by Design

This webinar will explore how the FAIR (Findable, Accessible, Interoperable, and Reusable) data principles can serve as a key enabler to automate and accelerate R&D process workflows. Through the lens of a real-world use case, the session will illustrate the practical implementation of FAIR, highlighting its role in driving faster and more impactful science. By making data more reusable and enhancing its value, FAIR also facilitates greater collaboration and partnership through improved data sharing. Ultimately, the webinar aims to show how FAIR interoperability makes data truly actionable, unlocking its full potential across research ecosystems.

Knowledge Graphs and Semantic Models for Drug Discovery and Healthcare

Data for drug discovery and healthcare is often trapped in silos which hampers effective interpretation and reuse. To remedy this, such data needs to be linked both internally and to external sources to make a FAIR data landscape which can power semantic models and knowledge graphs.

Semantics of Data Matrices & the STATO Ontology

This webinar presents the Statistics Ontology, STATO which is a semantic framework to support the creation of standardized analysis reports to help with review of results in the form of data matrices. STATO includes a hierarchy of classes and a vocabulary for annotating statistical methods used in life, natural and biomedical sciences investigations, text mining and statistical analyses.

Data Market Evolution: A Future Shaped by FAIR

This presentation reviewed the challenges in identifying, acquiring and utilizing research data in relation to an evolving data market. Strategic solutions were examined in which the FAIR principles play a key role in the future of data management.

CEDAR Work Bench for Metadata Management

With the explosion of interest in both enhanced knowledge management and open science, the past few years have seen considerable discussion about making scientific data “FAIR” — findable, accessible, interoperable, and reusable. The problem is that most scientific datasets are not FAIR. When left to their own devices, scientists do an absolutely terrible job creating the metadata that describe the experimental datasets that make their way in online repositories. The lack of standardization makes it extremely difficult for other investigators to locate relevant datasets, to re-analyse them, and to integrate those datasets with other data.

The Center for Expanded Data Annotation and Retrieval (CEDAR) has the goal of enhancing the authoring of experimental metadata to make online datasets more useful to the scientific community. The CEDAR work bench for metadata management will be presented in this webinar. CEDAR illustrates the importance of semantic technology to driving open science. It also demonstrates a means for simplifying access to scientific data sets and enhancing the reuse of the data to drive new discoveries.

Results of the ontology alignment evaluation initiative 2019

The Ontology Alignment Evaluation Initiative (OAEI) aims at comparing ontology matching systems on precisely defined test cases. These test cases can be based on ontologies of different levels of complexity (from simple thesauri to expressive OWL ontologies) and use different evaluation modalities (e.g., blind evaluation, open evaluation, or consensus). The OAEI 2019 campaign offered 11 tracks with 29 test cases, and was attended by 20 participants. This paper is an overall presentation of that campaign.

CDx, NGS and Regulation: Five Perspectives from the Pistoia Alliance

Companion diagnostics (CDx) are essential to the practice of precision medicine. Next-generation sequencing is an increasingly important tool in the development of CDx. However, for CDx to be deployed, many different biopharma industry sectors need to collaborate. This paper outlines some of the challenges and opportunities perceived by the biopharmaceutical industry, the Europe Molecular Quality Network, a regulatory agency, a notified body and a CDx service provider.