Member-Led Project

DataFAIRy Bioassay Annotation

This project aims to convert unstructured assay protocol descriptions into a high-quality FAIR data set, and create standards for this information.

Why is this important?

This project has the potential to:

Revolutionize biotech R&D by standardizing research methods and improving the reproducibility of experiments.
Increase efficiency for bench scientists: reduce assay search, planning, and set-up time, allow to skip experiments known to have failed in the public domain
Increase efficiency for data scientists: will help with harmonizing and merging datasets, cleansing data for analytics, informatics, ML, and AI applications
Support precompetitive collaborations, including a growing number of data science-focused initiatives that benefit from interoperable scientific data
Decrease the costs of internal curation efforts of bioassays
Potentially simplify the preparation of regulatory submissions
For assay kit vendors and CROs, a way to market products by having references to them in major public databanks like ChEMBL and PubChem
For scientific publishing organizations, potentially increase the quality of publications provided that the authors use the common assay annotation standard and deposit their methods into a public data bank prior to or simultaneously with publications. Possible reduction of workload of peer reviewers and editors
For funders of research, increase the value of the science that was funded

What will the project achieve?

The project will enable:

Costs to be shared for converting published (unstructured) biological assay descriptions into high-accuracy machine-readable FAIR data objects described by a community-defined data model tailored to address current and future essential business questions
The data model to be FAIR and based on public ontologies such as the BioAssay Ontology
The data model to be developed in a community-wide collaborative way and to eventually be promoted to the industry standard for the publication of assay metadata
The generated FAIR data to be made available to the public after a period of exclusivity for partnering organizations

How will the project do this?

A number of common issues will be addressed by this project:

There are currently >1.4 million biological assay protocols contained in research publications.
The biological assay is a popular data type for post-hoc data mining. Most of these published data and metadata are not in a form suitable for automated mining. They are partially annotated in e.g., ChEMBL and PubChem, but the volume, depth, and quality of these annotations are inadequate for addressing many current and future business questions.
Significant labor efforts (estimated between 4 and 12 weeks per assay) are spent in research organizations to select, set up, and validate biological assays. Some of these efforts fail completely and lead to waste that could have been avoided had the assay selection and set-up processes been more efficient.
Manual curation of bioassay data and metadata is done for smaller datasets and systems. Fully automated curation via NLP and auto-classification is not accurate enough.
About half of the organizations surveyed by the Pistoia Alliance in 2019 already engage in the conversion of unstructured assay protocols into machine-readable form. This is a high-cost process that also leads to duplication of effort across organizations.
Every year new assay protocols are developed and published. At the same time, some assay protocols become obsolete either due to technology development or because of the organizations that create and maintain them go out of business. These factors contribute to difficulties with the interpretation and reproducibility of historically performed assays.

Steering Committee

Community Members

Project Resources

Maturity Frameworks supporting the implementation of FAIR data principles

Event tile for playing FAIR with AI webinar

Playing FAIR with AI: Supporting Scientific Discovery

Press Release: Pistoia Alliance Launches Lab of the Future Report,...

Swiss Personalized Health Network – From clinical (routine) data to...

View all News Items >

Last Updated on August 19, 2022 by David Prior
Categories: Artificial Intelligence, Current Projects, FAIR Implementation

Events

14 Oct 2024 - 07 Nov 2024

General Ontology Training

Book this event >

25 Oct 2024

Pistoia Alliance Annual General Meeting

Book this event >

04 Nov 2024

Rapid Biomedical Insight Discovery with AI

Book this event >

07 Nov 2024

How to optimize knowledge graphs for generative AI in the pharmaceutical industry

Book this event >

This website intends to use cookies to improve the site and provide you with a better browsing experience. If you select "Continue" or continue to browse the site without customizing your choices, you agree to our use of cookies. Find out more in our Online Privacy Statement.

Continue More Info

Search...

Member-Led Project

Contact

Project Wiki

Why is this important?

What will the project achieve?

How will the project do this?

Steering Committee

Community Members

Project Resources

Maturity Frameworks supporting the implementation of FAIR data principles

Playing FAIR with AI: Supporting Scientific Discovery

Press Release: Pistoia Alliance Launches Lab of the Future Report,...

Swiss Personalized Health Network – From clinical (routine) data to...

Stay Up-to-Date

Events

14 Oct 2024 - 07 Nov 2024

25 Oct 2024

04 Nov 2024

07 Nov 2024