This project aims to convert the biological assay protocols contained in research publications into a machine-readable FAIR format. At this time, we are in the pilot project phase. We are engaged in development of business processes, selection of technologies and data models, and planning of guidelines for collaborative data sharing that would preserve privacy of the project members’ R&D pipelines.
Why is this important?
There are over one million biological assay protocols published to-date. Biological assay is a popular data type for post-hoc data mining for research program planning, but most assay descriptions are not in a suitable form. It is expected that making assay information available in a FAIR format would increase the efficiency of bench scientists engaged in experiment planning and enable research that currently requires tedious expert literature review. The common data model can simplify data-sharing in collaborative research and publishing of research results. At this time most of the pharmaceutical businesses already have internal programs for conversion of assay protocols into a FAIR format. The effort for annotation of public assay protocols in these internal programs is therefore duplicated. Shifting the annotation of these public assays to a collaborative project would result in immediate cost savings to the member companies.
What will the project achieve?
We will create a data product that would provide bioassay protocol information in a machine-readable FAIR form ready for data mining. Since the largest global pharmaceutical companies support this effort, the resulting data model, data ingestion business processes, and software would become de-facto industry standards that would further facilitate data sharing and collaboration.
How will the project do this?
We will use Natural Language Processing in combination with manual review of automatically produced assay annotations. Specific technologies will be selected in the pilot phase in the coming months.