Semantic Enrichment of ELN Data (SEED) will enable a FAIR-aligned comprehensive semantic capture and translation of data across eLN providers at the point of entry. The output will be computer-readable standard data, increasing capacity for provenance and attribute connection for insight and analysis

Project Deliverables

Ocotber 2020: Phase 1 successfully completed!

  • A new standard assay Ontology (ADME PD) now available to the community
  • A working exemplar of ADME and PD workflow to semantically tag unstructured text
  • An active community that supported Phase 1 and are sharing ideas for Phase II

seed project phase 1

Why is this important?

As volumes of experimental data being captured continue to increase, the challenge of using it and the need to align with FAIR also increase. An obvious place to start is where the data is captured in ELNs and to release the value, previously trapped, in the unstructured text. This process focussing on ELNs is just a starting point for the industry to improve use of textual data captured.

This project aims to address a number of common issues:

  • Volumes of unstructured eLN content are exponentially growing
  • Most eLN content is free text, unstructured information and thus challenging to identify common concepts, to effectively search and extract deeper Insights
  • Retrospective data analysis and searching of generally unstructured content is challenging
  • Data Workflows to/from eLNs can be relatively restricted in approach. Connectivity to Study Registration systems to enable Scientists to reuse such metadata through API to eLN are a must
  • The availability of persistent identifiers challenges the capability of making data to be digitally discoverable and hinders aggregation of data inter-applications
  • Valuable insights are hampered by unstructured content, preventing deeper data analysis and reducing competitive advantage
  • When pharma companies license multiple eLN applications, a holistic approach and guidance for enabling consistent semantic enrichment across data in these applications is desirable

Phase 2 – Expected Delivery

Delivers a new standard for Drug Safety terms and adds to Bioassay ontology, This will enable Assay/Study type ontology coverage for all of eCTD Module 4 of NDA (New Drug Application) submission.


  1. Extend Ontology work to include Drug Safety (3/4 months)
  2. Continue with Phase 1, including parameters leading to analysis e.g. knowledge graphs (2/3 month)
  3. Develop a workflow for structured data (e.g., Registration) (4/5 months)
  4. Scale up across major ELN providers delivering an agnostic solution (8/9 months)


Active Contributors

Image credits:

National Cancer Institute