The aim of the Unified Data Model (UDM) project is to create and publish an open and freely available data format for storage and exchange of experimental information about compound synthesis and testing.

Project deliverables

Version 6.0 was released in February 2020 delivers a stable release with a number of enhancements:

  • Inclusion of controlled vocabularies compatible with existing standards:
  • Support for embedding and referencing of analytical data
  • More detailed information on data provenance (authors’ affiliations, organisations)
  • Support for samples
  • Support for citations and molecules inside reaction variations to support simplified conversion from RD files
  • Fixed semantic and data types of several entities
  • and many others

UDM development is hosted on GitHub.

Why is this important?

Without UDM there is a lack of consistency in data formats coming from different systems and this makes it difficult to share experimental information. At best, time and resource is taken up trying to interpret different data formats, at worst valuable data is ignored because it can’t be shared and understood.

What will the project achieve?

We will create and publish an open and freely available data format for storage and exchange of experimental information about compound synthesis and biological testing. We aim to make UDM an industry-wide data standard used to facilitate data sharing and collaboration.

How will the project do this?

Collaboration across vendors and Pharma customers has been the key to building a set of requirements that have driven the development of UDM. The project started with the file format used by Elsevier to upload chemical reaction data into Reaxys, through listening to representation of the community at the project team, requirements were gathered and the format built .