Last week at the AI webinar, Assay Central came up, but we did not have a chance to discuss during the event. So we asked Sean Ekins to give us some details on what it is, and how it came about and what how it uses AI:

Introducing Assay Central: an AI example in Life Science

by Sean Ekins – CEO and Founder, Collaborations Pharmaceuticals, Inc.

Being invited to present on AI for the recent Pistoia Alliance webinar got me to think about some of the observations I have made over the past 20+ years using computational approaches for drug discovery and their influence.

I started out in 1996 pretty much computer naïve as a wet lab postdoc doing In vitro work on understanding enzymes involved in drug discovery. After attending a conference and seeing some homology models I was struck by the cool factor as well as wondering how commercial modeling software could help me make predictions in my area of research. This was one of those pivotal moments and flipped my career into a different direction (mixing wet and dry science). I was a pharmacologist who wanted to use computer algorithms and that’s pretty much what I am still doing except now we have more data, more powerful computing and many more software options available.

The added wrinkle to this has been that after a decade I realized the models and data we were working on was just not accessible to many people. That next Aha moment occurred chatting with a pharma company thinking how much they could save if open source software for cheminformatics performed comparably with commercial tools. This then lead to collaborations, publications, grants from the NIH and ultimately making more software open-source. Now with Alex Clark (Molecular Materials Informatics), we have taken these efforts to the next level, using the open source tools and machine learning algorithms to create a platform for data curation and models. We developed a prototype of Assay Central software and used this with a wide variety of structure activity data from sources both public and private, formatted and unformatted, for enabling neglected, rare or common disease targets. In the space of a few months we created error checking and correction software, built and validated Bayesian models (Fig 1, 2) with the datasets that were collected and cleaned and developed new data visualization tools (Fig 3). We have also made a small sample of the models available (www.assaycentral.org).

In short, Assay Central readily enables the user to compile structure-activity data for building computational models and can be used to create selections of these models for sharing with collaborators as needed. This software can in turn be used for scoring new molecules and visualizing the multiple outputs in various formats. We have utilized Assay Central at my company Collaborations Pharmaceuticals Inc. in our ongoing internal projects working on Ebola, HIV and tuberculosis small molecule drug discovery as well as over 10 external projects and a contract with an external customer.

There are many more areas in which we would like to develop the software and our more recent preliminary analysis of different machine learning algorithms and descriptors suggests we could use Assay Central to deliver deep learning models as these appear to perform markedly better.

As Collaborations Pharmaceuticals, Inc. is a small company we are focused on developing Assay Central as a tool that could help us identify new drug candidates for our collaborators or for our own projects, and do so efficiently. We have also provided a unique approach to readily sharing the models with collaborators privately or with the community. In many ways I have come full circle developing tools that could enable the next generation of researchers to learn from the massive amounts of accumulating public data. Who knows what will happen next but is clear there is far more awareness of machine learning because of the proliferation of consumer products and software!

Figure 1. Molecule and prediction visualization in Assay Central

Figure 2. Atom feature visualization of bayesian score contribution for different models.

Figure 3. Hexplot example showing Tanimoto similarity with ECFP6 fingerprint