We are delighted to announce the next of our partner challenges for the hackathon.
These challenges are designed to show how deep learning can work with life science and health related data to make an impact on advancing research into tackling disease and supporting patients.
The ExCAPE compound activity prediction challenge, presented by Janssen and Imec
Drug discovery comprises the identification, improvement and documentation of candidate drugs before their evaluation in patients; it involves the time-consuming and costly quantification of the activity of chemical compounds in various in vitro tests, known as assays. Machine learning approaches that can accurately predict the in vitro activity of compounds based on their chemical structure, promise to significantly reduce the time and cost to bring novel medicines to patients. On modest data scales and on an assay-per-assay basis, such approaches are already well established. But pharmaceutical companies are keen to also harness recent the advances in machine learning and big data analytics that are transforming other industries. They have the data to do that: the enterprise warehouse of a pharmaceutical company typically annotates millions of compounds with their activity in one or more of the many thousands of documented assays.
ExCAPE (Exascale Compound Activity Prediction Engine) is an EU-funded collaboration that unites pharmaceutical, technological and academic partners to harvest the power of supercomputers to speed drug discovery using machine learning. To enable cross-fertilization with machine learning advances in contexts other than drug discovery, the
ExCAPE partners have compiled and reformatted compound activity data from the public space for easy exploration. This dataset, which is a fair approximation of the compound activity data in a pharmaceutical data warehouse, is freely available to the field. As part of the Pistoia Alliance Deep Learning Hackathon, two of the ExCAPE partners, Janssen and Imec are formulating a challenge around this dataset.
A compound activity warehouse can be thought of as a table or matrix with compounds as rows, assays as columns, and in vitro observed activities as values. The fill rate of this matrix, i.e. the number of compound:assay combinations with an annotated activity is typically lower than 1%. For all the compounds, a sparse but extensive vector of binary features is provided that represents the chemical structure of the compounds.
The Pistoia hacketeers are challenged to propose innovative and performant predictive models for a handful of assays (columns) that will be disclosed at the start of the event. The assays to model are selected to be expected to benefit from the inclusion of activity data in other assays in the datasets during model training.
More challenges are being developed and will be announced shortly.
We look forward to seeing you at the event.
For any questions on how to get involved as competitors, judges or provide a challenge, please contact David Proudlock.