Project participants: Simon Thornber (GSK, project lead), Giles Ratcliffe (Pathforwards Ltd., project analyst), GSK, AstraZeneca, Lundbeck, Novartis, Roche
Capitalizing on sequencing data has required life science organizations to build and maintain infrastructure and associated workflows to search gene repositories. But such infrastructure confers little competitive advantage. And maintaining a core set of sequence databases and associated tools requires, as the Red Queen told Alice, "all the running you can do to keep in the same place"
The sequence services project aimed to define and demonstrate shared, hosted services for securely storing and mining both proprietary and public domain gene databases. Through two project phases, the effort developed securely managed services that would provide subscribers the following benefits:
- Reduced infrastructure management costs
- Scalable services for managing ever-increasing data sets
- The ability to rapidly capitalize on new analysis and data management workflows
- Secure and well-tested environments developed according to the specifications of industry leaders in NGS
The team selected four vendors to provide proofs of concept of the Phase 1 requirements. The resulting systems were publicly demonstrated and published in April 2011 after testing and ethical hacking by AT&T. The findings were written up in GIT Laboratory Journal.
Issues addressed included:
- Business models for delivery
Phase 2 of the Sequence Services project built on the Phase 1 achievements to develop platforms for analyzing and storing next-generation sequencing (NGS) data. The three resulting systems were demonstrated at the Pistoia Alliance Annual Conference 2012.
Main aims of the effort were to:
- Foster secure, easy collaboration among organisations and individuals without any risk to company firewalls
- Store large amounts of data in an extensible way and manage compute resources effectively
- Accommodate internal internal capacity planning cycles in pharma, which are typically much longer than the time over which demand varies
- Provide shared access to the latest public data and applications, making them more cost-effective and useful for all
- Reduce costs and convert sequence data storage and analysis from a capital expense to an operational expense