Carlos Castro Iragorri (Universidad del Rosario) and Richard Shute (Pistoia Alliance)
The first European Biohackathon took place in Paris from November 12-16 and the Pistoia Alliance was pleased to be able to co-sponsor with Elixir-Europe a project looking at how researchers might usefully track biosamples in the Hyperledger blockchain .
As a reminder, blockchain or decentralized ledger technology (DLT) is a collection of technologies: cryptographic security, decentralization, digital registry, smart contracts, rules and incentives, which allows collaboration among institutions with different levels of trust (The slides from the Pistoia Alliance’s webinar: “So that’s what a blockchain is.” are here). So far most public and permissionless blockchains have focused on providing a decentralized registry for trading cryptocurrencies (Bitcoin) and a decentralized computational platform to process smart contracts (Ethereum). More recently other public or private permissioned blockchains (e.g. Hyperledger, Everledger, Monax) or similar technologies (e.g. Corda, Ripple) have focused on using some of the features mentioned before to provide solutions within an enterprise or a community of enterprises. Although it is not yet clear if most user cases require a blockchain solution, the technology has already been made available, not only at the research level, but also at an enterprise level; there are 170 solutions that are already deployed in the space of permissioned ledgers .
Scientific advances in many disciplines are achieved increasingly through a process of collaboration among an ever larger group of communities that work in a decentralized manner to achieve different objectives. At the intersection of biomedical sciences and bioinformatics there are large communities of researchers and professionals who create and re-use samples. These samples are obtained and transformed through a very detailed and rigorous “manufacturing process” that is not very different to the stages outlined in a supply chain management diagram (see for example this from Medium. The supply chain management of biomedical samples, for example pharmaceuticals, follows various stages before a medicine, and its associated information, reaches the hands of doctors and patients: drug discovery, drug development, publishing and manufacturing, marketing, sales and distribution. At every stage in the supply chain, it is very important to capture provenance and any transformation of the biosample.
Hackathon – Blockchain for sample management
During the Paris Biohackathon in Europe, we developed a proof of concept on the Hyperledger platform for blockchain sample management using the BioSamples system as a surrogate for any type of biomedical, research or screening sample . BioSamples is a project maintained by the European Bioinformatics Institute (EBI) whose purpose is to store, supply descriptions and metadata on biological samples for researchers in industry and academia. The BioSamples database and the data management services receive new submissions as well as curations on the existent data and provide a queryable interface. During the hackathon we designed a business network application that has the same data structure as the one contained in BioSamples; some of the field capture unique identifiers, relationships to other samples, characteristics and external references (for example ontologies). In addition, we hard coded in smart contracts the main interactions of users with the data: the submission process and the curations. Because we used blockchain technology we were able to track every interaction with the data at the level of the user and the changes in the data. Currently, the BioSamples system administrators have a difficult time disentangling in efficient manner different curations on an existing sample, for example, currently any researcher may transform the data; and one of the most common transformations is the introduction of links to the relevant ontologies.
Importantly, the Hyperledger Fabric framework introduces event emission , which can be coded to a curation transaction. When a particular curation is performed the event emission informs which user performed the curation and more importantly keeps a track of the pre-curation and the post-curation data.
The resultant demo from the hacking project provided a first step toward delivering a tractable provenance model for sample management. In a later stage the project hopes to spark interest in blockchain technologies in the bioinformatics community. One possible extension would be to incorporate a functionality to rebuild a particular state of the sample base on the application of sequential or non-sequential curation layer. This would allow researchers to recover any state of the sample that is most relevant for their particular research.
Many thanks to Alex Garcia (BASF) for his contribution to the preparation of the project background and materials; and to Rafael Jimenez (Elixir) for financial support.