“Imagine trying to support genomic research when everybody has different names for the same genes,” is the analogy used by Dana Vanderwall from BMS to describe the significance of the HELM (Hierarchical Editing Language for Macromolecules) project. This Pistoia Alliance project is designed to allow standardised encoding for biologics and meaningful representation of their component data in a graphical format.
HELM, which was developed at Pfizer, is a way to solve the problem of how to consistently represent macro molecules such as oligonucleotides, peptides, proteins, vaccines and anti-bodies. These complex structures have long challenged informaticians because they are large and unwieldy and are impractical to represent at the atomic level. At the same time, the presence of non-natural chemical modifications makes it impossible to represent them by sequence alone.
“As the science and applications of biologics have evolved to create novel molecules and combinations, the established standards could no longer support these molecules as we use them now. So there is currently no fully open, non-proprietary standard, meaning scientists spend a lot of time re-inventing the ‘macromolecule description’ wheel. What we need is a system to allow better sharing of data between collaborators,” says Dr Vanderwall. “HELM gives us a single consistent way to describe macromolecules which can be used across industry and academia.”
However, the benefits from HELM are not just purely scientific. A study has shown that the adoption of HELM could provide significant cost savings across the industry, largely by ensuring that researchers do not have to spend time creating their own notations. “HELM could also be really important in opening up a whole new area in computational biology,” adds Dr Vanderwall. “The development of HELM could be a real catalyst in the development of new commercial tools capable of handling large molecules.”
Pfizer is working with the Pistoia Alliance to convert HELM from an internal technology to an industry standard, which can be used universally. A new HELM toolkit and editor will be made available as open-source software, enabling researchers to use the notation without the need to code it themselves.
Further details are available in the scientific paper published on the HELM language, which can be found at http://pubs.acs.org/doi/abs/10.1021/ci3001925?journalCode=jcisd8







