Vocabulary Standards: Helping Scientists Talk the Same Talk

In every day discussions we take for granted that we have a reference terminology (in my case, the English dictionary) that provides definitions and usage of the concepts necessary to describe a certain “fact.” But even humans get confused by overlapping terms. Take the buffalo sentence:

Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo.

This sentence uses three different valid definitions of the word “buffalo,” so that it is parsed as “THE buffalo FROM Buffalo THAT ARE buffaloED BY buffalo FROM Buffalo, buffalo (verb) OTHER buffalo FROM Buffalo.”

*whew*

We in life science frequently confuse words and meanings. Does a document tagged as “muscle wasting” refer to the indication, a side effect, or the name of an internal research team? Does “SNP” refer to sodium-nitroprusside or single nucleotide polymorphism? And if you think this is confusing for humans, it’s worse for computers. All these overlapping terms can mean data is misinterpreted or, worse, completely missing in areas where it is relevant.

Members of the Pistoia Alliance, including major pharma, academia, and content providers, have together written a paper that summarises the current state and issues around basic biomedical vocabularies. How we identify the “things” that constitute scientific endeavours—from projects to hypotheses to experimental results—is critical to enabling data to be accurately stored and reused. This isn’t a new topic, but has become much more relevant recently. As everyone is aware, our sector is moving to many more fluid business models, where contract research organisations, academic groups, and non-profit agencies intermingle with industry participants in a complex and variable matrix of interactions.

We argue in our paper that while different companies have invested in internal vocabulary work, internally focused approaches do not tackle the wider need to ensure that in all future interactions, data will be captured and communicated in a way that all team members understand. Indeed, as the paper points out, we’re already seeing the negative consequences when global research collaborators fail to speak a “common language.”  Precompetitive collaboration to develop such standards fits perfectly with the need for universal languages to describe science.

Will the publication of this paper change things over night? No. Our only aim was to reassess the issue in light of practical business considerations and argue for the need to take a fresh look at the problem. However, as with any new advance, there will be difficulties and questions. How do we do this? How do we resolve conflicts?  Who is responsible? Such questions will only be addressed through pilot projects and continued exploration of issues around shared vocabularies. We think that the Pistoia Alliance is uniquely positioned to handle this work, but we need concrete funding proposals and members willing to invest time and resources to make this happen.

I’ve discussed this issue a little more over on my company’s blog, but the question is what Pistoia members think. What ideas do you have to give this initiative life?

Posted in Pistoia Alliance Blog.

Leave a Reply

Your email address will not be published. Required fields are marked *