Contributed by Dr James Malone, Chief Executive Officer, FactBio.
Being an ontologist can be challenging at times. For starters, you have to try and explain to most people what an ontologist is. Typically, it’s not one of the ‘profession’ options when completing online applications. Then there are the ontologies themselves. The truth is that working out which ontologies are reliable and unreliable, even having worked with them for years, can be difficult and it’s often more art than science. For those new to the area this challenge can be overwhelming.
Perhaps the question I was asked most often in my previous life as a lead ontologist at the EBI was ‘which ontology should we use?’ In 2012 I published a blog calling for more objective ways of evaluating an ontology. The blog focused more on those developing bio-ontologies than consuming and it was a fairly computer science perspective, but it was my first attempt to elucidate how I think we might start objectively measuring the good and bad aspects of an ontology. I followed this up with Professor Robert Stevens and we published work in 2013 in which we began to quantify one aspect of community developed bio-ontologies – the activity levels as a function of their evolution over time.
Most recently, in 2016 we published a paper titled ‘Ten Simple Rules for Selecting a Bio-ontology‘ in PLOS Computational Biology. The intention was to write a more accessible article for those less familiar with bio-ontologies to help make an informed decision about which ontology one might use. This was about weighing up the evidence and making an assessment based on these criteria. It creeps towards something more objective, something more like science.
Ontology Science (?)
As an ontologist, when I had to tick a box for ‘profession’, typically I would select the option for ‘data scientist’ or ‘research scientist’. In truth ontologies aren’t really like other research or data. They aren’t data in the classic sense – they do not capture the output of an experiment and report those results ipso facto. They aren’t really research either, they don’t aim to test a hypothesis through some form of experimental design and produce observations.
Software is probably the closest you’d get, but again they don’t conform to all the software norms. Finding bugs isn’t something you can easily automate with tests, though you can do some of this. Testing the content – the semantics – is really the thing most people want to do with an ontology – does the definition look correct for, say epithelial cell or obesity? Do I agree with it? Does the wider community agree with it? Are the subclasses of those classes also correct? These are the bits that drag ontologies away from software and more into the territory of social machine. As such they’re hard things to evaluate quantifiably without considering the communal machinery they embody.
When we wrote the ten rules paper we were aware that we were dealing with something a bit outside the biology norm. As such it is not a direct answer to that most asked question; we don’t name a set of bio-ontologies you should use. The paper is really about allowing a person to weigh up what we consider to be the most important aspects of a bio-ontology in order to make an informed decision based around a person’s needs. If an ontology doesn’t wholly follow a rule, then it is not to say that ontology is useless but rather that it less useful, depending on your requirements. And that’s the key factor here, it really does depend on your requirements.
Think about what those requirements are and then reach a conclusion. Consider some of the following uses:
- Ontology as a curation tool. An ontology, used as a vocabulary is useful, but a straightforward bag of words may be enough. Here you’re looking for coverage – are all the words (and synonyms )we need present?
- Ontology as a search tool. Being able to go up and down the hierarchy to find more general or more specific concepts is very useful in search. If you want data annotated to brain you probably also want everything annotated with forebrain, hippocampus and so on. An ontology can help. Here, the structure is important, but it may not necessarily require a very complex representation to work. A SKOS representation which simply uses a ‘broader than’ and ‘narrower than’ pointer to up and down would work equally as well.
- Ontology as a data integration tool. It’s likely that, if you’re integrating with external data sources, you’ll require ontologies that are ‘well used’, i.e. that other data you wish to integrate with is also using the same ontology. This is an important consideration because if you use an ontology you brew yourself for instance, you will be faced with an ontology mapping problem, something the Pistoia Alliance has indeed recognised as an important but difficult and expensive challenge. Adding another ontology adds another mapping data point and adds to the challenge. The bottom line is, if you need to map to an ontology somebody has got to do it and that costs time and hence money. I know this because I had to do some of this work while at EBI and I’m still doing some of it now at FactBio. It is seemingly never-ending…
- Ontology as a database schema. Ontologies are commonly used in representations for data published on the web in RDF. Here you might want to ensure you have a standardised ontology using the W3C’s Web Ontology Language (OWL). You may also be interested in the way the ontology is structured – are appropriate relationships (predicates) used, do the parts you want to relate connect together in the ontology, for instance an assay, the probes, the input sample, the output data, the analysis performed, and so on. Ontologies can describe incredibly rich semantics but with each additional step into the richness comes a cost in terms of making a query and understanding the schema. Complex schema are, somewhat intuitively, more complex to query. So as complex as you need it, but no more.
- Ontology as a knowledge representation model. I could argue all ontologies are models of some domain of knowledge, even if that domain is something trivial and limited. Sometimes that model is internal to an organisation and built to formalise internal knowledge for communication and application use. Most commonly however, the need is that an ontology represents something towards a consensus of opinion between experts in some biological domain. You’re probably looking towards the considerable efforts of OBO Foundry ontologies in this case, often the poster child of community led bio-ontology development. But what about outside the OBO Foundry? Consider the above, some of the most used resources for annotating medical records are not OBO Foundry ontologies. We see (non-ontology) resources such as SNOMED CT, ICD 10 and so on used in medical records and they don’t always obey the OBO Foundry principles. But are they useful? Clearly a lot of people think so.
This is where I come back to the ten rules paper. What are your requirements? Is it important that others use it, that it is open, that there are textual definitions, that you can contribute to it, that it represents a community consensus? It’s about weighing up the pros and cons, and evaluating against your requirements in an honest and objective way and cutting through social dogma which can sometimes plague this field. The truth about ontologies is there is only one rule really; there are no rules that apply to everyone.