Introduction

This editorial will explore the expectations linked to a growing infrastructure around biomedical ontologies. Since they become an integral part of biological and biomedical research for the annotation of data – its integration, analysis, and visualization [1] – the demand for a place arises in which the scientific community can be made aware of new ontologies, major updates to existing ontologies, development and updates to ontology-based tools, and the discussion of ontology-based methods. The JBMS thematic series on ‘Biomedical Ontologies’, and the annual JBMS Ontology Issue, will fill these gaps and establish a hub of information about biomedical ontologies and their scientific applications.

The role of ontologies in biological and biomedical research has steadily increased in conjunction with the increase in quality and quantity of data that is being collected in all areas of biology. Not only is the number of ontologies increasing, their size growing, their relevance in biomedical research rising and they penetrate more areas of biology and biomedicine; ontologies have also begun to play a key part in the interpretation of the biomedical data as well as inspire the development of new tools for end users and new analysis methods for biomedical scientists. As a result, data integration and interoperability has become a relevant cost factor in the execution of big data projects and has been acknowledged by national and international projects, for example by the Elixir initiative, which aims to establish a biomedical IT infrastructure across Europe [2]. The development and application of ontologies will be an integral part of such an infrastructure for the main reason that data interoperability requires tools to explicitly describe the semantics of terms used to characterize the features of data, and ontologies are widely used to fill this role.

Which ‘ontology’ did you mean?

There has been considerable debate in the ontology research community as to what constitutes an ontology in biology [35] and what properties an ontology should have. Traditional axes of classification for ontologies include the expressivity of the language used to develop and distribute the ontologies, the applications for which the ontologies are intended (i.e. who uses the ontology and how) and the domain covered by the ontology. Arguments pertain

  1. (a)

    To the degree of formality of the language used to express the information in an ontology, i.e., whether a formal language such as the Web Ontology Language (OWL) [6] is used or a graph-based representation without explicit formal semantics,

  2. (b)

    To the complexity of the ontology description, i.e., whether rich axioms and relations are used or whether a taxonomy, accompanied with textual definitions of classes in an ontology, is sufficient,

  3. (c)

    To the interpretation of what constitutes a “class” or “relation” in an ontology, i.e., whether a class in an ontology refers to something in the world or to a mental construct, and

  4. (d)

    To the orthogonality of the content, i.e., what content has been incorporated from other ontologies and for which purposes.

Depending on the intended applications, artifacts called “ontologies” are developed with any combination of these properties.

In the JBMS thematic series on Biomedical Ontologies, we employ a broad interpretation of “ontology” and include artifacts that primarily provide vocabularies for the purpose of data annotation as well as formal theories that provide a rich representation of certain aspects of biomedicine. To annotate data within a database, a taxonomy of classes with labels and textual definitions is often sufficient, while more expressive formal constructs would be required if the ontology is developed to verify data integrity.

Representing ontologies

The annotation of research data using an ontology enables integration of data both within a database and across multiple databases [1]. Ontologies provide a controlled set of classes together with an explicit (formal or informal) representation of their meaning, a hierarchy between these classes and complex axiom patterns (“relations”) [7] between the classes, and ontologies facilitate data integration when shared across multiple databases. The taxonomic relations allow integration through general or specific aspects even if exact matches between data items can not be identified; and axioms between classes serve as complex relations that facilitate further data integration.

Today, most biomedical ontologies are developed in shared formal languages, either the OBO Flatfile Format [8] or the Web Ontology Language (OWL) [6]. Both languages are tightly coupled and thus allow translations between them [9, 10] so that the OBO Flatfile Format can now be considered to be a fragment of OWL [8].

The expressivity of a biomedical ontology is determined by the particular subset of OWL that is being used to formulate the ontologies, and serves as a major distinguishing factor. It characterizes the knowledge that can be expressed (such as whether the ontology may contain contradictions) and determines the complexity of general tasks such as querying the ontology and categorizing data with the ontology. OWL 2 comprises three emerging profiles (OWL EL, OWL QL and OWL RL) apart from OWL DL [11]. The OWL EL profile forms a subset which (a) allows to specify a taxonomy between classes (i.e., to state that one class is the subclass of another), (b) existential restrictions (i.e., to state that instances of one class must stand in a relation to some instance of another class), and (c) disjointness of classes (i.e., to state that two classes cannot share any instances), and has been found useful for a significant number of biomedical ontologies [1215].

Domains of ontologies and their applications

Ontologies are of particular importance in domains in which large volumes of data are being generated, and the emergence of high-throughput technologies has increased the importance of ontologies in some domains. In the 1990s, research on discovering gene functions in diverse organisms required a means to standardize gene functions for comparison within and across multiple organisms: this need induced the development of the Gene Ontology (GO), which turned into one of the most important resources in genomics research [16]. In a similar way, the Sequence Ontology (SO) [17] emerged as a response to the availability of more and more sequencing data, and to provide compatibility between different data formats for biological sequences and their features.

Different anatomy ontologies specify the organismal components for multiple species, and – on a smaller scale of granularity – the developmental relations and features of cell types are characterized by the Celltype Ontology [18]. Phenotype ontologies are also available for multiple species and are widely used for the annotation of the abnormalities observed in mutagenesis experiments [1921] as well as for the characterization of diseases and drug effects [22].

Further domains covered comprise chemical entities to annotate drugs and theirs biological activities [23], structures, and pharmaceutical applications [23, 24] for data interoperability [25], and ontologies for experimental settings, e.g., the BioAssay Ontology [26], the Experimental Factor Ontology [27], the eagle-i ontology [28] and the Ontology of Biomedical Investigations [29], capture the biomedical metadata to characterize experiments. Similarly, ontologies for environmental conditions denote data samples and their surroundings upon their encounter [27, 30]. Ontologies are also being used to annotate and classify journal articles [31, 32], pathways [33], and specific biological entities [34].

Ontologies, together with their annotations, are extensively used in the analysis of biomedical data, for example in the form of Gene Set Enrichment Analysis (GSEA) [35] for the interpretation of gene expression datasets. GSEA makes use of the structure of the Gene Ontology to identify statistically over- or under-represented classes based on gene expression observed in two biological states. Similar methods are also applied to other ontologies such as the Human Disease Ontology [36], the Neuro Behavior Ontology [37], or even the full set of ontologies contained in BioPortal [38].

Another analysis method relying on ontologies is to compare data items and identify meaningful biological relations between them based on semantic similarity [39]. This approach has been applied to identify protein-protein interactions [40], classify chemicals [41], suggest candidate genes involved in diseases [42, 43] and repurpose drugs [44, 45]. When applying semantic similarity to compare two data items, the choice of ontology determines the kind of similarity that is revealed: using GO will provide functional similarity, chemical entities from ChEBI will provide chemical structure similarity, and using phenotype ontologies will result in phenotypic similarity.

The integration of multiple ontologies – in particular from different domains – can reveal relations between annotated data items. For example, anatomy ontologies for cross-species comparisons – linking homologous or analogous anatomical structures – can be used to transfer and compare annotations for multiple species [15, 46]. For this purpose, the UBERON anatomy ontology [15] was developed. It enables cross-species phenotype representations that have been applied to deciphering human GWAS data based on comparisons with mouse model phenotypes [47] as well as the prioritization of candidate genes and drug targets based on data from model organisms [42, 44, 48, 49].

Additionally, the rich axiom systems of some ontologies help to verify and classify data according to constraints on biological entities expressed in the ontologies. One example of such an application has been the classification of proteins using ontologies [50], in which an ontology provides rules according to which decisions about the protein family are made. The same, or similar, constraints expressed in ontologies can be used to verify data, i.e., determine whether a data item complies with the constraints expressed in the ontology or not [51].

The main challenges for research in biomedical ontologies

Evaluation of ontologies and the development of a robust research methodology

Establishing effective methods to evaluate ontologies – both qualitatively and quantitatively, if possible – towards fitness for a purpose is a major challenge in ontology research [52]. Determining the “best” ontology for a given purpose becomes important, and criteria such as the ontology structure, formality, its complexity, its coverage, as well as the amount of data annotated with it contribute to this decision. Effective methods for evaluation are particularly required for domains in which multiple ontologies overlap in their content and intended applications, such as for human diseases where ICD, MeSH, SNOMED CT, the Human Disease Ontology [53], the Human Phenotype Ontology [22], the Unified Medical Language System (UMLS) [54], and more specific ontologies such as the Infectious Disease Ontology [55], Malaria ontology [56], etc. are being used.

The research methodology underlying the development of biomedical ontologies will also improve when effective evaluation criteria are being applied. The Ontology Summit [57] has addressed this need with the topic “Ontology Evaluation Across the Ontology Lifecycle” in 2013, and ontology evaluation featured prominently in panel discussions at the International Conference on Biomedical Ontologies 2013 (ICBO) and will play a prominent role at ICBO2014. The JBMS thematic series on Biomedical Ontologies will follow the community discussions to address ontology evaluation principles and methods, and their instantiation in community-agreed guidelines and standards.

Standards and Interoperability: Linked Data and beyond

Efficient reuse of ontologies, and the knowledge they contain, in the organization of open, linked data possibly accessible through multiple public interfaces (SPARQL endpoints) from different data providers is another challenge [58]. The main task is to balance the complexity of processing and querying ontologies, which commonly require the use of an automated reasoner, with the need to efficiently query large, linked datasets. In particular when multiple ontologies are used to annotate datasets and automated reasoning over these ontologies provides the means for finding relations between the classes in these ontologies, the need for an infrastructure to support combined queries over ontologies with queries over linked data using SPARQL arises.

Recently, some applications have come forward in which automated reasoning is used to answer complex queries over ontologies and subsequently retrieve data [5961]. At the same time, major providers of biological and biomedical data such as the European Bioinformatics Institute (https://www.ebi.ac.uk/rdf/) and UniProt (http://beta.sparql.uniprot.org/) provide access to their content through public SPARQL endpoints. In the future, we expect exciting applications that combine reasoning over ontologies in ontology repositories, such as the Ontology Lookup Service [62], BioPortal [63] or OntoBee [64], with (federated) SPARQL queries and provide a genuinely knowledge-driven way for exploring linked biomedical data.

Knowledge-based analysis of biomedical data

Integration of ontologies – and the knowledge they contain – in the analysis of biological and biomedical data is yet another challenge. Ontologies have been successfully integrated with biomedical analysis pipelines [35, 39, 65, 66]. However, these analysis methods mainly exploit the ontologies’ taxonomy and often make use of the axioms and constraints only implicitly.

Many ontologies contain a lot more information than taxonomic relationships, and some recent work has begun to exploit some additional information – disjointness between classes in an ontology – to improve computation of semantic similarity [67]. How the rich information that is further contained in formalized ontologies can be incorporated in the analysis of biomedical data remains a research question, and novel methods will likely appear as the infrastructure and tool support around ontologies evolves.

The JBMS thematic series on “Biomedical ontologies”

The JBMS thematic series on Biomedical Ontologies will provide the venue for publishing research about biomedical ontologies, their development, integration and quality assurance. On a regular basis, we will have open calls for papers on specific topics, and we welcome community input for important challenges to address.

The annual JBMS Ontology Issue will become a central part of the thematic series where we focus on ontologies that have already been demonstrated to be useful for scientific applications. The Ontology Issue is intended for a wide audience of readers; it does not specifically target researchers in ontology, but rather biological and biomedical researchers who may want to apply ontologies in their domain and require an overview over the currently available artifacts they can already use. In the Ontology Issue, new ontologies can be described as well as updates to existing ontologies. Updates in regular intervals produce a better understanding of the progress in developing an ontology and the major changes to its content, structure and applications.

In the future, we aim to establish another regular call in which ontology-based tools and applications will be described and updates to these tools published. Additionally, the thematic series will provide a venue to publish conference and workshop papers, and interested working groups are encouraged to suggest special topics or to contribute to existing publication cycles.

We aim to make the JBMS thematic series on Biomedical Ontologies take a central role in the exploration of current research in biomedical ontologies, and we intend to work closely with the research community to achieve this aim. All researchers are invited to express ideas and demands, ask for feedback on topics, and provide suggestions for novel developments.