Ontologies in Cheminformatics

  • Janna Hastings
  • Christoph Steinbeck
Living reference work entry


Ontologies are structured controlled vocabularies which encode domain knowledge, backed by sophisticated logic-based computational tools. They enable knowledge-based applications which harness automated reasoning for inference and knowledge discovery. They also enable the semantic and standard annotation of large-scale data, which is ever relevant in the modern age of increased high-throughput data generation and sharing in scientific research. Established chemical ontologies include ChEBI, which encodes the structural classification of chemical entities of biological interest together with their roles. More recently, the chemical information ontology was created to standardize the annotation of cheminformatics software and descriptors. In this chapter, the technology, structure and applications of ontologies within cheminformatics will be described.


Gene Ontology Semantic Similarity Chemical Entity Biological Interest Atomic Part 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



ChEBI is supported by the BBSRC under grant agreement number BB/K019783/1 within the “Bioinformatics and biological resources” fund.


  1. Baader, F., Calvanese, D., McGuiness, D., Nardi, D., & Patel-Schneider, P. (2003). Description logic handbook (2nd ed.). Cambridge: Cambridge University Press.Google Scholar
  2. Batchelor, C., Hastings, J., Steinbeck, C. (2010). Ontological dependence, dispositions and institutional reality in chemistry. In A. Galton & R. Mizoguchi (Eds.), Proceedings of the 6th Formal Ontology in Information Systems Conference, Toronto.Google Scholar
  3. Belleau, F., Nolin, M., Tourigny, N., Rigault, P., & Morissette, J. (2008). Bio2RDF: Towards a mashup to build bioinformatics knowledge systems. Journal of Biomedical Informatics, 41, 706–716. DOI10.1016/j.jbi.2008.03.004.Google Scholar
  4. Bolton, E. E., Wang, Y., Thiessen, P. A., & Bryant, S. H. (2008). PubChem: Integrated platform of small molecules and biological activities (pp. 217–241). American Chemical Society, Washington, DC.Google Scholar
  5. Camon, E., Magrane, M., Barrell, D., Lee, V., Dimmer, E., Maslen, J., Binns, D., Harte, N., Lopez, R., & Apweiler, R. (2004). The Gene Ontology Annotation (GOA) database: Sharing knowledge in uniprot with gene ontology. Nucleic Acids Research, 32(suppl 1), D262–D266. DOI10.1093/nar/gkh021.
  6. Chagoyen, M., & Pazos, F. (2011). MBRole: Enrichment analysis of metabolomic data. Bioinformatics, 27(5), 730–731. DOI10.1093/bioinformatics/btr001.
  7. Chen, B., Dong, X., Jiao, D., Wang, H., Zhu, Q., Ding, Y., & Wild, D. (2010). Chem2Bio2RDF: A semantic framework for linking and data mining chemogenomic and systems chemical biology data. BMC Bioinformatics, 11(1), 255. DOI10.1186/1471-2105-11-255.
  8. Chepelev, L., Riazanov, A., Kouznetsov, A., Low, H. S., Dumontier, M., & Baker, C. (2011). Prototype semantic infrastructure for automated small molecule classification and annotation in lipidomics. BMC Bioinformatics, 12(1), 303.
  9. Chepelev, L. L., Hastings, J., Ennis, M., Steinbeck, C., & Dumontier, M. (2012). Self-organizing ontology of biochemically relevant small molecules. BMC Bioinformatics, 13, 3.CrossRefGoogle Scholar
  10. Corbett, P., & Murray-Rust, P. (2006). High-throughput identification of chemistry in life science texts. In M. Berthold, R. Glen, & I. Fischer (Eds.), Computational life sciences II (pp. 107–118). Springer, Berlin/Heidelberg.Google Scholar
  11. Courtot, M., Juty, N., Knüpfer, C., Waltemath, D., Zhukova, A., Dräger, A., Dumontier, M., Finney, A., Golebiewski, M., Hastings, J., Hoops, S., Keating, S., Kell, D. B., Kerrien, S., Lawson, J., Lister, A., Lu, J., Machne, R., Mendes, P., Pocock, M., Rodriguez, N., Villeger, A., Wilkinson, D. J., Wimalaratne, S., Laibe, C., Hucka, M., & Novère, N. L. (2011). Controlled vocabularies and semantics in systems biology. Molecular Systems Biology, 7, 543.Google Scholar
  12. Ferreira, J. D., & Couto, F. M. (2010). Semantic similarity for automatic classification of chemical compounds. PLoS Computational Biology, 6(9), e1000937. DOI10.1371/journal.pcbi.1000937.Google Scholar
  13. Ferreira, J. D., Hastings, J., & Couto, F. M. (2013). Exploiting disjointness axioms to improve semantic similarity measures. Bioinformatics, 29, 2781–2787.Google Scholar
  14. Fu, G., Batchelor, C., Dumontier, M., Hastings, J., Willighagen, E., & Bolton, E. (2015). PubChemRDF: Towards the semantic annotation of pubchem compound and substance databases. Journal of Cheminformatics, 7, 34.CrossRefGoogle Scholar
  15. Gkoutos, G. V., Schofield, P. N., & Hoehndorf, R. (2012). The units ontology: A tool for integrating units of measurement in science. Database, 2012. DOI10.1093/database/bas033.
  16. Grau, B. C., Horrocks, I., Motik, B., Parsia, B., Patel-Schneider, P., & Sattler, U. (2008). OWL 2: The next step for OWL. Web Semantics, 6, 309–322. DOI10.1016/j.websem.2008.05.001.
  17. Grego, T., Ferreira, J. D., Pesquita, C., Bastos, H., Vicosa, D. V., Freire, J., & Couto, F. M. (2010). Chemical and metabolic pathway semantic similarity. Technical report, LASIGE, Faculty of Sciences, University of Lisbon.Google Scholar
  18. Gruber, T. R. (2009). Ontology. In L. Liu & M. T. Özsu (Eds.), Encyclopedia of database systems. Springer.
  19. Harland, L., Larminie, C., Sansone, S. A., Popa, S., Marshall, M. S., Braxenthaler, M., Cantor, M., Filsell, W., Forster, M. J., Huang, E., Matern, A., Musen, M., Saric, J., Slater, T., Wilson, J., Lynch, N., Wise, J., & Dix, I. (2011). Empowering industrial research with shared biomedical vocabularies. Drug Discovery Today, 16(21–22), 940–947. DOI10.1016/j.drudis.2011.09.013.
  20. Hastings, J., Chepelev, L., Willighagen, E., Adams, N., Steinbeck, C., Dumontier, M. (2011). The chemical information ontology: Provenance and disambiguation for chemical data on the biological semantic web. PLoS One, 6(10), e25513. DOI10.1371/journal.pone.0025513.Google Scholar
  21. Hastings, J., de Matos, P., Dekker, A., Ennis, M., Harsha, B., Kale, N., Muthukrishnan, V., Owen, G., Turner, S., Williams, M., & Steinbeck, C. (2013). The ChEBI reference database and ontology for biologically relevant chemistry: Enhancements for 2013. Nucleic Acids Research, 41(Database issue), D456–D463.Google Scholar
  22. Hastings, J., Magka, D., Batchelor, C., Duan, L., Stevens, R., Ennis, M., & Steinbeck, C. (2012). Structure-based classification and ontology in chemistry. Journal of Cheminformatics, 4(1), 8. DOI10.1186/1758-2946-4-8.
  23. Haug, K., Salek, R. M., Conesa, P., Hastings, J., de Matos, P., Rijnbeek, M., Mahendraker, T., Williams, M., Neumann, S., Rocca-Serra, P., Maguire, E., Gonzalez-Beltran, A., Sansone, S. A., Griffin, J. L., & Steinbeck, C. (2012). Metabolights–an open-access general-purpose repository for metabolomics studies and associated meta-data. Nucleic Acids Research. DOI10.1093/nar/gks1004.
  24. Hill, D. P., Adams, N., Bada, M., Batchelor, C., Berardini, T. Z., Dietze, H., Drabkin, H. J., Ennis, M., Foulger, R. E., Harris, M. A., Hastings, J., Kale, N. S., de Matos, P., Mungall, C. J., Owen, G., Roncaglia, P., Steinbeck, C., Turner, S., & Lomax, J. (2013). Dovetailing biology and chemistry: Integrating the Gene Ontology with the ChEBI chemical ontology. BMC Genomics, 14, 513.Google Scholar
  25. Hoehndorf, R., Dumontier, M., & Gkoutos, G. V. (2012). Identifying aberrant pathways through integrated analysis of knowledge in pharmacogenomics. Bioinformatics, 28(16), 2169–2175. DOI10.1093/bioinformatics/bts350.
  26. Hoehndorf, R., Oellrich, A., Dumontier, M., Kelso, J., Rebholz-Schuhmann, D., & Herre, H. (2010). Relations as patterns: Bridging the gap between obo and owl. BMC Bioinformatics, 11(1), 441. DOI10.1186/1471-2105-11-441.
  27. Huang, D. W., Sherman, B. T., & Lempicki, R. A. (2009). Bioinformatics enrichment tools: Paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Research, 37(1), 1–13.Google Scholar
  28. Hunter, L. (2002). Ontologies for programs, not people. Genome Biology, 3, 1002.1–1002.2.CrossRefGoogle Scholar
  29. Jessop, D. M., Adams, S. E., Willighagen, E. L., Hawizy, L., & Murray-Rust, P. (2011). Oscar4: A flexible architecture for chemical text-mining. Journal of Cheminformatics, 3, 41.Google Scholar
  30. Jupp, S., Malone, J., Bolleman, J., Brandizi, M., Davies, M., Garcia, L., Gaulton, A., Gehant, S., Laibe, C., Redaschi, N., Wimalaratne, S. M., Martin, M., Novère, N. L., Parkinson, H., Birney, E., & Jenkinson, A. M. (2013). The EBI RDF platform: Linked open data for the life sciences. Bioinformatics, 30, 1338–1339.Google Scholar
  31. Kanehisa, M., Goto, S., Hattori, M., Aoki-Kinoshita, K., Itoh, M., Kawashima, S., Katayama, T., Araki, M., & Hirakawa, M. (2006). From genomics to chemical genomics: New developments in KEGG. Nucleic Acids Research, 34, D354–D357. DOI10.1093/nar/gkj102.Google Scholar
  32. Kutz, O., Hastings, J., & Mossakowski, T. (2012). Modelling highly symmetrical molecules: Linking ontologies and graphs artificial intelligence: Methodology, systems, and applications. In A. Ramsay & G. Agre (Eds.), Artificial intelligence: Methodology, systems, and applications (Lecture notes in computer science, Vol. 7557, chap. 11, pp. 103–111). Springer, Berlin/Heidelberg. DOI10.1007/978-3-642-33185-5_11.
  33. Li, C., Donizelli, M., Rodriguez, N., Dharuri, H., Endler, L., Chelliah, V., Li, L., He, E., Henry, A., Stefan, M., Snoep, J., Hucka, M., Le Nov\(\grave{e}\)re, N., & Laibe, C. (2010). BioModels database: An enhanced, curated and annotated resource for published quantitative kinetic models. BMC Systems Biology, 4, 92.Google Scholar
  34. Lowe, D. M., Corbett, P. T., Murray-Rust, P., & Glen, R. C. (2011). Chemical name to structure: Opsin, an open source solution. Journal of Chemical Information and Modeling, 51(3), 739–753. DOI10.1021/ci100384d.
  35. Magka, D., Motik, B., & Horrocks, I. (2011). Modelling structured domains using description graphs and logic programming. Technical report, Department of Computer Science, University of Oxford.Google Scholar
  36. Matthews, L., Gopinath, G., Gillespie, M., Caudy, M., Croft, D., de Bono, B., Garapati, P., Hemish, J., Hermjakob, H., Jassal, B., Kanapin, A., Lewis, S., Mahajan, S., May, B., Schmidt, E., Vastrik, I., Wu, G., Birney, E., Stein, L., & D’Eustachio, E. (2009). Reactome knowledgebase of human biological pathways and processes. Nucleic Acids Research, 37, D619–D622.Google Scholar
  37. McNaught, A. D., & Wilkinson, A. (1997). IUPAC compendium of chemical terminology (2nd ed., the “Gold Book”). Oxford: Blackwell Scientific Publications. DOIdoi:10.1351/goldbook. XMLon-linecorrectedversion: (2006-) created by M. Nic, J. Jirat, B. Kosata; updates compiled by A. Jenkins.
  38. Moreno, P., Beisken, S., Harsha, B., Muthukrishnan, V., Tudose, I., Dekker, A., Dornfeldt, S., Taruttis, F., Grosse, I., Hastings, J., Neumann, S., & Steinbeck, C. (2015). BiNChE: A web tool and library for chemical enrichment analysis based on the chEBI ontology. BMC Bioinformatics, 16, 56.CrossRefGoogle Scholar
  39. Shearer, R., Motik, B., & Horrocks, I. (2008). HermiT: A highly-efficient OWL reasoner. In C. Dolbear, A. Ruttenberg, & U. Sattler (Eds.), Proceedings of the 5th Workshop on OWL: Experiences and Directions, Karlsruhe.Google Scholar
  40. Shotton, D. (2010). CiTO, the citation typing ontology, and its use for annotation of reference lists and visualization of citation networks. Journal of Biomedical Semantics, 1(Suppl 1), S6.CrossRefGoogle Scholar
  41. Sirin, E., Parsia, B., Cuenca Grau, B., Kalyanpur, A., & Katz, Y. (2007). Pellet: Aypractical OWL-DL reasoner. Journal of Web Semantics, 5, 51–53.CrossRefGoogle Scholar
  42. Smith, B. (2003). Ontology. In L. Floridi (Ed.), Blackwell guide to the philosophy of computing and information (pp. 155–166). Oxford: Blackwell.Google Scholar
  43. Swainston, N., Smallbone, K., Mendes, P., Kell, D. B., & Paton, N. W. (2011). The SuBliMinaL Toolbox: Automating steps in the reconstruction of metabolic networks. Journal of Integrative Bioinformatics, 8, 186.Google Scholar
  44. Protégé Team, T. (2013). The Protégé ontology editing tool. Last accessed Mar 2013.Google Scholar
  45. The Gene Ontology Consortium. (2000). Gene ontology: Tool for the unification of biology. Nature Genetics, 25, 25–29.CrossRefGoogle Scholar
  46. The Gene Ontology Consortium. (2012). The OBO language, version 1.2. Last accessed Oct 2012.Google Scholar
  47. The UniProt Consortium. (2015). Uniprot: A hub for protein information. Nucleic Acids Research, 43, D204–D212.CrossRefGoogle Scholar
  48. Tsarkov, D., & Horrocks, I. (2006). FaCT++ description logic reasoner: System description. In Proceedings of the International Joint Conference on Automated Reasoning (IJCAR 2006), Seattle (pp. 292–297). Springer.Google Scholar
  49. Villanueva-Rosales, N., & Dumontier, M. (2007). Describing chemical functional groups in OWL-DL for the classification of chemical compounds. In Proceedings of the OWL: Experiences and Directions (OWLED 2007), Innsbruck.Google Scholar
  50. Wegner, J. K., Sterling, A., Guha, R., Bender, A., Faulon, J. L., Hastings, J., O’Boyle, N., Overington, J., Van Vlijmen, H., & Willighagen, E. (2012). Cheminformatics. Communications of the ACM, 55(11), 65–75.Google Scholar
  51. Willighagen, E. L., Waagmeester, A., Spjuth, O., Ansell, P., Williams, A. J., Tkachenko, V., Hastings, J., Chen, B., & Wild, D. J. (2013). The ChEMBL database as linked open data. Journal of Cheminformatics, 5, 23.Google Scholar
  52. Wishart, D., Knox, C., Guo, A., Shrivastava, S., Hassanali, M., Stothard, P., Chang, Z., & Woolsey, J. (2006). DrugBank: A comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Research, 34, D668–D672. DOI10.1093/nar/gkj067.Google Scholar
  53. Wishart, D. S., Knox, C., Guo, A. C. C., Eisner, R., Young, N., Gautam, B., Hau, D. D., Psychogios, N., Dong, E., Bouatra, S., Mandal, R., Sinelnikov, I., Xia, J., Jia, L., Cruz, J. A., Lim, E., Sobsey, C. A., Shrivastava, S., Huang, P., Liu, P., Fang, L., Peng, J., Fradette, R., Cheng, D., Tzur, D., Clements, M., Lewis, A., De Souza, A., Zuniga, A., Dawe, M., Xiong, Y., Clive, D., Greiner, R., Nazyrova, A., Shaykhutdinov, R., Li, L., Vogel, H. J., Forsythe, I. (2009). HMDB: A knowledgebase for the human metabolome. Nucleic Acids Research, 37(Database issue), D603–D610. DOI10.1093/nar/gkn810.

Copyright information

© Springer Science+Business Media Dordrecht 2016

Authors and Affiliations

  1. 1.European Bioinformatics InstituteHinxtonUK

Personalised recommendations