Abstract
Ontologies are structured controlled vocabularies which encode domain knowledge, backed by sophisticated logic-based computational tools. They enable knowledge-based applications which harness automated reasoning for inference and knowledge discovery. They also enable the semantic and standard annotation of large-scale data, which is ever relevant in the modern age of increased high-throughput data generation and sharing in scientific research. Established chemical ontologies include ChEBI, which encodes the structural classification of chemical entities of biological interest together with their roles. More recently, the chemical information ontology was created to standardize the annotation of cheminformatics software and descriptors. In this chapter, the technology, structure and applications of ontologies within cheminformatics will be described.
References
Baader, F., Calvanese, D., McGuiness, D., Nardi, D., & Patel-Schneider, P. (2003). Description logic handbook (2nd ed.). Cambridge: Cambridge University Press.
Batchelor, C., Hastings, J., Steinbeck, C. (2010). Ontological dependence, dispositions and institutional reality in chemistry. In A. Galton & R. Mizoguchi (Eds.), Proceedings of the 6th Formal Ontology in Information Systems Conference, Toronto.
Belleau, F., Nolin, M., Tourigny, N., Rigault, P., & Morissette, J. (2008). Bio2RDF: Towards a mashup to build bioinformatics knowledge systems. Journal of Biomedical Informatics, 41, 706–716. DOI10.1016/j.jbi.2008.03.004.
Bolton, E. E., Wang, Y., Thiessen, P. A., & Bryant, S. H. (2008). PubChem: Integrated platform of small molecules and biological activities (pp. 217–241). American Chemical Society, Washington, DC.
Camon, E., Magrane, M., Barrell, D., Lee, V., Dimmer, E., Maslen, J., Binns, D., Harte, N., Lopez, R., & Apweiler, R. (2004). The Gene Ontology Annotation (GOA) database: Sharing knowledge in uniprot with gene ontology. Nucleic Acids Research, 32(suppl 1), D262–D266. DOI10.1093/nar/gkh021. http://nar.oxfordjournals.org/content/32/suppl_1/D262.abstract.
Chagoyen, M., & Pazos, F. (2011). MBRole: Enrichment analysis of metabolomic data. Bioinformatics, 27(5), 730–731. DOI10.1093/bioinformatics/btr001. http://bioinformatics.oxfordjournals.org/content/27/5/730.abstract.
Chen, B., Dong, X., Jiao, D., Wang, H., Zhu, Q., Ding, Y., & Wild, D. (2010). Chem2Bio2RDF: A semantic framework for linking and data mining chemogenomic and systems chemical biology data. BMC Bioinformatics, 11(1), 255. DOI10.1186/1471-2105-11-255. http://www.biomedcentral.com/1471-2105/11/255.
Chepelev, L., Riazanov, A., Kouznetsov, A., Low, H. S., Dumontier, M., & Baker, C. (2011). Prototype semantic infrastructure for automated small molecule classification and annotation in lipidomics. BMC Bioinformatics, 12(1), 303. http://dx.doi.org/10.1186/1471-2105-12-303.
Chepelev, L. L., Hastings, J., Ennis, M., Steinbeck, C., & Dumontier, M. (2012). Self-organizing ontology of biochemically relevant small molecules. BMC Bioinformatics, 13, 3.
Corbett, P., & Murray-Rust, P. (2006). High-throughput identification of chemistry in life science texts. In M. Berthold, R. Glen, & I. Fischer (Eds.), Computational life sciences II (pp. 107–118). Springer, Berlin/Heidelberg.
Courtot, M., Juty, N., Knüpfer, C., Waltemath, D., Zhukova, A., Dräger, A., Dumontier, M., Finney, A., Golebiewski, M., Hastings, J., Hoops, S., Keating, S., Kell, D. B., Kerrien, S., Lawson, J., Lister, A., Lu, J., Machne, R., Mendes, P., Pocock, M., Rodriguez, N., Villeger, A., Wilkinson, D. J., Wimalaratne, S., Laibe, C., Hucka, M., & Novère, N. L. (2011). Controlled vocabularies and semantics in systems biology. Molecular Systems Biology, 7, 543.
Ferreira, J. D., & Couto, F. M. (2010). Semantic similarity for automatic classification of chemical compounds. PLoS Computational Biology, 6(9), e1000937. DOI10.1371/journal.pcbi.1000937.
Ferreira, J. D., Hastings, J., & Couto, F. M. (2013). Exploiting disjointness axioms to improve semantic similarity measures. Bioinformatics, 29, 2781–2787.
Fu, G., Batchelor, C., Dumontier, M., Hastings, J., Willighagen, E., & Bolton, E. (2015). PubChemRDF: Towards the semantic annotation of pubchem compound and substance databases. Journal of Cheminformatics, 7, 34.
Gkoutos, G. V., Schofield, P. N., & Hoehndorf, R. (2012). The units ontology: A tool for integrating units of measurement in science. Database, 2012. DOI10.1093/database/bas033. http://database.oxfordjournals.org/content/2012/bas033.abstract.
Grau, B. C., Horrocks, I., Motik, B., Parsia, B., Patel-Schneider, P., & Sattler, U. (2008). OWL 2: The next step for OWL. Web Semantics, 6, 309–322. DOI10.1016/j.websem.2008.05.001. http://portal.acm.org/citation.cfm?id=1464505.1464604.
Grego, T., Ferreira, J. D., Pesquita, C., Bastos, H., Vicosa, D. V., Freire, J., & Couto, F. M. (2010). Chemical and metabolic pathway semantic similarity. Technical report, LASIGE, Faculty of Sciences, University of Lisbon.
Gruber, T. R. (2009). Ontology. In L. Liu & M. T. Özsu (Eds.), Encyclopedia of database systems. Springer. http://tomgruber.org/writing/ontology-definition-2007.htm.
Harland, L., Larminie, C., Sansone, S. A., Popa, S., Marshall, M. S., Braxenthaler, M., Cantor, M., Filsell, W., Forster, M. J., Huang, E., Matern, A., Musen, M., Saric, J., Slater, T., Wilson, J., Lynch, N., Wise, J., & Dix, I. (2011). Empowering industrial research with shared biomedical vocabularies. Drug Discovery Today, 16(21–22), 940–947. DOI10.1016/j.drudis.2011.09.013. http://www.sciencedirect.com/science/article/pii/S1359644611003035.
Hastings, J., Chepelev, L., Willighagen, E., Adams, N., Steinbeck, C., Dumontier, M. (2011). The chemical information ontology: Provenance and disambiguation for chemical data on the biological semantic web. PLoS One, 6(10), e25513. DOI10.1371/journal.pone.0025513.
Hastings, J., de Matos, P., Dekker, A., Ennis, M., Harsha, B., Kale, N., Muthukrishnan, V., Owen, G., Turner, S., Williams, M., & Steinbeck, C. (2013). The ChEBI reference database and ontology for biologically relevant chemistry: Enhancements for 2013. Nucleic Acids Research, 41(Database issue), D456–D463.
Hastings, J., Magka, D., Batchelor, C., Duan, L., Stevens, R., Ennis, M., & Steinbeck, C. (2012). Structure-based classification and ontology in chemistry. Journal of Cheminformatics, 4(1), 8. DOI10.1186/1758-2946-4-8. http://www.jcheminf.com/content/4/1/8.
Haug, K., Salek, R. M., Conesa, P., Hastings, J., de Matos, P., Rijnbeek, M., Mahendraker, T., Williams, M., Neumann, S., Rocca-Serra, P., Maguire, E., Gonzalez-Beltran, A., Sansone, S. A., Griffin, J. L., & Steinbeck, C. (2012). Metabolights–an open-access general-purpose repository for metabolomics studies and associated meta-data. Nucleic Acids Research. DOI10.1093/nar/gks1004. http://nar.oxfordjournals.org/content/early/2012/10/28/nar.gks1004.abstract.
Hill, D. P., Adams, N., Bada, M., Batchelor, C., Berardini, T. Z., Dietze, H., Drabkin, H. J., Ennis, M., Foulger, R. E., Harris, M. A., Hastings, J., Kale, N. S., de Matos, P., Mungall, C. J., Owen, G., Roncaglia, P., Steinbeck, C., Turner, S., & Lomax, J. (2013). Dovetailing biology and chemistry: Integrating the Gene Ontology with the ChEBI chemical ontology. BMC Genomics, 14, 513.
Hoehndorf, R., Dumontier, M., & Gkoutos, G. V. (2012). Identifying aberrant pathways through integrated analysis of knowledge in pharmacogenomics. Bioinformatics, 28(16), 2169–2175. DOI10.1093/bioinformatics/bts350. http://bioinformatics.oxfordjournals.org/content/28/16/2169.abstract.
Hoehndorf, R., Oellrich, A., Dumontier, M., Kelso, J., Rebholz-Schuhmann, D., & Herre, H. (2010). Relations as patterns: Bridging the gap between obo and owl. BMC Bioinformatics, 11(1), 441. DOI10.1186/1471-2105-11-441. http://www.biomedcentral.com/1471-2105/11/441.
Huang, D. W., Sherman, B. T., & Lempicki, R. A. (2009). Bioinformatics enrichment tools: Paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Research, 37(1), 1–13.
Hunter, L. (2002). Ontologies for programs, not people. Genome Biology, 3, 1002.1–1002.2.
Jessop, D. M., Adams, S. E., Willighagen, E. L., Hawizy, L., & Murray-Rust, P. (2011). Oscar4: A flexible architecture for chemical text-mining. Journal of Cheminformatics, 3, 41.
Jupp, S., Malone, J., Bolleman, J., Brandizi, M., Davies, M., Garcia, L., Gaulton, A., Gehant, S., Laibe, C., Redaschi, N., Wimalaratne, S. M., Martin, M., Novère, N. L., Parkinson, H., Birney, E., & Jenkinson, A. M. (2013). The EBI RDF platform: Linked open data for the life sciences. Bioinformatics, 30, 1338–1339.
Kanehisa, M., Goto, S., Hattori, M., Aoki-Kinoshita, K., Itoh, M., Kawashima, S., Katayama, T., Araki, M., & Hirakawa, M. (2006). From genomics to chemical genomics: New developments in KEGG. Nucleic Acids Research, 34, D354–D357. DOI10.1093/nar/gkj102.
Kutz, O., Hastings, J., & Mossakowski, T. (2012). Modelling highly symmetrical molecules: Linking ontologies and graphs artificial intelligence: Methodology, systems, and applications. In A. Ramsay & G. Agre (Eds.), Artificial intelligence: Methodology, systems, and applications (Lecture notes in computer science, Vol. 7557, chap. 11, pp. 103–111). Springer, Berlin/Heidelberg. DOI10.1007/978-3-642-33185-5_11. http://dx.doi.org/10.1007/978-3-642-33185-5_11.
Li, C., Donizelli, M., Rodriguez, N., Dharuri, H., Endler, L., Chelliah, V., Li, L., He, E., Henry, A., Stefan, M., Snoep, J., Hucka, M., Le Nov\(\grave{e}\)re, N., & Laibe, C. (2010). BioModels database: An enhanced, curated and annotated resource for published quantitative kinetic models. BMC Systems Biology, 4, 92.
Lowe, D. M., Corbett, P. T., Murray-Rust, P., & Glen, R. C. (2011). Chemical name to structure: Opsin, an open source solution. Journal of Chemical Information and Modeling, 51(3), 739–753. DOI10.1021/ci100384d. http://pubs.acs.org/doi/abs/10.1021/ci100384d.
Magka, D., Motik, B., & Horrocks, I. (2011). Modelling structured domains using description graphs and logic programming. Technical report, Department of Computer Science, University of Oxford.
Matthews, L., Gopinath, G., Gillespie, M., Caudy, M., Croft, D., de Bono, B., Garapati, P., Hemish, J., Hermjakob, H., Jassal, B., Kanapin, A., Lewis, S., Mahajan, S., May, B., Schmidt, E., Vastrik, I., Wu, G., Birney, E., Stein, L., & D’Eustachio, E. (2009). Reactome knowledgebase of human biological pathways and processes. Nucleic Acids Research, 37, D619–D622.
McNaught, A. D., & Wilkinson, A. (1997). IUPAC compendium of chemical terminology (2nd ed., the “Gold Book”). Oxford: Blackwell Scientific Publications. DOIdoi:10.1351/goldbook. XMLon-linecorrectedversion:http://goldbook.iupac.org. (2006-) created by M. Nic, J. Jirat, B. Kosata; updates compiled by A. Jenkins.
Moreno, P., Beisken, S., Harsha, B., Muthukrishnan, V., Tudose, I., Dekker, A., Dornfeldt, S., Taruttis, F., Grosse, I., Hastings, J., Neumann, S., & Steinbeck, C. (2015). BiNChE: A web tool and library for chemical enrichment analysis based on the chEBI ontology. BMC Bioinformatics, 16, 56.
Shearer, R., Motik, B., & Horrocks, I. (2008). HermiT: A highly-efficient OWL reasoner. In C. Dolbear, A. Ruttenberg, & U. Sattler (Eds.), Proceedings of the 5th Workshop on OWL: Experiences and Directions, Karlsruhe.
Shotton, D. (2010). CiTO, the citation typing ontology, and its use for annotation of reference lists and visualization of citation networks. Journal of Biomedical Semantics, 1(Suppl 1), S6.
Sirin, E., Parsia, B., Cuenca Grau, B., Kalyanpur, A., & Katz, Y. (2007). Pellet: Aypractical OWL-DL reasoner. Journal of Web Semantics, 5, 51–53.
Smith, B. (2003). Ontology. In L. Floridi (Ed.), Blackwell guide to the philosophy of computing and information (pp. 155–166). Oxford: Blackwell.
Swainston, N., Smallbone, K., Mendes, P., Kell, D. B., & Paton, N. W. (2011). The SuBliMinaL Toolbox: Automating steps in the reconstruction of metabolic networks. Journal of Integrative Bioinformatics, 8, 186.
Protégé Team, T. (2013). The Protégé ontology editing tool. http://protege.stanford.edu/. Last accessed Mar 2013.
The Gene Ontology Consortium. (2000). Gene ontology: Tool for the unification of biology. Nature Genetics, 25, 25–29.
The Gene Ontology Consortium. (2012). The OBO language, version 1.2. http://www.geneontology.org/GO.format.obo-1_2.shtml. Last accessed Oct 2012.
The UniProt Consortium. (2015). Uniprot: A hub for protein information. Nucleic Acids Research, 43, D204–D212.
Tsarkov, D., & Horrocks, I. (2006). FaCT++ description logic reasoner: System description. In Proceedings of the International Joint Conference on Automated Reasoning (IJCAR 2006), Seattle (pp. 292–297). Springer.
Villanueva-Rosales, N., & Dumontier, M. (2007). Describing chemical functional groups in OWL-DL for the classification of chemical compounds. In Proceedings of the OWL: Experiences and Directions (OWLED 2007), Innsbruck.
Wegner, J. K., Sterling, A., Guha, R., Bender, A., Faulon, J. L., Hastings, J., O’Boyle, N., Overington, J., Van Vlijmen, H., & Willighagen, E. (2012). Cheminformatics. Communications of the ACM, 55(11), 65–75.
Willighagen, E. L., Waagmeester, A., Spjuth, O., Ansell, P., Williams, A. J., Tkachenko, V., Hastings, J., Chen, B., & Wild, D. J. (2013). The ChEMBL database as linked open data. Journal of Cheminformatics, 5, 23.
Wishart, D., Knox, C., Guo, A., Shrivastava, S., Hassanali, M., Stothard, P., Chang, Z., & Woolsey, J. (2006). DrugBank: A comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Research, 34, D668–D672. DOI10.1093/nar/gkj067.
Wishart, D. S., Knox, C., Guo, A. C. C., Eisner, R., Young, N., Gautam, B., Hau, D. D., Psychogios, N., Dong, E., Bouatra, S., Mandal, R., Sinelnikov, I., Xia, J., Jia, L., Cruz, J. A., Lim, E., Sobsey, C. A., Shrivastava, S., Huang, P., Liu, P., Fang, L., Peng, J., Fradette, R., Cheng, D., Tzur, D., Clements, M., Lewis, A., De Souza, A., Zuniga, A., Dawe, M., Xiong, Y., Clive, D., Greiner, R., Nazyrova, A., Shaykhutdinov, R., Li, L., Vogel, H. J., Forsythe, I. (2009). HMDB: A knowledgebase for the human metabolome. Nucleic Acids Research, 37(Database issue), D603–D610. DOI10.1093/nar/gkn810. http://dx.doi.org/10.1093/nar/gkn810.
Acknowledgements
ChEBI is supported by the BBSRC under grant agreement number BB/K019783/1 within the “Bioinformatics and biological resources” fund.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer Science+Business Media Dordrecht
About this entry
Cite this entry
Hastings, J., Steinbeck, C. (2016). Ontologies in Cheminformatics. In: Leszczynski, J. (eds) Handbook of Computational Chemistry. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-6169-8_55-1
Download citation
DOI: https://doi.org/10.1007/978-94-007-6169-8_55-1
Received:
Accepted:
Published:
Publisher Name: Springer, Dordrecht
Online ISBN: 978-94-007-6169-8
eBook Packages: Springer Reference Chemistry and Mat. ScienceReference Module Physical and Materials ScienceReference Module Chemistry, Materials and Physics