Skip to main content

Datamining with Ontologies

  • Protocol
  • First Online:
Data Mining Techniques for the Life Sciences

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1415))

Abstract

The use of ontologies has increased rapidly over the past decade and they now provide a key component of most major databases in biology and biomedicine. Consequently, datamining over these databases benefits from considering the specific structure and content of ontologies, and several methods have been developed to use ontologies in datamining applications. Here, we discuss the principles of ontology structure, and datamining methods that rely on ontologies. The impact of these methods in the biological and biomedical sciences has been profound and is likely to increase as more datasets are becoming available using common, shared ontologies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Herre H, Heller B, Burek P, Hoehndorf R, Loebe F, Michalek H (2006) General Formal Ontology (GFO) – a foundational ontology integrating objects and processes [Version 1.0]. Onto-Med Report, IMISE, University of Leipzig, Leipzig, Germany

    Google Scholar 

  2. Gruber TR (1995) Toward principles for the design of ontologies used for knowledge sharing. Int J Hum Comput Stud 43:907–928

    Google Scholar 

  3. Salvadores M, Alexander PR, Musen MA, Noy NF (2013) Bioportal as a dataset of linked biomedical ontologies and terminologies in RDF. Semant Web 4:277–284

    PubMed  PubMed Central  Google Scholar 

  4. Cote R, Jones P, Apweiler R, Hermjakob H (2006) The Ontology Lookup Service, a lightweight cross-platform tool for controlled vocabulary queries. BMC Bioinformatics 7:97+

    Google Scholar 

  5. Xiang Z, Mungall CJ, Ruttenberg A, He Y (2011) Ontobee: a linked data server and browser for ontology terms. In: Proceedings of international conference on biomedical ontology, pp 279–281

    Google Scholar 

  6. Hoehndorf R, Slater L, Schofield PN, Gkoutos GV (2015) Aber-OWL: a framework for ontology-based data access in biology. BMC Bioinformatics 16:26

    Article  PubMed  PubMed Central  Google Scholar 

  7. Berners-Lee T, Hendler J, Lassila O, et al. (2001) The semantic web. Sci Am 284:28–37

    Article  Google Scholar 

  8. Grau B, Horrocks I, Motik B, Parsia B, Patelschneider P, Sattler U (2008) OWL 2: the next step for OWL. Web Semant 6:309–322

    Article  Google Scholar 

  9. Horrocks I (2007) OBO flat file format syntax and semantics and mapping to OWL Web Ontology Language. Tech. rep. http://www.cs.man.ac.uk/~horrocks/obo/, University of Manchester

  10. Baader F (2003) The description logic handbook: theory implementation and applications. Cambridge University Press, Cambridge

    Google Scholar 

  11. Barwise J (1989) The situation in logic. CSLI, Stanford, CA

    Google Scholar 

  12. Noy NF, Sintek M, Decker S, Crubezy M, Fergerson RW, Musen MA (2001) Creating semantic web contents with protege-2000. IEEE Intell Syst 16:60–71

    Article  Google Scholar 

  13. Horridge M, Bechhofer S, Noppens O (2007) Igniting the OWL 1.1 touch paper: the OWL API. In: Proceedings of OWLED 2007: third international workshop on OWL experiences and directions

    Google Scholar 

  14. Carroll JJ, Dickinson I, Dollin C, Reynolds D, Seaborne A, Wilkinson K (2003) Jena: implementing the semantic web recommendations. Technical Report, Hewlett Packard, Bristol, UK

    Google Scholar 

  15. Hoehndorf R, Schofield PN, Gkoutos GV (2015) The role of ontologies in biological and biomedical research: a functional perspective. Brief Bioinform 16:1069–1080

    Google Scholar 

  16. Gkoutos GV, Green EC, Mallon AMM, Hancock JM, Davidson D (2005) Using ontologies to describe mouse phenotypes. Genome Biol 6:R5

    Article  Google Scholar 

  17. Mungall C, Gkoutos G, Smith C, Haendel M, Lewis S, Ashburner M (2010) Integrating phenotype ontologies across multiple species. Genome Biol 11:R2+

    Google Scholar 

  18. Bradford Y, Conlin T, Dunn N, Fashena D, Frazer K, Howe DG, Knight J, Mani P, Martin R, Moxon SA, Paddock H, Pich C, Ramachandran S, Ruef BJ, Ruzicka L, Bauer Schaper H, Schaper K, Shao X, Singer A, Sprague J, Sprunger B, Van Slyke C, Westerfield M (2011) ZFIN: enhancements and updates to the Zebrafish Model Organism Database. Nucleic Acids Res 39(Suppl 1):D822–D829

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP (2005) Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA 102:15545–15550

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Hung JH, Yang TH, Hu Z, Weng Z, DeLisi C (2012) Gene set enrichment analysis: performance evaluation and usage guidelines. Brief Bioinform 13:281–291

    Article  PubMed  PubMed Central  Google Scholar 

  21. Wittkop T, TerAvest E, Evani U, Fleisch K, Berman A, Powell C, Shah N, Mooney S (2013) STOP using just GO: a multi-ontology hypothesis generation tool for high throughput experimentation. BMC Bioinformatics 14:53

    Article  PubMed  PubMed Central  Google Scholar 

  22. Hoehndorf R, Hancock JM, Hardy NW, Mallon AM, Schofield PN, Gkoutos GV (2014) Analyzing gene expression data in mice with the Neuro Behavior Ontology. Mamm Genome 25:32–40

    Article  CAS  PubMed  Google Scholar 

  23. Prfer K, Muetzel B, Do HH, Weiss G, Khaitovich P, Rahm E, Paabo S, Lachmann M, Enard W (2007) FUNC: a package for detecting significant associations between gene sets and ontological annotations. BMC Bioinformatics 8:41+

    Google Scholar 

  24. Guzzi PH, Mina M, Guerra C, Cannataro M (2011) Semantic similarity analysis of protein data: assessment with biological features and issues. Brief Bioinform 13:569–585

    Article  PubMed  Google Scholar 

  25. Benabderrahmane S, Smail-Tabbone M, Poch O, Napoli A, Devignes MD (2010) IntelliGO: a new vector-based semantic similarity measure including annotation origin. BMC Bioinformatics 11:588

    Article  PubMed  PubMed Central  Google Scholar 

  26. Hoehndorf R, Schofield PN, Gkoutos GV (2011) PhenomeNET: a whole-phenome approach to disease gene discovery. Nucleic Acids Res 39:e119

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Hoehndorf R, Hiebert T, Hardy NW, Schofield PN, Gkoutos GV, Dumontier M (2014) Mouse model phenotypes provide information about human drug targets. Bioinformatics 30:719–725

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Zemojtel T, Khler S, Mackenroth L, Jger M, Hecht J, Krawitz P, Graul-Neumann L, Doelken S, Ehmke N, Spielmann M, ien NC, Schweiger MR, Krger U, Frommer G, Fischer B, Kornak U, Flttmann R, Ardeshirdavani A, Moreau Y, Lewis SE, Haendel M, Smedley D, Horn D, Mundlos S, Robinson PN (2014) Effective diagnosis of genetic disease by computational phenotype analysis of the disease-associated genome. Sci Transl Med 6:252ra123

    Google Scholar 

  29. Ferreira JD, Couto FM (2010) Semantic similarity for automatic classification of chemical compounds. PLoS Comput Biol 6:e1000937

    Article  PubMed  PubMed Central  Google Scholar 

  30. Pesquita C, Faria D, Falcao AO, Lord P, Couto FM (2009) Semantic similarity in biomedical ontologies. PLoS Comput Biol 5:e1000443

    Article  PubMed  PubMed Central  Google Scholar 

  31. Resnik P (1999) Semantic similarity in a taxonomy: an information-based measure and its application to problems of ambiguity in natural language. J Artif Intell Res 11:95–130

    Google Scholar 

  32. Harispe S, Ranwez S, Janaqi S, Montmain J (2014) The semantic measures library and toolkit: fast computation of semantic similarity and relatedness using biomedical ontologies. Bioinformatics 30:740–742

    Article  CAS  PubMed  Google Scholar 

  33. Washington NL, Haendel MA, Mungall CJ, Ashburner M, Westerfield M, Lewis SE (2009) Linking human diseases to animal models using ontology-based phenotype annotation. PLoS Biol 7:e1000247

    Article  PubMed  PubMed Central  Google Scholar 

  34. Khler S, Schulz MH, Krawitz P, Bauer S, Doelken S, Ott CE, Mundlos C, Horn D, Mundlos S, Robinson PN (2009) Clinical diagnostics in human genetics with semantic similarity searches in ontologies. Am J Hum Genet 85:457–464

    Article  Google Scholar 

  35. Yu G, Wang LG, Yan GR, He QY (2015) DOSE: an R/Bioconductor package for disease ontology semantic and enrichment analysis. Bioinformatics 31:608–609

    Article  PubMed  Google Scholar 

  36. Yu G, Li F, Qin Y, Bo X, Wu Y, Wang S (2010) GOSemSim: an R package for measuring semantic similarity among GO terms and gene products. Bioinformatics 26:976–978

    Article  CAS  PubMed  Google Scholar 

  37. Deng Y, Gao L, Wang B, Guo X (2015) HPOSim: an R package for phenotypic similarity measure and enrichment analysis based on the human phenotype ontology. PLoS ONE 10:e0115692

    Article  PubMed  PubMed Central  Google Scholar 

  38. Zhu S, Zeng J, Mamitsuka H (2009) Enhancing MEDLINE document clustering by incorporating MeSH semantic similarity. Bioinformatics 25:1944–1951

    Article  CAS  PubMed  Google Scholar 

  39. Oellrich A, Jacobsen J, Papatheodorou I, Project TSMG, Smedley D (2014) Using association rule mining to determine promising secondary phenotyping hypotheses. Bioinformatics 30:i52–i59

    Article  Google Scholar 

  40. Zhou X, Menche J, Barabsi AL, Sharma A (2014) Human symptoms–disease network. Nat Commun 5:4212

    CAS  PubMed  Google Scholar 

  41. Hoehndorf R, Schofield PN, Gkoutos GV (2015b) Analysis of the human diseasome using phenotype similarity between common, genetic, and infectious diseases. Sci Rep 5:10888

    Article  PubMed  PubMed Central  Google Scholar 

  42. Mao Y, Van Auken K, Li D, Arighi CN, McQuilton P, Hayman GT, Tweedie S, Schaeffer ML, Laulederkind SJF, Wang SJ, Gobeill J, Ruch P, Luu AT, Kim JJ, Chiang JH, Chen YD, Yang CJ, Liu H, Zhu D, Li Y, Yu H, Emadzadeh E, Gonzalez G, Chen JM, Dai HJ, Lu Z (2014) Overview of the gene ontology task at BioCreative IV. Database 2014:bau086

    Google Scholar 

  43. Funk C, Baumgartner W, Garcia B, Roeder C, Bada M, Cohen K, Hunter L, Verspoor K (2014) Large-scale biomedical concept recognition: an evaluation of current automatic annotators and their parameters. BMC Bioinformatics 15:59

    Article  PubMed  PubMed Central  Google Scholar 

  44. Rebholz-Schuhmann D, Oellrich A, Hoehndorf R (2012) Text-mining solutions for biomedical research: enabling integrative biology. Nat Rev Genet 13:829–839

    Article  CAS  PubMed  Google Scholar 

  45. Wolstencroft K, Lord P, Tabernero L, Brass A, Stevens R (2006) Protein classification using ontology classification. Bioinformatics 22:e530–e538

    Article  CAS  PubMed  Google Scholar 

  46. Croset S, Overington JP, Rebholz-Schuhmann D (2013) Brain: biomedical knowledge manipulation. Bioinformatics 29:1238–1239

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Huntley R, Harris M, Alam-Faruque Y, Blake J, Carbon S, Dietze H, Dimmer E, Foulger R, Hill D, Khodiyar V, Lock A, Lomax J, Lovering R, Mutowo-Meullenet P, Sawford T, Van Auken K, Wood V, Mungall C (2014) A method for increasing expressivity of Gene Ontology annotations using a compositional approach. BMC Bioinformatics 15:155

    Article  PubMed  PubMed Central  Google Scholar 

  48. Kazakov Y (2008) \( \mathcal{R}\mathcal{I}\mathcal{Q} \) and \( \mathcal{S}\mathcal{R}\mathcal{O}\mathcal{I}\mathcal{Q} \) are harder than \( \mathcal{S}\mathscr{H}\mathcal{O}\mathcal{I}\mathcal{Q} \). In: Proceeding of KR. AAAI Press, Menlo Park, pp 274–284

    Google Scholar 

  49. Stuckenschmidt H, Parent C, Spaccapietra S (2009) Modular ontologies: concepts, theories and techniques for knowledge modularization, 1st edn. Springer, Berlin

    Book  Google Scholar 

  50. Motik B, Grau BC, Horrocks I, Wu Z, Fokoue A, Lutz C (2009) OWL 2 Web Ontology Language: profiles. Recommendation, World Wide Web Consortium (W3C)

    Google Scholar 

  51. Hoehndorf R, Dumontier M, Oellrich A, Wimalaratne S, Rebholz-Schuhmann D, Schofield P, Gkoutos GV (2011) A common layer of interoperability for biomedical ontologies based on OWL EL. Bioinformatics 27:1001–1008

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Kazakov Y, Krtzsch M, Simank F (2011) Unchain my \( \mathcal{E}\mathcal{L} \) reasoner. In: Proceedings of the 23rd international workshop on description logics (DL’10). CEUR workshop proceedings. CEUR-WS.org

    Google Scholar 

  53. Bail S, Glimm B, Jimnez-Ruiz E, Matentzoglu N, Parsia B, Steigmiller A (eds) (2014) ORE 2014: OWL reasoner evaluation workshop. CEUR workshop proceedings. CEUR-WS.org, Aachen, Germany

    Google Scholar 

  54. McInnes BT, Pedersen T (2013) Evaluating measures of semantic similarity and relatedness to disambiguate terms in biomedical text. J Biomed Inform 46:1116–1124 (Spec Sect Soc Media Environ)

    Google Scholar 

  55. Lord PW, Stevens RD, Brass A, Goble CA (2003) Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation. Bioinformatics 19:1275–1283

    Article  CAS  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Robert Hoehndorf .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer Science+Business Media New York

About this protocol

Cite this protocol

Hoehndorf, R., Gkoutos, G.V., Schofield, P.N. (2016). Datamining with Ontologies. In: Carugo, O., Eisenhaber, F. (eds) Data Mining Techniques for the Life Sciences. Methods in Molecular Biology, vol 1415. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-3572-7_19

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-3572-7_19

  • Published:

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-3570-3

  • Online ISBN: 978-1-4939-3572-7

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics