Semantic Web pp 101-119 | Cite as

Clinical Ontologies for Discovery Applications

  • Yves A. Lussier
  • Olivier Bodenreider


The recent achievements in the Human Genome Project have made possible a high-throughput “systems approach” for accelerating bioinformatics research. In addition, the NIH Whole Genome Association Studies will soon supply abundant clinical data annotated to clinical ontologies for mining. The elucidation of the molecular underpinnings of human diseases will require the use of genomic and ontology-anchored clinical databases. The objective of this chapter is to provide the background required to conduct biological discovery research with clinical ontologies. We first provide a description of the complexity of clinical information and the main characteristics of various clinical ontologies. The second section illustrates several methods used to integrate clinical ontologies and therefore databases annotated with heterogeneous standards. Finally the third section reviews a few genome-wide studies that leverage clinical ontologies. We conclude with the future opportunities and challenges offered by the Semantic Web and clinical ontologies for clinical data integration and mining. Discovery research faces the challenge of generating novel tools to help collect, access, integrate, organize and manage clinical information and enable genome wide analyses to associate phenotypic information with genomic data at different scales of biology. Collaborations between bioinformaticians and clinical informaticians are poised to leverage the Semantic Web.

Key words

Clinical Terminology Clinical Ontology Clinical Phenotypes Discovery Phenomics 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    Brunner H.G. and van Driel M.A. From syndrome families to functional genomics. Nat Rev Genet. 5(7): 545–51, 2004.PubMedCrossRefGoogle Scholar
  2. [2]
    Mahner M. and Kary M. What exactly are genomes, genotypes and phenotypes? And what about phenomes? Journal of Theoretical Biology. 186(1): 55–63, 1997.PubMedCrossRefGoogle Scholar
  3. [3]
    Musen M.A., Gennari J.H., Eriksson H., Tu S.W., and Puerta A.R. PROTEGE-II: computer support for development of intelligent systems from libraries of components. Medinfo. 8 Pt 1: 766–70, 1995.PubMedGoogle Scholar
  4. [4]
    Rector A., Rossi A., Consorti M.F., and Zanstra P. Practical development of re-usable terminologies: GALEN-IN-USE and the GALEN Organisation. Int J Med Inform. 48(1–3): 71–84, 1998.PubMedCrossRefGoogle Scholar
  5. [5]
    Campbell K.E., Das A.K., and Musen M.A. A logical foundation for representation of clinical data. J Am Med Inform Assoc. 1(3): 218–32, 1994.PubMedGoogle Scholar
  6. [6]
    Friedman C, Huff S.M., Hersh W.R., Pattison-Gordon E., and Cimino J.J. The Canon Group’s effort: working toward a merged model. J Am Med Inform Assoc. 2(1): 4–18, 1995.PubMedGoogle Scholar
  7. [7]
    Bodenreider O. and Stevens R. Bio-ontologies: current trends and future directions. Brief Bioinform. 2006.Google Scholar
  8. [8]
    Rubin D.L., Hewett M, Oliver D.E., Klein T.E., and Altaian R.B. Automating data acquisition into ontologies from pharmacogenetics relational data sources using declarative object definitions and XML. Pac Symp Biocomput. 88–99, 2002.Google Scholar
  9. [9]
    Embley D.W., Campbell D.M., Randy D.S., and Stephen W.L., Ontology-based extraction and structuring of information from data-rich unstructured documents, in Proceedings of the seventh international conference on Information and knowledge management. 1998, ACM Press: Bethesda, Maryland, United States.Google Scholar
  10. [10]
    Honavar V., Silvescu, A., Reinoso-Castillo, J., Andoff, C., Dobbs, D. Ontology-Driven Information Extraction and Knowledge Acquisition from Heterogeneous, Distributed Biological Data Sources. in Proceedings of the IJCAI-2001 Workshop on Knowledge Discovery from Heterogeneous, Distributed, Autonomous, Dynamic Data and Knowledge Sources. 2001Google Scholar
  11. [11]
    Snoussi H., Magnin L., and Nie J.-Y. Heterogeneous web data extraction using ontologies. in Third International Bi-Conference Workshop on Agent-oriented information systems (AOIS-2001) Montreal, Canada, 2001Google Scholar
  12. [12]
    Yu H., Friedman C, Rhzetsky A., and Kra P. Representing genomic knowledge in the UMLS semantic network. Proc AMIA Symp. 181–5, 1999.Google Scholar
  13. [13]
    Musen M.A. Dimensions of knowledge sharing and reuse. Comput Biomed Res. 25(5):435–67, 1992.PubMedCrossRefGoogle Scholar
  14. [14]
    Rector A.L., Rogers J., Roberts A., and Wroe C. Scale and context: issues in ontologies to link health-and bio-informatics. Proc AMIA Symp. 642–6, 2002.Google Scholar
  15. [15]
    Pole P.M. and Rector A.L. Mapping the GALEN CORE model to SNOMED-III: initial experiments. Proc AMIA Annu Fall Symp. 100–4, 1996.Google Scholar
  16. [16]
    Elkin P.L., Turtle M., Keck K., Campbell K., Atkin G., and Chute C.G. The role of compositionality in standardized problem list generation. Medinfo. 9 Pt 1: 660–4, 1998.PubMedGoogle Scholar
  17. [17]
    Elkin P.L., Bailey K.R., and Chute C.G. A randomized controlled trial of automated term composition. Proc AMIA Symp. 765–9, 1998.Google Scholar
  18. [18]
    Mays E., Weida R., Dionne R., Laker M., White B., Liang C, and Oles F.J. Scalable and expressive medical terminologies. Proc AMIA Annu Fall Symp. 259–63, 1996.Google Scholar
  19. [19]
    Nelson S.J., Olson N.E., Fuller L., Tuttle M.S., Cole W.G., and Sherertz D.D. Identifying concepts in medical knowledge. Medinfo. 8 Pt 1: 33–6, 1995.PubMedGoogle Scholar
  20. [20]
    Sujansky W. Heterogeneous database integration in biomedicine. J Biomed Inform. 34(4): 285–98, 2001.PubMedCrossRefGoogle Scholar
  21. [21]
    Oliver D.E., Rubin D.L., Stuart J.M., Hewett M., Klein T.E., and Altman R.B. Ontology development for a pharmacogenetics knowledge base. Pac Symp Biocomput. 65–76, 2002.Google Scholar
  22. [22]
    Cimino J.J. Desiderata for controlled medical vocabularies in the twenty-first century. Methods Inf Med. 37(4–5): 394–403, 1998.PubMedGoogle Scholar
  23. [23]
    Cimino J.J. In defense of the Desiderata. J Biomed Inform. 39(3): 299–306, 2006.PubMedCrossRefGoogle Scholar
  24. [24]
    Nelson S.J., Johnston D., and Humphreys B.L., Relationships in Medical Subject Headings, in Relationships in the organization of knowledge, C.A. Bean and R. Green, Editors. 2001, Kluwer. p. 171–184.Google Scholar
  25. [25]
    Bodenreider O. The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Res. 32(Database issue): D267–70, 2004.PubMedCrossRefGoogle Scholar
  26. [26]
    Humphreys B.L., Lindberg D.A., Schoolman H.M., and Barnett G.O. The Unified Medical Language System: an informatics research collaboration. J Am Med Inform Assoc. 5(1): 1–11, 1998.PubMedGoogle Scholar
  27. [27]
    Lindberg D.A., Humphreys B.L., and McCray A.T. The Unified Medical Language System. Methods Inf Med. 32(4): 281–91, 1993.PubMedGoogle Scholar
  28. [28]
    [cited; Available from: Scholar
  29. [29]
    Strachan T. and Read A., Human Molecular Genetics. 2nd ed. 1999: Wiley-Liss. 574.Google Scholar
  30. [30]
    Dawkins R., The Extended Phenotype: The Long Reach Of The Gene. 1982: Oxford University Press.Google Scholar
  31. [31]
    Tuttle M.S., Suarez-Munist O.N., Olson N.E., Sherertz D.D., Sperzel W.D., Erlbaum M.S., Fuller L.F., Hole W.T., Nelson S.J., Cole W.G., et al. Merging terminologies. Medinfo. 8 Pt 1: 162–6, 1995.PubMedGoogle Scholar
  32. [32]
    Blois M., Information in Medicine: The Nature of Medical Descriptions. 1984, Berkeley, California: University of California Press.Google Scholar
  33. [33]
    Levy A., Combining Artificial Intelligence and Databases for Data Integration, in Artificial Intelligence Today: Recent Trends and Developments, M.a.V. Wooldridge, M, Editor. 1999, Springer: Berlin. p. 249–268.Google Scholar
  34. [34]
    Friedman C, Hripcsak G., Shagina L., and Liu H.F. Representing information in patient reports using natural language processing and the extensible markup language. Journal of the American Medical Informatics Association. 6(1): 76–87, 1999.PubMedGoogle Scholar
  35. [35]
    Krauthammer M, Johnson S.B., Hripcsak G., Campbell D.A., and Friedman C. Representing nested semantic information in a linear string of text using XML. Proc AMIA Symp. 405–9, 2002.Google Scholar
  36. [36]
    Aronson A.R. Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. Proc AMIA Symp. 17–21, 2001.Google Scholar
  37. [37]
    McCray A.T., Browne A.C., and Bodenreider O. The lexical properties of the gene ontology. Proc AMIA Symp. 504–8, 2002.Google Scholar
  38. [38]
    Cimino J.J., Johnson S.B., Peng P., and Aguirre A. From ICD9-CM to MeSH using the UMLS: a how-to guide. Proc Annu Symp Comput Appl Med Care. 730–4, 1993.Google Scholar
  39. [39]
    Turtle M.S., Cole W.G., Sheretz D.D., and Nelson S.J. Navigating to knowledge. Methods Inf Med. 34(1–2): 214–31, 1995.Google Scholar
  40. [40]
    Tuttle M.S., Sherertz D.D., Erlbaum M.S., Sperzel W.D., Fuller L.F., Olson N.E., Nelson S.J., Cimino J.J., and Chute C.G. Adding your terms and relationships to the UMLS Metathesaurus. Proc Annu Symp Comput Appl Med Care. 219–23, 1991.Google Scholar
  41. [41]
    Lussier Y.A., Shagina L., and Friedman C. Automating SNOMED coding using medical language understanding: a feasibility study. Proc AMIA Symp. 418–22, 2001.Google Scholar
  42. [42]
    Masarie F.E., Jr., Miller R.A., Bouhaddou O., Giuse N.B., and Warner H.R. An interlingua for electronic interchange of medical information: using frames to map between clinical vocabularies. Comput Biomed Res. 24(4): 379–400, 1991.PubMedCrossRefGoogle Scholar
  43. [43]
    McCray A.T., Srinivasan S., and Browne A.C. Lexical methods for managing variation in biomedical terminologies. Proc Annu Symp Comput Appl Med Care. 235–9, 1994.Google Scholar
  44. [44]
    Rocha R.A., Rocha B.H., and Huff S.M. Automated translation between medical vocabularies using a frame-based interlingua. Proc Annu Symp Comput Appl Med Care. 690–4, 1993.Google Scholar
  45. [45]
    Bodenreider O., Nelson S.J., Hole W.T., and Chang H.F. Beyond synonymy: exploiting the UMLS semantics in mapping vocabularies. Proc AMIA Symp. 815–9, 1998.Google Scholar
  46. [46]
    Fung K.W. and Bodenreider O. Utilizing the UMLS for semantic mapping between terminologies. AMIA Annu Symp Proc. 266–70, 2005.Google Scholar
  47. [47]
    Bodenreider O., Mitchell J.A., and McCray A.T. Evaluation of the UMLS as a terminology and knowledge resource for biomedical informatics. Proc AMIA Symp. 61–5, 2002.Google Scholar
  48. [48]
    Lomax J. and McCray A.T. Mapping the Gene Ontology into the Unified Medical Language System. Comparative and Functional Genomics. 5: 354–361, 2004.CrossRefPubMedGoogle Scholar
  49. [49]
    Cimino J.J. and Baraett G.O. Automated translation between medical terminologies using semantic definitions. MD Comput. 7(2): 104–9, 1990.PubMedGoogle Scholar
  50. [50]
    Hill D.P., Blake J.A., Richardson J.E., and Ringwald M. Extension and integration of the gene ontology (GO): combining GO vocabularies with external vocabularies. Genome Res. 12(12): 1982–91, 2002.PubMedCrossRefGoogle Scholar
  51. [51]
    Spackman K.A. and Campbell K.E. Compositional concept representation using SNOMED: towards further convergence of clinical terminologies. Proc AMIA Symp. 740–4, 1998.Google Scholar
  52. [52]
    Biesecker L.G. Mapping phenotypes to language: a proposal to organize and standardize the clinical descriptions of malformations. Clin Genet. 68(4): 320–6, 2005.PubMedCrossRefGoogle Scholar
  53. [53]
    Kahraman A., Avramov A., Nashev L.G., Popov D., Ternes R., Pohlenz H.D., and Weiss B. PhenomicDB: a multi-species genotype/phenotype database for comparative phenomics. Bioinformatics. 21(3): 418–20, 2005.PubMedCrossRefGoogle Scholar
  54. [54]
    Perez-Iratxeta C., Wjst M., Bork P., and Andrade M.A. G2D: a tool for mining genes associated with disease. BMC Genet. 6: 45, 2005.PubMedCrossRefGoogle Scholar
  55. [55]
    Smith C.L., Goldsmith C.A., and Eppig J.T. The Mammalian Phenotype Ontology as a tool for annotating, analyzing and comparing phenotypic information. Genome Biol. 6(1): R7, 2005.PubMedCrossRefGoogle Scholar
  56. [56]
    Blake J.A., Eppig J.T., Bult C.J., Kadin J.A., and Richardson J.E. The Mouse Genome Database (MGD): updates and enhancements. Nucleic Acids Res. 34(Database issue): D562–7, 2006.PubMedCrossRefGoogle Scholar
  57. [57]
    Friedman C., Knirsch C., Shagina L., and Hripcsak G. Automating a severity score guideline for community-acquired pneumonia employing medical language processing of discharge summaries. Proc AMIA Symp. 256–60, 1999.Google Scholar
  58. [58]
    Hripcsak G., Friedman C., Alderson P.O., DuMouchel W., Johnson S.B., and Clayton P.D. Unlocking clinical data from narrative reports: a study of natural language processing. Ann Intern Med. 122(9): 681–8, 1995.PubMedGoogle Scholar
  59. [59]
    Hripcsak G., Kuperman G.J., and Friedman C. Extracting findings from narrative reports: software transferability and sources of physician disagreement. Methods Inf Med. 37(1): 1–7, 1998.PubMedGoogle Scholar
  60. [60]
    Jain N.L. and Friedman C. Identification of findings suspicious for breast cancer based on natural language processing of mammogram reports. Proc AMIA Annu Fall Symp. 829–33, 1997.Google Scholar
  61. [61]
    Knirsch C.A., Jain N.L., Pablos-Mendez A., Friedman C, and Hripcsak G. Respiratory isolation of tuberculosis patients using clinical guidelines and an automated clinical decision support system. Infect Control Hosp Epidemiol. 19(2): 94–100, 1998.PubMedCrossRefGoogle Scholar
  62. [62]
    Friedman C, Kra P., Yu H., Krauthammer M., and Rzhetsky A. GENIES: a natural-language processing system for the extraction of molecular pathways from journal articles. Bioinformatics. 17Suppl 1: S74–82, 2001.PubMedGoogle Scholar
  63. [63]
    Lussier Y.A., Borlawsky T., Rappaport D., and Friedman C. PhenoGO: a Multistrategy Language Processing System Assigning Phenotypic Context to Gene Ontology Annotations. Pacific Symposium on Biocomputing. 64–75, 2006.Google Scholar
  64. [64]
    Friedman C, Borlawsky T., Shagina L., Xing H.R., and Lussier Y.A. Bio-ontology and text: bridging the modeling gap. Bioinformatics. 2006.Google Scholar
  65. [65]
    Zeng Q. and Cimino J.J. Mapping medical vocabularies to the Unified Medical Language System. Proc AMIA Annu Fall Symp. 105–9, 1996.Google Scholar
  66. [66]
    2006 NCBC All Hands Meeting. 2006: Bethesda, MD.Google Scholar
  67. [67]
    Hamer C.D., Baclawski K., Futrelle R.P., Fridman N., and Sampath S. Creating a knowledge base of biological research papers. Proc Int Conf Intell Syst Mol Biol. 2: 147–55, 1994.Google Scholar
  68. [68]
    Bajdik CD., Kuo B., Rusaw S., Jones S., and Brooks-Wilson A. CGMIM: automated text-mining of Online Mendelian Inheritance in Man (OMIM) to identify genetically-associated cancers and candidate genes. BMC Bioinformatics. 6(1): 78, 2005.PubMedCrossRefGoogle Scholar
  69. [69]
    Yakushiji A., Tateisi Y., Miyao Y., and Tsujii J. Event extraction from biomedical papers using a full parser. Pac Symp Biocomput. 408–19, 2001.Google Scholar
  70. [70]
    Perez-Iratxeta C, Bork P., and Andrade M.A. Association of genes to genetically inherited diseases using data mining. Nat Genet. 31(3): 316–9, 2002.PubMedGoogle Scholar
  71. [71]
    Raychaudhuri S. and Altman R.B. A literature-based method for assessing the functional coherence of a gene group. Bioinformatics. 19(3): 396–401, 2003.PubMedCrossRefGoogle Scholar
  72. [72]
    Raychaudhuri S., Chang J.T., Sutphin P.D., and Altman R.B. Associating genes with gene ontology codes using a maximum entropy analysis of biomedical literature. Genome Res. 12(1): 203–14, 2002.PubMedCrossRefGoogle Scholar
  73. [73]
    Haft D.H., Selengut J.D., Brinkac L.M., Zafar N., and White O. Genome Properties: a system for the investigation of prokaryotic genetic content for microbiology, genome annotation and comparative genomics. Bioinformatics. 21(3): 293–306, 2005.PubMedCrossRefGoogle Scholar
  74. [74]
    Korbel J.O., Doerks T., Jensen L.J., Perez-Iratxeta C., Kaczanowski S., Hooper S.D., Andrade M. A., and Bork P. Systematic association of genes to phenotypes by genome and literature mining. PLoS Biol. 3(5): el34, 2005.CrossRefGoogle Scholar
  75. [75]
    Bodenreider O., Lexical, terminological and ontological resources for biological text mining, in Text mining for biology and biomedicine, S. Ananiadou and J. McNaught, Editors. 2006, Artech House. p. 43–66.Google Scholar
  76. [76]
    Miller R.A. and Masarie F.E., Jr. Use of the Quick Medical Reference (QMR) program as a tool for medical education. Methods Inf Med. 28(4): 340–5, 1989.PubMedGoogle Scholar
  77. [77]
    Lussier Y.A., Sarkar I.N., and Cantor M. An integrative model for in-silico clinical-genomics discovery science. Proc AMIA Symp. 469-73, 2002.Google Scholar
  78. [78]
    Cantor M.N., Sarkar I.N., Bodenreider O., and Lussier Y.A. Genestrace: phenomic knowledge discovery via structured terminology. Pac Symp Biocomput. 103–14, 2005.Google Scholar
  79. [79]
    Butte A.J. and Kohane I.S. Creation and implications of a phenome-genome network. Nat Biotechnol. 24(1): 55–62, 2006.PubMedCrossRefGoogle Scholar
  80. [80]
    National Library of Medicine. Unified Medical Language System® Fact Sheet. 2006Google Scholar
  81. 23.
    March 2006 [cited; Available from: Scholar
  82. [81]
    Wheeler D.L., Church D.M., Edgar R., Federhen S., Helmberg W., Madden T.L., Pontius J.U., Schuler G.D., Schriml L.M., Sequeira E., et al. Database resources of the National Center for Biotechnology Information: update. Nucleic Acids Res. 32(Database issue): D35–40, 2004.PubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2007

Authors and Affiliations

  • Yves A. Lussier
    • 1
    • 2
  • Olivier Bodenreider
    • 3
  1. 1.Section of Genetic MedicineThe University of ChicagoUSA
  2. 2.Department of Biomedical Informatics and College of Physicians and SurgeonsColumbia UniversityUSA
  3. 3.National Library of MedicineNational Institutes of HealthBethesdaUSA

Personalised recommendations