Skip to main content

Clinical Ontologies for Discovery Applications

  • Chapter
Semantic Web

Abstract

The recent achievements in the Human Genome Project have made possible a high-throughput “systems approach” for accelerating bioinformatics research. In addition, the NIH Whole Genome Association Studies will soon supply abundant clinical data annotated to clinical ontologies for mining. The elucidation of the molecular underpinnings of human diseases will require the use of genomic and ontology-anchored clinical databases. The objective of this chapter is to provide the background required to conduct biological discovery research with clinical ontologies. We first provide a description of the complexity of clinical information and the main characteristics of various clinical ontologies. The second section illustrates several methods used to integrate clinical ontologies and therefore databases annotated with heterogeneous standards. Finally the third section reviews a few genome-wide studies that leverage clinical ontologies. We conclude with the future opportunities and challenges offered by the Semantic Web and clinical ontologies for clinical data integration and mining. Discovery research faces the challenge of generating novel tools to help collect, access, integrate, organize and manage clinical information and enable genome wide analyses to associate phenotypic information with genomic data at different scales of biology. Collaborations between bioinformaticians and clinical informaticians are poised to leverage the Semantic Web.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Brunner H.G. and van Driel M.A. From syndrome families to functional genomics. Nat Rev Genet. 5(7): 545–51, 2004.

    Article  PubMed  CAS  Google Scholar 

  2. Mahner M. and Kary M. What exactly are genomes, genotypes and phenotypes? And what about phenomes? Journal of Theoretical Biology. 186(1): 55–63, 1997.

    Article  PubMed  CAS  Google Scholar 

  3. Musen M.A., Gennari J.H., Eriksson H., Tu S.W., and Puerta A.R. PROTEGE-II: computer support for development of intelligent systems from libraries of components. Medinfo. 8 Pt 1: 766–70, 1995.

    PubMed  Google Scholar 

  4. Rector A., Rossi A., Consorti M.F., and Zanstra P. Practical development of re-usable terminologies: GALEN-IN-USE and the GALEN Organisation. Int J Med Inform. 48(1–3): 71–84, 1998.

    Article  PubMed  CAS  Google Scholar 

  5. Campbell K.E., Das A.K., and Musen M.A. A logical foundation for representation of clinical data. J Am Med Inform Assoc. 1(3): 218–32, 1994.

    PubMed  CAS  Google Scholar 

  6. Friedman C, Huff S.M., Hersh W.R., Pattison-Gordon E., and Cimino J.J. The Canon Group’s effort: working toward a merged model. J Am Med Inform Assoc. 2(1): 4–18, 1995.

    PubMed  CAS  Google Scholar 

  7. Bodenreider O. and Stevens R. Bio-ontologies: current trends and future directions. Brief Bioinform. 2006.

    Google Scholar 

  8. Rubin D.L., Hewett M, Oliver D.E., Klein T.E., and Altaian R.B. Automating data acquisition into ontologies from pharmacogenetics relational data sources using declarative object definitions and XML. Pac Symp Biocomput. 88–99, 2002.

    Google Scholar 

  9. Embley D.W., Campbell D.M., Randy D.S., and Stephen W.L., Ontology-based extraction and structuring of information from data-rich unstructured documents, in Proceedings of the seventh international conference on Information and knowledge management. 1998, ACM Press: Bethesda, Maryland, United States.

    Google Scholar 

  10. Honavar V., Silvescu, A., Reinoso-Castillo, J., Andoff, C., Dobbs, D. Ontology-Driven Information Extraction and Knowledge Acquisition from Heterogeneous, Distributed Biological Data Sources. in Proceedings of the IJCAI-2001 Workshop on Knowledge Discovery from Heterogeneous, Distributed, Autonomous, Dynamic Data and Knowledge Sources. 2001

    Google Scholar 

  11. Snoussi H., Magnin L., and Nie J.-Y. Heterogeneous web data extraction using ontologies. in Third International Bi-Conference Workshop on Agent-oriented information systems (AOIS-2001) Montreal, Canada, 2001

    Google Scholar 

  12. Yu H., Friedman C, Rhzetsky A., and Kra P. Representing genomic knowledge in the UMLS semantic network. Proc AMIA Symp. 181–5, 1999.

    Google Scholar 

  13. Musen M.A. Dimensions of knowledge sharing and reuse. Comput Biomed Res. 25(5):435–67, 1992.

    Article  PubMed  CAS  Google Scholar 

  14. Rector A.L., Rogers J., Roberts A., and Wroe C. Scale and context: issues in ontologies to link health-and bio-informatics. Proc AMIA Symp. 642–6, 2002.

    Google Scholar 

  15. Pole P.M. and Rector A.L. Mapping the GALEN CORE model to SNOMED-III: initial experiments. Proc AMIA Annu Fall Symp. 100–4, 1996.

    Google Scholar 

  16. Elkin P.L., Turtle M., Keck K., Campbell K., Atkin G., and Chute C.G. The role of compositionality in standardized problem list generation. Medinfo. 9 Pt 1: 660–4, 1998.

    PubMed  Google Scholar 

  17. Elkin P.L., Bailey K.R., and Chute C.G. A randomized controlled trial of automated term composition. Proc AMIA Symp. 765–9, 1998.

    Google Scholar 

  18. Mays E., Weida R., Dionne R., Laker M., White B., Liang C, and Oles F.J. Scalable and expressive medical terminologies. Proc AMIA Annu Fall Symp. 259–63, 1996.

    Google Scholar 

  19. Nelson S.J., Olson N.E., Fuller L., Tuttle M.S., Cole W.G., and Sherertz D.D. Identifying concepts in medical knowledge. Medinfo. 8 Pt 1: 33–6, 1995.

    PubMed  Google Scholar 

  20. Sujansky W. Heterogeneous database integration in biomedicine. J Biomed Inform. 34(4): 285–98, 2001.

    Article  PubMed  CAS  Google Scholar 

  21. Oliver D.E., Rubin D.L., Stuart J.M., Hewett M., Klein T.E., and Altman R.B. Ontology development for a pharmacogenetics knowledge base. Pac Symp Biocomput. 65–76, 2002.

    Google Scholar 

  22. Cimino J.J. Desiderata for controlled medical vocabularies in the twenty-first century. Methods Inf Med. 37(4–5): 394–403, 1998.

    PubMed  CAS  Google Scholar 

  23. Cimino J.J. In defense of the Desiderata. J Biomed Inform. 39(3): 299–306, 2006.

    Article  PubMed  Google Scholar 

  24. Nelson S.J., Johnston D., and Humphreys B.L., Relationships in Medical Subject Headings, in Relationships in the organization of knowledge, C.A. Bean and R. Green, Editors. 2001, Kluwer. p. 171–184.

    Google Scholar 

  25. Bodenreider O. The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Res. 32(Database issue): D267–70, 2004.

    Article  PubMed  CAS  Google Scholar 

  26. Humphreys B.L., Lindberg D.A., Schoolman H.M., and Barnett G.O. The Unified Medical Language System: an informatics research collaboration. J Am Med Inform Assoc. 5(1): 1–11, 1998.

    PubMed  CAS  Google Scholar 

  27. Lindberg D.A., Humphreys B.L., and McCray A.T. The Unified Medical Language System. Methods Inf Med. 32(4): 281–91, 1993.

    PubMed  CAS  Google Scholar 

  28. [cited; Available from: http://ncimeta.nci.nih.gov/indexMetaphrase.html.

    Google Scholar 

  29. Strachan T. and Read A., Human Molecular Genetics. 2nd ed. 1999: Wiley-Liss. 574.

    Google Scholar 

  30. Dawkins R., The Extended Phenotype: The Long Reach Of The Gene. 1982: Oxford University Press.

    Google Scholar 

  31. Tuttle M.S., Suarez-Munist O.N., Olson N.E., Sherertz D.D., Sperzel W.D., Erlbaum M.S., Fuller L.F., Hole W.T., Nelson S.J., Cole W.G., et al. Merging terminologies. Medinfo. 8 Pt 1: 162–6, 1995.

    PubMed  Google Scholar 

  32. Blois M., Information in Medicine: The Nature of Medical Descriptions. 1984, Berkeley, California: University of California Press.

    Google Scholar 

  33. Levy A., Combining Artificial Intelligence and Databases for Data Integration, in Artificial Intelligence Today: Recent Trends and Developments, M.a.V. Wooldridge, M, Editor. 1999, Springer: Berlin. p. 249–268.

    Google Scholar 

  34. Friedman C, Hripcsak G., Shagina L., and Liu H.F. Representing information in patient reports using natural language processing and the extensible markup language. Journal of the American Medical Informatics Association. 6(1): 76–87, 1999.

    PubMed  CAS  Google Scholar 

  35. Krauthammer M, Johnson S.B., Hripcsak G., Campbell D.A., and Friedman C. Representing nested semantic information in a linear string of text using XML. Proc AMIA Symp. 405–9, 2002.

    Google Scholar 

  36. Aronson A.R. Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. Proc AMIA Symp. 17–21, 2001.

    Google Scholar 

  37. McCray A.T., Browne A.C., and Bodenreider O. The lexical properties of the gene ontology. Proc AMIA Symp. 504–8, 2002.

    Google Scholar 

  38. Cimino J.J., Johnson S.B., Peng P., and Aguirre A. From ICD9-CM to MeSH using the UMLS: a how-to guide. Proc Annu Symp Comput Appl Med Care. 730–4, 1993.

    Google Scholar 

  39. Turtle M.S., Cole W.G., Sheretz D.D., and Nelson S.J. Navigating to knowledge. Methods Inf Med. 34(1–2): 214–31, 1995.

    Google Scholar 

  40. Tuttle M.S., Sherertz D.D., Erlbaum M.S., Sperzel W.D., Fuller L.F., Olson N.E., Nelson S.J., Cimino J.J., and Chute C.G. Adding your terms and relationships to the UMLS Metathesaurus. Proc Annu Symp Comput Appl Med Care. 219–23, 1991.

    Google Scholar 

  41. Lussier Y.A., Shagina L., and Friedman C. Automating SNOMED coding using medical language understanding: a feasibility study. Proc AMIA Symp. 418–22, 2001.

    Google Scholar 

  42. Masarie F.E., Jr., Miller R.A., Bouhaddou O., Giuse N.B., and Warner H.R. An interlingua for electronic interchange of medical information: using frames to map between clinical vocabularies. Comput Biomed Res. 24(4): 379–400, 1991.

    Article  PubMed  Google Scholar 

  43. McCray A.T., Srinivasan S., and Browne A.C. Lexical methods for managing variation in biomedical terminologies. Proc Annu Symp Comput Appl Med Care. 235–9, 1994.

    Google Scholar 

  44. Rocha R.A., Rocha B.H., and Huff S.M. Automated translation between medical vocabularies using a frame-based interlingua. Proc Annu Symp Comput Appl Med Care. 690–4, 1993.

    Google Scholar 

  45. Bodenreider O., Nelson S.J., Hole W.T., and Chang H.F. Beyond synonymy: exploiting the UMLS semantics in mapping vocabularies. Proc AMIA Symp. 815–9, 1998.

    Google Scholar 

  46. Fung K.W. and Bodenreider O. Utilizing the UMLS for semantic mapping between terminologies. AMIA Annu Symp Proc. 266–70, 2005.

    Google Scholar 

  47. Bodenreider O., Mitchell J.A., and McCray A.T. Evaluation of the UMLS as a terminology and knowledge resource for biomedical informatics. Proc AMIA Symp. 61–5, 2002.

    Google Scholar 

  48. Lomax J. and McCray A.T. Mapping the Gene Ontology into the Unified Medical Language System. Comparative and Functional Genomics. 5: 354–361, 2004.

    Article  CAS  PubMed  Google Scholar 

  49. Cimino J.J. and Baraett G.O. Automated translation between medical terminologies using semantic definitions. MD Comput. 7(2): 104–9, 1990.

    PubMed  CAS  Google Scholar 

  50. Hill D.P., Blake J.A., Richardson J.E., and Ringwald M. Extension and integration of the gene ontology (GO): combining GO vocabularies with external vocabularies. Genome Res. 12(12): 1982–91, 2002.

    Article  PubMed  CAS  Google Scholar 

  51. Spackman K.A. and Campbell K.E. Compositional concept representation using SNOMED: towards further convergence of clinical terminologies. Proc AMIA Symp. 740–4, 1998.

    Google Scholar 

  52. Biesecker L.G. Mapping phenotypes to language: a proposal to organize and standardize the clinical descriptions of malformations. Clin Genet. 68(4): 320–6, 2005.

    Article  PubMed  CAS  Google Scholar 

  53. Kahraman A., Avramov A., Nashev L.G., Popov D., Ternes R., Pohlenz H.D., and Weiss B. PhenomicDB: a multi-species genotype/phenotype database for comparative phenomics. Bioinformatics. 21(3): 418–20, 2005.

    Article  PubMed  CAS  Google Scholar 

  54. Perez-Iratxeta C., Wjst M., Bork P., and Andrade M.A. G2D: a tool for mining genes associated with disease. BMC Genet. 6: 45, 2005.

    Article  PubMed  CAS  Google Scholar 

  55. Smith C.L., Goldsmith C.A., and Eppig J.T. The Mammalian Phenotype Ontology as a tool for annotating, analyzing and comparing phenotypic information. Genome Biol. 6(1): R7, 2005.

    Article  PubMed  Google Scholar 

  56. Blake J.A., Eppig J.T., Bult C.J., Kadin J.A., and Richardson J.E. The Mouse Genome Database (MGD): updates and enhancements. Nucleic Acids Res. 34(Database issue): D562–7, 2006.

    Article  PubMed  CAS  Google Scholar 

  57. Friedman C., Knirsch C., Shagina L., and Hripcsak G. Automating a severity score guideline for community-acquired pneumonia employing medical language processing of discharge summaries. Proc AMIA Symp. 256–60, 1999.

    Google Scholar 

  58. Hripcsak G., Friedman C., Alderson P.O., DuMouchel W., Johnson S.B., and Clayton P.D. Unlocking clinical data from narrative reports: a study of natural language processing. Ann Intern Med. 122(9): 681–8, 1995.

    PubMed  CAS  Google Scholar 

  59. Hripcsak G., Kuperman G.J., and Friedman C. Extracting findings from narrative reports: software transferability and sources of physician disagreement. Methods Inf Med. 37(1): 1–7, 1998.

    PubMed  CAS  Google Scholar 

  60. Jain N.L. and Friedman C. Identification of findings suspicious for breast cancer based on natural language processing of mammogram reports. Proc AMIA Annu Fall Symp. 829–33, 1997.

    Google Scholar 

  61. Knirsch C.A., Jain N.L., Pablos-Mendez A., Friedman C, and Hripcsak G. Respiratory isolation of tuberculosis patients using clinical guidelines and an automated clinical decision support system. Infect Control Hosp Epidemiol. 19(2): 94–100, 1998.

    Article  PubMed  CAS  Google Scholar 

  62. Friedman C, Kra P., Yu H., Krauthammer M., and Rzhetsky A. GENIES: a natural-language processing system for the extraction of molecular pathways from journal articles. Bioinformatics. 17Suppl 1: S74–82, 2001.

    PubMed  Google Scholar 

  63. Lussier Y.A., Borlawsky T., Rappaport D., and Friedman C. PhenoGO: a Multistrategy Language Processing System Assigning Phenotypic Context to Gene Ontology Annotations. Pacific Symposium on Biocomputing. 64–75, 2006.

    Google Scholar 

  64. Friedman C, Borlawsky T., Shagina L., Xing H.R., and Lussier Y.A. Bio-ontology and text: bridging the modeling gap. Bioinformatics. 2006.

    Google Scholar 

  65. Zeng Q. and Cimino J.J. Mapping medical vocabularies to the Unified Medical Language System. Proc AMIA Annu Fall Symp. 105–9, 1996.

    Google Scholar 

  66. 2006 NCBC All Hands Meeting. 2006: Bethesda, MD.

    Google Scholar 

  67. Hamer C.D., Baclawski K., Futrelle R.P., Fridman N., and Sampath S. Creating a knowledge base of biological research papers. Proc Int Conf Intell Syst Mol Biol. 2: 147–55, 1994.

    Google Scholar 

  68. Bajdik CD., Kuo B., Rusaw S., Jones S., and Brooks-Wilson A. CGMIM: automated text-mining of Online Mendelian Inheritance in Man (OMIM) to identify genetically-associated cancers and candidate genes. BMC Bioinformatics. 6(1): 78, 2005.

    Article  PubMed  CAS  Google Scholar 

  69. Yakushiji A., Tateisi Y., Miyao Y., and Tsujii J. Event extraction from biomedical papers using a full parser. Pac Symp Biocomput. 408–19, 2001.

    Google Scholar 

  70. Perez-Iratxeta C, Bork P., and Andrade M.A. Association of genes to genetically inherited diseases using data mining. Nat Genet. 31(3): 316–9, 2002.

    PubMed  CAS  Google Scholar 

  71. Raychaudhuri S. and Altman R.B. A literature-based method for assessing the functional coherence of a gene group. Bioinformatics. 19(3): 396–401, 2003.

    Article  PubMed  CAS  Google Scholar 

  72. Raychaudhuri S., Chang J.T., Sutphin P.D., and Altman R.B. Associating genes with gene ontology codes using a maximum entropy analysis of biomedical literature. Genome Res. 12(1): 203–14, 2002.

    Article  PubMed  CAS  Google Scholar 

  73. Haft D.H., Selengut J.D., Brinkac L.M., Zafar N., and White O. Genome Properties: a system for the investigation of prokaryotic genetic content for microbiology, genome annotation and comparative genomics. Bioinformatics. 21(3): 293–306, 2005.

    Article  PubMed  CAS  Google Scholar 

  74. Korbel J.O., Doerks T., Jensen L.J., Perez-Iratxeta C., Kaczanowski S., Hooper S.D., Andrade M. A., and Bork P. Systematic association of genes to phenotypes by genome and literature mining. PLoS Biol. 3(5): el34, 2005.

    Article  CAS  Google Scholar 

  75. Bodenreider O., Lexical, terminological and ontological resources for biological text mining, in Text mining for biology and biomedicine, S. Ananiadou and J. McNaught, Editors. 2006, Artech House. p. 43–66.

    Google Scholar 

  76. Miller R.A. and Masarie F.E., Jr. Use of the Quick Medical Reference (QMR) program as a tool for medical education. Methods Inf Med. 28(4): 340–5, 1989.

    PubMed  CAS  Google Scholar 

  77. Lussier Y.A., Sarkar I.N., and Cantor M. An integrative model for in-silico clinical-genomics discovery science. Proc AMIA Symp. 469-73, 2002.

    Google Scholar 

  78. Cantor M.N., Sarkar I.N., Bodenreider O., and Lussier Y.A. Genestrace: phenomic knowledge discovery via structured terminology. Pac Symp Biocomput. 103–14, 2005.

    Google Scholar 

  79. Butte A.J. and Kohane I.S. Creation and implications of a phenome-genome network. Nat Biotechnol. 24(1): 55–62, 2006.

    Article  PubMed  CAS  Google Scholar 

  80. National Library of Medicine. Unified Medical Language System® Fact Sheet. 2006

    Google Scholar 

  81. March 2006 [cited; Available from: http://www.nlm.nih.gov/pubs/factsheets/umls.html.

    Google Scholar 

  82. Wheeler D.L., Church D.M., Edgar R., Federhen S., Helmberg W., Madden T.L., Pontius J.U., Schuler G.D., Schriml L.M., Sequeira E., et al. Database resources of the National Center for Biotechnology Information: update. Nucleic Acids Res. 32(Database issue): D35–40, 2004.

    Article  PubMed  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Lussier, Y.A., Bodenreider, O. (2007). Clinical Ontologies for Discovery Applications. In: Baker, C.J.O., Cheung, KH. (eds) Semantic Web. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-48438-9_6

Download citation

Publish with us

Policies and ethics