Skip to main content

Data Standards and Terminology Including Biomedical Ontologies

  • Chapter
  • First Online:
Clinical Applications of Artificial Intelligence in Real-World Data

Abstract

Electronic health records are routinely collected as part of care and have variable data types, quality and structure. As a result, there is a need for standardization of clinical data from health records if these are to be used in software applications for data mining and/or machine learning and artificial intelligence approaches. Clinical terminologies and classification systems are available that can serve as standards to enable the harmonization of disparate data sources. In this chapter, we discuss different types of biomedical semantic standards including medically-relevant ontologies, their uses, and their limitations. We also discuss the application of semantic standards in order to provide features for use in machine learning particularly with respect to phenotypes. Finally, we discuss potential areas of improvement for the future such as covering genotypes and steps needed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. The health information technology for economic and clinical health act (HITECH act). PsycEXTRA Dataset. American Psychological Association (APA); 2009. https://doi.org/10.1037/e500522017-001.

  2. Shivade C, Raghavan P, Fosler-Lussier E, Embi PJ, Elhadad N, Johnson SB, et al. A review of approaches to identifying patient phenotype cohorts using electronic health records. J Am Med Inform Assoc. 2014;21:221–30.

    Article  Google Scholar 

  3. ISO 8601-1:2019. In: ISO [Internet]. 2019 [cited 31 Jan 2022]. Available: https://www.iso.org/standard/70907.html.

  4. Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, et al. The variant call format and VCFtools. Bioinformatics. 2011;27:2156–8.

    Article  Google Scholar 

  5. National Institutes of Health (NIH). In: National Institutes of Health (NIH) [Internet]. [cited 31 Jan 2022]. Available: https://www.nih.gov/.

  6. Jackson R, Matentzoglu N, Overton JA, Vita R, Balhoff JP, Buttigieg PL, et al. OBO Foundry in 2021: operationalizing open data principles to evaluate ontologies. Database . 2021;2021. https://doi.org/10.1093/database/baab069.

  7. Institute of Medicine (US) Committee on Data Standards for Patient Safety, Aspden P, Corrigan JM, Wolcott J, Erickson SM. Health Care Data Standards. National Academies Press (US); 2004.

    Google Scholar 

  8. McGlynn EA, Lieu TA, Durham ML, Bauck A, Laws R, Go AS, et al. Developing a data infrastructure for a learning health system: the portal network. J Am Med Inform Assoc. 2014;21:596–601.

    Article  Google Scholar 

  9. Bender D, Sartipi K. HL7 FHIR: an agile and RESTful approach to healthcare information exchange. In: Proceedings of the 26th IEEE international symposium on computer-based medical systems. ieeexplore.ieee.org; 2013. p. 326–31.

    Google Scholar 

  10. OMOP Common Data Model. [cited 31 Jan 2022]. Available: https://www.ohdsi.org/data-standardization/the-common-data-model/.

  11. Rector A, Schulz S, Rodrigues JM, Chute CG, Solbrig H. On beyond Gruber: “Ontologies” in today’s biomedical information systems and the limits of OWL. J Biomed Inform. 2019;100S: 100002.

    Article  Google Scholar 

  12. McGuinness DL, Van Harmelen F, Others. OWL web ontology language overview. W3C recommendation. 2004;10: 2004.

    Google Scholar 

  13. Miles A, Bechhofer S. SKOS simple knowledge organization system reference. W3C Recommendation. 2009 [cited 22 Feb 2022]. Available: https://www.escholar.manchester.ac.uk/uk-ac-man-scw:66505.

  14. Semantic web - W3C. [cited 22 Feb 2022]. Available: https://www.w3.org/standards/semanticweb/.

  15. Whetzel PL, Noy NF, Shah NH, Alexander PR, Nyulas C, Tudorache T, et al. BioPortal: enhanced functionality via new Web services from the national center for biomedical ontology to access and use ontologies in software applications. Nucleic Acids Res. 2011;39:W541–5.

    Article  Google Scholar 

  16. Wikipedia contributors. Abox. In: Wikipedia, The Free Encyclopedia [Internet]. 19 Nov 2021. Available: https://en.wikipedia.org/w/index.php?title=Abox&oldid=1056049124.

  17. Kibbe WA, Arze C, Felix V, Mitraka E, Bolton E, Fu G, et al. Disease Ontology 2015 update: an expanded and updated database of human diseases for linking biomedical knowledge through disease data. Nucleic Acids Res. 2015;43:D1071–8.

    Article  Google Scholar 

  18. Hogan WR, Hanna J, Joseph E, Brochhausen M. Towards a consistent and scientifically accurate drug ontology. CEUR Workshop Proc. 2013;1060:68–73.

    Google Scholar 

  19. Bandrowski A, Brinkman R, Brochhausen M, Brush MH, Bug B, Chibucos MC, et al. The ontology for biomedical investigations. PLoS ONE. 2016;11: e0154556.

    Article  Google Scholar 

  20. Hastings J, Owen G, Dekker A, Ennis M, Kale N, Muthukrishnan V, et al. ChEBI in 2016: improved services and an expanding collection of metabolites. Nucleic Acids Res. 2016;44:D1214–9.

    Article  Google Scholar 

  21. Giannangelo K. Healthcare code sets, clinical terminologies, and classification systems, 3rd ed. American Health Information Management Association; 2014.

    Google Scholar 

  22. Classification Systems : U.S. Bureau of Labor Statistics. 30 Sep 2015 [cited 14 Jan 2022]. Available: https://www.bls.gov/opub/hom/topic/classification-systems.htm.

  23. ICD-10 Version:2019. [cited 14 Jan 2022]. Available: https://icd.who.int/browse10/2019/en.

  24. Nouraei SAR, Hudovsky A, Virk JS, Chatrath P, Sandhu GS. An audit of the nature and impact of clinical coding subjectivity variability and error in otolaryngology. Clin Otolaryngol. 2013;38:512–24.

    Article  Google Scholar 

  25. Benson T. Principles of health interoperability HL7 and SNOMED. Springer London; 2010.

    Google Scholar 

  26. Read Codes—NHS Digital. [cited 5 Mar 2021]. Available: https://digital.nhs.uk/services/terminology-and-classifications/read-codes.

  27. Lee D, Cornet R, Lau F, de Keizer N. A survey of SNOMED CT implementations. J Biomed Inform. 2013;46:87–96.

    Article  Google Scholar 

  28. Kalet IJ. Chapter 4—Biomedical Information Access. In: Kalet IJ, editor. Principles of Biomedical Informatics. 2nd ed. San Diego: Academic Press; 2014. p. 397–478.

    Chapter  Google Scholar 

  29. Hansen DP, Kemp ML, Mills SR, Mercer MA, Frosdick PA, Lawley MJ. Developing a national emergency department data reference set based on SNOMED CT. Med J Aust. 2011;194:S8-10.

    Article  Google Scholar 

  30. NHS Digital. The NHS Digital SNOMED CT Browser. [cited 14 Jan 2022]. Available: https://termbrowser.nhs.uk/?.

  31. Compositional Grammar—Specification and Guide—Compositional Grammar - SNOMED Confluence. [cited 14 Jan 2022]. Available: https://confluence.ihtsdotools.org/display/DOCSCG/Compositional+Grammar+-+Specification+and+Guide.

  32. Karlsson D, Nyström M, Cornet R. Does SNOMED CT post-coordination scale? Stud Health Technol Inform. 2014;205:1048–52.

    Google Scholar 

  33. Wikipedia contributors. SNOMED CT. In: Wikipedia, The Free Encyclopedia [Internet]. 23 Dec 2021. Available: https://en.wikipedia.org/w/index.php?title=SNOMED_CT&oldid=1061690432.

  34. ICD-11. [cited 22 Feb 2022]. Available: https://icd.who.int/en.

  35. ICD-ICD-10-CM - International classification of diseases, tenth revision, clinical modification. 11 Feb 2022 [cited 22 Feb 2022]. Available: https://www.cdc.gov/nchs/icd/icd10cm.htm.

  36. Cartwright DJ. ICD-9-CM to ICD-10-CM codes: what? why? how? Adv Wound Care. 2013;2:588–92.

    Article  Google Scholar 

  37. ICD-10-CM and ICD-10 PCS and GEMs Archive. [cited 22 Feb 2022]. Available: https://www.cms.gov/Medicare/Coding/ICD10/Archive-ICD-10-CM-ICD-10-PCS-GEMs.

  38. Fung KW, Richesson R, Smerek M, Pereira KC, Green BB, Patkar A, et al. Preparing for the ICD-10-CM transition: automated methods for translating ICD codes in clinical phenotype definitions. EGEMS (Wash DC). 2016;4:1211.

    Google Scholar 

  39. Liu S, Ma W, Moore R, Ganesan V, Nelson S. RxNorm: prescription for electronic drug information exchange. IT Prof. 2005;7:17–23.

    Article  Google Scholar 

  40. Spiers I, Goulding J, Arrowsmith I. Clinical terminologies in the NHS: SNOMED CT and dm+ d. British J Pharmacy. 2017;2:80–7.

    Article  Google Scholar 

  41. Association AM. Current procedural terminology: CPT. Am Med Ass. 2007.

    Google Scholar 

  42. Morley KI, Wallace J, Denaxas SC, Hunter RJ, Patel RS, Perel P, et al. Defining disease phenotypes using national linked electronic health records: a case study of atrial fibrillation. PLoS ONE. 2014;9: e110900.

    Article  Google Scholar 

  43. Huff SM, Rocha RA, McDonald CJ, De Moor GJ, Fiers T, Bidgood WD Jr, et al. Development of the logical observation identifier names and codes (LOINC) vocabulary. J Am Med Inform Assoc. 1998;5:276–92.

    Article  Google Scholar 

  44. Langlotz CP. RadLex: a new method for indexing online educational materials. Radiographics. 2006;26:1595–7.

    Article  Google Scholar 

  45. Hripcsak G, Albers DJ. Next-generation phenotyping of electronic health records. J Am Med Inform Assoc. 2013;20:117–21.

    Article  Google Scholar 

  46. Denaxas S, Gonzalez-Izquierdo A, Direk K, Fitzpatrick NK, Fatemifar G, Banerjee A, et al. UK phenomics platform for developing and validating electronic health record phenotypes: CALIBER. J Am Med Inform Assoc. 2019;26:1545–59.

    Article  Google Scholar 

  47. Herrett E, Shah AD, Boggon R, Denaxas S, Smeeth L, van Staa T, et al. Completeness and diagnostic validity of recording acute myocardial infarction events in primary care, hospital care, disease registry, and national mortality records: cohort study. BMJ. 2013;346: f2350.

    Article  Google Scholar 

  48. Bastarache L. Using phecodes for research with the electronic health record: from PheWAS to PheRS. Annu Rev Biomed Data Sci. 2021;4:1–19.

    Article  Google Scholar 

  49. HCUP-US Tools & Software Page. [cited 22 Feb 2022]. Available: https://www.hcup-us.ahrq.gov/toolssoftware/ccs/ccsfactsheet.jsp.

  50. Denny JC, Bastarache L, Ritchie MD, Carroll RJ, Zink R, Mosley JD, et al. Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data. Nat Biotechnol. 2013;31:1102–11.

    Article  Google Scholar 

  51. Wu P, Gifford A, Meng X, Li X, Campbell H, Varley T, et al. Mapping ICD-10 and ICD-10-CM codes to phecodes: workflow development and initial evaluation. JMIR Med Inform. 2019;7: e14325.

    Article  Google Scholar 

  52. Wei W-Q, Bastarache LA, Carroll RJ, Marlo JE, Osterman TJ, Gamazon ER, et al. Evaluating phecodes, clinical classification software, and ICD-9-CM codes for phenome-wide association studies in the electronic health record. PLoS ONE. 2017;12: e0175508.

    Article  Google Scholar 

  53. Vasilevsky N, Essaid S, Matentzoglu N, Harris NL, Haendel M, Robinson P, et al. Mondo disease ontology: harmonizing disease concepts across the world. CEUR Workshop Proceedings. CEUR-WS; 2020. Available: http://ceur-ws.org/Vol-2807/abstractY.pdf.

  54. Nguengang Wakap S, Lambert DM, Olry A, Rodwell C, Gueydan C, Lanneau V, et al. Estimating cumulative point prevalence of rare diseases: analysis of the Orphanet database. Eur J Hum Genet. 2020;28:165–73.

    Article  Google Scholar 

  55. McKusick VA. Mendelian inheritance in man and its online version. OMIM Am J Hum Genet. 2007;80:588–604.

    Article  Google Scholar 

  56. Gene Ontology Consortium. The gene ontology resource: enriching a GOld mine. Nucleic Acids Res. 2021;49:D325–34.

    Article  Google Scholar 

  57. Mungall CJ, Torniai C, Gkoutos GV, Lewis SE, Haendel MA. Uberon, an integrative multi-species anatomy ontology. Genome Biol. 2012;13:R5.

    Article  Google Scholar 

  58. Cook DL, Mejino JLV, Rosse C. The foundational model of anatomy: a template for the symbolic representation of multi-scale physiological functions. Conf Proc IEEE Eng Med Biol Soc. 2004;2004:5415–8.

    Google Scholar 

  59. Robinson PN, Köhler S, Bauer S, Seelow D, Horn D, Mundlos S. The human phenotype ontology: a tool for annotating and analyzing human hereditary disease. Am J Hum Genet. 2008;83:610–5.

    Article  Google Scholar 

  60. Khattak FK, Jeblee S, Pou-Prom C, Abdalla M, Meaney C, Rudzicz F. A survey of word embeddings for clinical text. J Biomed Inform. 2019;100S: 100057.

    Article  Google Scholar 

  61. Li Y, Rao S, Solares JRA, Hassaine A, Ramakrishnan R, Canoy D, et al. BEHRT: transformer for electronic health records. Sci Rep. 2020;10:7155.

    Article  Google Scholar 

  62. Kuan V, Denaxas S, Gonzalez-Izquierdo A, Direk K, Bhatti O, Husain S, et al. A chronological map of 308 physical and mental health conditions from 4 million individuals in the English National Health Service. Lancet Digit Health. 2019;1:e63–77.

    Article  Google Scholar 

  63. Henderson J, Ho JC, Kho AN, Denny JC, Malin BA, Sun J, et al. Granite: diversified, sparse tensor factorization for electronic health record-based phenotyping. In: 2017 IEEE international conference on healthcare informatics (ICHI); 2017. p. 214–23.

    Google Scholar 

  64. Ho JC, Ghosh J, Steinhubl SR, Stewart WF, Denny JC, Malin BA, et al. Limestone: high-throughput candidate phenotype generation via tensor factorization. J Biomed Inform. 2014;52:199–211.

    Article  Google Scholar 

  65. Wang Z, He Y. Precision omics data integration and analysis with interoperable ontologies and their application for COVID-19 research. Brief Funct Genom. 2021;20:235–48.

    Article  Google Scholar 

  66. Aevermann BD, Novotny M, Bakken T, Miller JA, Diehl AD, Osumi-Sutherland D, et al. Cell type discovery using single-cell transcriptomics: implications for ontological representation. Hum Mol Genet. 2018;27:R40–7.

    Article  Google Scholar 

  67. Reese JT, Unni D, Callahan TJ, Cappelletti L, Ravanmehr V, Carbon S, et al. KG-COVID-19: a framework to produce customized knowledge graphs for COVID-19 response. Patterns (N Y). 2021;2: 100155.

    Article  Google Scholar 

  68. Eilbeck K, Lewis SE, Mungall CJ, Yandell M, Stein L, Durbin R, et al. The sequence ontology: a tool for the unification of genome annotations. Genome Biol. 2005;6:R44.

    Article  Google Scholar 

  69. Bodenreider O, Cornet R, Vreeman DJ. Recent developments in clinical terminologies—SNOMED CT, LOINC, and RxNorm. Yearb Med Inform. 2018;27:129–39.

    Article  Google Scholar 

  70. Ceusters, W. The place of Referent Tracking in Biomedical Informatics. 2020. https://doi.org/10.31219/osf.io/q8hts.

  71. Arp R, Smith B, Spear AD. Building ontologies with basic formal ontology. MIT Press; 2015.

    Google Scholar 

  72. Hemingway H, Asselbergs FW, Danesh J, Dobson R, Maniadakis N, Maggioni A, et al. Big data from electronic health records for early and late translational cardiovascular research: challenges and potential. Eur Heart J. 2018;39:1481–95.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Spiros Denaxas .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Denaxas, S., Stoeckert, C. (2023). Data Standards and Terminology Including Biomedical Ontologies. In: Asselbergs, F.W., Denaxas, S., Oberski, D.L., Moore, J.H. (eds) Clinical Applications of Artificial Intelligence in Real-World Data. Springer, Cham. https://doi.org/10.1007/978-3-031-36678-9_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-36678-9_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-36677-2

  • Online ISBN: 978-3-031-36678-9

  • eBook Packages: MedicineMedicine (R0)

Publish with us

Policies and ethics