Abstract
Electronic health records (EHR) contain patients’ health information in varied formats such as clinical reports written in natural language, X-rays, MRI, case/discharge-summary, etc. One of its essential constituents is clinical narratives which contain significant clinical findings of a patient. Since the clinical narratives are stored in natural language, clinical evidence, significant findings, and other observations written in it by doctors remain locked. This free text information is incomprehensible by machines while processing EHRs in healthcare applications such as Clinical Decision Support System. The proposed work, Clinical Narratives to Knowledge Graph (N2K) Mapper algorithm, is an effort to map clinical narratives to a Knowledge Graph (KG) so that important clinical details as recommended by doctors can be used effectively by healthcare applications. The KG is defined by an ontological semantic meta-data called Healthcare Ontology. The N2K Mapper algorithm uses natural language processing to parse the clinical narratives and recognizes medicinal vocabulary using named entity recognition and various existing biomedical ontologies. This semantically retrieved information is then mapped onto a KG. Besides enriching the KG from clinical narratives, the algorithm also augments the biomedical ontologies with new medicinal vocabulary leading to sustainable semantic structure for future references. An experimental study was performed on Chest X-ray radiology reports taken from the Medical Information Mart for Intensive Care (MIMIC)-III dataset. It showed that the N2K Mapper was able to find and recognize approximately 80% of the total identified entities in the existing biomedical ontologies. The remaining 20% were augmented in the biomedical ontologies with an expert’s assistance. The N2K Mapper showed an accuracy of 81.5%, precision of 78.5%, recall of 85.07%, and F1-score of 78.97%.
Similar content being viewed by others
Data Availability
The data that support the findings of this study are available from [35] (https://physionet.org/content/mimiciii/1.4/), but restrictions apply to the availability of these data, which were used under license for the current study.
References
Nurdiati S, Hoede C (2008) 25 years development of knowledge graph theory: the results and the challenge. Memorandum 1876(2):1–10
Singhal A (2012) Introducing the knowledge graph: things, not strings. Official Google Blog
Ji S, Pan S, Cambria E, Marttinen P, Yu PS (2021) A survey on knowledge graphs: representation, acquisition, and applications. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2021.3070843
Tian L, Zhou X, Wu YP, Zhou WT, Zhang JH, Zhang TS (2022) Knowledge graph and knowledge reasoning: a systematic review. Appl Geochem. https://doi.org/10.1016/j.jnlest.2022.100159
Zhang Y et al (2020) HKGB: an inclusive, extensible, intelligent, semi-auto-constructed knowledge graph framework for healthcare with clinicians’ expertise incorporated. Inf Process Manag 57(6):102324. https://doi.org/10.1016/j.ipm.2020.102324
Fang Y, Wang H, Wang L, Di R, Song Y (2019) Diagnosis of COPD based on a knowledge graph and integrated model. IEEE Access 7:46004–46013. https://doi.org/10.1109/ACCESS.2019.2909069
Malik KM, Krishnamurthy M, Alobaidi M, Hussain M, Alam F, Malik G (2020) Automated domain-specific healthcare knowledge graph curation framework: subarachnoid hemorrhage as phenotype. Expert Syst Appl. https://doi.org/10.1016/j.eswa.2019.113120
Spasić I, Zhao B, Jones CB, Button K (2015) KneeTex: an ontology-driven system for information extraction from MRI reports. J Biomed Semantics 6(1):1–26. https://doi.org/10.1186/s13326-015-0033-1
Yao L, Liu H, Liu Y, Li X, Anwar MW (2015) Biomedical named entity recognition based on deep neutral network. Int J Hybrid Inf Technol 8(8):279–288. https://doi.org/10.14257/ijhit.2015.8.8.29
Rotmensch M, Halpern Y, Tlimat A, Horng S, Sontag D (2017) Learning a health knowledge graph from electronic medical records. Sci Rep 7(1):1–11. https://doi.org/10.1038/s41598-017-05778-z
Harnoune A, Rhanoui M, Mikram M, Yousfi S, Elkaimbillah Z, El Asri B (2021) BERT based clinical knowledge extraction for biomedical knowledge graph construction and analysis. Comput Methods Progr Biomed Update 1:100042. https://doi.org/10.1016/j.cmpbup.2021.100042
Nath N, Lee SH, McDonnell M, Lee I (2021) The quest for better clinical word vectors: ontology based and lexical vector augmentation versus clinical contextual embeddings. Comput Biol Med. https://doi.org/10.1016/j.compbiomed.2021.104433
Kamdar MR et al (2020) Text snippets to corroborate medical relations: an unsupervised approach using a knowledge graph and embeddings. AMIA Summits Transl Sci Proc 2020:288–297
Li L et al (2020) Real-world data medical knowledge graph: construction and applications. Artif Intell Med 103:101817. https://doi.org/10.1016/j.artmed.2020.101817
Yuan H, Deng W (2021) Doctor recommendation on healthcare consultation platforms: an integrated framework of knowledge graph and deep learning. Internet Res. https://doi.org/10.1108/INTR-07-2020-0379
Ernst P, Siu A, Weikum G (2015) “KnowLife: a versatile approach for constructing a large knowledge graph for biomedical sciences. BMC Bioinformatics. https://doi.org/10.1186/s12859-015-0549-5
Shi L, Li S, Yang X, Qi J, Pan G, Zhou B (2017) Semantic health knowledge graph: semantic integration of heterogeneous medical knowledge and services. Biomed Res Int. https://doi.org/10.1155/2017/2858423
Yu T et al (2017) Knowledge graph for TCM health preservation: design, construction, and applications. Artif Intell Med 77:48–52. https://doi.org/10.1016/j.artmed.2017.04.001
Xia E, Sun W, Mei J, Xu E, Wang K, Qin Y (2018) Mining disease-symptom relation from massive biomedical literature and its application in severe disease diagnosis. AMIA Annu Symp Proc 2018:1118–1126
Tao X et al (2020) Mining health knowledge graph for health risk prediction. World Wide Web. https://doi.org/10.1007/s11280-020-00810-1
Xiu X, Qian Q, Wu S (2020) Construction of a digestive system tumor knowledge graph based on Chinese electronic medical records: development and usability study. JMIR Med Inform. https://doi.org/10.2196/18287
Smith B et al (2007) The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration. Nat Biotechnol 25(11):1251–1255. https://doi.org/10.1038/nbt1346
Dhiman S, Thukral A, Bedi P (2022) OHF: an ontology based framework for healthcare. In: Dev A, Agrawal SS, Sharma A (eds) Artificial intelligence and speech technology: AIST 2021. Communications in computer and information science, vol 1546. Springer, Cham, pp 318–328
Vidhate DA, Kulkarni P (2018) Improved decision making in multiagent system for diagnostic application using cooperative learning algorithms. Int J Inf Technol. https://doi.org/10.1007/s41870-017-0079-7
Gruber T (2009) Definition of ontology. Database systems, pp 10–12
Mohammed O, Benlamri R, Fong S (2012) Building a diseases symptoms ontology for medical diagnosis: an integrative approach. In: 1st international conference on future generation communication technologies, FGCT 2012, pp 104–108. https://doi.org/10.1109/FGCT.2012.6476567
Schriml LM et al (2019) Human Disease Ontology 2018 update: classification, content and workflow expansion. Nucleic Acids Res 47(D1):955–962. https://doi.org/10.1093/nar/gky1032
Langlotz CP (2006) RadLex: a new method for indexing online educational materials. Radiographics. https://doi.org/10.1148/rg.266065168
Rosse C, Mejino JLV (2003) A reference ontology for biomedical informatics: the foundational model of anatomy. J Biomed Inform 36(6):478–500. https://doi.org/10.1016/j.jbi.2003.11.007
Cyganiak R, Wood D, Lanthaler M (2014) RDF 1.1 concepts and abstract syntax. W3C Recommendation
Tjong Kim Sang EF, Buchholz S (2000) Introduction to the CoNLL-2000 shared task: Chunking
Akhil KK, Rajimol R, Anoop VS (2020) Parts-of-Speech tagging for Malayalam using deep learning techniques. Int J Inf Technol. https://doi.org/10.1007/s41870-020-00491-z
Thanawala P, Pareek J (2018) MwTExt: automatic extraction of multi-word terms to generate compound concepts within ontology. Int J Inf Technol. https://doi.org/10.1007/s41870-018-0111-6
Sintayehu H, Lehal GS (2021) Named entity recognition: a semi-supervised learning approach. Int J Inf Technol. https://doi.org/10.1007/s41870-020-00470-4
Johnson AEW et al (2016) MIMIC-III, a freely accessible critical care database. Sci Data 3(1):1–9. https://doi.org/10.1038/sdata.2016.35
Sato K (2012) An inside look at Google BigQuery. Google Inc
Loper E, Bird S (2002) NLTK: the natural language Toolkit. Assoc. Comput. Linguist.
Lamy JB (2017) Owlready: Ontology-oriented programming in Python with automatic classification and high level constructs for biomedical ontologies. Artif Intell Med 80(2020):11–28. https://doi.org/10.1016/j.artmed.2017.07.002
DuCharme B (2010) Learning SPARQL querying and updating with SPARQL 1.1
Zaki MJ, Meira W Jr (2014) Data mining and analysis: fundamental concepts and algorithms, 2nd edn. Cambridge University Press, Cambridge
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Thukral, A., Dhiman, S., Meher, R. et al. Knowledge graph enrichment from clinical narratives using NLP, NER, and biomedical ontologies for healthcare applications. Int. j. inf. tecnol. 15, 53–65 (2023). https://doi.org/10.1007/s41870-022-01145-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41870-022-01145-y