Abstract
There are challenges for analyzing the narrative clinical notes in Electronic Health Records (EHRs) because of their unstructured nature. Mining the associations between the clinical concepts within the clinical notes can support physicians in making decisions, and provide researchers evidence about disease development and treatment. In this paper, in order to model and analyze disease and symptom relationships in the clinical notes, we present a concept association mining framework that is based on word embedding learned through neural networks. The approach is tested using 154,738 clinical notes from 500 patients, which are extracted from the Indiana University Health’s Electronic Health Records system. All patients are diagnosed with more than one type of disease. The results show that this concept association mining framework can identify related diseases and symptoms. We also propose a method to visualize a patients’ diseases and related symptoms in chronological order. This visualization can provide physicians an overview of the medical history of a patient and support decision making. The presented approach can also be expanded to analyze the associations of other clinical concepts, such as social history, family history, medications, etc.
Similar content being viewed by others
References
Meigs SL, Solomon M. Electronic health record use a bitter pill for many physicians. Perspect Health Inf Manag. 2016;13:1–17.
Sondhi P, Sun J, Tong H, Zhai C. Sympgraph: a framework formining clinical notes through symptom relation graphs. In Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, p. 1167–1175. ACM; 2012.
McKee PA, Castelli WP, McNamara PM, Kannel WB. The natural history of congestive heart failure: the Framingham study. N Engl J Med. 1971;285(26):1441–6.
Zhou X, Menche J, Barabási AL, Sharma A. Human symptoms-disease network. Nat Commun. 2014;5:4212.
Mikolov T, Sutskever I, Chen K, Corrado G, Dean J. Distributed representations of words and phrases and their compositionality. In: Proceedings of the international conference on neural information processing systems, 2013; p. 3111–3119.
Logeswari S, Premalatha K. Biomedical document clustering using ontology based concept weight. In: Proceedings of the International Conference on Computer Communication and Informatics; 2013. p. 1–4 https://doi.org/10.1109/ICCCI.2013.6466273
Yoo I, Hu X, Song IY. A coherent graph-based semantic clustering and summarization approach for biomedical literature and a new summarization evaluation method. In: Proceedings of the first international workshop on text mining in bioinformatics, 2006. p. 84–89
Zhang X, Jing L, Hu X, Ng M, Zhou X. A comparative study of ontology based term similarity measure on pubmed document clustering. In: Proceedings of the international conference on database systems for advanced applications, 2007. p. 115–126
Moen S, Ananiadou TSS. 2013. Distributional semantics resources for biomedical text processing. In: Proceedings of the 5th international symposium on languages in biology and medicine, Tokyo, Japan, p. 39–43
Tulkens S, Suster S, DaelemansW. Using distributed representations to disambiguate biomedical and clinical concepts. In: Proceedings of the 15th workshop on biomedical natural language processing, 2016.
Globerson A, Chechik G, Pereira F, Tishby N. Euclidean embedding of co-occurrence data. J Mach Learn Res. 2007;8(Oct):2265–95.
Levy O, Goldberg Y. Neural word embedding as implicit matrix factorization. Adv Neural Inf Process Syst. 2014;27:2177–85.
Zhu Y, Yan E, Wang F. Semantic relatedness and similarity of biomedical terms: examining the effects of recency, size, and section of biomedical publications on the performance of word2vec. BMC Med Inform Decis Making. 2017;17:95–103.
MetaMap—A Tool For Recognizing UMLS Concepts in Text. https://metamap.nlm.nih.gov/
Fact Sheet—UMLS Metathesaurus. https://www.nlm.nih.gov/pubs/factsheets/umlsmeta.html
Kim HK, Kim H, Cho S. Bag-of-concepts Comprehending document representation through clustering words in distributed representation. Neurocomputing. 2017;266:336–52.
Shah S, Luo X. Comparison of deep learning based concept representations for biomedical document clustering. In: 2018 IEEE EMBS international conference on biomedical & health informatics (BHI), p. 349–352. IEEE; 2018
Hartigan JA, Wong MA. Algorithm as 136: a k-means clustering algorithm. J R Stat Soc Ser C Appl Stat. 1979;28(1):100–8.
Nallamothu BK, Baman TS. Dilated and restrictive cardiomyopathy. Inpatient Cardiovasc Med 2014, 178–186
Cavanagh P, Derr J, Ulbrecht J, Maser R, Orchard T. Problems with gait and posture in neuropathic patients with insulin-dependent diabetes mellitus. Diabetic Med. 1992;9(5):469–74.
Macgilchrist C, Paul L, Ellis B, Howe T, Kennon B, Godwin J. Lower-limb risk factors for falls in people with diabetes mellitus. Diabetic Med. 2010;27(2):162–8.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Shah, S., Luo, X., Kanakasabai, S. et al. Neural networks for mining the associations between diseases and symptoms in clinical notes. Health Inf Sci Syst 7, 1 (2019). https://doi.org/10.1007/s13755-018-0062-0
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13755-018-0062-0