Neural networks for mining the associations between diseases and symptoms in clinical notes

  • Setu Shah
  • Xiao LuoEmail author
  • Saravanan Kanakasabai
  • Ricardo Tuason
  • Gregory Klopper
Part of the following topical collections:
  1. Special Issue on Application of Artificial Intelligence in Health Research


There are challenges for analyzing the narrative clinical notes in Electronic Health Records (EHRs) because of their unstructured nature. Mining the associations between the clinical concepts within the clinical notes can support physicians in making decisions, and provide researchers evidence about disease development and treatment. In this paper, in order to model and analyze disease and symptom relationships in the clinical notes, we present a concept association mining framework that is based on word embedding learned through neural networks. The approach is tested using 154,738 clinical notes from 500 patients, which are extracted from the Indiana University Health’s Electronic Health Records system. All patients are diagnosed with more than one type of disease. The results show that this concept association mining framework can identify related diseases and symptoms. We also propose a method to visualize a patients’ diseases and related symptoms in chronological order. This visualization can provide physicians an overview of the medical history of a patient and support decision making. The presented approach can also be expanded to analyze the associations of other clinical concepts, such as social history, family history, medications, etc.


Neural networks Natural language processing Concept association mining Clinical notes Electronic health records 



  1. 1.
    Meigs SL, Solomon M. Electronic health record use a bitter pill for many physicians. Perspect Health Inf Manag. 2016;13:1–17.Google Scholar
  2. 2.
    Sondhi P, Sun J, Tong H, Zhai C. Sympgraph: a framework formining clinical notes through symptom relation graphs. In Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, p. 1167–1175. ACM; 2012.Google Scholar
  3. 3.
    McKee PA, Castelli WP, McNamara PM, Kannel WB. The natural history of congestive heart failure: the Framingham study. N Engl J Med. 1971;285(26):1441–6.CrossRefGoogle Scholar
  4. 4.
    Zhou X, Menche J, Barabási AL, Sharma A. Human symptoms-disease network. Nat Commun. 2014;5:4212.CrossRefGoogle Scholar
  5. 5.
    Mikolov T, Sutskever I, Chen K, Corrado G, Dean J. Distributed representations of words and phrases and their compositionality. In: Proceedings of the international conference on neural information processing systems, 2013; p. 3111–3119.Google Scholar
  6. 6.
    Logeswari S, Premalatha K. Biomedical document clustering using ontology based concept weight. In: Proceedings of the International Conference on Computer Communication and Informatics; 2013. p. 1–4
  7. 7.
    Yoo I, Hu X, Song IY. A coherent graph-based semantic clustering and summarization approach for biomedical literature and a new summarization evaluation method. In: Proceedings of the first international workshop on text mining in bioinformatics, 2006. p. 84–89Google Scholar
  8. 8.
    Zhang X, Jing L, Hu X, Ng M, Zhou X. A comparative study of ontology based term similarity measure on pubmed document clustering. In: Proceedings of the international conference on database systems for advanced applications, 2007. p. 115–126Google Scholar
  9. 9.
    Moen S, Ananiadou TSS. 2013. Distributional semantics resources for biomedical text processing. In: Proceedings of the 5th international symposium on languages in biology and medicine, Tokyo, Japan, p. 39–43Google Scholar
  10. 10.
    Tulkens S, Suster S, DaelemansW. Using distributed representations to disambiguate biomedical and clinical concepts. In: Proceedings of the 15th workshop on biomedical natural language processing, 2016.Google Scholar
  11. 11.
    Globerson A, Chechik G, Pereira F, Tishby N. Euclidean embedding of co-occurrence data. J Mach Learn Res. 2007;8(Oct):2265–95.MathSciNetzbMATHGoogle Scholar
  12. 12.
    Levy O, Goldberg Y. Neural word embedding as implicit matrix factorization. Adv Neural Inf Process Syst. 2014;27:2177–85.Google Scholar
  13. 13.
    Zhu Y, Yan E, Wang F. Semantic relatedness and similarity of biomedical terms: examining the effects of recency, size, and section of biomedical publications on the performance of word2vec. BMC Med Inform Decis Making. 2017;17:95–103.CrossRefGoogle Scholar
  14. 14.
    MetaMap—A Tool For Recognizing UMLS Concepts in Text.
  15. 15.
  16. 16.
    Kim HK, Kim H, Cho S. Bag-of-concepts Comprehending document representation through clustering words in distributed representation. Neurocomputing. 2017;266:336–52.CrossRefGoogle Scholar
  17. 17.
    Shah S, Luo X. Comparison of deep learning based concept representations for biomedical document clustering. In: 2018 IEEE EMBS international conference on biomedical & health informatics (BHI), p. 349–352. IEEE; 2018Google Scholar
  18. 18.
    Hartigan JA, Wong MA. Algorithm as 136: a k-means clustering algorithm. J R Stat Soc Ser C Appl Stat. 1979;28(1):100–8.zbMATHGoogle Scholar
  19. 19.
    Nallamothu BK, Baman TS. Dilated and restrictive cardiomyopathy. Inpatient Cardiovasc Med 2014, 178–186Google Scholar
  20. 20.
    Cavanagh P, Derr J, Ulbrecht J, Maser R, Orchard T. Problems with gait and posture in neuropathic patients with insulin-dependent diabetes mellitus. Diabetic Med. 1992;9(5):469–74.CrossRefGoogle Scholar
  21. 21.
    Macgilchrist C, Paul L, Ellis B, Howe T, Kennon B, Godwin J. Lower-limb risk factors for falls in people with diabetes mellitus. Diabetic Med. 2010;27(2):162–8.CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Purdue School of Engineering and TechnologyIUPUIIndianapolisUSA
  2. 2.Indiana University Health South CampusIndianapolisUSA

Personalised recommendations