Neural networks for mining the associations between diseases and symptoms in clinical notes


There are challenges for analyzing the narrative clinical notes in Electronic Health Records (EHRs) because of their unstructured nature. Mining the associations between the clinical concepts within the clinical notes can support physicians in making decisions, and provide researchers evidence about disease development and treatment. In this paper, in order to model and analyze disease and symptom relationships in the clinical notes, we present a concept association mining framework that is based on word embedding learned through neural networks. The approach is tested using 154,738 clinical notes from 500 patients, which are extracted from the Indiana University Health’s Electronic Health Records system. All patients are diagnosed with more than one type of disease. The results show that this concept association mining framework can identify related diseases and symptoms. We also propose a method to visualize a patients’ diseases and related symptoms in chronological order. This visualization can provide physicians an overview of the medical history of a patient and support decision making. The presented approach can also be expanded to analyze the associations of other clinical concepts, such as social history, family history, medications, etc.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7


  1. 1.

    Meigs SL, Solomon M. Electronic health record use a bitter pill for many physicians. Perspect Health Inf Manag. 2016;13:1–17.

    Google Scholar 

  2. 2.

    Sondhi P, Sun J, Tong H, Zhai C. Sympgraph: a framework formining clinical notes through symptom relation graphs. In Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, p. 1167–1175. ACM; 2012.

  3. 3.

    McKee PA, Castelli WP, McNamara PM, Kannel WB. The natural history of congestive heart failure: the Framingham study. N Engl J Med. 1971;285(26):1441–6.

    Article  Google Scholar 

  4. 4.

    Zhou X, Menche J, Barabási AL, Sharma A. Human symptoms-disease network. Nat Commun. 2014;5:4212.

    Article  Google Scholar 

  5. 5.

    Mikolov T, Sutskever I, Chen K, Corrado G, Dean J. Distributed representations of words and phrases and their compositionality. In: Proceedings of the international conference on neural information processing systems, 2013; p. 3111–3119.

  6. 6.

    Logeswari S, Premalatha K. Biomedical document clustering using ontology based concept weight. In: Proceedings of the International Conference on Computer Communication and Informatics; 2013. p. 1–4

  7. 7.

    Yoo I, Hu X, Song IY. A coherent graph-based semantic clustering and summarization approach for biomedical literature and a new summarization evaluation method. In: Proceedings of the first international workshop on text mining in bioinformatics, 2006. p. 84–89

  8. 8.

    Zhang X, Jing L, Hu X, Ng M, Zhou X. A comparative study of ontology based term similarity measure on pubmed document clustering. In: Proceedings of the international conference on database systems for advanced applications, 2007. p. 115–126

  9. 9.

    Moen S, Ananiadou TSS. 2013. Distributional semantics resources for biomedical text processing. In: Proceedings of the 5th international symposium on languages in biology and medicine, Tokyo, Japan, p. 39–43

  10. 10.

    Tulkens S, Suster S, DaelemansW. Using distributed representations to disambiguate biomedical and clinical concepts. In: Proceedings of the 15th workshop on biomedical natural language processing, 2016.

  11. 11.

    Globerson A, Chechik G, Pereira F, Tishby N. Euclidean embedding of co-occurrence data. J Mach Learn Res. 2007;8(Oct):2265–95.

    MathSciNet  MATH  Google Scholar 

  12. 12.

    Levy O, Goldberg Y. Neural word embedding as implicit matrix factorization. Adv Neural Inf Process Syst. 2014;27:2177–85.

    Google Scholar 

  13. 13.

    Zhu Y, Yan E, Wang F. Semantic relatedness and similarity of biomedical terms: examining the effects of recency, size, and section of biomedical publications on the performance of word2vec. BMC Med Inform Decis Making. 2017;17:95–103.

    Article  Google Scholar 

  14. 14.

    MetaMap—A Tool For Recognizing UMLS Concepts in Text.

  15. 15.

    Fact Sheet—UMLS Metathesaurus.

  16. 16.

    Kim HK, Kim H, Cho S. Bag-of-concepts Comprehending document representation through clustering words in distributed representation. Neurocomputing. 2017;266:336–52.

    Article  Google Scholar 

  17. 17.

    Shah S, Luo X. Comparison of deep learning based concept representations for biomedical document clustering. In: 2018 IEEE EMBS international conference on biomedical & health informatics (BHI), p. 349–352. IEEE; 2018

  18. 18.

    Hartigan JA, Wong MA. Algorithm as 136: a k-means clustering algorithm. J R Stat Soc Ser C Appl Stat. 1979;28(1):100–8.

    MATH  Google Scholar 

  19. 19.

    Nallamothu BK, Baman TS. Dilated and restrictive cardiomyopathy. Inpatient Cardiovasc Med 2014, 178–186

  20. 20.

    Cavanagh P, Derr J, Ulbrecht J, Maser R, Orchard T. Problems with gait and posture in neuropathic patients with insulin-dependent diabetes mellitus. Diabetic Med. 1992;9(5):469–74.

    Article  Google Scholar 

  21. 21.

    Macgilchrist C, Paul L, Ellis B, Howe T, Kennon B, Godwin J. Lower-limb risk factors for falls in people with diabetes mellitus. Diabetic Med. 2010;27(2):162–8.

    Article  Google Scholar 

Download references

Author information



Corresponding author

Correspondence to Xiao Luo.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Shah, S., Luo, X., Kanakasabai, S. et al. Neural networks for mining the associations between diseases and symptoms in clinical notes. Health Inf Sci Syst 7, 1 (2019).

Download citation


  • Neural networks
  • Natural language processing
  • Concept association mining
  • Clinical notes
  • Electronic health records