Skip to main content

Identification of Congestive Heart Failure Patients Through Natural Language Processing

  • Chapter
  • First Online:
Data Science

Abstract

Research in biomedical field requires technical infrastructures to deal with heterogeneous and multi-sourced big data. Biomedical informatics primarily use data in electronic health to better understand how diseases spread or to gather new insights from patient history. One of the prominent use cases of electronic health records is identification of patient cohort (group) with specific disease or some common characteristics, so that useful inferences may be drawn via these records. This paper proposes a methodology for identification and analysis of cohorts for patients having congestive heart failure problem among obesity patients. This may help doctors and medical researchers in predicting outcomes, survival analysis of patients, clinical trials, and other types of retroactive studies. cTAKES tool was used to apply natural language processing technique in order to identify patients belonging to a particular cohort. All clinical terms were identified and were mapped to its matching terms in the UMLS Metathesaurus. Also, negated statements were detected and removed from the final cohort. The method is reasonably automated and achieves accuracy, precision, recall, and F-score values of 0.970, 0.972, 0.958, and 0.965, respectively. Results were compared against the experts annotations. Additionally, manual review of clinical records was performed for further validation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  • Afzal N, Mallipeddi VP, Sohn S, Liu H, Chaudhry R, Scott CG, Arruda-Olson AM (2018) Natural language processing of clinical notes for identification of critical limb ischemia. Int J Med Inform 111:83–89

    Article  Google Scholar 

  • Bodenreider O (2004) The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Res 32(Suppl\_1):D267–D270

    Google Scholar 

  • Denecke K (2015) Health web science: social media data for healthcare. Springer, Berlin

    Google Scholar 

  • Fox F, Aggarwal VR, Whelton H, Johnson O (2018, June) A data quality framework for process mining of electronic health record data. In: 2018 IEEE International conference on healthcare informatics (ICHI). IEEE, pp 12–21

    Google Scholar 

  • Grana M, Jackwoski K (2015, November) Electronic health record: a review. In: 2015 IEEE International conference on bioinformatics and biomedicine (BIBM). IEEE, pp 1375–1382

    Google Scholar 

  • Gupta D, Sundaram S, Khanna A, Hassanien AE, De Albuquerque VHC (2018) Improved diagnosis of Parkinson’s disease using optimized crow search algorithm. Comput Electr Eng 68:412–424

    Article  Google Scholar 

  • https://ctakes.apache.org/. Last accessed 11 Apr 2019

  • https://en.wikipedia.org/wiki/Apache_cTAKES. Last accessed 11 Apr 2019

  • https://medium.com/greyatom/learning-pos-tagging-chunking-in-nlp-85f7f811a8cb. Last accessed: 10 Apr 2019

  • https://nlp.stanford.edu/IR-book/html/htmledition/tokenization-1.html. Last accessed 10 Apr 2019

  • https://nlp.stanford.edu/software/tagger.shtml. Last accessed: 10 Apr 2019

  • https://www.i2b2.org/NLP/DataSets/Main.php. Last accessed 18 Apr 2019

  • https://www.i2b2.org/NLP/Obesity/Documentation.php. Last accessed 18 Apr 2019

  • https://www.mayoclinic.org/diseases-conditions/heart-failure/symptoms-causes/syc-20373142. Last accessed: 18 May 2019

  • Huffman MD, Prabhakaran D (2010) Heart failure: epidemiology and prevention in India. Nat Med J India 23(5):283

    Google Scholar 

  • Johar A, Baliyan N (in press) Data science approaches to patient cohort identification: a use case in biomedical field. In: 1st International conference on machine learning, image processing, network security and data sciences. IETE Springer Series

    Google Scholar 

  • Kandula S, Zeng-Treitler Q, Chen L, Salomon WL, Bray BE (2011) A bootstrapping algorithm to improve cohort identification using structured data. J Biomed Inform 44:S63–S68

    Article  Google Scholar 

  • Malathi D, Logesh R, Subramaniyaswamy V, Vijayakumar V, Sangaiah AK (2019) Hybrid reasoning-based privacy-aware disease prediction support system. Comput Electr Eng 73:114–127

    Article  Google Scholar 

  • MIT Critical Data (2016) Secondary analysis of electronic health records. Springer Nature, Berlin, p 427

    Google Scholar 

  • Murphy SN, Gainer V, Mendis M, Churchill S, Kohane I (2011) Strategies for maintaining patient privacy in i2b2. J Am Med Inform Assoc 18(Supplement\_1):i103–i108

    Google Scholar 

  • Pang Z, Yang G, Khedri R, Zhang YT (2018) Introduction to the special section: convergence of automation technology, biomedical engineering, and health informatics toward the healthcare 4.0. IEEE Rev Biomed Eng 11:249–259

    Article  Google Scholar 

  • Reátegui R, Ratté S (2018) Comparison, of MetaMap and cTAKES for entity extraction in clinical notes. BMC Med Inform Decis Making 18(3), 74; Smith J, Jones M Jr, Houghton L et al (1999) Future of health insurance. N Engl J Med 965:325–329

    Google Scholar 

  • Saini M, Baliyan N, Bassi V (2017, August) Prediction of heart disease severity with hybrid data mining. In: 2017 2nd International conference on telecommunication and networks (TEL-NET). IEEE, pp 1–6

    Google Scholar 

  • Shickel B, Tighe PJ, Bihorac A, Rashidi P (2017) Deep EHR: a survey of recent advances in deep learning techniques for electronic health record (EHR) analysis. IEEE J Biomed Health Inform 22(5):1589–1604

    Article  Google Scholar 

  • Sohn S, Wi CI, Wu ST, Liu H, Ryu E, Krusemark E, Juhn YJ (2018) Ascertainment of asthma prognosis using natural language processing from electronic medical records. J Allergy Clin Immunol 141(6):2292–2294

    Article  Google Scholar 

  • Unified Medical Language System (UMLS): The Metathesaurus. https://www.nlm.nih.gov/research/umls/new_users/online_learning/Meta_001.html. Last accessed 18 Apr 2019

  • Wang L, Zheng X, Huang LS, Xu J, Hsu FC, Chen SH, Ng MC, Bowden DW, Freedman BI, Su J (2018) Progression of chronic kidney disease in African Americans with type 2 diabetes mellitus using topology learning in electronic medical records. bioRxiv, 361956

    Google Scholar 

  • Wi CI, Sohn S, Ali M, Krusemark E, Ryu E, Liu H, Juhn YJ (2018) Natural language processing for asthma ascertainment in different practice settings. J Allergy Clin Immunol Pract 6(1):126–131

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Niyati Baliyan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Baliyan, N., Johar, A., Bhardwaj, P. (2021). Identification of Congestive Heart Failure Patients Through Natural Language Processing. In: Verma, G.K., Soni, B., Bourennane, S., Ramos, A.C.B. (eds) Data Science. Transactions on Computer Systems and Networks. Springer, Singapore. https://doi.org/10.1007/978-981-16-1681-5_26

Download citation

  • DOI: https://doi.org/10.1007/978-981-16-1681-5_26

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-16-1680-8

  • Online ISBN: 978-981-16-1681-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics