Abstract
Research in biomedical field requires technical infrastructures to deal with heterogeneous and multi-sourced big data. Biomedical informatics primarily use data in electronic health to better understand how diseases spread or to gather new insights from patient history. One of the prominent use cases of electronic health records is identification of patient cohort (group) with specific disease or some common characteristics, so that useful inferences may be drawn via these records. This paper proposes a methodology for identification and analysis of cohorts for patients having congestive heart failure problem among obesity patients. This may help doctors and medical researchers in predicting outcomes, survival analysis of patients, clinical trials, and other types of retroactive studies. cTAKES tool was used to apply natural language processing technique in order to identify patients belonging to a particular cohort. All clinical terms were identified and were mapped to its matching terms in the UMLS Metathesaurus. Also, negated statements were detected and removed from the final cohort. The method is reasonably automated and achieves accuracy, precision, recall, and F-score values of 0.970, 0.972, 0.958, and 0.965, respectively. Results were compared against the experts annotations. Additionally, manual review of clinical records was performed for further validation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Afzal N, Mallipeddi VP, Sohn S, Liu H, Chaudhry R, Scott CG, Arruda-Olson AM (2018) Natural language processing of clinical notes for identification of critical limb ischemia. Int J Med Inform 111:83–89
Bodenreider O (2004) The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Res 32(Suppl\_1):D267–D270
Denecke K (2015) Health web science: social media data for healthcare. Springer, Berlin
Fox F, Aggarwal VR, Whelton H, Johnson O (2018, June) A data quality framework for process mining of electronic health record data. In: 2018 IEEE International conference on healthcare informatics (ICHI). IEEE, pp 12–21
Grana M, Jackwoski K (2015, November) Electronic health record: a review. In: 2015 IEEE International conference on bioinformatics and biomedicine (BIBM). IEEE, pp 1375–1382
Gupta D, Sundaram S, Khanna A, Hassanien AE, De Albuquerque VHC (2018) Improved diagnosis of Parkinson’s disease using optimized crow search algorithm. Comput Electr Eng 68:412–424
https://ctakes.apache.org/. Last accessed 11 Apr 2019
https://en.wikipedia.org/wiki/Apache_cTAKES. Last accessed 11 Apr 2019
https://medium.com/greyatom/learning-pos-tagging-chunking-in-nlp-85f7f811a8cb. Last accessed: 10 Apr 2019
https://nlp.stanford.edu/IR-book/html/htmledition/tokenization-1.html. Last accessed 10 Apr 2019
https://nlp.stanford.edu/software/tagger.shtml. Last accessed: 10 Apr 2019
https://www.i2b2.org/NLP/DataSets/Main.php. Last accessed 18 Apr 2019
https://www.i2b2.org/NLP/Obesity/Documentation.php. Last accessed 18 Apr 2019
https://www.mayoclinic.org/diseases-conditions/heart-failure/symptoms-causes/syc-20373142. Last accessed: 18 May 2019
Huffman MD, Prabhakaran D (2010) Heart failure: epidemiology and prevention in India. Nat Med J India 23(5):283
Johar A, Baliyan N (in press) Data science approaches to patient cohort identification: a use case in biomedical field. In: 1st International conference on machine learning, image processing, network security and data sciences. IETE Springer Series
Kandula S, Zeng-Treitler Q, Chen L, Salomon WL, Bray BE (2011) A bootstrapping algorithm to improve cohort identification using structured data. J Biomed Inform 44:S63–S68
Malathi D, Logesh R, Subramaniyaswamy V, Vijayakumar V, Sangaiah AK (2019) Hybrid reasoning-based privacy-aware disease prediction support system. Comput Electr Eng 73:114–127
MIT Critical Data (2016) Secondary analysis of electronic health records. Springer Nature, Berlin, p 427
Murphy SN, Gainer V, Mendis M, Churchill S, Kohane I (2011) Strategies for maintaining patient privacy in i2b2. J Am Med Inform Assoc 18(Supplement\_1):i103–i108
Pang Z, Yang G, Khedri R, Zhang YT (2018) Introduction to the special section: convergence of automation technology, biomedical engineering, and health informatics toward the healthcare 4.0. IEEE Rev Biomed Eng 11:249–259
Reátegui R, Ratté S (2018) Comparison, of MetaMap and cTAKES for entity extraction in clinical notes. BMC Med Inform Decis Making 18(3), 74; Smith J, Jones M Jr, Houghton L et al (1999) Future of health insurance. N Engl J Med 965:325–329
Saini M, Baliyan N, Bassi V (2017, August) Prediction of heart disease severity with hybrid data mining. In: 2017 2nd International conference on telecommunication and networks (TEL-NET). IEEE, pp 1–6
Shickel B, Tighe PJ, Bihorac A, Rashidi P (2017) Deep EHR: a survey of recent advances in deep learning techniques for electronic health record (EHR) analysis. IEEE J Biomed Health Inform 22(5):1589–1604
Sohn S, Wi CI, Wu ST, Liu H, Ryu E, Krusemark E, Juhn YJ (2018) Ascertainment of asthma prognosis using natural language processing from electronic medical records. J Allergy Clin Immunol 141(6):2292–2294
Unified Medical Language System (UMLS): The Metathesaurus. https://www.nlm.nih.gov/research/umls/new_users/online_learning/Meta_001.html. Last accessed 18 Apr 2019
Wang L, Zheng X, Huang LS, Xu J, Hsu FC, Chen SH, Ng MC, Bowden DW, Freedman BI, Su J (2018) Progression of chronic kidney disease in African Americans with type 2 diabetes mellitus using topology learning in electronic medical records. bioRxiv, 361956
Wi CI, Sohn S, Ali M, Krusemark E, Ryu E, Liu H, Juhn YJ (2018) Natural language processing for asthma ascertainment in different practice settings. J Allergy Clin Immunol Pract 6(1):126–131
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Baliyan, N., Johar, A., Bhardwaj, P. (2021). Identification of Congestive Heart Failure Patients Through Natural Language Processing. In: Verma, G.K., Soni, B., Bourennane, S., Ramos, A.C.B. (eds) Data Science. Transactions on Computer Systems and Networks. Springer, Singapore. https://doi.org/10.1007/978-981-16-1681-5_26
Download citation
DOI: https://doi.org/10.1007/978-981-16-1681-5_26
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-1680-8
Online ISBN: 978-981-16-1681-5
eBook Packages: Computer ScienceComputer Science (R0)