A Deep Learning Based Approach to Automate Clinical Coding of Electronic Health Records

Kumar, Ashutosh; Rathore, Santosh Singh

doi:10.1007/978-3-031-24094-2_7

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13773))

Included in the following conference series:

International Conference on Big Data Analytics

457 Accesses

Abstract

The medical records in different electronic formats, such as handwritten notes, diagnosis summaries, lab reports, electronic pdfs, etc., contain valuable information that can be used for various medical purposes. These health records are currently coded manually or semi-automated to assign clinical codes (ICD-codes) for clinical research and analytics. This process is very time-consuming, expensive, and error-prone. This paper presents a method for automated clinical coding of electronic health records (EHRs) given the patient diagnosis summary and other medical-related documents. The presented method uses natural language processing (NLP) techniques, which capture knowledge from the free-text diagnosis descriptions, do the text matching and semantic mapping, and translate diagnosis descriptions into clinical codes. We develop one baseline Word2vec and cosine similarity hybrid model, a transformer encoder model, and a BERT (Bidirectional Encoder Representations from Transformers) model for the automated clinical coding. The presented models are evaluated using a publicly available Medical Information Mart for Intensive Care III (MIMIC-III) dataset. The used dataset consists of various patient diagnosis descriptions and corresponding ICD-9 codes. The experimental results show that the presented BlueBERT based automated clinical coding model produced an AUC (area under ROC curve) value of 98.9% for the top-10 ICD codes prediction. On the full MIMIC-III dataset, the transformer model produced an accuracy of 76.8%, a precision of 61.02%, a recall of 47.22%, a f1-score of 53.2%, and an AUC value of 92.1%. The hybrid baseline model and another used transformer encoder model also showed promising results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://physionet.org/content/mimiciii-demo/1.4/.

References

Wang, M., Wang, M., Fei, Y., Yang, Y., Walker, J., Mostafa, J.: A systematic review of automatic text summarization for biomedical literature and EHRs. J. Am. Med. Inform. Assoc. 28(10), 2287–2297 (2021)
Article Google Scholar
Subotin, M., Davis, A.: A system for predicting ICD-10-pcs codes from electronic health records. In: 2014 Proceedings of BioNLP, pp. 59–67 (2014)
Google Scholar
J., Teng, F., Ma, Z., Chen, L., Huang, L., Li, X.: A multi-channel convolutional neural network for ICD coding. In: 2019 IEEE 14th International Conference on Intelligent Systems and Knowledge Engineering (ISKE), pp. 1178–1184. IEEE (2019)
Google Scholar
Zhang, Z., Liu, J., Razavian, N.: BERT-XML: large scale automated ICD coding using BERT pretraining. arXiv preprint arXiv:2006.03685 (2020)
Xie, X., Xiong, Y., Yu, P.S., Zhu, Y.: EHR coding with multi-scale feature attention and structured knowledge graph propagation. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pp. 649–658 (2019)
Google Scholar
Rubbo, B., et al.: Use of electronic health records to ascertain, validate and phenotype acute myocardial infarction: a systematic review and recommendations. Int. J. Cardiol. 187, 705–711 (2015)
Article Google Scholar
Atutxa, A., Pérez, A., Casillas, A.: Machine learning approaches on diagnostic term encoding with the ICD for clinical documentation. IEEE J. Biomed. Health Inform. 22(4), 1323–1329 (2017)
Article Google Scholar
Xu, K., et al.: Multimodal machine learning for automated ICD coding. In: Machine Learning for Healthcare Conference, pp. 197–215. PMLR (2019)
Google Scholar
Jamian, L., Wheless, L., Crofford, L.J., Barnado, A.: Rule-based and machine learning algorithms identify patients with systemic sclerosis accurately in the electronic health record. Arthritis Res. Therp. 21(1), 1–9 (2019)
Google Scholar
Catling, F., Spithourakis, G.P., Riedel, S.: Towards automated clinical coding. Int. J. Med. Inform. 120, 50–61 (2018)
Article Google Scholar
Sonabend, A., et al.: Automated ICD coding via unsupervised knowledge integration (unite). Int. J. Med. Inform. 139, 104135 (2020)
Google Scholar
Diao, X., et al.: Automated ICD coding for primary diagnosis via clinically interpretable machine learning. Int. J. Med. Inform. 153, 104543 (2021)
Google Scholar
Purushotham, S., Meng, C., Che, Z., Liu, Y.: Benchmarking deep learning models on large healthcare datasets. J. Biomed. Inform. 83, 112–134 (2018)
Article Google Scholar
Shi, H., Xie, P., Hu, Z., Zhang, M., Xing, E.P.: Towards automated ICD coding using deep learning. arXiv preprint arXiv:1711.04075 (2017)
Dong, H., Suárez-Paniagua, V., Whiteley, W., Honghan, W.: Explainable automated coding of clinical notes using hierarchical label-wise attention networks and label embedding initialisation. J. Biomed. Inform. 116, 103728 (2021)
Google Scholar
Perotte, A., Pivovarov, R., Natarajan, K., Weiskopf, N., Wood, F., Elhadad, N.: Diagnosis code assignment: models and evaluation metrics. J. Am. Med. Inform. Assoc. 21(2), 231–237 (2014)
Article Google Scholar
Zhang, Y., Zhang, Y., Qi, P., Manning, C.D., Langlotz, C.P.: Biomedical and clinical English model packages for the stanza python NLP library. J. Am. Med. Inform. Assoc. 28(9), 1892–1899 (2021)
Article Google Scholar
Ayyar, S., Don, O., Iv, W.: Tagging patient notes with ICD-9 codes. In: Proceedings of the 29th Conference on Neural Information Processing Systems, pp. 1–8 (2016)
Google Scholar
Mullenbach, J., Wiegreffe, S., Duke, J., Sun, J., Eisenstein, J.: Explainable prediction of medical codes from clinical text. arXiv preprint arXiv:1802.05695 (2018)
Moons, E., Khanna, A., Akkasi, A., Moens, M.-F.: A comparison of deep learning methods for ICD coding of clinical records. Appl. Sci. 10(15), 5262 (2020)
Article Google Scholar
Jiang, Z., et al.: A light gradient boosting machine-enabled early prediction of cardiotoxicity for breast cancer patients. Int. J. Radiat. Oncol. Biol. Phys. 111(3), e223 (2021)
Article Google Scholar
Moqurrab, S.A., Ayub, U., Anjum, A., Asghar, S., Srivastava, G.: An accurate deep learning model for clinical entity recognition from clinical notes. IEEE J. Biomed. Health Inform. 25(10), 3804–3811 (2021)
Article Google Scholar
Wei, M.Y., Luster, J.E., Chan, C.-L., Min, L.: Comprehensive review of ICD-9 code accuracies to measure multimorbidity in administrative data. BMC Health Serv. Res. 20(1), 1–11 (2020)
Article Google Scholar
Zhang, Y., Lu, Z., Wang, S.: Unsupervised feature selection via transformed auto-encoder. Knowl.-Based Syst. 215, 106748 (2021)
Google Scholar
Tenney, I., Das, D., Pavlick, E.: BERT rediscovers the classical NLP pipeline. arXiv preprint arXiv:1905.05950 (2019)
Acheampong, F.A., Nunoo-Mensah, H., Chen, W.: Transformer models for text-based emotion detection: a review of BERT-based approaches. Artif. Intell. Rev. 54(8), 5789–5829 (2021)
Article Google Scholar
Peng, Y., Chen, Q., Lu, Z.: An empirical study of multi-task learning on BERT for biomedical text mining. arXiv preprint arXiv:2005.02799 (2020)
Johnson, A.E.W., et al.: MIMIC-III, a freely accessible critical care database. Sci. Data 3(1), 1–9 (2016)
Article Google Scholar

Download references

Acknowledgement

This work is partially supported by a Research Grant under DST-Start up Research Grant (India), File number: SRG/2021/000173.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, ABV-Indian Institute of Information Technology and Management, Gwalior, India
Ashutosh Kumar & Santosh Singh Rathore

Authors

Ashutosh Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Santosh Singh Rathore
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Santosh Singh Rathore .

Editor information

Editors and Affiliations

Indian Institute of Technology-Roorkee, Roorkee, India
Partha Pratim Roy
IBM Research, Gurugram, India
Arvind Agarwal
Southwest Jiaotong University, Chengdu, China
Tianrui Li
International Institute of Information Technology - Hyderabad, Hyderabad, India
P. Krishna Reddy
The University of Aizu, Fukushima, Japan
R. Uday Kiran

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kumar, A., Rathore, S.S. (2022). A Deep Learning Based Approach to Automate Clinical Coding of Electronic Health Records. In: Roy, P.P., Agarwal, A., Li, T., Krishna Reddy, P., Uday Kiran, R. (eds) Big Data Analytics. BDA 2022. Lecture Notes in Computer Science, vol 13773. Springer, Cham. https://doi.org/10.1007/978-3-031-24094-2_7

Download citation

DOI: https://doi.org/10.1007/978-3-031-24094-2_7
Published: 29 January 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-24093-5
Online ISBN: 978-3-031-24094-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Deep Learning Based Approach to Automate Clinical Coding of Electronic Health Records