Skip to main content

A Deep Learning Based Approach to Automate Clinical Coding of Electronic Health Records

  • Conference paper
  • First Online:
Big Data Analytics (BDA 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13773))

Included in the following conference series:

  • 457 Accesses

Abstract

The medical records in different electronic formats, such as handwritten notes, diagnosis summaries, lab reports, electronic pdfs, etc., contain valuable information that can be used for various medical purposes. These health records are currently coded manually or semi-automated to assign clinical codes (ICD-codes) for clinical research and analytics. This process is very time-consuming, expensive, and error-prone. This paper presents a method for automated clinical coding of electronic health records (EHRs) given the patient diagnosis summary and other medical-related documents. The presented method uses natural language processing (NLP) techniques, which capture knowledge from the free-text diagnosis descriptions, do the text matching and semantic mapping, and translate diagnosis descriptions into clinical codes. We develop one baseline Word2vec and cosine similarity hybrid model, a transformer encoder model, and a BERT (Bidirectional Encoder Representations from Transformers) model for the automated clinical coding. The presented models are evaluated using a publicly available Medical Information Mart for Intensive Care III (MIMIC-III) dataset. The used dataset consists of various patient diagnosis descriptions and corresponding ICD-9 codes. The experimental results show that the presented BlueBERT based automated clinical coding model produced an AUC (area under ROC curve) value of 98.9% for the top-10 ICD codes prediction. On the full MIMIC-III dataset, the transformer model produced an accuracy of 76.8%, a precision of 61.02%, a recall of 47.22%, a f1-score of 53.2%, and an AUC value of 92.1%. The hybrid baseline model and another used transformer encoder model also showed promising results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://physionet.org/content/mimiciii-demo/1.4/.

References

  1. Wang, M., Wang, M., Fei, Y., Yang, Y., Walker, J., Mostafa, J.: A systematic review of automatic text summarization for biomedical literature and EHRs. J. Am. Med. Inform. Assoc. 28(10), 2287–2297 (2021)

    Article  Google Scholar 

  2. Subotin, M., Davis, A.: A system for predicting ICD-10-pcs codes from electronic health records. In: 2014 Proceedings of BioNLP, pp. 59–67 (2014)

    Google Scholar 

  3. J., Teng, F., Ma, Z., Chen, L., Huang, L., Li, X.: A multi-channel convolutional neural network for ICD coding. In: 2019 IEEE 14th International Conference on Intelligent Systems and Knowledge Engineering (ISKE), pp. 1178–1184. IEEE (2019)

    Google Scholar 

  4. Zhang, Z., Liu, J., Razavian, N.: BERT-XML: large scale automated ICD coding using BERT pretraining. arXiv preprint arXiv:2006.03685 (2020)

  5. Xie, X., Xiong, Y., Yu, P.S., Zhu, Y.: EHR coding with multi-scale feature attention and structured knowledge graph propagation. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pp. 649–658 (2019)

    Google Scholar 

  6. Rubbo, B., et al.: Use of electronic health records to ascertain, validate and phenotype acute myocardial infarction: a systematic review and recommendations. Int. J. Cardiol. 187, 705–711 (2015)

    Article  Google Scholar 

  7. Atutxa, A., Pérez, A., Casillas, A.: Machine learning approaches on diagnostic term encoding with the ICD for clinical documentation. IEEE J. Biomed. Health Inform. 22(4), 1323–1329 (2017)

    Article  Google Scholar 

  8. Xu, K., et al.: Multimodal machine learning for automated ICD coding. In: Machine Learning for Healthcare Conference, pp. 197–215. PMLR (2019)

    Google Scholar 

  9. Jamian, L., Wheless, L., Crofford, L.J., Barnado, A.: Rule-based and machine learning algorithms identify patients with systemic sclerosis accurately in the electronic health record. Arthritis Res. Therp. 21(1), 1–9 (2019)

    Google Scholar 

  10. Catling, F., Spithourakis, G.P., Riedel, S.: Towards automated clinical coding. Int. J. Med. Inform. 120, 50–61 (2018)

    Article  Google Scholar 

  11. Sonabend, A., et al.: Automated ICD coding via unsupervised knowledge integration (unite). Int. J. Med. Inform. 139, 104135 (2020)

    Google Scholar 

  12. Diao, X., et al.: Automated ICD coding for primary diagnosis via clinically interpretable machine learning. Int. J. Med. Inform. 153, 104543 (2021)

    Google Scholar 

  13. Purushotham, S., Meng, C., Che, Z., Liu, Y.: Benchmarking deep learning models on large healthcare datasets. J. Biomed. Inform. 83, 112–134 (2018)

    Article  Google Scholar 

  14. Shi, H., Xie, P., Hu, Z., Zhang, M., Xing, E.P.: Towards automated ICD coding using deep learning. arXiv preprint arXiv:1711.04075 (2017)

  15. Dong, H., Suárez-Paniagua, V., Whiteley, W., Honghan, W.: Explainable automated coding of clinical notes using hierarchical label-wise attention networks and label embedding initialisation. J. Biomed. Inform. 116, 103728 (2021)

    Google Scholar 

  16. Perotte, A., Pivovarov, R., Natarajan, K., Weiskopf, N., Wood, F., Elhadad, N.: Diagnosis code assignment: models and evaluation metrics. J. Am. Med. Inform. Assoc. 21(2), 231–237 (2014)

    Article  Google Scholar 

  17. Zhang, Y., Zhang, Y., Qi, P., Manning, C.D., Langlotz, C.P.: Biomedical and clinical English model packages for the stanza python NLP library. J. Am. Med. Inform. Assoc. 28(9), 1892–1899 (2021)

    Article  Google Scholar 

  18. Ayyar, S., Don, O., Iv, W.: Tagging patient notes with ICD-9 codes. In: Proceedings of the 29th Conference on Neural Information Processing Systems, pp. 1–8 (2016)

    Google Scholar 

  19. Mullenbach, J., Wiegreffe, S., Duke, J., Sun, J., Eisenstein, J.: Explainable prediction of medical codes from clinical text. arXiv preprint arXiv:1802.05695 (2018)

  20. Moons, E., Khanna, A., Akkasi, A., Moens, M.-F.: A comparison of deep learning methods for ICD coding of clinical records. Appl. Sci. 10(15), 5262 (2020)

    Article  Google Scholar 

  21. Jiang, Z., et al.: A light gradient boosting machine-enabled early prediction of cardiotoxicity for breast cancer patients. Int. J. Radiat. Oncol. Biol. Phys. 111(3), e223 (2021)

    Article  Google Scholar 

  22. Moqurrab, S.A., Ayub, U., Anjum, A., Asghar, S., Srivastava, G.: An accurate deep learning model for clinical entity recognition from clinical notes. IEEE J. Biomed. Health Inform. 25(10), 3804–3811 (2021)

    Article  Google Scholar 

  23. Wei, M.Y., Luster, J.E., Chan, C.-L., Min, L.: Comprehensive review of ICD-9 code accuracies to measure multimorbidity in administrative data. BMC Health Serv. Res. 20(1), 1–11 (2020)

    Article  Google Scholar 

  24. Zhang, Y., Lu, Z., Wang, S.: Unsupervised feature selection via transformed auto-encoder. Knowl.-Based Syst. 215, 106748 (2021)

    Google Scholar 

  25. Tenney, I., Das, D., Pavlick, E.: BERT rediscovers the classical NLP pipeline. arXiv preprint arXiv:1905.05950 (2019)

  26. Acheampong, F.A., Nunoo-Mensah, H., Chen, W.: Transformer models for text-based emotion detection: a review of BERT-based approaches. Artif. Intell. Rev. 54(8), 5789–5829 (2021)

    Article  Google Scholar 

  27. Peng, Y., Chen, Q., Lu, Z.: An empirical study of multi-task learning on BERT for biomedical text mining. arXiv preprint arXiv:2005.02799 (2020)

  28. Johnson, A.E.W., et al.: MIMIC-III, a freely accessible critical care database. Sci. Data 3(1), 1–9 (2016)

    Article  Google Scholar 

Download references

Acknowledgement

This work is partially supported by a Research Grant under DST-Start up Research Grant (India), File number: SRG/2021/000173.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Santosh Singh Rathore .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kumar, A., Rathore, S.S. (2022). A Deep Learning Based Approach to Automate Clinical Coding of Electronic Health Records. In: Roy, P.P., Agarwal, A., Li, T., Krishna Reddy, P., Uday Kiran, R. (eds) Big Data Analytics. BDA 2022. Lecture Notes in Computer Science, vol 13773. Springer, Cham. https://doi.org/10.1007/978-3-031-24094-2_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-24094-2_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-24093-5

  • Online ISBN: 978-3-031-24094-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics