Automated Coding of Medical Diagnostics from Free-Text: The Role of Parameters Optimization and Imbalanced Classes

Virginio, Luiz; dos Reis, Julio Cesar

doi:10.1007/978-3-030-06016-9_12

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 11371))

Included in the following conference series:

International Conference on Data Integration in the Life Sciences

714 Accesses
1 Citations

Abstract

The extraction of codes from Electronic Health Records (EHR) data is an important task because extracted codes can be used for different purposes such as billing and reimbursement, quality control, epidemiological studies, and cohort identification for clinical trials. The codes are based on standardized vocabularies. Diagnostics, for example, are frequently coded using the International Classification of Diseases (ICD), which is a taxonomy of diagnosis codes organized in a hierarchical structure. Extracting codes from free-text medical notes in EHR such as the discharge summary requires the review of patient data searching for information that can be coded in a standardized manner. The manual human coding assignment is a complex and time-consuming process. The use of machine learning and natural language processing approaches have been receiving an increasing attention to automate the process of ICD coding. In this article, we investigate the use of Support Vector Machines (SVM) and the binary relevance method for multi-label classification in the task of automatic ICD coding from free-text discharge summaries. In particular, we explored the role of SVM parameters optimization and class weighting for addressing imbalanced class. Experiments conducted with the Medical Information Mart for Intensive Care III (MIMIC III) database reached 49.86% of f1-macro for the 100 most frequent diagnostics. Our findings indicated that optimization of SVM parameters and the use of class weighting can improve the effectiveness of the classifier.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://www.who.int/classifications/icd/en/.
2.
https://www.nlm.nih.gov/research/umls/about_umls.html.
3.
https://metamap.nlm.nih.gov/.
4.
https://mimic.physionet.org/.
5.
http://scikit-learn.org/stable/.
6.
https://www.nltk.org/.
7.
The opinions expressed in this work do not necessarily reflect those of the funding agencies.

References

Chaudhry, B.: Systematic review: impact of health information technology on quality, efficiency, and costs of medical care. Ann. Intern. Med. 144(10), 742 (2006)
Article Google Scholar
Navas, H., Osornio, A.L., Baum, A., Gomez, A., Luna, D., de Quiros, F.G.B.: Creation and evaluation of a terminology server for the interactive coding of discharge summaries. Stud. Health Technol. Inform. 129, 650–654 (2007)
Google Scholar
Rios, A., Kavuluru, R.: Supervised extraction of diagnosis codes from EMRs: role of feature selection, data selection, and probabilistic thresholding. In: 2013 IEEE International Conference on Healthcare Informatics, pp. 66–73 (2013)
Google Scholar
Scheurwegs, E., Luyckx, K., Luyten, L., Daelemans, W., Van den Bulcke, T.: Data integration of structured and unstructured sources for assigning clinical codes to patient stays. J. Am. Med. Inform. Assoc. 23(e1), 11–19 (2016)
Article Google Scholar
Perotte, A., Pivovarov, R., Natarajan, K., Weiskopf, N., Wood, F., Elhadad, N.: Diagnosis code assignment: models and evaluation metrics. J. Am. Med. Inform. Assoc. 21(2), 231–237 (2014)
Article Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Kavuluru, R., Rios, A., Lu, Y.: An empirical evaluation of supervised learning approaches in assigning diagnosis codes to electronic medical records. Artif. Intell. Med. 65(2), 155–166 (2015)
Article Google Scholar
Dougherty, M., Seabold, S., White, S.: Study Reveals hard facts on CAC. J. AHIMA 84(7), 54–56 (2013)
Google Scholar
Helwe, C., Elbassuoni, S., Geha, M., Hitti, E., Makhlouf Obermeyer, C.: CCS coding of discharge diagnoses via deep neural networks. In: Proceedings of the 2017 International Conference on Digital Health, DH 2017, pp. 175–179 (2017)
Google Scholar
Wang, S., Chang, X., Li, X., Long, G., Yao, L., Sheng, Q.: Diagnosis code assignment using sparsity-based disease correlation embedding. IEEE Trans. Knowl. Data Eng. 28(12), 3191–3202 (2016)
Article Google Scholar
Rizzo, S.G., Montesi, D., Fabbri, A., Marchesini, G.: ICD code retrieval: novel approach for assisted disease classification. In: Ashish, N., Ambite, J.-L. (eds.) DILS 2015. LNCS, vol. 9162, pp. 147–161. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-21843-4_12
Chapter Google Scholar
Farkas, R., Szarvas, G.: Automatic construction of rule-based ICD-9-CM coding systems. BMC Bioinf. 9(Suppl. 3), S10 (2008)
Article Google Scholar
Stanfill, M.H., Williams, M., Fenton, S.H., Jenders, R.A., Hersh, W.R.: A systematic literature review of automated clinical coding and classification systems. J. Am. Med. Inform. Assoc. 17(6), 646–651 (2010)
Article Google Scholar
Zhang, Y.: A hierarchical approach to encoding medical concepts for clinical notes. In: Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies Student Research Workshop, HLT 2008, p. 67 (2008)
Google Scholar
Subotin, M., Davis, A.R.: A method for modeling co-occurrence propensity of clinical codes with application to ICD-10-PCS auto-coding. J. Am. Med. Inform. Assoc. 23(5), 866–871 (2016)
Article Google Scholar
Berndorfer, S., Henriksson, A.: Automated diagnosis coding with combined text representations. Stud. Health Technol. Inform. 235, 201–205 (2017)
Google Scholar
Shi, H., Xie, P., Hu, Z., Zhang, M., Xing, E.P.: Towards automated ICD coding using deep learning, pp. 1–11 (2017)
Google Scholar
Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)
Article Google Scholar
Haykin, S.: Neural Networks and Learning Machines, vol. 3. Pearson, Upper Saddle River (2009)
Google Scholar

Download references

Acknowledgements

This work is supported by the São Paulo Research Foundation (FAPESP) (Grant #2017/02325-5)^{Footnote 7}.

Author information

Authors and Affiliations

University of Campinas, Campinas, São Paulo, Brazil
Luiz Virginio & Julio Cesar dos Reis

Authors

Luiz Virginio
View author publications
You can also search for this author in PubMed Google Scholar
Julio Cesar dos Reis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Luiz Virginio .

Editor information

Editors and Affiliations

TIB and Leibniz University, Hannover, Germany
Sören Auer
TIB and Leibniz University, Hannover, Germany
Maria-Esther Vidal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Virginio, L., dos Reis, J.C. (2019). Automated Coding of Medical Diagnostics from Free-Text: The Role of Parameters Optimization and Imbalanced Classes. In: Auer, S., Vidal, ME. (eds) Data Integration in the Life Sciences. DILS 2018. Lecture Notes in Computer Science(), vol 11371. Springer, Cham. https://doi.org/10.1007/978-3-030-06016-9_12

Download citation

DOI: https://doi.org/10.1007/978-3-030-06016-9_12
Published: 30 December 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-06015-2
Online ISBN: 978-3-030-06016-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics