Automated Coding of Medical Diagnostics from Free-Text: The Role of Parameters Optimization and Imbalanced Classes
The extraction of codes from Electronic Health Records (EHR) data is an important task because extracted codes can be used for different purposes such as billing and reimbursement, quality control, epidemiological studies, and cohort identification for clinical trials. The codes are based on standardized vocabularies. Diagnostics, for example, are frequently coded using the International Classification of Diseases (ICD), which is a taxonomy of diagnosis codes organized in a hierarchical structure. Extracting codes from free-text medical notes in EHR such as the discharge summary requires the review of patient data searching for information that can be coded in a standardized manner. The manual human coding assignment is a complex and time-consuming process. The use of machine learning and natural language processing approaches have been receiving an increasing attention to automate the process of ICD coding. In this article, we investigate the use of Support Vector Machines (SVM) and the binary relevance method for multi-label classification in the task of automatic ICD coding from free-text discharge summaries. In particular, we explored the role of SVM parameters optimization and class weighting for addressing imbalanced class. Experiments conducted with the Medical Information Mart for Intensive Care III (MIMIC III) database reached 49.86% of f1-macro for the 100 most frequent diagnostics. Our findings indicated that optimization of SVM parameters and the use of class weighting can improve the effectiveness of the classifier.
KeywordsAutomated ICD coding Multi-label classification Imbalanced classes
This work is supported by the São Paulo Research Foundation (FAPESP) (Grant #2017/02325-5)7.
- 2.Navas, H., Osornio, A.L., Baum, A., Gomez, A., Luna, D., de Quiros, F.G.B.: Creation and evaluation of a terminology server for the interactive coding of discharge summaries. Stud. Health Technol. Inform. 129, 650–654 (2007)Google Scholar
- 3.Rios, A., Kavuluru, R.: Supervised extraction of diagnosis codes from EMRs: role of feature selection, data selection, and probabilistic thresholding. In: 2013 IEEE International Conference on Healthcare Informatics, pp. 66–73 (2013)Google Scholar
- 8.Dougherty, M., Seabold, S., White, S.: Study Reveals hard facts on CAC. J. AHIMA 84(7), 54–56 (2013)Google Scholar
- 9.Helwe, C., Elbassuoni, S., Geha, M., Hitti, E., Makhlouf Obermeyer, C.: CCS coding of discharge diagnoses via deep neural networks. In: Proceedings of the 2017 International Conference on Digital Health, DH 2017, pp. 175–179 (2017)Google Scholar
- 14.Zhang, Y.: A hierarchical approach to encoding medical concepts for clinical notes. In: Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies Student Research Workshop, HLT 2008, p. 67 (2008)Google Scholar
- 16.Berndorfer, S., Henriksson, A.: Automated diagnosis coding with combined text representations. Stud. Health Technol. Inform. 235, 201–205 (2017)Google Scholar
- 17.Shi, H., Xie, P., Hu, Z., Zhang, M., Xing, E.P.: Towards automated ICD coding using deep learning, pp. 1–11 (2017)Google Scholar
- 19.Haykin, S.: Neural Networks and Learning Machines, vol. 3. Pearson, Upper Saddle River (2009)Google Scholar