Abstract
The widespread adoption of medical document management has generated a large volume of unstructured data containing abbreviations, ambiguous terms, and typing errors. These factors make manual categorization an expensive, time-consuming, and error-prone task. Thus, the automatic classification of medical data into informative clinical categories can substantially reduce the cost of this task. In this context, this work aims to evaluate the use of an ensemble of classifiers of clinical texts to differentiate them into prescriptions, clinical notes, and exam requests. For this, we used the combination of N_gram+TF-IDF and BERTimbau to vectorize the text. Then, we used the classifiers Random Forest, Multilayer Perceptron, and Support Vector Machine to create the ensemble. After that, we predict the final ensemble label through a voting approach. The results are promising, reaching an accuracy of 0.99, kappa of 0.99, and F1-score of 0.99. Our approach allows automatic and accurate classification of clinical texts, achieving better categorization results than individual approaches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
The source code and the dataset of our study are publicly available at https://github.com/pavic-ufpi/ISDA_Clinical_Text.
References
Burges, C.: A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Disc. 2(2), 121–167 (1998)
Cusick, M., et al.: Using weak supervision and deep learning to classify clinical notes for identification of current suicidal ideation. J. Psychiatr. Res. 136, 95–102 (2021)
Dong, X., Yu, Z., Cao, W., Shi, Y., Ma, Q.: A survey on ensemble learning. Front. Comp. Sci. 14(2), 241–258 (2020)
Gupta, S., Belouali, A., Shah, N., Atkins, M., Madhavan, S.: Automated identification of patients with immune-related adverse events from clinical notes using word embedding and machine learning. JCO Clin. Cancer Inform. 5, 541–549 (2021)
Hornik, K., Stinchcombe, M., White, H.: Multilayer feedforward networks are universal approximators. Neural Netw. 2, 359–366 (1 1989)
Kambar, M.E.Z.N., Nahed, P., Cacho, J.R.F., Lee, G., Cummings, J., Taghva, K.: Clinical text classification of alzheimer’s drugs’ mechanism of action. In: Yang, X.-S., Sherratt, S., Dey, N., Joshi, A. (eds.) Proceedings of Sixth International Congress on Information and Communication Technology. LNNS, vol. 235, pp. 513–521. Springer, Singapore (2022). https://doi.org/10.1007/978-981-16-2377-6_48
Kausar, N., Abdullah, A., Samir, B.B., Palaniappan, S., AlGhamdi, B.S., Dey, N.: Ensemble clustering algorithm with supervised classification of clinical data for early diagnosis of coronary artery disease. J. Med. Imaging Health Inform. 6(1), 78–87 (2016)
Kim, D., Seo, D., Cho, S., Kang, P.: Multi-co-training for document classification using various document representations: Tf-idf, lda, and doc2vec. Inf. Sci. 477, 15–29 (2019)
Kumar, V., Recupero, D.R., Riboni, D., Helaoui, R.: Ensembling classical machine learning and deep learning approaches for morbidity identification from clinical notes. IEEE Access 9, 7107–7126 (2020)
Li, M.D., Deng, F., Chang, K., Kalpathy-Cramer, J., Huang, A.J.: Automated radiology-arthroscopy correlation of knee meniscal tears using natural language processing algorithms. Acad. Radiol. 29(4), 479–487 (2022)
Liu, J., Bai, R., Lu, Z., Ge, P., Aickelin, U., Liu, D.: Data-driven regular expressions evolution for medical text classification using genetic programming. In: IEEE CEC. pp. 1–8. IEEE (2020)
López-Úbeda, P., Díaz-Galiano, M.C., Martín-Noguerol, T., Luna, A., Ureña-López, L.A., Martín-Valdivia, M.T.: Automatic medical protocol classification using machine learning approaches. Comput. Methods Programs Biomed. 200, 105939 (2021)
Mujtaba, G., et al.: Clinical text classification research trends: Systematic literature review and open issues. Expert Syst. Appl. 116, 494–520 (2019)
Ratner, A., Bach, S., Ehrenberg, H., Fries, J., Wu, S., Ré, C.: Snorkel: Rapid training data creation with weak supervision. In: International Conference on Very Large Data Bases. vol. 11, p. 269. NIH Public Access (2017)
Ratner, A., Bach, S., Ehrenberg, H., Fries, J., Wu, S., Ré, C.: Snorkel: Rapid training data creation with weak supervision. VLDB J. 29(2), 709–730 (2020)
Santos, H., Ulbrich, A., Woloszyn, V., Vieira, R.: An initial investigation of the charlson comorbidity index regression based on clinical notes. In: International Symposium on Computer-Based Medical Systems, pp. 6–11. IEEE (2018)
da Silva, D.A., Ten Caten, C.S., Dos Santos, R.P., Fogliatto, F.S., Hsuan, J.: Predicting the occurrence of surgical site infections using text mining and machine learning. PLoS ONE 14(12), e0226272 (2019)
Souza, F., Nogueira, R., Lotufo, R.: BERTimbau: pretrained BERT models for brazilian portuguese. In: Cerri, R., Prati, R.C. (eds.) BRACIS 2020. LNCS (LNAI), vol. 12319, pp. 403–417. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-61377-8_28
Swain, P., Hauska, H.: The decision tree classifier: design and potential. IEEE Trans. Geosci. Electron. 15(3), 142–147 (1977)
Tayefi, M., et al.: Challenges and opportunities beyond structured data in analysis of electronic health records. Wiley Interdisciplinary Reviews: Computational Statistics, p. e1549 (2021)
Wang, Y., et al.: A clinical text classification paradigm using weak supervision and deep representation. BMC Med. Inform. Decis. Mak. 19(1), 1–13 (2019)
Wold, S., Esbensen, K., Geladi, P.: Principal component analysis. Chemom. Intell. Lab. Syst. 2(1–3), 37–52 (1987)
Yun-tao, Z., Ling, G., Yong-cheng, W.: An improved tf-idf approach for text classification. Journal of Zhejiang University-SCIENCE A 2005 6:1 6, 49–55 (8 2005)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Sousa, O.L.V., da Silva, D.P., Campelo, V.E.S., Silva, R.R.V.e., Magalhães, D.M.V. (2023). Ensemble of Classifiers for Multilabel Clinical Text Categorization in Portuguese. In: Abraham, A., Pllana, S., Casalino, G., Ma, K., Bajaj, A. (eds) Intelligent Systems Design and Applications. ISDA 2022. Lecture Notes in Networks and Systems, vol 715. Springer, Cham. https://doi.org/10.1007/978-3-031-35507-3_5
Download citation
DOI: https://doi.org/10.1007/978-3-031-35507-3_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-35506-6
Online ISBN: 978-3-031-35507-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)