Ensemble of Classifiers for Multilabel Clinical Text Categorization in Portuguese

Sousa, Orrana Lhaynher Veloso; da Silva, David Pereira; Campelo, Victor Eulalio Sousa; Silva, Romuere Rodrigues Veloso e; Magalhães, Deborah Maria Vieira

doi:10.1007/978-3-031-35507-3_5

Orrana Lhaynher Veloso Sousa¹⁴,
David Pereira da Silva¹⁵,
Victor Eulalio Sousa Campelo¹⁶,
Romuere Rodrigues Veloso e Silva^14,15 &
…
Deborah Maria Vieira Magalhães^14,15

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 715))

Included in the following conference series:

International Conference on Intelligent Systems Design and Applications

239 Accesses

Abstract

The widespread adoption of medical document management has generated a large volume of unstructured data containing abbreviations, ambiguous terms, and typing errors. These factors make manual categorization an expensive, time-consuming, and error-prone task. Thus, the automatic classification of medical data into informative clinical categories can substantially reduce the cost of this task. In this context, this work aims to evaluate the use of an ensemble of classifiers of clinical texts to differentiate them into prescriptions, clinical notes, and exam requests. For this, we used the combination of N_gram+TF-IDF and BERTimbau to vectorize the text. Then, we used the classifiers Random Forest, Multilayer Perceptron, and Support Vector Machine to create the ensemble. After that, we predict the final ensemble label through a voting approach. The results are promising, reaching an accuracy of 0.99, kappa of 0.99, and F1-score of 0.99. Our approach allows automatic and accurate classification of clinical texts, achieving better categorization results than individual approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Ensemble Learning Model for Medical Text Classification

Applicability of Machine Learning Methods to Multi-label Medical Text Classification

Machine Learning Performance Analysis for Classification of Medical Specialties

Notes

1.
The source code and the dataset of our study are publicly available at https://github.com/pavic-ufpi/ISDA_Clinical_Text.

References

Burges, C.: A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Disc. 2(2), 121–167 (1998)
Article Google Scholar
Cusick, M., et al.: Using weak supervision and deep learning to classify clinical notes for identification of current suicidal ideation. J. Psychiatr. Res. 136, 95–102 (2021)
Article Google Scholar
Dong, X., Yu, Z., Cao, W., Shi, Y., Ma, Q.: A survey on ensemble learning. Front. Comp. Sci. 14(2), 241–258 (2020)
Article Google Scholar
Gupta, S., Belouali, A., Shah, N., Atkins, M., Madhavan, S.: Automated identification of patients with immune-related adverse events from clinical notes using word embedding and machine learning. JCO Clin. Cancer Inform. 5, 541–549 (2021)
Article Google Scholar
Hornik, K., Stinchcombe, M., White, H.: Multilayer feedforward networks are universal approximators. Neural Netw. 2, 359–366 (1 1989)
Google Scholar
Kambar, M.E.Z.N., Nahed, P., Cacho, J.R.F., Lee, G., Cummings, J., Taghva, K.: Clinical text classification of alzheimer’s drugs’ mechanism of action. In: Yang, X.-S., Sherratt, S., Dey, N., Joshi, A. (eds.) Proceedings of Sixth International Congress on Information and Communication Technology. LNNS, vol. 235, pp. 513–521. Springer, Singapore (2022). https://doi.org/10.1007/978-981-16-2377-6_48
Chapter Google Scholar
Kausar, N., Abdullah, A., Samir, B.B., Palaniappan, S., AlGhamdi, B.S., Dey, N.: Ensemble clustering algorithm with supervised classification of clinical data for early diagnosis of coronary artery disease. J. Med. Imaging Health Inform. 6(1), 78–87 (2016)
Article Google Scholar
Kim, D., Seo, D., Cho, S., Kang, P.: Multi-co-training for document classification using various document representations: Tf-idf, lda, and doc2vec. Inf. Sci. 477, 15–29 (2019)
Article Google Scholar
Kumar, V., Recupero, D.R., Riboni, D., Helaoui, R.: Ensembling classical machine learning and deep learning approaches for morbidity identification from clinical notes. IEEE Access 9, 7107–7126 (2020)
Article Google Scholar
Li, M.D., Deng, F., Chang, K., Kalpathy-Cramer, J., Huang, A.J.: Automated radiology-arthroscopy correlation of knee meniscal tears using natural language processing algorithms. Acad. Radiol. 29(4), 479–487 (2022)
Article Google Scholar
Liu, J., Bai, R., Lu, Z., Ge, P., Aickelin, U., Liu, D.: Data-driven regular expressions evolution for medical text classification using genetic programming. In: IEEE CEC. pp. 1–8. IEEE (2020)
Google Scholar
López-Úbeda, P., Díaz-Galiano, M.C., Martín-Noguerol, T., Luna, A., Ureña-López, L.A., Martín-Valdivia, M.T.: Automatic medical protocol classification using machine learning approaches. Comput. Methods Programs Biomed. 200, 105939 (2021)
Article Google Scholar
Mujtaba, G., et al.: Clinical text classification research trends: Systematic literature review and open issues. Expert Syst. Appl. 116, 494–520 (2019)
Article Google Scholar
Ratner, A., Bach, S., Ehrenberg, H., Fries, J., Wu, S., Ré, C.: Snorkel: Rapid training data creation with weak supervision. In: International Conference on Very Large Data Bases. vol. 11, p. 269. NIH Public Access (2017)
Google Scholar
Ratner, A., Bach, S., Ehrenberg, H., Fries, J., Wu, S., Ré, C.: Snorkel: Rapid training data creation with weak supervision. VLDB J. 29(2), 709–730 (2020)
Article Google Scholar
Santos, H., Ulbrich, A., Woloszyn, V., Vieira, R.: An initial investigation of the charlson comorbidity index regression based on clinical notes. In: International Symposium on Computer-Based Medical Systems, pp. 6–11. IEEE (2018)
Google Scholar
da Silva, D.A., Ten Caten, C.S., Dos Santos, R.P., Fogliatto, F.S., Hsuan, J.: Predicting the occurrence of surgical site infections using text mining and machine learning. PLoS ONE 14(12), e0226272 (2019)
Article Google Scholar
Souza, F., Nogueira, R., Lotufo, R.: BERTimbau: pretrained BERT models for brazilian portuguese. In: Cerri, R., Prati, R.C. (eds.) BRACIS 2020. LNCS (LNAI), vol. 12319, pp. 403–417. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-61377-8_28
Chapter Google Scholar
Swain, P., Hauska, H.: The decision tree classifier: design and potential. IEEE Trans. Geosci. Electron. 15(3), 142–147 (1977)
Article Google Scholar
Tayefi, M., et al.: Challenges and opportunities beyond structured data in analysis of electronic health records. Wiley Interdisciplinary Reviews: Computational Statistics, p. e1549 (2021)
Google Scholar
Wang, Y., et al.: A clinical text classification paradigm using weak supervision and deep representation. BMC Med. Inform. Decis. Mak. 19(1), 1–13 (2019)
Article MathSciNet Google Scholar
Wold, S., Esbensen, K., Geladi, P.: Principal component analysis. Chemom. Intell. Lab. Syst. 2(1–3), 37–52 (1987)
Article Google Scholar
Yun-tao, Z., Ling, G., Yong-cheng, W.: An improved tf-idf approach for text classification. Journal of Zhejiang University-SCIENCE A 2005 6:1 6, 49–55 (8 2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Electrical Engineering Department, Federal University of Piaui, Picos, Brazil
Orrana Lhaynher Veloso Sousa, Romuere Rodrigues Veloso e Silva & Deborah Maria Vieira Magalhães
Information Systems Department, Federal University of Piaui, Picos, Brazil
David Pereira da Silva, Romuere Rodrigues Veloso e Silva & Deborah Maria Vieira Magalhães
Specialized Medicine Department, Federal University of Piaui, Teresina, Brazil
Victor Eulalio Sousa Campelo

Authors

Orrana Lhaynher Veloso Sousa
View author publications
You can also search for this author in PubMed Google Scholar
David Pereira da Silva
View author publications
You can also search for this author in PubMed Google Scholar
Victor Eulalio Sousa Campelo
View author publications
You can also search for this author in PubMed Google Scholar
Romuere Rodrigues Veloso e Silva
View author publications
You can also search for this author in PubMed Google Scholar
Deborah Maria Vieira Magalhães
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Romuere Rodrigues Veloso e Silva .

Editor information

Editors and Affiliations

Faculty of Computing and Data Science, FLAME University, Pune, Maharashtra, India
Ajith Abraham
Center for Smart Computing Continuum, Burgenland, Austria
Sabri Pllana
University of Bari, Bari, Italy
Gabriella Casalino
University of Jinan, Jinan, Shandong, China
Kun Ma
Department of Computer Science and Engineering, Thapar Institute of Engineering and Technology, Patiala, Punjab, India
Anu Bajaj

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sousa, O.L.V., da Silva, D.P., Campelo, V.E.S., Silva, R.R.V.e., Magalhães, D.M.V. (2023). Ensemble of Classifiers for Multilabel Clinical Text Categorization in Portuguese. In: Abraham, A., Pllana, S., Casalino, G., Ma, K., Bajaj, A. (eds) Intelligent Systems Design and Applications. ISDA 2022. Lecture Notes in Networks and Systems, vol 715. Springer, Cham. https://doi.org/10.1007/978-3-031-35507-3_5

Download citation

DOI: https://doi.org/10.1007/978-3-031-35507-3_5
Published: 03 June 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-35506-6
Online ISBN: 978-3-031-35507-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Ensemble of Classifiers for Multilabel Clinical Text Categorization in Portuguese

Abstract

Access this chapter

Similar content being viewed by others

Ensemble Learning Model for Medical Text Classification

Applicability of Machine Learning Methods to Multi-label Medical Text Classification

Machine Learning Performance Analysis for Classification of Medical Specialties

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Ensemble of Classifiers for Multilabel Clinical Text Categorization in Portuguese

Abstract

Access this chapter

Similar content being viewed by others

Ensemble Learning Model for Medical Text Classification

Applicability of Machine Learning Methods to Multi-label Medical Text Classification

Machine Learning Performance Analysis for Classification of Medical Specialties

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation