Skip to main content

Ensemble of Classifiers for Multilabel Clinical Text Categorization in Portuguese

  • Conference paper
  • First Online:
Intelligent Systems Design and Applications (ISDA 2022)

Abstract

The widespread adoption of medical document management has generated a large volume of unstructured data containing abbreviations, ambiguous terms, and typing errors. These factors make manual categorization an expensive, time-consuming, and error-prone task. Thus, the automatic classification of medical data into informative clinical categories can substantially reduce the cost of this task. In this context, this work aims to evaluate the use of an ensemble of classifiers of clinical texts to differentiate them into prescriptions, clinical notes, and exam requests. For this, we used the combination of N_gram+TF-IDF and BERTimbau to vectorize the text. Then, we used the classifiers Random Forest, Multilayer Perceptron, and Support Vector Machine to create the ensemble. After that, we predict the final ensemble label through a voting approach. The results are promising, reaching an accuracy of 0.99, kappa of 0.99, and F1-score of 0.99. Our approach allows automatic and accurate classification of clinical texts, achieving better categorization results than individual approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    The source code and the dataset of our study are publicly available at https://github.com/pavic-ufpi/ISDA_Clinical_Text.

References

  1. Burges, C.: A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Disc. 2(2), 121–167 (1998)

    Article  Google Scholar 

  2. Cusick, M., et al.: Using weak supervision and deep learning to classify clinical notes for identification of current suicidal ideation. J. Psychiatr. Res. 136, 95–102 (2021)

    Article  Google Scholar 

  3. Dong, X., Yu, Z., Cao, W., Shi, Y., Ma, Q.: A survey on ensemble learning. Front. Comp. Sci. 14(2), 241–258 (2020)

    Article  Google Scholar 

  4. Gupta, S., Belouali, A., Shah, N., Atkins, M., Madhavan, S.: Automated identification of patients with immune-related adverse events from clinical notes using word embedding and machine learning. JCO Clin. Cancer Inform. 5, 541–549 (2021)

    Article  Google Scholar 

  5. Hornik, K., Stinchcombe, M., White, H.: Multilayer feedforward networks are universal approximators. Neural Netw. 2, 359–366 (1 1989)

    Google Scholar 

  6. Kambar, M.E.Z.N., Nahed, P., Cacho, J.R.F., Lee, G., Cummings, J., Taghva, K.: Clinical text classification of alzheimer’s drugs’ mechanism of action. In: Yang, X.-S., Sherratt, S., Dey, N., Joshi, A. (eds.) Proceedings of Sixth International Congress on Information and Communication Technology. LNNS, vol. 235, pp. 513–521. Springer, Singapore (2022). https://doi.org/10.1007/978-981-16-2377-6_48

    Chapter  Google Scholar 

  7. Kausar, N., Abdullah, A., Samir, B.B., Palaniappan, S., AlGhamdi, B.S., Dey, N.: Ensemble clustering algorithm with supervised classification of clinical data for early diagnosis of coronary artery disease. J. Med. Imaging Health Inform. 6(1), 78–87 (2016)

    Article  Google Scholar 

  8. Kim, D., Seo, D., Cho, S., Kang, P.: Multi-co-training for document classification using various document representations: Tf-idf, lda, and doc2vec. Inf. Sci. 477, 15–29 (2019)

    Article  Google Scholar 

  9. Kumar, V., Recupero, D.R., Riboni, D., Helaoui, R.: Ensembling classical machine learning and deep learning approaches for morbidity identification from clinical notes. IEEE Access 9, 7107–7126 (2020)

    Article  Google Scholar 

  10. Li, M.D., Deng, F., Chang, K., Kalpathy-Cramer, J., Huang, A.J.: Automated radiology-arthroscopy correlation of knee meniscal tears using natural language processing algorithms. Acad. Radiol. 29(4), 479–487 (2022)

    Article  Google Scholar 

  11. Liu, J., Bai, R., Lu, Z., Ge, P., Aickelin, U., Liu, D.: Data-driven regular expressions evolution for medical text classification using genetic programming. In: IEEE CEC. pp. 1–8. IEEE (2020)

    Google Scholar 

  12. López-Úbeda, P., Díaz-Galiano, M.C., Martín-Noguerol, T., Luna, A., Ureña-López, L.A., Martín-Valdivia, M.T.: Automatic medical protocol classification using machine learning approaches. Comput. Methods Programs Biomed. 200, 105939 (2021)

    Article  Google Scholar 

  13. Mujtaba, G., et al.: Clinical text classification research trends: Systematic literature review and open issues. Expert Syst. Appl. 116, 494–520 (2019)

    Article  Google Scholar 

  14. Ratner, A., Bach, S., Ehrenberg, H., Fries, J., Wu, S., Ré, C.: Snorkel: Rapid training data creation with weak supervision. In: International Conference on Very Large Data Bases. vol. 11, p. 269. NIH Public Access (2017)

    Google Scholar 

  15. Ratner, A., Bach, S., Ehrenberg, H., Fries, J., Wu, S., Ré, C.: Snorkel: Rapid training data creation with weak supervision. VLDB J. 29(2), 709–730 (2020)

    Article  Google Scholar 

  16. Santos, H., Ulbrich, A., Woloszyn, V., Vieira, R.: An initial investigation of the charlson comorbidity index regression based on clinical notes. In: International Symposium on Computer-Based Medical Systems, pp. 6–11. IEEE (2018)

    Google Scholar 

  17. da Silva, D.A., Ten Caten, C.S., Dos Santos, R.P., Fogliatto, F.S., Hsuan, J.: Predicting the occurrence of surgical site infections using text mining and machine learning. PLoS ONE 14(12), e0226272 (2019)

    Article  Google Scholar 

  18. Souza, F., Nogueira, R., Lotufo, R.: BERTimbau: pretrained BERT models for brazilian portuguese. In: Cerri, R., Prati, R.C. (eds.) BRACIS 2020. LNCS (LNAI), vol. 12319, pp. 403–417. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-61377-8_28

    Chapter  Google Scholar 

  19. Swain, P., Hauska, H.: The decision tree classifier: design and potential. IEEE Trans. Geosci. Electron. 15(3), 142–147 (1977)

    Article  Google Scholar 

  20. Tayefi, M., et al.: Challenges and opportunities beyond structured data in analysis of electronic health records. Wiley Interdisciplinary Reviews: Computational Statistics, p. e1549 (2021)

    Google Scholar 

  21. Wang, Y., et al.: A clinical text classification paradigm using weak supervision and deep representation. BMC Med. Inform. Decis. Mak. 19(1), 1–13 (2019)

    Article  MathSciNet  Google Scholar 

  22. Wold, S., Esbensen, K., Geladi, P.: Principal component analysis. Chemom. Intell. Lab. Syst. 2(1–3), 37–52 (1987)

    Article  Google Scholar 

  23. Yun-tao, Z., Ling, G., Yong-cheng, W.: An improved tf-idf approach for text classification. Journal of Zhejiang University-SCIENCE A 2005 6:1 6, 49–55 (8 2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Romuere Rodrigues Veloso e Silva .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sousa, O.L.V., da Silva, D.P., Campelo, V.E.S., Silva, R.R.V.e., Magalhães, D.M.V. (2023). Ensemble of Classifiers for Multilabel Clinical Text Categorization in Portuguese. In: Abraham, A., Pllana, S., Casalino, G., Ma, K., Bajaj, A. (eds) Intelligent Systems Design and Applications. ISDA 2022. Lecture Notes in Networks and Systems, vol 715. Springer, Cham. https://doi.org/10.1007/978-3-031-35507-3_5

Download citation

Publish with us

Policies and ethics