Skip to main content

Detection of Tumor Morphology Mentions in Clinical Reports in Spanish Using Transformers

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12861))

Abstract

The aim of this study is to systematically examine the performance of transformer-based models for the detection of tumor morphology mentions in clinical documents in Spanish. For this purpose, we analyzed 3 transformer models supporting the Spanish language, namely multilingual BERT, BETO and XLM-RoBERTa. By means of a transfer-learning-based approach, the models were first pretrained on a collection of real-world oncology clinical cases with the goal of adapting transformers to the distinctive features of the Spanish oncology domain. The resulting models were further fine-tuned on the Cantemist-NER task, addressing the detection of tumor morphology mentions as a multi-class sequence-labeling problem. To evaluate the effectiveness of the proposed approach, we compared the obtained results by the domain-specific version of the examined transformers with the performance achieved by the general-domain version of the models. The results obtained in this paper empirically demonstrated that, for every analyzed transformer, the clinical version outperformed the corresponding general-domain model on the detection of tumor morphology mentions in clinical case reports in Spanish. Additionally, the combination of the transfer-learning-based approach with an ensemble strategy exploiting the predictive capabilities of the distinct transformer architectures yielded the best obtained results, achieving a precision value of 0.893, a recall of 0.887 and an F1-score of 0.89, which remarkably surpassed the prior state-of-the-art performance for the Cantemist-NER task.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://sklearn-crfsuite.readthedocs.io/.

References

  1. Alsentzer, E., et al.: Publicly available clinical BERT embeddings. In: Proceedings of the 2nd Clinical Natural Language Processing Workshop, pp. 72–78. Association for Computational Linguistics, Minneapolis, June 2019

    Google Scholar 

  2. Baumann, L.A., Baker, J., Elshaug, A.G.: The impact of electronic health record systems on clinical documentation times: a systematic review. Health Policy 122(8), 827–836 (2018)

    Article  Google Scholar 

  3. Bronnert, J.: Preparing for the CAC transition. J. AHIMA 82(7), 60–1; quiz 62 (2011)

    Google Scholar 

  4. Cañete, J., Chaperon, G., Fuentes, R., Ho, J.H., Kang, H., Pérez, J.: Spanish pre-trained BERT model and evaluation data. In: Practical ML for Developing Countries Workshop@ ICLR 2020 (2020)

    Google Scholar 

  5. Conneau, A., et al.: Unsupervised cross-lingual representation learning at scale. arXiv [cs.CL], November 2019

    Google Scholar 

  6. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv [cs.CL], October 2018

    Google Scholar 

  7. García-Pablos, A., Perez, N., Cuadros, M.: Vicomtech at CANTEMIST 2020. In: Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2020), pp. 489–498. CEUR Workshop Proceedings (2020)

    Google Scholar 

  8. Hughes, M., Li, I., Kotoulas, S., Suzumura, T.: Medical text classification using convolutional neural networks. Stud. Health Technol. Inform. 235, 246–250 (2017)

    PubMed  Google Scholar 

  9. Johnson, M., et al.: Google’s multilingual neural machine translation system: enabling Zero-Shot translation. Trans. Assoc. Comput. Linguist. 5, 339–351 (2017)

    Article  Google Scholar 

  10. Kudo, T., Richardson, J.: SentencePiece: a simple and language independent subword tokenizer and detokenizer for neural text processing. arXiv [cs.CL], August 2018

    Google Scholar 

  11. Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning. ICML 2001, pp. 282–289. Morgan Kaufmann Publishers Inc., San Francisco, June 2001

    Google Scholar 

  12. Lample, G., Conneau, A.: Cross-lingual language model pretraining. arXiv [cs.CL] (2019)

    Google Scholar 

  13. Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach. arXiv [cs.CL] (2019)

    Google Scholar 

  14. López-García, G., Jerez, J.M., Ribelles, N., Alba, E., Veredas, F.J.: ICB-UMA at CANTEMIST 2020: automatic ICD-O coding in Spanish with BERT. In: Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2020), pp. 468–476. CEUR Workshop Proceedings (2020)

    Google Scholar 

  15. Miranda-Escalada, A., Farré-Maduell, E., Krallinger, M.: Named entity recognition, concept normalization and clinical coding: overview of the cantemist track for cancer text mining in Spanish, corpus, guidelines, methods and results. In: Iberian Languages Evaluation Forum (IberLEF 2020), pp. 303–323. CEUR Workshop Proceedings, Málaga, Spain (2020)

    Google Scholar 

  16. Mujtaba, G., et al.: Clinical text classification research trends: systematic literature review and open issues. Expert Syst. Appl. 116, 494–520 (2019)

    Article  Google Scholar 

  17. National Cancer Institute: How Cancer Is Diagnosed (2019). https://www.cancer.gov/about-cancer/diagnosis-staging/diagnosis. Accessed 23 Apr 2021

  18. Qiu, J.X., Yoon, H.J., Fearn, P.A., Tourassi, G.D.: Deep learning for automated extraction of primary sites from cancer pathology reports. IEEE J. Biomed. Health Inform. 22(1), 244–251 (2018)

    Article  Google Scholar 

  19. Ramshaw, L.A., Marcus, M.P.: Text chunking using transformation-based learning. In: Armstrong, S., Church, K., Isabelle, P., Manzi, S., Tzoukermann, E., Yarowsky, D. (eds.) Natural Language Processing Using Very Large Corpora. Text, Speech and Language Technology, vol. 11, pp. 157–176. Springer, Dordrecht (1999). https://doi.org/10.1007/978-94-017-2390-9_10

    Chapter  Google Scholar 

  20. Ribelles, N., et al.: Galén: Sistema de información para la gestión y coordinación de procesos en un servicio de oncología. RevistaeSalud 6(21), 1–12 (2010)

    Google Scholar 

  21. Si, Y., Wang, J., Xu, H., Roberts, K.: Enhancing clinical concept extraction with contextual embeddings. J. Am. Med. Inform. Assoc. 26(11), 1297–1304 (2019)

    Article  Google Scholar 

  22. Stenetorp, P., Pyysalo, S., Topić, G., Ohta, T., Ananiadou, S., Tsujii, J.: BRAT: a Web-based Tool for NLP-assisted text annotation. In: Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics, pp. 102–107. Association for Computational Linguistics, Avignon, April 2012

    Google Scholar 

  23. Sung, H., et al.: Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: Cancer J. Clin. 71, 209–249 (2021)

    Google Scholar 

  24. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems 30, pp. 5998–6008. Curran Associates, Inc. (2017)

    Google Scholar 

  25. Vítores, D.F.: El español: una lengua viva. Informe 2020. Instituto Cervantes (2020)

    Google Scholar 

  26. Wolf, T., et al.: HuggingFace’s transformers: state-of-the-art natural language processing. arXiv [cs.CL], October 2019

    Google Scholar 

  27. Xiong, Y., Huang, Y., Chen, Q., Wang, X., Nic, Y., Tang, B.: A joint model for medical named entity recognition and normalization. In: Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2020), pp. 499–504. CEUR Workshop Proceedings (2020)

    Google Scholar 

  28. Yang, X., Bian, J., Hogan, W.R., Wu, Y.: Clinical concept extraction using transformers. J. Am. Med. Inform. Assoc. 27(12), 1935–1942 (2020)

    Article  Google Scholar 

  29. Zhu, F., et al.: Biomedical text mining and its applications in cancer research. J. Biomed. Inform. 46(2), 200–211 (2013)

    Article  Google Scholar 

Download references

Acknowledgments

This work was partially supported by the project PID2020-116898RB-I00, Ministerio de Ciencia e Innovación, Plan Nacional de I+D+i, the project UMA-CEIATECH-01, Andalucía TECH, and the I Plan Propio de Investigación, Transferencia y Divulgación Científica of the Universidad de Málaga.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guillermo López-García .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

López-García, G., Jerez, J.M., Ribelles, N., Alba, E., Veredas, F.J. (2021). Detection of Tumor Morphology Mentions in Clinical Reports in Spanish Using Transformers. In: Rojas, I., Joya, G., Català, A. (eds) Advances in Computational Intelligence. IWANN 2021. Lecture Notes in Computer Science(), vol 12861. Springer, Cham. https://doi.org/10.1007/978-3-030-85030-2_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-85030-2_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-85029-6

  • Online ISBN: 978-3-030-85030-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics