Skip to main content

To Enhance Full-Text Biomedical Document Classification Through Semantic Enrichment

  • Conference paper
  • First Online:
Hybrid Artificial Intelligent Systems (HAIS 2023)

Abstract

The rapid growth of the scientific literature makes text classification essential specially in the biomedical research domain to help researchers to focus on the latest findings in a fast and efficient way.

The potential benefits of using text semantic enrichment to enhance the biomedical document classification is presented in this study. We show the importance of enriching the corpora with semantic information to improve the full-text classification.

The approach involves the semantic enrichment of a Medline corpus with a Semantic Repository (SemRep) which extracts semantic predications from biomedical text. The study also addresses the problem of treating highly dimensional data while maintaining the semantic structure of the corpus.

Experimental results lead to the sustained conclusion that better results are achieved with full-text instead of using only abstracts and titles. We also conclude that the application of enriched techniques to full-texts significantly improves the task of text classification providing a significant contribution for the biomedical text mining research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Rindflesch, T.C., Fiszman, M.: The interaction of domain knowledge and linguistic structure in natural language processing: interpreting hypernymic propositions in biomedical text. J. Biomed. Inform. 36(6), 462–477 (2003)

    Article  Google Scholar 

  2. Rindflesch, T.C., Fiszman, M., Libbus, B.: Semantic interpretation for the biomedical research literature. In: Chen, H., Fuller, S.S., Friedman, C., Hersh, W. (eds.) Medical Informatics Integrated Series in Information Systems. ISIS, vol. 8, pp. 399–422. Springer, Boston (2005). https://doi.org/10.1007/0-387-25739-X_14

    Chapter  Google Scholar 

  3. Hersh, W., Buckley, C., Leone, T.J., Hickam, D.: OHSUMED: an interactive retrieval evaluation and new large test collection for research. In: Croft, B.W., van Rijsbergen, C.J. (eds.) SIGIR 1994, pp. 192–201. Springer, London (1994). https://doi.org/10.1007/978-1-4471-2099-5_20

    Chapter  Google Scholar 

  4. Škrlj, B., Kralj, J., Lavrač, N., Pollak, S.: Towards robust text classification with semantics-aware recurrent neural architecture. Mach. Learn. Knowl. Extr. 1(2), 575–589 (2019)

    Article  Google Scholar 

  5. Albitar, S., Espinasse, B., Fournier, S.: Semantic enrichments in text supervised classification: application to medical domain. In: The Twenty-Seventh International Flairs Conference, pp. 425–430 (2014)

    Google Scholar 

  6. Zhang, R., Hristovski, D., Schutte, D., Kastrin, A., Fiszman, M., Kilicoglu, H.: Drug repurposing for COVID-19 via knowledge graph completion. J. Biomed. Inform. 115, 103696 (2021)

    Article  Google Scholar 

  7. Kilicoglu, H., Shin, D., Fiszman, M., Rosemblat, G., Rindflesch, T.C.: SemMedDB: a PubMed-scale repository of biomedical semantic predications. Bioinformatics 28(23), 3158–3160 (2012)

    Article  Google Scholar 

  8. Du, J., Li, X.: A knowledge graph of combined drug therapies using semantic predications from biomedical literature: algorithm development. JMIR Med. Inform. 8(4), e18323 (2020)

    Article  Google Scholar 

  9. Ben Abacha, A., Zweigenbaum, P.: A hybrid approach for the extraction of semantic relations from MEDLINE abstracts. In: Gelbukh, A. (ed.) CICLing 2011. LNCS, vol. 6609, pp. 139–150. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-19437-5_11

    Chapter  Google Scholar 

  10. Avram, A., Matei, O., Pintea, C.-M., Pop, P.: Context quality impact in context-aware data mining for predicting soil moisture. Cybern. Syst. 51(7), 668–684 (2020)

    Article  Google Scholar 

  11. Avram, A., Matei, O., Pintea, C.-M., Pop, P.C.: Influence of context availability and soundness in predicting soil moisture using the context-aware data mining approach. Logic J. IGPL 31, 762–774 (2023)

    Article  Google Scholar 

  12. Sammut, C., Webb, G.I. (eds.): Encyclopedia of Machine Learning and Data Mining, 2nd edn. Springer, New York (2017). https://doi.org/10.1007/978-1-4899-7687-1

    Book  Google Scholar 

  13. Gonçalves, C., Iglesias, E.L., Borrajo, L., Camacho, R., Vieira, A.S., Gonçalves, C.T.: LearnSec: a framework for full text analysis. In: de Cos Juez, F., et al. (eds.) HAIS 2018. LNCS, vol. 10870, pp. 502–513. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-92639-1_42

    Chapter  Google Scholar 

  14. Gonçalves, C.A.O., Camacho, R., Gonçalves, C.T., Seara Vieira, A., Borrajo Diz, L., Iglesias, E.L.: Classification of full text biomedical documents: sections importance assessment. Appl. Sci. 11(6), 2674 (2021)

    Article  Google Scholar 

  15. Rebholz-Schuhmann, D., et al.: BioLexicon: towards a reference terminological resource in the biomedical domain. In: Proceedings of the of the 16th Annual International Conference on Intelligent Systems for Molecular Biology (ISMB 2008) (2008)

    Google Scholar 

  16. Forman, G., et al.: An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 3, 1289–1305 (2003)

    Google Scholar 

  17. Wolpert, D.H.: Stacked generalization. Neural Netw. 5(2), 241–259 (1992)

    Article  Google Scholar 

Download references

Acknowledgement

This work was financially supported by Base Funding - UIDB/00027/2020 of the Artificial Intelligence and Computer Science Laboratory - LIACC - funded by national funds through the FCT/MCTES (PIDDAC) and by Portuguese national funds through FCT - Fundação para a Ciência e Tecnologia, under the project UIDB/05422/2020.

We acknowledge the SoftCPS - Software Cyber-Physical Systems Group (ISEP) for their assistance and resources.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to E. L. Iglesias .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Gonçalves, C.A., Seara Vieira, A., Gonçalves, C.T., Borrajo, L., Camacho, R., Iglesias, E.L. (2023). To Enhance Full-Text Biomedical Document Classification Through Semantic Enrichment. In: García Bringas, P., et al. Hybrid Artificial Intelligent Systems. HAIS 2023. Lecture Notes in Computer Science(), vol 14001. Springer, Cham. https://doi.org/10.1007/978-3-031-40725-3_47

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-40725-3_47

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-40724-6

  • Online ISBN: 978-3-031-40725-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics