BioTMPy: A Deep Learning-Based Tool to Classify Biomedical Literature

Alves, Nuno; Rodrigues, Ruben; Rocha, Miguel

doi:10.1007/978-3-030-86258-9_12

Nuno Alves¹³,
Ruben Rodrigues¹³ &
Miguel Rocha¹³

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 325))

Included in the following conference series:

International Conference on Practical Applications of Computational Biology & Bioinformatics

347 Accesses
1 Citations
1 Altmetric

Abstract

The identification of the most relevant articles for a given task among a rapidly increasing number of options is a highly time-consuming task performed by researchers. To help in this task, a package called BioTMPy (https://github.com/BioSystemsUM/biotmpy) was developed to implement a complete pipeline to classify biomedical literature using state-of-the-art Deep Learning models. The package is divided into distinct modules that can be used in different steps of a pipeline, together or taken independently. To validate BioTMPy, the package was used to compare several pre-trained embeddings on a dataset from a BioCreative’s challenge, where BioWordVec showed a slightly better performance over GloVe, PubMed vectors and “pubmed_ncbi” embeddings. Additionally, we implemented and compared several state-of-the-art DL models encompassing recurrent and convolutional layers, as well as transformers with attention mechanisms, including the ones from the BERT family. We were able to obtain an improvement of over 7% for average precision and 3% for F1-score when compared to the challenge’s best submission.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://biotmpyppi.bio.di.uminho.pt/.

References

Krallinger, M., Valencia, A.: Text-mining and information-retrieval services for molecular biology (2005)
Google Scholar
Mirończuk, M.M., Protasiewicz, J.: A recent overview of the state-of-the-art elements of text classification, September 2018
Google Scholar
Fiorini, N., et al.: Best match: new relevance search for PubMed. PLoS Biol. 16(8), e2005343 (2018)
Google Scholar
Cohen, A.M., Hersh, W.R.: A survey of current work in biomedical text mining. Briefings Bioinform. 6, 57–71 (2005)
Article Google Scholar
Ignatow, G., Mihalcea, R.: An introduction to text mining: research design, data collection, and analysis (2018). https://study.sagepub.com/introtextmining
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding, October 2018
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, December 2017, NIPS, pp. 5999–6009 (2017)
Google Scholar
Chollet, F.: Deep Learning with Phyton (2018)
Google Scholar
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
McKinney, W., Team, P.: Pandas: powerful python data analysis toolkit, p. 1625 (2015)
Google Scholar
Harris, C.R., et al.: Array programming with NumPy. Nature 585(7825), 357–362 (2020)
Article Google Scholar
Wolf, T., et al.: HuggingFace’s transformers: state-of-the-art natural language processing. ArXiv arXiv:1910..03771 (2019)
Hunter, J.D.: Matplotlib: a 2D graphics environment. Comput. Sci. Eng. 9(3), 90–95 (2007)
Article Google Scholar
Natural language toolkit. https://www.nltk.org/
Burns, G.A., Li, X., Peng, N.: Building deep learning models for evidence classification from the open access biomedical literature. Database J. Biol. Databases Curation 2019 (2019)
Google Scholar
Lee, J., et al.: BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36(4), 1234–1240 (2019)
Google Scholar
Beltagy, I., Lo, K., Cohan, A.: SciBERT: a pretrained language model for scientific text, March 2019. http://arxiv.org/abs/1903.10676
Islamaj Doǧan, R., et al.: Overview of the BioCreative VI Precision Medicine Track: Mining protein interactions and mutations for precision medicine (2019)
Google Scholar
Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: EMNLP 2014 - 2014 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference, pp. 1532–1543 (2014)
Google Scholar
Zhang, Y., Chen, Q., Yang, Z., Lin, H., Lu, Z.: BioWordVec, improving biomedical word embeddings with subword information and MeSH. Sci. Data 6(1), 52 (2019). www.nature.com/scientificdata
Pyysalo, S., Ginter, F., Moen, H., Salakoski, T., Ananiadou, S.: Distributional semantics resources for biomedical text processing. Aistats 5, 39–44 (2013)
Google Scholar
Kim, S., Fiorini, N., Wilbur, W.J., Lu, Z.: Bridging the gap: incorporating a semantic similarity measure for effectively mapping PubMed queries to documents. J. Biomed. Inform. 75, 122–127 (2017)
Article Google Scholar
Gururangan, S., et al.: Don’t stop pretraining: adapt language models to domains and tasks, pp. 8342–8360 (2020). https://github.com/allenai/

Download references

Acknowledgements

This research has been supported by FCT - Fundação para a Ciência e Tecnologia through the DeepBio project - ref. NORTE-01-0247-FEDER-039831, funded by Lisboa 2020, Norte 2020, Portugal 2020 and FEDER - Fundo Europeu de Desenvolvimento Regional.

Author information

Authors and Affiliations

Centre of Biological Engineering, University of Minho, Campus de Gualtar, 4710-057, Braga, Portugal
Nuno Alves, Ruben Rodrigues & Miguel Rocha

Authors

Nuno Alves
View author publications
You can also search for this author in PubMed Google Scholar
Ruben Rodrigues
View author publications
You can also search for this author in PubMed Google Scholar
Miguel Rocha
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nuno Alves .

Editor information

Editors and Affiliations

Department de Informática, Universidade do Minho, Braga, Portugal
Miguel Rocha
Superior de Ingeniería Informática, Universidade de Vigo, Escuela, Ourense, Spain
Florentino Fdez-Riverola
Department of Genetics and Genomics, United Arab Emirates University, Abu Dhabi, United Arab Emirates
Mohd Saberi Mohamad
BISITE, Digital Innovation Hub, University of Salamanca, Salamanca, Salamanca, Spain
Roberto Casado-Vara

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Alves, N., Rodrigues, R., Rocha, M. (2022). BioTMPy: A Deep Learning-Based Tool to Classify Biomedical Literature. In: Rocha, M., Fdez-Riverola, F., Mohamad, M.S., Casado-Vara, R. (eds) Practical Applications of Computational Biology & Bioinformatics, 15th International Conference (PACBB 2021). PACBB 2021. Lecture Notes in Networks and Systems, vol 325. Springer, Cham. https://doi.org/10.1007/978-3-030-86258-9_12

Download citation

DOI: https://doi.org/10.1007/978-3-030-86258-9_12
Published: 28 August 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86257-2
Online ISBN: 978-3-030-86258-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics