Overview of BioASQ 2021: The Ninth BioASQ Challenge on Large-Scale Biomedical Semantic Indexing and Question Answering

Nentidis, Anastasios; Katsimpras, Georgios; Vandorou, Eirini; Krithara, Anastasia; Gasco, Luis; Krallinger, Martin; Paliouras, Georgios

doi:10.1007/978-3-030-85251-1_18

Anastasios Nentidis^18,19,
Georgios Katsimpras¹⁸,
Eirini Vandorou¹⁸,
Anastasia Krithara¹⁸,
Luis Gasco²⁰,
Martin Krallinger²⁰ &
…
Georgios Paliouras¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12880))

Included in the following conference series:

International Conference of the Cross-Language Evaluation Forum for European Languages

1103 Accesses
9 Citations

Abstract

Advancing the state-of-the-art in large-scale biomedical semantic indexing and question answering is the main focus of the BioASQ challenge. BioASQ organizes respective tasks where different teams develop systems that are evaluated on the same benchmark datasets that represent the real information needs of experts in the biomedical domain. This paper presents an overview of the ninth edition of the BioASQ challenge in the context of the Conference and Labs of the Evaluation Forum (CLEF) 2021. In this year, a new question answering task, named Synergy, is introduced to support researchers studying the COVID-19 disease and measure the ability of the participating teams to discern information while the problem is still developing. In total, 42 teams with more than 170 systems were registered to participate in the four tasks of the challenge. The evaluation results, similarly to previous years, show a performance gain against the baselines which indicates the continuous improvement of the state-of-the-art in this field.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://pubmed.ncbi.nlm.nih.gov/.
2.
https://plantl.mineco.gob.es.
3.
https://www.ethnologue.com/guides/ethnologue200.
4.
DeCS (Descriptores Descriptores en Ciencias de la Salud, Health Science Descriptors) is a structured controlled vocabulary created by BIREME to index scientific publications on BvSalud (Biblioteca Virtual en Salud, Virtual Health Library).
5.
IBECS includes bibliographic references from scientific articles in health sciences published in Spanish medical journals. http://ibecs.isciii.es.
6.
LILACS is a resource comprising scientific and technical literature from Latin America and the Caribbean countries. It includes 26 countries, 882 journals and 878,285 records, 464,451 of which are full texts https://lilacs.bvsalud.org.
7.
Registro Español de Estudios Clínicos, a database containing summaries of clinical trials https://reec.aemps.es/reec/public/web.html.
8.
https://cloud.google.com/blog/topics/public-datasets/google-patents-public-datasets-connecting-public-paid-and-private-patent-data.
9.
http://participants-area.bioasq.org/results/9a/.
10.
http://participants-area.bioasq.org/Tasks/b/eval_meas_2021/.
11.
http://participants-area.bioasq.org/results/9b/phaseA/.
12.
http://participants-area.bioasq.org/results/9b/phaseB/.
13.
http://participants-area.bioasq.org/results/synergy/.

References

attentionxml: Label tree-based attention-aware deep model for high-performance extreme multi-label text classification
Google Scholar
ku-dmis at bioasq 9: Data-centric and model-centric approaches for biomedical question answering
Google Scholar
Almeida, T., Matos, S.: BIT.UA at BioASQ 8: lightweight neural document ranking with zero-shot snippet retrieval. In: CLEF (Working Notes) (2020)
Google Scholar
Almeida, T., Matos, S.: BioASQ synergy: a strong and simple baseline rooted in relevance feedback. In: CLEF (Working Notes) (2021)
Google Scholar
Almeida, T., Matos, S.: Universal passage weighting mechanism (UPWM) in BioASQ 9b. In: CLEF (Working Notes) (2021)
Google Scholar
Alrowili, S., Shanker, K.: Large biomedical question answering models with ALBERT and ELECTRA. In: CLEF (Working Notes) (2021)
Google Scholar
Alrowili, S., Shanker, V.: BioM-transformers: building large biomedical language models with BERT, ALBERT and ELECTRA. In: Proceedings of the 20th Workshop on Biomedical Language Processing, pp. 221–227. Association for Computational Linguistics, June 2021. https://www.aclweb.org/anthology/2021.bionlp-1.24
Baldwin, B., Carpenter, B.: LingPipe. World Wide Web (2003). http://alias-i.com/lingpipe
Balikas, G., et al.: Evaluation framework specifications. Project deliverable D4.1, UPMC, May 2013
Google Scholar
Campos, M., Couto, F.: Post-processing BioBERT and using voting methods for biomedical question answering. In: CLEF (Working Notes) (2021)
Google Scholar
Clark, K., Luong, M.T., Le, Q.V., Manning, C.D.: ELECTRA: pre-training text encoders as discriminators rather than generators. arXiv preprint arXiv:2003.10555 (2020)
Demsar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
MathSciNet MATH Google Scholar
García-Pablos, A., Perez, N., Cuadros, M.: Vicomtech at MESINESP2: BERT-based multi-label classification models for biomedical text indexing (2021)
Google Scholar
Gasco, L., et al.: Overview of BioASQ 2021-MESINESP track. Evaluation of advance hierarchical classification techniques for scientific literature, patents and clinical trials (2021)
Google Scholar
Huang, Y., Buse, G., Abdullatif, K., Ozgur, A., Ozkirimli, E.: Pidna at BioASQ MESINESP: hybrid semantic indexing for biomedical articles in Spanish (2021)
Google Scholar
Khanna, U., Molla, D.: Transformer-based language models for factoid question answering at bioasq9b. In: CLEF (Working Notes) (2021)
Google Scholar
Kosmopoulos, A., Partalas, I., Gaussier, E., Paliouras, G., Androutsopoulos, I.: Evaluation measures for hierarchical classification: a unified view and novel approaches. Data Min. Knowl. Disc. 29(3), 820–865 (2015)
Article MathSciNet Google Scholar
Krallinger, M., et al.: Overview of the CHEMDNER patents task. In: Proceedings of the Fifth BioCreative Challenge Evaluation Workshop, pp. 63–75 (2015)
Google Scholar
Miranda-Escalada, A., Farré, E., Krallinger, M.: Named entity recognition, concept normalization and clinical coding: Overview of the cantemist track for cancer text mining in Spanish, corpus, guidelines, methods and results. In: Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2020). CEUR Workshop Proceedings (2020)
Google Scholar
Miranda-Escalada, A.: The ProfNER shared task on automatic recognition of occupation mentions in social media: systems, evaluation, guidelines, embeddings and corpora. In: Proceedings of the Sixth Social Media Mining for Health (# SMM4H) Workshop and Shared Task, pp. 13–20 (2021)
Google Scholar
Miranda-Escalada, A., Gonzalez-Agirre, A., Armengol-Estapé, J., Krallinger, M.: Overview of automatic clinical coding: annotations, guidelines, and solutions for non-English clinical cases at CodiEsp track of CLEF eHealth 2020. In: Working Notes of Conference and Labs of the Evaluation (CLEF) Forum. CEUR Workshop Proceedings (2020)
Google Scholar
Molla, D., Jones, C., Nguyen, V.: Query focused multi-document summarisation of biomedical texts. arXiv preprint arXiv:2008.11986 (2020)
Molla, D., Khanna, U., Galat, D., Nguyen, V., Rybinski, M.: Query-focused extractive summarisation for finding ideal answers to biomedical and COVID-19 questions. In: CLEF (Working Notes) (2021)
Google Scholar
Mork, J.G., Demner-Fushman, D., Schmidt, S.C., Aronson, A.R.: Recent enhancements to the NLM medical text indexer. In: Proceedings of Question Answering Lab at CLEF (2014)
Google Scholar
Nentidis, A., et al.: Overview of BioASQ 2020: the eighth BioASQ challenge on large-scale biomedical semantic indexing and question answering. In: Arampatzis, A., et al. (eds.) CLEF 2020. LNCS, vol. 12260, pp. 194–214. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58219-7_16
Chapter Google Scholar
Ozyurt, I.B.: On the effectiveness of small, discriminatively pre-trained language representation models for biomedical text mining. In: Proceedings of the First Workshop on Scholarly Document Processing, pp. 104–112 (2020)
Google Scholar
Ozyurt, I.B.: End-to-end biomedical question answering via bio-answerfinder and discriminative language representation models. In: CLEF (Working Notes) (2021)
Google Scholar
Ozyurt, I.B., Bandrowski, A., Grethe, J.S.: Bio-AnswerFinder: a system to find answers to questions from biomedical texts. Database 2020 (2020)
Google Scholar
Pappas, D., Stavropoulos, P., Androutsopoulos, I.: AUEB-NLP at BioASQ 8: biomedical document and snippet retrieval (2020)
Google Scholar
Peng, S., You, R., Wang, H., Zhai, C., Mamitsuka, H., Zhu, S.: DeepMesh: deep semantic representation for improving large-scale mesh indexing. Bioinformatics 32(12), i70–i79 (2016)
Article Google Scholar
Rae, A., Mork, J., Demner-Fushman, D.: A neural text ranking approach for automatic mesh indexing. In: CLEF (Working Notes) (2021)
Google Scholar
Rae, A.R., Pritchard, D.O., Mork, J.G., Demner-Fushman, D.: Automatic mesh indexing: revisiting the subheading attachment problem. In: AMIA Annual Symposium Proceedings, vol. 2020, p. 1031. American Medical Informatics Association (2020)
Google Scholar
Raffel, C.: Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv preprint arXiv:1910.10683 (2019)
Ribadas, F.J., De Campos, L.M., Darriba, V.M., Romero, A.E.: CoLe and UTAI at BioASQ 2015: experiments with similarity based descriptor assignment. CEUR Workshop Proc. 1391 (2015)
Google Scholar
Rodriguez-Penagos, C.: Overview of MESINESP8, a Spanish medical semantic indexing task within BioASQ 2020 (2020)
Google Scholar
Ruas, P., Andrade, V.D.T., Couto, F.M.: LASIGE-BioTM at MESINESP2: entity linking with semantic similarity and extreme multi-label classification on Spanish biomedical documents (2021)
Google Scholar
Sarrouti, M., Gupta, D., Abacha, A.B., Demner-Fushman, D.: NLM at BioASQ 2021: deep learning-based methods for biomedical question answering about COVID-19. In: CLEF (Working Notes) (2021)
Google Scholar
Torres-Salinas, D., Robinson-Garcia, N., van Schalkwyk, F., Nane, G.F., Castillo-Valdivieso, P.: The growth of COVID-19 scientific literature: a forecast analysis of different daily time series in specific settings. arXiv preprint arXiv:2101.12455 (2021)
Tsatsaronis, G., et al.: An overview of the BioASQ large-scale biomedical semantic indexing and question answering competition. BMC Bioinform. 16, 138 (2015). https://doi.org/10.1186/s12859-015-0564-6
Article Google Scholar
Tsoumakas, G., Laliotis, M., Markontanatos, N., Vlahavas, I.: Large-scale semantic indexing of biomedical publications. In: 1st BioASQ Workshop: A Challenge on Large-Scale Biomedical Semantic Indexing and Question Answering (2013)
Google Scholar
Wang, L.L., et al.: CORD-19: the COVID-19 open research dataset. ArXiv (2020)
Google Scholar
Wei, C.H., Leaman, R., Lu, Z.: Beyond accuracy: creating interoperable and scalable text-mining web services. Bioinform. (Oxford, Engl.) 32(12), 1907–10 (2016). https://doi.org/10.1093/bioinformatics/btv760
Article Google Scholar
Xue, L., et al.: mT5: a massively multilingual pre-trained text-to-text transformer. arXiv preprint arXiv:2010.11934 (2020)
Yang, Z., Zhou, Y., Eric, N.: Learning to answer biomedical questions: OAQA at BioASQ 4b. ACL 2016, 23 (2016)
Google Scholar
Yoon, W., Jackson, R., Kang, J., Lagerberg, A.: Sequence tagging for biomedical extractive question answering. arXiv preprint arXiv:2104.07535 (2021)
You, R., Liu, Y., Mamitsuka, H., Zhu, S.: BERTMeSH: deep contextual representation learning for large-scale high-performance MeSH indexing with full text. Bioinformatics 37(5), 684–692 (2021)
Article Google Scholar
Zavorin, I., Mork, J.G., Demner-Fushman, D.: Using learning-to-rank to enhance NLM medical text indexer results. ACL 2016, 8 (2016)
Google Scholar
Zhang, Y., Han, J.C., Tsai, R.T.H.: NCU-IISR/AS-GIS: results of various pre-trained biomedical language models and logistic regression model in BioASQ task 9b phase b. In: CLEF (Working Notes) (2021)
Google Scholar

Download references

Acknowledgments

Google was a proud sponsor of the BioASQ Challenge in 2020. The ninth edition of BioASQ is also sponsored by Atypon Systems inc. BioASQ is grateful to NLM for providing the baselines for task 9a and to the CMU team for providing the baselines for task 9b. The MESINESP task is sponsored by the Spanish Plan for the Advancement of Language Technologies (Plan TL). BioASQ would also like to thank LILACS, SCIELO, Biblioteca Virtual en Salud, Instituto de Salud Carlos III, and BIREME for providing data and help in organizing the BioASQ MESINESP task.

Author information

Authors and Affiliations

National Center for Scientific Research “Demokritos”, Athens, Greece
Anastasios Nentidis, Georgios Katsimpras, Eirini Vandorou, Anastasia Krithara & Georgios Paliouras
Aristotle University of Thessaloniki, Thessaloniki, Greece
Anastasios Nentidis
Barcelona Supercomputing Center, Barcelona, Spain
Luis Gasco & Martin Krallinger

Authors

Anastasios Nentidis
View author publications
You can also search for this author in PubMed Google Scholar
Georgios Katsimpras
View author publications
You can also search for this author in PubMed Google Scholar
Eirini Vandorou
View author publications
You can also search for this author in PubMed Google Scholar
Anastasia Krithara
View author publications
You can also search for this author in PubMed Google Scholar
Luis Gasco
View author publications
You can also search for this author in PubMed Google Scholar
Martin Krallinger
View author publications
You can also search for this author in PubMed Google Scholar
Georgios Paliouras
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anastasios Nentidis .

Editor information

Editors and Affiliations

Arizona State University, Tempe, AZ, USA
K. Selçuk Candan
Politehnica University of Bucharest, Bucharest, Romania
Bogdan Ionescu
Université Grenoble Alpes, Saint-Martin-d’Hères, France
Lorraine Goeuriot
Aalborg University Copenhagen, Copenhagen, Denmark
Birger Larsen
HES-SO Valais-Wallis, Sierre, Switzerland
Henning Müller
University of Montpellier, Montpellier, France
Alexis Joly
University of Copenhagen, Copenhagen, Denmark
Maria Maistro
TU Wien, Vienna, Austria
Florina Piroi
University of Padua, Padova, Italy
Guglielmo Faggioli
University of Padua, Padova, Italy
Nicola Ferro

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nentidis, A. et al. (2021). Overview of BioASQ 2021: The Ninth BioASQ Challenge on Large-Scale Biomedical Semantic Indexing and Question Answering. In: Candan, K.S., et al. Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2021. Lecture Notes in Computer Science(), vol 12880. Springer, Cham. https://doi.org/10.1007/978-3-030-85251-1_18

Download citation

DOI: https://doi.org/10.1007/978-3-030-85251-1_18
Published: 14 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-85250-4
Online ISBN: 978-3-030-85251-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Overview of BioASQ 2021: The Ninth BioASQ Challenge on Large-Scale Biomedical Semantic Indexing and Question Answering