Proposal of the First International Workshop on Semantic Indexing and Information Retrieval for Health from Heterogeneous Content Types and Languages (SIIRH)

Couto, Francisco M.; Krallinger, Martin

doi:10.1007/978-3-030-45442-5_87

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12036))

Included in the following conference series:

European Conference on Information Retrieval

5909 Accesses
3 Citations

Abstract

The application of Information Retrieval (IR) and deep learning strategies to explore the vast amount of rapidly growing health-related content is of utmost importance, but is also particularly challenging, due to the very specialized domain language, and implicit differences in language characteristics depending on the content type.

This workshop aims at presenting and discussing current and future directions for IR and machine learning approaches devoted to the retrieval and classification of different types of health-related documents ranging from layman or patient generated texts to highly specialized medical literature or clinical records. It includes a session on the MESINESP shared task, supported by the Spanish National Language Technology plan (Plan TL), in order to address the importance and impact of community evaluation efforts, in particular BioASQ, BioCreative, eHealth CLEF, MEDIQA and TREC, as scenarios for exploring evaluation settings and generate data collections of key importance for promoting the development and comparison of IR resources. Additionally, an open session will address IR technologies for heterogeneous health-related content open to multiple languages with a particular interest in the exploitation of structured controlled vocabularies and entity linking, covering the following topics: multilingual and non-English health-related IR, concept indexing, text categorization, generation of evaluation resources biomedical document IR strategies; scalability, robustness and reproducibility of health IR and text mining resources; use of specialized machine translation and advanced deep learning approaches for improving health related search results; medical Question Answering search tools; retrieval of multilingual health related web-content; and other related topics.

Supported by FCT through funding of the DeST: Deep Semantic Tagger project, ref. PTDC/CCI-BIO/28685/2017, and LaSIGE Research Unit, ref. UIDB/00408/2020.

You have full access to this open access chapter, Download conference paper PDF

Overview of the CLEF eHealth Evaluation Lab 2020

Overview of the CLEF eHealth Evaluation Lab 2019

Clinical Information Retrieval: A Literature Review

Article 23 January 2024

Sonish Sivarajkumar, Haneef Ahamed Mohammad, … Yanshan Wang

Keywords

1 Introduction

There is an increasing interest in exploiting the vast amount of rapidly growing content related to health [7] by means of Information Retrieval [12] (IR) and deep learning strategies [14, 18]. Health-related content is particularly challenging, due to the highly specialized domain language and implicit differences in language characteristics depending on the content type (patient-generated content like discussion forum [15], blogs [8], social media [17] and other Internet sources, healthcare documentation and clinical records [6], professional or scientific publications [9], clinical practice guidelines, clinical trials documentation, medical questionnaires, medical informed consent documents, etc.). Moreover, it is also critical to provide search solutions for non-English content as well as cross-language or multilingual IR solutions [4, 10, 16].

Efficient retrieval of biomedical documents is key for evidence-based medicine, preparing systematic reviews or retrieval of particular clinical case studies. Due to particular search conditions of caregivers and healthcare professionals (limited amount of time spent per patient), they are also in need of more sophisticated retrieval approaches applied to electronic health records [11], a type of content highly challenging due to its telegraphic and domain specific language and the presence of negations and abbreviations. There is also interest in processing patient-generated content like social media and patient fora, a key resource for rare disease research, clinical trials patient selection/stratification or for discovering new patient-reported symptoms and treatment-related adverse effects. In the health-domain, indexing strategies relying on structured controlled vocabularies, like MeSH/DeCS or SNOMED CT, represent a critical component for efficient biomedical search engines, enabling query expansion and refinement [2] and the improvement of recommender systems [3].

1.1 BioASQ MESINESP Session

Currently, most of the Biomedical NLP and IR research is being done on English documents [13], and only few tasks have been carried out on non-English texts [5]. Many structured controlled vocabularies are also available only in English [19]. Nonetheless, it is important to note that there is also a considerable amount of medically relevant content published in languages other than English and particularly clinical texts are entirely written in the native language of each country, with a few exceptions. The critical importance of semantic indexing with medical vocabularies motivated several-shared tasks in the past, in particular the BioASQ tracks^{Footnote 1}, with a considerable number of participants and impact in the field. Following the outline of previous medical indexing efforts, in particular the success of the BioASQ tracks centered on PubMed, the BioASQ MESINESP TASK^{Footnote 2}, supported by the Spanish National Language Technology plan (Plan TL), proposes to carry out the first task on semantic indexing of Spanish medical texts.

This workshop will be a forum where the community can present and discuss current and future directions for the area based on the experience in participating at the MESINESP shared task or other medical IR, QA or text categorization evaluation campaigns, as well as the exploitation of evaluation settings and data collections generated through these kind of community evaluation efforts (both during and after the competition period).

1.2 Open Session

In addition to the MESINESP and shared task/evaluation campaign participation experience session, the workshop will include an Open Session covering IR technologies for heterogeneous health-related content open to multiple languages with a particular interest in the exploitation of structured controlled vocabularies and entity linking for document indexing and semantic search applications.

Among the proposed topics for the Open Session are: (1) multilingual and non-English health related IR, concept indexing and text categorization strategies, (2) generation of evaluation resources for biomedical document IR strategies, (3) scalability, robustness, reproducibility, utility and usability [1] of health IR and text mining resources, (4) use of specialized machine translation and advanced deep learning approaches for improving health related search results, (5) medical Question Answering search tools, (6) retrieval of multilingual health related web-content. Note that we will also consider other submissions related to innovative cutting-edge health and biomedical IR strategies, including evaluation and Gold Standard evaluation data set generation.

2 Planned Format and Structure

All the teams implementing systems for MESINESP will be invited to submit an article describing their participation strategy. The program committee will review the papers and select which of them will have a presentation slot at the workshop. For the Open Session we will invite researchers to submit novel IR approaches to process heterogeneous health-related content with particular interest in non-English content, novel content types as well as semantic indexing strategies exploiting structured controlled vocabularies and ontologies.

We expect that further investigation on the topics will continue after the workshop, based on new insights obtained through discussions during the event. As a venue to compile the results of the follow-up investigation, a journal special issue will be organized to be published a few months after the workshop.

3 People Involved

3.1 Organizers

Martin Krallinger: head of the Text Mining unit at the Barcelona Supercomputing Center (BSC), Spain
Francisco M. Couto: LASIGE member and associate professor at the University of Lisbon, Portugal

3.2 Programme Committee

Alberto Lavelli: FBK, Trento, Italy
Alfonso Valencia: Barcelona Supercomputing Center, Spain
Analia Lourenco: Universidade de Vigo, Spain
Anastasios Nentidis: National Center for Scientific Research Demokritos, Greece
André Lamurias: LASIGE, Portugal
Anne:Lyse Minard - University of Orleans, France
Aron Henriksson: Stockholm University, Sweden
Bruno Martins: INESC-ID, Portugal
Carsten Eickhoff: Brown University, USA
Chih:Hsuan Wei - NCBI/NIH, National Library of Medicine, USA
Cyril Grouin: LIMSI, CNRS, Université Paris-Saclay, Orsay, France
Diana Sousa: LASIGE, Portugal
Dimitrios Kokkinakis: University of Gothenburg, Sweden
Eben Holderness: McLean Hosp., Harvard Med. School & Brandeis University, USA
Ellen Vorhees: National Institute of Standards and Technology (NIST), USA.
Fabio Rinaldi: IDSIA, University of Zurich, Switzerland & FBK, Trento, Italy
Fleur Mougin: University of Bordeaux, France
Georgeta Bordea: Université de Bordeaux, France
Georgios Paliouras: National Center for Scientific Research Demokritos, Greece
Goran Nenadic: University of Manchester, UK
Graciela Gonzalez: Hernandez - University of Pennsylvania, USA
Hanna Suominen: CSIRO, Australia
Henning Muller: University of Applied Sciences Western Switzerland, Switzerland
Hercules Dalianis: Stockholm University, Sweden
Hyeju Jang: University of British Columbia, Canada
James Pustejovsky: Brandeis University, USA
Jin:Dong Kim - Research Organization of Information and Systems, Japan
Jong C. Park: KAIST Computer Science, Korea
Kevin Bretonnel Cohen: University of Colorado School of Medicine, Colorado, USA
Maria Skeppstedt: Institute for Language and Folklore, Sweden
Marcia Barros: LASIGE, Portugal
Mariana Lara: Neves - German Federal Institute for Risk Assessment, Germany
Marta Villegas: BSC, Spain
Pedro Ruas: LASIGE, Portugal
Rafael Berlanga Llavori: Universitat Jaume I, Spain
Rezarta Islamaj: Dogan - NIH/NLM/NCBI, USA
Sérgio Matos: University of Aveiro, Portugal
Shyamasree Saha: Europe PubMed Central, EMBL-EBI, UK
Suzanne Tamang: Stanford University School of Medicine, USA
Thierry Hamon: LIMSI, CNRS, Université Paris-Saclay & Université Paris 13, France
Thomas Brox Røst: Norwegian University of Science and Technology, Norway
Yifan Peng: NCBI/NIH, National Library of Medicine, USA
Yonghui Wu: University of Florida, USA
Yoshinobu Kano: Shizuoka University, Japan
Zhiyong Lu: NCBI/NIH, National Library of Medicine, USA
Zita Marinho: Priberam, Portugal

Notes

1.
http://bioasq.org/.
2.
http://temu.bsc.es/mesinesp.

References

Arighi, C.N., et al.: BioCreative III interactive task: an overview. BMC Bioinformatics 12(8), S4 (2011). https://doi.org/10.1186/1471-2105-12-S8-S4
Article Google Scholar
Barros, M., Couto, F.M.: Knowledge representation and management: a linked data perspective. Yearb. Med. Inform. 25(01), 178–183 (2016)
Article Google Scholar
Barros, M., Moitinho, A., Couto, F.: Hybrid semantic recommender system for chemical compounds. In: European Conference on Information Retrieval. Springer (2020)
Google Scholar
Bawden, R., et al.: Findings of the WMT 2019 biomedical translation shared task: evaluation for MEDLINE abstracts and biomedical terminologies. In: Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2), pp. 29–53 (2019)
Google Scholar
Campos, L., Pedro, V., Couto, F.: Impact of translation on named-entity recognition in radiology texts. Database 2017 (2017)
Google Scholar
Costumero, R., García-Pedrero, Á., Gonzalo-Martín, C., Menasalvas, E., Millan, S.: Text analysis and information extraction from Spanish written documents. In: Ślȩzak, D., Tan, A.-H., Peters, J.F., Schwabe, L. (eds.) BIH 2014. LNCS (LNAI), vol. 8609, pp. 188–197. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-09891-3_18
Couto, F.M.: Data and Text Processing for Health and Life Sciences. AEMB, vol. 1137. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-13845-5
Book Google Scholar
Denecke, K., Nejdl, W.: How valuable is medical social media data? Content analysis of the medical web. Inf. Sci. 179(12), 1870–1880 (2009)
Article Google Scholar
Intxaurrondo, A., et al.: Finding mentions of abbreviations and their definitions in Spanish clinical cases: the BARR2 shared task evaluation results. In: IberEval@ SEPLN, pp. 280–289 (2018)
Google Scholar
Kelly, L., et al.: Overview of the CLEF eHealth evaluation lab 2019. In: Crestani, F., et al. (eds.) CLEF 2019. LNCS, vol. 11696, pp. 322–339. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-28577-7_26
Chapter Google Scholar
Koleck, T.A., Dreisbach, C., Bourne, P.E., Bakken, S.: Natural language processing of symptoms documented in free-text narratives of electronic health records: a systematic review. J. Am. Med. Inform. Assoc.J. Am. Med. Inform. Assoc. 26(4), 364–379 (2019)
Article Google Scholar
Krallinger, M., Rabal, O., Lourenco, A., Oyarzabal, J., Valencia, A.: Information retrieval and text mining technologies for chemistry. Chem. Rev. 117(12), 7673–7761 (2017)
Article Google Scholar
Lamurias, A., Couto, F.M.: Text mining for bioinformatics using biomedical literature. In: Encyclopedia of Bioinformatics and Computational Biology, vol. 1 (2019)
Google Scholar
Lee, J., et al.: BioBERT: pre-trained biomedical language representation model for biomedical textmining. Bioinformatics 36, 1234–1240 (2019)
Google Scholar
Liu, X., Chen, H.: AZDrugMiner: an information extraction system for mining patient-reported adverse drug events in online patient forums. In: Zeng, D., et al. (eds.) ICSH 2013. LNCS, vol. 8040, pp. 134–150. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-39844-5_16
Chapter Google Scholar
Marimon, M., et al.: Automatic de-identification of medical texts in Spanish: the meddocan track, corpus, guidelines, methods and evaluation of results. In: Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019), vol. TBA, p. TBA. CEUR Workshop Proceedings, Bilbao, Spain, September 2019, TBA. CEUR-WS. org (2019)
Google Scholar
Segura-Bedmar, I., Revert, R., Martínez, P.: Detecting drugs and adverse events from Spanish social media streams. In: Proceedings of the 5th International Workshop on Health Text Mining and Information Analysis (LOUHI), pp. 106–115 (2014)
Google Scholar
Sousa, D., Couto, F.: BiOnt: deep learning using multiple biomedical ontologies for relation extraction. In: European Conference on Information Retrieval. Springer (2020)
Google Scholar
Villegas, M., Intxaurrondo, A., Gonzalez-Agirre, A., Marimon, M., Krallinger, M.: The MeSpEN resource for English-Spanish medical machine translation and terminologies: census of parallel corpora, glossaries and term translations. In: Proceedings of the LREC 2018 Workshop “MultilingualBIO: Multilingual Biomedical Text Processing”, Paris, France. European Language Resources Association (ELRA) (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

LASIGE, Faculdade de Ciências, Universidade de Lisboa, Lisbon, Portugal
Francisco M. Couto
Life Science Department, Barcelona Supercomputing Centre (BSC-CNS), C/Jordi Girona 29-31, 08034, Barcelona, Spain
Martin Krallinger

Authors

Francisco M. Couto
View author publications
You can also search for this author in PubMed Google Scholar
Martin Krallinger
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Francisco M. Couto .

Editor information

Editors and Affiliations

University of Glasgow, Glasgow, UK
Joemon M. Jose
University College London, London, UK
Emine Yilmaz
Universidade NOVA de Lisboa, Lisbon, Portugal
João Magalhães
Universidad Autónoma de Madrid, Madrid, Spain
Pablo Castells
University of Padua, Padua, Italy
Nicola Ferro
Universidade de Lisboa, Lisbon, Portugal
Mário J. Silva
Universidade NOVA de Lisboa, Lisbon, Portugal
Flávio Martins

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Couto, F.M., Krallinger, M. (2020). Proposal of the First International Workshop on Semantic Indexing and Information Retrieval for Health from Heterogeneous Content Types and Languages (SIIRH). In: Jose, J., et al. Advances in Information Retrieval. ECIR 2020. Lecture Notes in Computer Science(), vol 12036. Springer, Cham. https://doi.org/10.1007/978-3-030-45442-5_87

Download citation

DOI: https://doi.org/10.1007/978-3-030-45442-5_87
Published: 08 April 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-45441-8
Online ISBN: 978-3-030-45442-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Proposal of the First International Workshop on Semantic Indexing and Information Retrieval for Health from Heterogeneous Content Types and Languages (SIIRH)

Abstract

Similar content being viewed by others

Overview of the CLEF eHealth Evaluation Lab 2020

Overview of the CLEF eHealth Evaluation Lab 2019

Clinical Information Retrieval: A Literature Review

Keywords

1 Introduction

1.1 BioASQ MESINESP Session

1.2 Open Session

2 Planned Format and Structure

3 People Involved

3.1 Organizers

3.2 Programme Committee

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Proposal of the First International Workshop on Semantic Indexing and Information Retrieval for Health from Heterogeneous Content Types and Languages (SIIRH)

Abstract

Similar content being viewed by others

Overview of the CLEF eHealth Evaluation Lab 2020

Overview of the CLEF eHealth Evaluation Lab 2019

Clinical Information Retrieval: A Literature Review

Keywords

1 Introduction

1.1 BioASQ MESINESP Session

1.2 Open Session

2 Planned Format and Structure

3 People Involved

3.1 Organizers

3.2 Programme Committee

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation