Keywords

1 Introduction

There is an increasing interest in exploiting the vast amount of rapidly growing content related to health [7] by means of Information Retrieval [12] (IR) and deep learning strategies [14, 18]. Health-related content is particularly challenging, due to the highly specialized domain language and implicit differences in language characteristics depending on the content type (patient-generated content like discussion forum [15], blogs [8], social media [17] and other Internet sources, healthcare documentation and clinical records [6], professional or scientific publications [9], clinical practice guidelines, clinical trials documentation, medical questionnaires, medical informed consent documents, etc.). Moreover, it is also critical to provide search solutions for non-English content as well as cross-language or multilingual IR solutions [4, 10, 16].

Efficient retrieval of biomedical documents is key for evidence-based medicine, preparing systematic reviews or retrieval of particular clinical case studies. Due to particular search conditions of caregivers and healthcare professionals (limited amount of time spent per patient), they are also in need of more sophisticated retrieval approaches applied to electronic health records [11], a type of content highly challenging due to its telegraphic and domain specific language and the presence of negations and abbreviations. There is also interest in processing patient-generated content like social media and patient fora, a key resource for rare disease research, clinical trials patient selection/stratification or for discovering new patient-reported symptoms and treatment-related adverse effects. In the health-domain, indexing strategies relying on structured controlled vocabularies, like MeSH/DeCS or SNOMED CT, represent a critical component for efficient biomedical search engines, enabling query expansion and refinement [2] and the improvement of recommender systems [3].

1.1 BioASQ MESINESP Session

Currently, most of the Biomedical NLP and IR research is being done on English documents [13], and only few tasks have been carried out on non-English texts [5]. Many structured controlled vocabularies are also available only in English [19]. Nonetheless, it is important to note that there is also a considerable amount of medically relevant content published in languages other than English and particularly clinical texts are entirely written in the native language of each country, with a few exceptions. The critical importance of semantic indexing with medical vocabularies motivated several-shared tasks in the past, in particular the BioASQ tracksFootnote 1, with a considerable number of participants and impact in the field. Following the outline of previous medical indexing efforts, in particular the success of the BioASQ tracks centered on PubMed, the BioASQ MESINESP TASKFootnote 2, supported by the Spanish National Language Technology plan (Plan TL), proposes to carry out the first task on semantic indexing of Spanish medical texts.

This workshop will be a forum where the community can present and discuss current and future directions for the area based on the experience in participating at the MESINESP shared task or other medical IR, QA or text categorization evaluation campaigns, as well as the exploitation of evaluation settings and data collections generated through these kind of community evaluation efforts (both during and after the competition period).

1.2 Open Session

In addition to the MESINESP and shared task/evaluation campaign participation experience session, the workshop will include an Open Session covering IR technologies for heterogeneous health-related content open to multiple languages with a particular interest in the exploitation of structured controlled vocabularies and entity linking for document indexing and semantic search applications.

Among the proposed topics for the Open Session are: (1) multilingual and non-English health related IR, concept indexing and text categorization strategies, (2) generation of evaluation resources for biomedical document IR strategies, (3) scalability, robustness, reproducibility, utility and usability [1] of health IR and text mining resources, (4) use of specialized machine translation and advanced deep learning approaches for improving health related search results, (5) medical Question Answering search tools, (6) retrieval of multilingual health related web-content. Note that we will also consider other submissions related to innovative cutting-edge health and biomedical IR strategies, including evaluation and Gold Standard evaluation data set generation.

2 Planned Format and Structure

All the teams implementing systems for MESINESP will be invited to submit an article describing their participation strategy. The program committee will review the papers and select which of them will have a presentation slot at the workshop. For the Open Session we will invite researchers to submit novel IR approaches to process heterogeneous health-related content with particular interest in non-English content, novel content types as well as semantic indexing strategies exploiting structured controlled vocabularies and ontologies.

We expect that further investigation on the topics will continue after the workshop, based on new insights obtained through discussions during the event. As a venue to compile the results of the follow-up investigation, a journal special issue will be organized to be published a few months after the workshop.

3 People Involved

3.1 Organizers

  • Martin Krallinger: head of the Text Mining unit at the Barcelona Supercomputing Center (BSC), Spain

  • Francisco M. Couto: LASIGE member and associate professor at the University of Lisbon, Portugal

3.2 Programme Committee

  • Alberto Lavelli: FBK, Trento, Italy

  • Alfonso Valencia: Barcelona Supercomputing Center, Spain

  • Analia Lourenco: Universidade de Vigo, Spain

  • Anastasios Nentidis: National Center for Scientific Research Demokritos, Greece

  • André Lamurias: LASIGE, Portugal

  • Anne:Lyse Minard - University of Orleans, France

  • Aron Henriksson: Stockholm University, Sweden

  • Bruno Martins: INESC-ID, Portugal

  • Carsten Eickhoff: Brown University, USA

  • Chih:Hsuan Wei - NCBI/NIH, National Library of Medicine, USA

  • Cyril Grouin: LIMSI, CNRS, Université Paris-Saclay, Orsay, France

  • Diana Sousa: LASIGE, Portugal

  • Dimitrios Kokkinakis: University of Gothenburg, Sweden

  • Eben Holderness: McLean Hosp., Harvard Med. School & Brandeis University, USA

  • Ellen Vorhees: National Institute of Standards and Technology (NIST), USA.

  • Fabio Rinaldi: IDSIA, University of Zurich, Switzerland & FBK, Trento, Italy

  • Fleur Mougin: University of Bordeaux, France

  • Georgeta Bordea: Université de Bordeaux, France

  • Georgios Paliouras: National Center for Scientific Research Demokritos, Greece

  • Goran Nenadic: University of Manchester, UK

  • Graciela Gonzalez: Hernandez - University of Pennsylvania, USA

  • Hanna Suominen: CSIRO, Australia

  • Henning Muller: University of Applied Sciences Western Switzerland, Switzerland

  • Hercules Dalianis: Stockholm University, Sweden

  • Hyeju Jang: University of British Columbia, Canada

  • James Pustejovsky: Brandeis University, USA

  • Jin:Dong Kim - Research Organization of Information and Systems, Japan

  • Jong C. Park: KAIST Computer Science, Korea

  • Kevin Bretonnel Cohen: University of Colorado School of Medicine, Colorado, USA

  • Maria Skeppstedt: Institute for Language and Folklore, Sweden

  • Marcia Barros: LASIGE, Portugal

  • Mariana Lara: Neves - German Federal Institute for Risk Assessment, Germany

  • Marta Villegas: BSC, Spain

  • Pedro Ruas: LASIGE, Portugal

  • Rafael Berlanga Llavori: Universitat Jaume I, Spain

  • Rezarta Islamaj: Dogan - NIH/NLM/NCBI, USA

  • Sérgio Matos: University of Aveiro, Portugal

  • Shyamasree Saha: Europe PubMed Central, EMBL-EBI, UK

  • Suzanne Tamang: Stanford University School of Medicine, USA

  • Thierry Hamon: LIMSI, CNRS, Université Paris-Saclay & Université Paris 13, France

  • Thomas Brox Røst: Norwegian University of Science and Technology, Norway

  • Yifan Peng: NCBI/NIH, National Library of Medicine, USA

  • Yonghui Wu: University of Florida, USA

  • Yoshinobu Kano: Shizuoka University, Japan

  • Zhiyong Lu: NCBI/NIH, National Library of Medicine, USA

  • Zita Marinho: Priberam, Portugal