Advertisement

Information Retrieval Journal

, Volume 21, Issue 6, pp 507–540 | Cite as

An analysis of evaluation campaigns in ad-hoc medical information retrieval: CLEF eHealth 2013 and 2014

  • Lorraine Goeuriot
  • Gareth J. F. Jones
  • Liadh Kelly
  • Johannes Leveling
  • Mihai Lupu
  • Joao Palotti
  • Guido Zuccon
Article

Abstract

Since its inception in 2013, one of the key contributions of the CLEF eHealth evaluation campaign has been the organization of an ad-hoc information retrieval (IR) benchmarking task. This IR task evaluates systems intended to support laypeople searching for and understanding health information. Each year the task provides registered participants with standard IR test collections consisting of a document collection and topic set. Participants then return retrieval results obtained by their IR systems for each query, which are assessed using a pooling procedure. In this article we focus on CLEF eHealth 2013 and 2014s retrieval task, which saw topics created based on patients’ information needs associated with their medical discharge summaries. We overview the task and datasets created, and the results obtained by participating teams over these two years. We then provide a detailed comparative analysis of the results, and conduct an evaluation of the datasets in the light of these results. This twofold study of the evaluation campaign teaches us about technical aspects of medical IR, such as the effectiveness of query expansion; the quality and characteristics of CLEF eHealth IR datasets, such as their reliability; and how to run an IR evaluation campaign in the medical domain.

Keywords

eHealth Evaluation Benchmarking 

References

  1. Alsulmi, M. & Carterette, B. (2016). Improving clinical case search using semantic based query reformulations. In 2016 IEEE international conference on bioinformatics and biomedicine (BIBM) (pp. 694–698). IEEE.Google Scholar
  2. Armstrong, T. G., Moffat, A., Webber, W., & Zobel, J. (2009). Improvements that don’t add up: Ad-hoc retrieval results since 1998. In CIKM 2009 (pp. 601–610). ACM.Google Scholar
  3. Barajas, K. C., & Akella, R. (2013). Incorporating statistical topic models in the retrieval of health care documents. In Proceedings of the ShARe/CLEF eHealth Evaluation Lab.Google Scholar
  4. Bedrick, S., & Sheikhshabbafghi, G. (2013). Lucene, MetaMap, and language modeling: OHSU at CLEF eHealth 2013. In Proceedings of the ShARe/CLEF eHealth Evaluation Lab.Google Scholar
  5. Benigeri, M., & Pluye, P. (2003). Shortcomings of health information on the internet. Health Promotion International, 18(4), 381386.CrossRefGoogle Scholar
  6. Chappell, T., & Geva, S. (2013). Working notes for TopSig at ShARe/CLEF eHealth 2013. In Proceedings of the ShARe/CLEF eHealth Evaluation Lab.Google Scholar
  7. Choi, S., & Choi, J. (2013). SNUMedinfo at CLEFeHealth2013 task 3. In Proceedings of the ShARe/CLEF eHealth Evaluation Lab.Google Scholar
  8. Choi, S., & Choi, J. (2014). Exploring effective information retrieval technique for the medical web documents: SNUMedinfo at CLEFeHealth2014 Task 3. In Proceedings of the ShARe/CLEF eHealth Evaluation Lab.Google Scholar
  9. Claveau, V. (2012). Unsupervised and semi-supervised morphological analysis for information retrieval in the biomedical domain. In COLING, 2012 (pp. 629–645).Google Scholar
  10. Claveau, V., Hamon, T., Grabar, N., & Le Maguer, S. (2014). RePaLi participation to CLEF eHealth IR challenge 2014: Leveraging term variation. In Proceedings of the ShARe/CLEF eHealth Evaluation Lab.Google Scholar
  11. Demner-Fushman, D., & Lin, J. (2007). Answering clinical questions with knowledge-based and statistical techniques. Computational Linguistics, 33(1), 63–103.CrossRefGoogle Scholar
  12. Dramé, K., Mougin, F., & Diallo, G. (2014). Query expansion using external resources for improving information retrieval in the biomedical domain. In Proceedings of the ShARe/CLEF eHealth Evaluation Lab.Google Scholar
  13. Goeuriot, L., Chapman, W., Jones, G. J. F., Kelly, L., Leveling, J., & Salanterä, S. (2014a). Building realistic potential patients queries for medical information retrieval evaluation. In Proceedings of the LREC workshop on building and evaluating resources for health and biomedical text processing.Google Scholar
  14. Goeuriot, L., Jones, G. J. F., Kelly, L., Leveling, J., Hanbury, A., Müller, H., et al. (2013a). ShARe/CLEF eHealth Evaluation Lab 2013, task 3: Information retrieval to address patients’ questions when reading clinical reports. In CLEF online working notes.Google Scholar
  15. Goeuriot, L., Kelly, L., & Leveling, J. (2014b). An analysis of query difficulty for information retrieval in the medical domain. In Proceedings of the ACM special interest group on information retrieval conference (SIGIR 2014).Google Scholar
  16. Goeuriot, L., Kelly, L., Hanlen, L., Suominen, H., Névéol, A., Palotti, J., et al. (2015). Overview of the CLEF eHealth Evaluation Lab 2015. In Proceedings of CLEF 2015.Google Scholar
  17. Goeuriot, L., Kelly, L., Jones, G. J. F., Zuccon, G., Suominen, H., Hanbury, A., et al. (2013b). Creation of a new evaluation benchmark for information retrieval targeting patient information needs. In R. Song, W. Webber, N. Kando, & K. Kishida (Eds.), Proceedings of the 5th international workshop on evaluating information access (EVIA), a satellite workshop of the NTCIR-10 conference, Tokyo/Fukuoka, Japan, National Institute of Informatics/Kijima Printing.Google Scholar
  18. Goeuriot, L., Kelly, L., Li, W., Palotti, J., Pecina, P., Zuccon, G., et al. (2014c). ShARe/CLEF eHealth Evaluation Lab 2014, Task 3: User-centred health information retrieval. In CLEF online working notes.Google Scholar
  19. Hanbury, A., & Müller, H. (2012). Khresmoi—Multimodal multilingual medical information search. In Proceedings of medical informatics Europe 2012 (MIE 2012), Village of the Future.Google Scholar
  20. Hansen, D. L., Derry, H. A., Resnick, P. J., & Richardson, C. R. (2003). Adolescents searching for health information on the internet: An observational study. Journal of Medical Internet Research, 5(4), e25.  https://doi.org/10.2196/jmir.5.4.e25.CrossRefGoogle Scholar
  21. Harman, D., & Buckley, C. (2004). The NRRC reliable information access (RIA) workshop. In SIGIR 2004 (pp. 528–529). ACM.Google Scholar
  22. Hauff, C., de Jong, F., Kelly, D., & Azzopardi, L. (2010). Query quality: User ratings and system predictions. In SIGIR ’10 (pp. 743–744). New York, NY: ACM.Google Scholar
  23. He, B., & Ounis, I. (2006). Query performance prediction. Information Systems, 31(7), 585–594.CrossRefGoogle Scholar
  24. Hersh, W. R., Buckley, C., Leone, T.J., & Hickam, D. H. (1994). OHSUMED: An interactive retrieval evaluation and new large test collection for research. In Proceedings of SIGIR ’94, (pp. 192–201).CrossRefGoogle Scholar
  25. Kalpathy-Cramer, J., Müller, H., Bedrick, S., Eggel, I., de Herrera, A. G. S., & Tsikrika, T. (2011). The CLEF 2011 medical image retrieval and classification tasks. In Working notes of CLEF 2011 (Cross Language Evaluation Forum).Google Scholar
  26. Kelly, L., Goeuriot, L., Suominen, H., Névéol, A., Palotti, J., & Zuccon, G. (2016). Overview of the CLEF eHealth Evaluation Lab 2016. In Proceedings of CLEF 2016.Google Scholar
  27. Kelly, L., Goeuriot, L., Suominen, H., Schreck, T., Leroy, G., Mowery, D. L., et al. (2014). Overview of the ShARe/CLEF eHealth Evaluation Lab 2014. In Proceedings of CLEF 2014.Google Scholar
  28. Koopman, B., & Zuccon, G. (2014). Why assessing relevance in medical IR is demanding. In Proceedings of medical information retrieval (MedIR) workshop (SIGIR).Google Scholar
  29. Koopman, B., & Zuccon, G. (2016). A test collection for matching patients to clinical trials. In Proceedings of the 39th international ACM SIGIR conference on research and development in information retrieval, SIGIR ’16 (pp. 669–672). New York, NY: ACM.Google Scholar
  30. Koopman, B., Zuccon, G., Bruza, P., Sitbon, L., & Lawley, M. (2012). An evaluation of corpus-driven measures of medical concept similarity for information retrieval. In Proceedings of CIKM 2012.Google Scholar
  31. Koopman, B., Zuccon, G., Bruza, P., Sitbon, L., & Lawley, M. (2016). Information retrieval as semantic inference: A graph inference model applied to medical search. Information Retrieval Journal, 19(1–2), 6–37.CrossRefGoogle Scholar
  32. Ksentini, N., Tmar, M., & Gargouri, F. (2014). Miracl at CLEF 2014: eHealth information retrieval task. In Proceedings of the ShARe/CLEF eHealth Evaluation Lab.Google Scholar
  33. Leveling, J., Goeuriot, L., Kelly, L., & Jones, G. J. F. (2012). DCU@TRECMed 2012: Using ad-hoc baselines for domain-specific retrieval. In Proceedings of TREC 2012. NIST.Google Scholar
  34. Limsopatham, N., Macdonald, C., & Ounis, I. (2013a). Inferring conceptual relationships to improve medical records search. In Proceedings of the 10th conference on open research areas in information retrieval (pp. 1–8).Google Scholar
  35. Limsopatham, N., Macdonald, C., & Ounis, I. (2013b). University of Glasgow at CLEF 2013: Experiments in eHealth Task 3 with Terrier. In Proceedings of the ShARe/CLEF eHealth Evaluation Lab.Google Scholar
  36. Lin, J. (2005). Evaluation of resources for question answering evaluation. In Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR 2005) (pp. 392–399).Google Scholar
  37. Liu, X., Nie, J.-Y., & Sordoni, A. (2016). Constraining word embeddings by prior knowledge—Application to medical information retrieval. In Asia information retrieval symposium (pp. 155–167). Springer.Google Scholar
  38. Malagon, J. M. C., & López, M. M. (2014). Laberinto at ShARe/CLEF eHealth Evaluation Lab. In Proceedings of the ShARe/CLEF eHealth Evaluation Lab.Google Scholar
  39. Müller, H., Clough, P., Deselaers, T., & Caputo, B. (Eds.). (2016). ImageCLEF—Experimental evaluation in visual information retrieval (Vol. 32)., The information retrieval series Berlin: Springer.zbMATHGoogle Scholar
  40. Oh, H.-S., & Jung, Y. (2014). A multiple-stage approach to re-ranking clinical documents. In Proceedings of the ShARe/CLEF eHealth Evaluation Lab.Google Scholar
  41. Ozturkmenoglu, O., Alpkocak, A., & Kilinc, D. (2014). Demir at CLEF eHealth: The effects of selective query expansion to information retrieval. In Proceedings of the ShARe/CLEF eHealth Evaluation Lab.Google Scholar
  42. Palotti, J., Zuccon, G., Bernhardt, J., Hanbury, A., & Goeuriot, L. (2016). Assessors agreement: A case study across assessor type, payment levels, query variations and relevance dimensions. In International conference of the cross-language evaluation forum for European languages (pp. 40–53). Springer.Google Scholar
  43. Palotti, J., Zuccon, G., Goeuriot, L., Kelly, L., Hanbury, A., Jones, G. J., et al. (2015). CLEF eHealth evaluation lab 2015, task 2: Retrieving information about medical symptoms. In Proceedings of CLEF eHealth Evaluation Lab.Google Scholar
  44. Pecina, P., Dušek, O., Goeuriot, L., Haji, J., Hlaváčová, J., Jones, G. J. F., et al. (2014). Adaptation of machine translation for multilingual information retrieval in the medical domain. Journal of Artificial Intelligence in Medicine, Special Issue on Health Document Text Mining and Information, 61(3), 165–185.Google Scholar
  45. Roberts, K., Simpson, M. S., Voorhees, E., & Hersh, W. R. (2015). Overview of the TREC 2015 clinical decision support track. In Proceedings of TREC.Google Scholar
  46. Roberts, P. M., Cohen, A. M., & Hersh, W. R. (2009). Tasks, topics and relevance judging for the TREC Genomics Track: Five years of experience evaluating biomedical text information retrieval systems. Information Retrieval, 12, 81–97.CrossRefGoogle Scholar
  47. Robertson, S. E., & Jones, K. S. (1994). Simple, proven approaches to text retrieval. Technical report 356, University of Cambridge.Google Scholar
  48. Sakai, T., & Mitamura, T. (2010). Boiling down information retrieval test collections. In RIAO 2010 (pp. 49–56). CID.Google Scholar
  49. Saleh, S., & Pecina, P. (2014). Cuni at the ShARe/CLEF eHealth Evaluation Lab 2014. In Proceedings of the ShARe/CLEF eHealth Evaluation Lab.Google Scholar
  50. Salton, G., Wong, A., & Yang, C.-S. (1975). A vector space model for automatic indexing. Communications of the ACM, 18(11), 613–620.CrossRefGoogle Scholar
  51. Shen, W., & Nie, J.-Y. (2015). Is concept mapping useful for biomedical information retrieval? In International conference of the cross-language evaluation forum for European languages (pp. 281–286). Springer.Google Scholar
  52. Shen, W., Nie, J.-Y., Liu, X., & Liui, X. (2014). An investigation of the effectiveness of concept-based approach in medical information retrieval GRIUM@CLEF2014eHealthTask 3. In Proceedings of the ShARe/CLEF eHealth Evaluation Lab.Google Scholar
  53. Simpson, M. S., Voorhees, E. M., & Hersh, W. (2014). Overview of the TREC 2014 clinical decision support track. In Proceedings of TREC.Google Scholar
  54. Soboroff, I. (2009). A guide to the ria workshop data archive. Information Retrieval, 12(6), 642–651.CrossRefGoogle Scholar
  55. Soldaini, L., Cohan, A., Yates, A., Goharian, N., & Frieder, O. (2015). Retrieving medical literature for clinical decision support. In European conference on information retrieval (pp. 538–549). Springer.Google Scholar
  56. Suominen, H., Salanterä, S., Velupillai, S., Chapman, W. W., Savova, G., Elhadad, N., et al. (2013). ShARe/CLEF eHealth Evaluation Lab 2013: Three shared tasks on natural language processing and machine learning to make clinical reports easier to understand for patients. In CLEF 2013. Lecture notes in computer science (LNCS). Springer.Google Scholar
  57. Thakkar, H., Iyer, G., Shah, K., & Majumder, P. (2014). Team IRLabDAIICT at ShARe/CLEF eHealth 2014 Task 3: User-centered information retrieval system for clinical documents. In Proceedings of the ShARe/CLEF eHealth Evaluation Lab.Google Scholar
  58. Thesprasith, O., & Jaruskulchai, C. (2014). CSKU GPRF-QE for medical topic web retrieval. In Proceedings of the ShARe/CLEF eHealth Evaluation Lab.Google Scholar
  59. Tibi, O., Thuma, E., & Mosweunyane, G. (2017). Selective collection enrichment in user-centred health information retrieval. In 2017 1st international conference on next generation computing applications (NextComp) (pp. 175–181). IEEE.Google Scholar
  60. Urbano, J., Marrero, M., & Martín, D. (2013). On the measurement of test collection reliability. In SIGIR ’13 (pp. 393–402). ACM.Google Scholar
  61. Verberne, S. (2014). A language-modelling approach to user-centred health information retrieval. In Proceedings of the ShARe/CLEF eHealth Evaluation Lab.Google Scholar
  62. Voorhees, E. M., & Hersh, W. (2012). Overview of the TREC 2012 medical records track. In TREC 2012. NIST.Google Scholar
  63. Voorhees, E. M., & Tong, R. M. (2011). Overview of the TREC 2011 medical records track. In Proceedings of TREC. NIST.Google Scholar
  64. Voorhees, E. M. (2005). The TREC robust retrieval track. SIGIR Forum, 39(1), 11–20.CrossRefGoogle Scholar
  65. White, R., & Horvitz, E. (2008). Cyberchondria: Studies of the escalation of medical concerns in web search. Technical report, Microsoft Research.Google Scholar
  66. Wu, J., & Huang, J. (2014). York University at CLEF eHealth 2014: A learning-to-rank approach for medical document retrieval. In Proceedings of the ShARe/CLEF eHealth Evaluation Lab.Google Scholar
  67. Xia, Y., Xie, Z., Zhang, Q., Wang, H., & Zhao, H. (2014). Cannabis_TREATS_cancer: Incorporating fine-grained ontological relations in medical document ranking. In Natural language processing and Chinese computing (pp. 275–285). Springer.Google Scholar
  68. Yang, C., Bhattacharya, S., & Srinivasan, P. (2014). The University of Iowa at CLEF 2014: eHealth Task 3. In Proceedings of the ShARe/CLEF eHealth Evaluation Lab.Google Scholar
  69. Zhang, Y., Cohen, T., Jiang, M., Tang, B., & Xu, H. (2013). Evaluation of vector space models for medical disorders information retrieval. In Proceedings of the ShARe/CLEF eHealth Evaluation Lab.Google Scholar
  70. Zhong, X., Xia, Y., Xie, Z., Na, S., Hu, Q., & Huang, Y. (2013). Concept-based medical document retrieval: THCIB at CLEF eHealth lab 2013 task 3. In Proceedings of the ShARe/CLEF eHealth Evaluation Lab.Google Scholar
  71. Zhu, D., & Carterette, B. (2012). Improving health records search using multiple query expansion collections. In 2012 IEEE international conference on bioinformatics and biomedicine (BIBM) (pp. 1–7). IEEE.Google Scholar
  72. Zhu, D., Wu, S., James, M., Carterette, B., & Liu, H. (2013). Using discharge summaries to improve information retrieval in clinical domain. In Proceedings of the ShARe/CLEF eHealth Evaluation Lab.Google Scholar
  73. Zhu, D., Stephen, W., Carterette, B., & Liu, H. (2014). Using large clinical corpora for query expansion in text-based cohort identification. Journal of Biomedical Informatics, 49, 275–281.CrossRefGoogle Scholar
  74. Zuccon, G., & Koopman, B. (2018). Choices in knowledge-base retrieval for consumer health search. In ECIR’18.Google Scholar
  75. Zuccon, G., Koopman, B., & Nguyen, A. (2013). Retrieval of health advice on the web: AEHRC at ShARe/CLEF eHealth evaluation lab task 3. In Proceedings of the ShARe/CLEF eHealth Evaluation Lab.Google Scholar
  76. Zuccon, G., Koopman, B., & Palotti, J. (2015). Diagnose this if you can: On the effectiveness of search engines in finding medical self-diagnosis information. In Advances in information retrieval (pp. 562–567).Google Scholar
  77. Zuccon, G., Koopman, B., Bruza, P., & Azzopardi, L. (2015). Integrating and evaluating neural word embeddings in information retrieval. In Proceedings of the 20th Australasian document computing symposium (p. 12). ACM.Google Scholar
  78. Zuccon, G., Koopman, B., Nguyen, A., Vickers, D., & Butt, L. (2012). Exploiting medical hierarchies for concept-based information retrieval. In Proceedings of the seventeenth Australasian document computing symposium (pp. 111–114). ACM.Google Scholar
  79. Zuccon, G., Palotti, J., Goeuriot, L., Kelly, L., Lupu, M., Pecina, P., et al. (2016). The IR task at the CLEF eHealth Evaluation Lab 2016: User-centred health information retrieval. In Proceedings of CLEF.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  • Lorraine Goeuriot
    • 1
  • Gareth J. F. Jones
    • 2
  • Liadh Kelly
    • 3
  • Johannes Leveling
    • 2
  • Mihai Lupu
    • 4
  • Joao Palotti
    • 4
  • Guido Zuccon
    • 5
  1. 1.LIGUniversité Grenoble AlpesGrenobleFrance
  2. 2.Dublin City UniversityDublinIreland
  3. 3.Maynooth UniversityMaynoothIreland
  4. 4.TU WienViennaAustria
  5. 5.Queensland University of TechnologyBrisbaneAustralia

Personalised recommendations