Advertisement

From Multilingual to Multimodal: The Evolution of CLEF over Two Decades

  • Nicola FerroEmail author
  • Carol Peters
Chapter
Part of the The Information Retrieval Series book series (INRE, volume 41)

Abstract

This introductory chapter begins by explaining briefly what is intended by experimental evaluation in information retrieval in order to provide the necessary background for the rest of this volume. The major international evaluation initiatives that have adopted and implemented in various ways this common framework are then presented and their relationship to CLEF indicated. The second part of the chapter details how the experimental evaluation paradigm has been implemented in CLEF by providing a brief overview of the main activities and results obtained over the last two decades. The aim has been to build a strong multidisciplinary research community and to create a sustainable technical framework that would not simply support but would also empower both research and development and evaluation activities, while meeting and at times anticipating the demands of a rapidly evolving information society.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Notes

Acknowledgements

CLEF 2000 and 2001 were supported by the European Commission under the Information Society Technologies programme and within the framework of the DELOS Network of Excellence for Digital Libraries (contract no. IST-1999-12262).

CLEF 2002 and 2003 were funded as an independent project (contract no. IST-2000-31002) under the 5th Framework Programme of the European Commission.

CLEF 2004–2007 were sponsored by the DELOS Network of Excellence for Digital Libraries (contract no. G038-507618) under the 6th Framework Programme of the European Commission.

Under the 7th Framework Programme of the European Commission, CLEF 2008 and 2009 were supported by TrebleCLEF Coordination Action (contract no. 215231) and CLEF 2010 to 2013 were funded by the PROMISE Network of Excellence (contract no. 258191).

CLEF 2011–2015 also received support from the ELIAS network (contract no. 09-RNP-085) of the European Science Foundation (ESF) for ensuring student travel grants and invites speakers.

CLEF 2015, 2017, and 2018 received ACM SIGIR support for student travel grants through the SIGIR Friends program.

Over the years CLEF has also attracted industrial sponsorship: from 2010 onwards, CLEF has received the support of Google, Microsoft, Yandex, Xerox, Celi as well as publishers in the field such as Springer and Now Publishers.

In addition to the support gratefully acknowledged above, CLEF tracks and labs have frequently received the assistance of other projects and organisations; unfortunately, it is impossible to list them all here.

It must be noted that, above all, CLEF would not be possible without the volunteer efforts, enthusiasm, and passion of its community: lab organizers, lab participants, and attendees are the core and the real success of CLEF.

References

  1. Amigó E, Corujo A, Gonzalo J, Meij E, de Rijke M (2012) Overview of RepLab 2012: evaluating online reputation management systems. In: Forner P, Karlgren J, Womser-Hacker C, Ferro N (eds) CLEF 2012 working notes, CEUR workshop proceedings (CEUR-WS.org), ISSN 1613-0073. http://ceur-ws.org/Vol-1178/
  2. Amigó E, Gonzalo J, Verdejo MF (2013) A general evaluation measure for document organization tasks. In: Jones GJF, Sheridan P, Kelly D, de Rijke M, Sakai T (eds) Proc. 36th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR 2013). ACM Press, New York, pp 643–652Google Scholar
  3. Angelini M, Ferro N, Larsen B, Müller H, Santucci G, Silvello G, Tsikrika T (2014) Measuring and analyzing the scholarly impact of experimental evaluation initiatives. In: Agosti M, Catarci T, Esposito F (eds) Proc. 10th Italian research conference on digital libraries (IRCDL 2014). Procedia computer science, vol. 38, pp 133–137CrossRefGoogle Scholar
  4. Angelini M, Fazzini V, Ferro N, Santucci G, Silvello G (2018) CLAIRE: a combinatorial visual analytics system for information retrieval evaluation. Inf Process Manag 54(6):1077–1100CrossRefGoogle Scholar
  5. Bollmann P (1984) Two axioms for evaluation measures in information retrieval. In: van Rijsbergen CJ (ed) Proc. of the third joint BCS and ACM symposium on research and development in information retrieval. Cambridge University Press, Cambridge, pp 233–245Google Scholar
  6. Braschler M (2004) Combination approaches for multilingual text retrieval. Inf Retr 7(1/2):183–204CrossRefGoogle Scholar
  7. Buckley C, Voorhees EM (2004) Retrieval evaluation with incomplete information. In: Sanderson M, Järvelin K, Allan J, Bruza P (eds) Proc. 27th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR 2004). ACM Press, New York, pp 25–32Google Scholar
  8. Buckley C, Voorhees EM (2005) Retrieval system evaluation. In: Harman DK, Voorhees EM (eds) TREC. Experiment and evaluation in information retrieval. MIT Press, Cambridge, pp 53–78Google Scholar
  9. Busin L, Mizzaro S (2013) Axiometrics: an axiomatic approach to information retrieval effectiveness metrics. In: Kurland O, Metzler D, Lioma C, Larsen B, Ingwersen P (eds) Proc. 4th international conference on the theory of information retrieval (ICTIR 2013). ACM Press, New York, pp 22–29Google Scholar
  10. Büttcher S, Clarke CLA, Yeung PCK, Soboroff I (2007) Reliable information retrieval evaluation with incomplete and biased judgements. In: Kraaij W, de Vries AP, Clarke CLA, Fuhr N, Kando N (eds) Proc. 30th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR 2007). ACM Press, New York, pp 63–70Google Scholar
  11. Carterette BA (2012) Multiple testing in statistical analysis of systems-based information retrieval experiments. ACM Trans Inf Syst 30(1):4:1–4:34CrossRefGoogle Scholar
  12. Chapelle O, Metzler D, Zhang Y, Grinspan P (2009) Expected reciprocal rank for graded relevance. In: Cheung DWL, Song IY, Chu WW, Hu X, Lin JJ (eds) Proc. 18th international conference on information and knowledge management (CIKM 2009). ACM Press, New York, pp 621–630Google Scholar
  13. Cleverdon CW (1967) The cranfield tests on index languages devices. ASLIB Proc 19(6):173–194CrossRefGoogle Scholar
  14. Di Nunzio GM, Leveling J, Mandl T (2011) LogCLEF 2011 multilingual log file analysis: language identification, query classification, and success of a query. In: Petras V, Forner P, Clough P, Ferro N (eds) CLEF 2011 working notes. CEUR workshop proceedings (CEUR-WS.org), ISSN 1613-0073. http://ceur-ws.org/Vol-1177/
  15. Ferrante M, Ferro N, Maistro M (2015) Towards a formal framework for utility-oriented measurements of retrieval effectiveness. In: Allan J, Croft WB, de Vries AP, Zhai C, Fuhr N, Zhang Y (eds) Proc. 1st ACM SIGIR international conference on the theory of information retrieval (ICTIR 2015). ACM Press, New York, pp 21–30Google Scholar
  16. Ferrante M, Ferro N, Pontarollo S (2017) Are IR evaluation measures on an interval scale? In: Kamps J, Kanoulas E, de Rijke M, Fang H, Yilmaz E (eds) Proc. 3rd ACM SIGIR international conference on the theory of information retrieval (ICTIR 2017). ACM Press, New York, pp 67–74Google Scholar
  17. Ferrante M, Ferro N, Pontarollo S (2019) A general theory of IR evaluation measures. IEEE Trans Knowl Data Eng 31(3):409–422CrossRefGoogle Scholar
  18. Ferro N, Harman D (2010) CLEF 2009: Grid@CLEF pilot track overview. In: Peters C, Di Nunzio GM, Kurimo M, Mandl T, Mostefa D, Peñas A, Roda G (eds) Multilingual information access evaluation, vol. I text retrieval experiments – tenth workshop of the cross–language evaluation forum (CLEF 2009). Revised selected papers. Lecture notes in computer science (LNCS), vol 6241. Springer, Heidelberg, pp 552–565Google Scholar
  19. Ferro N, Silvello G (2016) A general linear mixed models approach to study system component effects. In: Perego R, Sebastiani F, Aslam J, Ruthven I, Zobel J (eds) Proc. 39th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR 2016). ACM Press, New York, pp 25–34Google Scholar
  20. Ferro N, Silvello G (2017) Towards an anatomy of IR system component performances. J Am Soc Inf Sci Technol 69(2):187–200CrossRefGoogle Scholar
  21. Ferro N, Maistro M, Sakai T, Soboroff I (2018) Overview of CENTRE@CLEF 2018: a first tale in the systematic reproducibility realm. In: Bellot P, Trabelsi C, Mothe J, Murtagh F, Nie JY, Soulier L, SanJuan E, Cappellato L, Ferro N (eds) Experimental IR meets multilinguality, multimodality, and interaction. Proceedings of the ninth international conference of the CLEF association (CLEF 2018). Lecture notes in computer science (LNCS), vol 11,018. Springer, Heidelberg, pp 239–246Google Scholar
  22. Fuhr N (2012) Salton award lecture: information retrieval as engineering science. SIGIR Forum 46(2):19–28CrossRefGoogle Scholar
  23. Hanbury A, Müller H (2010) Automated component-level evaluation: present and future. In: Agosti M, Ferro N, Peters C, de Rijke M, Smeaton A (eds) Multilingual and multimodal information access evaluation. Proceedings of the international conference of the cross-language evaluation forum (CLEF 2010). Lecture notes in computer science (LNCS), vol 6360. Springer, Heidelberg, pp 124–135CrossRefGoogle Scholar
  24. Harman DK (2011) Information retrieval evaluation. Morgan & Claypool Publishers, San RafaelCrossRefGoogle Scholar
  25. Harman DK, Voorhees EM (eds) (2005) TREC. Experiment and evaluation in information retrieval. MIT Press, CambridgezbMATHGoogle Scholar
  26. Harman DK, Braschler M, Hess M, Kluck M, Peters C, Schäuble P, Sheridan P (2001) CLIR evaluation at TREC. In: Peters C (ed) Cross-language information retrieval and evaluation: workshop of cross-language evaluation forum (CLEF 2000). Lecture notes in computer science (LNCS), vol 2069. Springer, Heidelberg, pp 7–23CrossRefGoogle Scholar
  27. Hopfgartner F, Hanbury A, Müller H, Eggel I, Balog K, Brodt T, Cormack GV, Lin J, Kalpathy-Cramer J, Kando N, Kato MP, Krithara A, Gollub T, Potthast M, Viegas E, Mercer S (2018) Evaluation-as-a-service for the computational sciences: overview and outlook. ACM J Data Inf Qual 10(4):15:1–15:32CrossRefGoogle Scholar
  28. Hull DA (1993) Using statistical testing in the evaluation of retrieval experiments. In: Korfhage R, Rasmussen E, Willett P (eds) Proc. 16th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR 1993). ACM Press, New York, pp 329–338Google Scholar
  29. Järvelin K, Kekäläinen J (2002) Cumulated gain-based evaluation of IR techniques. ACM Trans Inf Syst 20(4):422–446CrossRefGoogle Scholar
  30. Kanoulas E, Azzopardi L (2017) CLEF 2017 dynamic search evaluation lab overview. In: Jones GJF, Lawless S, Gonzalo J, Kelly L, Goeuriot L, Mandl T, Cappellato L, Ferro N (eds) Experimental IR meets multilinguality, multimodality, and interaction. Proceedings of the eighth international conference of the CLEF association (CLEF 2017). Lecture notes in computer science (LNCS), vol 10,456. Springer, Heidelberg, pp 361–366Google Scholar
  31. Kekäläinen J, Järvelin K (2002) Using graded relevance assessments in IR evaluation. J Am Soc Inf Sci Technol 53(13):1120–1129CrossRefGoogle Scholar
  32. Kelly D (2009) Methods for evaluating interactive information retrieval systems with users. Found Trends Inf Retr 3(1–2):1–224Google Scholar
  33. Lommatzsch A, Kille B, Hopfgartner F, Larson M, Brodt T, Seiler J, Özgöbek Ö (2017) CLEF 2017 NewsREEL overview: a stream-based recommender task for evaluation and education. In: Jones GJF, Lawless S, Gonzalo J, Kelly L, Goeuriot L, Mandl T, Cappellato L, Ferro N (eds) Experimental IR meets multilinguality, multimodality, and interaction. Proceedings of the eighth international conference of the CLEF association (CLEF 2017). Lecture notes in computer science (LNCS), vol 10,456. Springer, Heidelberg, pp 239–254Google Scholar
  34. McNamee P, Mayfield J (2004) Character N-gram tokenization for European language text retrieval. Inf Retr 7(1–2):73–97CrossRefGoogle Scholar
  35. Mizzaro S (1997) Relevance: the whole history. J Am Soc Inf Sci Technol 48(9):810–832CrossRefGoogle Scholar
  36. Moffat A, Zobel J (2008) Rank-biased precision for measurement of retrieval effectiveness. ACM Trans Inf Syst 27(1):2:1–2:27CrossRefGoogle Scholar
  37. Nardi A, Peters C, Vicedo JL, Ferro N (eds) (2006) CLEF 2006 working notes, CEUR workshop proceedings (CEUR-WS.org), ISSN 1613-0073. http://ceur-ws.org/Vol-1172/
  38. Nardi A, Peters C, Ferro N (eds) (2007) CLEF 2007 working notes, CEUR workshop proceedings (CEUR-WS.org), ISSN 1613-0073. http://ceur-ws.org/Vol-1173/
  39. Pasi G, Jones GJF, Marrara S, Sanvitto C, Ganguly D, Sen P (2017) Overview of the CLEF 2017 personalised information retrieval pilot lab (PIR-CLEF 2017). In: Jones GJF, Lawless S, Gonzalo J, Kelly L, Goeuriot L, Mandl T, Cappellato L, Ferro N (eds) Experimental IR meets multilinguality, multimodality, and interaction. Proceedings of the eighth international conference of the CLEF association (CLEF 2017). Lecture notes in computer science (LNCS), vol 10,456. Springer, Heidelberg, pp 338–345Google Scholar
  40. Peters C (ed) (2001) Cross-language information retrieval and evaluation: workshop of cross-language evaluation forum (CLEF 2000). Lecture notes in computer science (LNCS), vol 2069. Springer, HeidelbergGoogle Scholar
  41. Rowe BR, Wood DW, Link AL, Simoni DA (2010) Economic impact assessment of NIST’s text retrieval conference (TREC) program. RTI Project Number 0211875, RTI International. http://trec.nist.gov/pubs/2010.economic.impact.pdf
  42. Sakai T (2006) Evaluating evaluation metrics based on the bootstrap. In: Efthimiadis EN, Dumais S, Hawking D, Järvelin K (eds) Proc. 29th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR 2006). ACM Press, New York, pp 525–532Google Scholar
  43. Sakai T (2012) Evaluation with informational and navigational intents. In: Mille A, Gandon FL, Misselis J, Rabinovich M, Staab S (eds) Proc. 21st international conference on world wide web (WWW 2012). ACM Press, New York, pp 499–508Google Scholar
  44. Sakai T (2014a) Metrics, statistics, tests. In: Ferro N (ed) Bridging between information retrieval and databases - PROMISE winter school 2013, revised tutorial lectures. Lecture notes in computer science (LNCS), vol 8173. Springer, Heidelberg, pp 116–163Google Scholar
  45. Sakai T (2014b) Statistical reform in information retrieval? SIGIR Forum 48(1):3–12CrossRefGoogle Scholar
  46. Sanderson M (2010) Test collection based evaluation of information retrieval systems. Found Trends Inf Retr 4(4):247–375CrossRefGoogle Scholar
  47. Saracevic T (1975) RELEVANCE: a review of and a framework for the thinking on the notion in information science. J Am Soc Inf Sci Technol 26(6):321–343CrossRefGoogle Scholar
  48. Savoy J (1997) Statistical inference in retrieval effectiveness evaluation. Inf Process Manag 33(44):495–512CrossRefGoogle Scholar
  49. Spärck Jones K (ed) (1981) Information retrieval experiment. Butterworths, LondonzbMATHGoogle Scholar
  50. Thornley CV, Johnson AC, Smeaton AF, Lee H (2011) The scholarly impact of TRECVid (2003–2009). J Am Soc Inf Sci Technol 62(4):613–627CrossRefGoogle Scholar
  51. Tsikrika T, Garcia Seco de Herrera A, Müller H (2011) Assessing the scholarly impact of image CLEF. In: Forner P, Gonzalo J, Kekäläinen J, Lalmas M, de Rijke M (eds) Multilingual and multimodal information access evaluation. Proceedings of the second international conference of the cross-language evaluation forum (CLEF 2011). Lecture notes in computer science (LNCS), vol 6941. Springer, Heidelberg, pp 95–106CrossRefGoogle Scholar
  52. Tsikrika T, Larsen B, Müller H, Endrullis S, Rahm E (2013) The scholarly impact of CLEF (2000–2009). In: Forner P, Müller H, Paredes R, Rosso P, Stein B (eds) Information access evaluation meets multilinguality, multimodality, and visualization. Proceedings of the fourth international conference of the CLEF initiative (CLEF 2013). Lecture notes in computer science (LNCS), vol 8138. Springer, Heidelberg, pp 1–12CrossRefGoogle Scholar
  53. van Rijsbergen CJ (1974) Foundations of evaluation. J Doc 30(4):365–373CrossRefGoogle Scholar
  54. Voorhees EM, Harman DK (1998) Overview of the seventh text retrieval conference (TREC-7). In: Voorhees EM, Harman DK (eds) The seventh text retrieval conference (TREC-7). National Institute of Standards and Technology (NIST), Special Publication 500-242, Washington, pp 1–24Google Scholar
  55. Yilmaz E, Aslam JA (2006) Estimating average precision with incomplete and imperfect judgments. In: Yu PS, Tsotras V, Fox EA, Liu CB (eds) Proc. 15th international conference on information and knowledge management (CIKM 2006). ACM Press, New York, pp 102–111Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Department of Information EngineeringUniversity of PaduaPadovaItaly
  2. 2.Istituto di Scienza e Tecnologie dell’Informazione “A. Faedo” (ISTI)National Research Council (CNR)PisaItaly

Personalised recommendations