Are Test Collections “Real”? Mirroring Real-World Complexity in IR Test Collections

  • Melanie Imhof
  • Martin Braschler
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9283)


Objective evaluation of effectiveness is a major topic in the field of information retrieval (IR), as emphasized by the numerous evaluation campaigns in this area. The increasing pervasiveness of information has lead to a large variety of IR application scenarios that involve different information types (modalities), heterogeneous documents and context-enriched queries. In this paper, we argue that even though the complexity of academic test collections has increased over the years, they are still too structurally simple in comparison to operational collections in real-world applications. Furthermore, research has brought up retrieval methods for very specific modalities, such as ratings, geographical coordinates and timestamps. However, it is still unclear how to systematically incorporate new modalities in IR systems. We therefore propose a categorization of modalities that not only allows analyzing the complexity of a collection but also helps to generalize methods to entire modality categories instead of being specific for a single modality. Moreover, we discuss how such a complex collection can methodically be built for the usage in an evaluation campaign.


Collection complexity Modality categorization Evaluation campaigns 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Agrawal, R., Ailamaki, A., Bernstein, P.A., Brewer, E.A., Carey, M.J., Chaudhuri, S., Doan, A., Florescu, D., Franklin, M.J., Garcia-Molina, H., et al.: The claremont report on database research. Communications of the ACM 52(6), 56–65 (2009)CrossRefGoogle Scholar
  2. 2.
    Balog, K., Kelly, L., Schuth, A.: Head first: living labs for ad-hoc search evaluation. In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, pp. 1815–1818. ACM (2014)Google Scholar
  3. 3.
    Byström, K., Järvelin, K.: Task complexity affects information seeking and use. Information Processing & Management 31(2), 191–213 (1995)CrossRefGoogle Scholar
  4. 4.
    Jones, K.S.: Readings in information retrieval. Morgan Kaufmann (1997)Google Scholar
  5. 5.
    Kekäläinen, J., Järvelin, K.: Evaluating information retrieval systems under the challenges of interaction and multidimensional dynamic relevance. In: Proceedings of the 4th CoLIS Conference, pp. 253–270 (2002)Google Scholar
  6. 6.
    Koolen, M., Kazai, G., Kamps, J., Preminger, M., Doucet, A., Landoni, M.: Overview of the inex 2012 social book search track, p. 77 (2012)Google Scholar
  7. 7.
    Saastamoinen, M., Kumpulainen, S., Järvelin, K.: Task complexity and information searching in administrative tasks revisited. In: Proceedings of the 4th Information Interaction in Context Symposium, pp. 204–213. ACM (2012)Google Scholar
  8. 8.
    Voorhees, E.M.: The philosophy of information retrieval evaluation. In: Peters, C., Braschler, M., Gonzalo, J., Kluck, M. (eds.) CLEF 2001. LNCS, vol. 2406, pp. 355–370. Springer, Heidelberg (2002) CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Universitéde NeuchâtelNeuchâtelSwitzerland
  2. 2.Zurich University of Applied SciencesWinterthurSwitzerland

Personalised recommendations