Building a Common Framework for IIR Evaluation

  • Mark Michael Hall
  • Elaine Toms
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8138)


Cranfield-style evaluations standardised Information Retrieval (IR) evaluation practices, enabling the creation of programmes such as TREC, CLEF, and INEX, and long-term comparability of IR systems. However, the methodology does not translate well into the Interactive IR (IIR) domain, where the inclusion of the user into the search process and the repeated interaction between user and system creates more variability than the Cranfield-style evaluations can support. As a result, IIR evaluations of various systems have tended to be non-comparable, not because the systems vary, but because the methodologies used are non-comparable. In this paper we describe a standardised IIR evaluation framework, that ensures that IIR evaluations can share a standardised baseline methodology in much the same way that TREC, CLEF, and INEX imposed a process on IR evaluation. The framework provides a common baseline, derived by integrating existing, validated evaluation measures, that enables inter-study comparison, but is also flexible enough to support most kinds of IIR studies. This is achieved through the use of a “pluggable” system, into which any web-based IIR interface can be embedded. The framework has been implemented and the software will be made available to reduce the resource commitment required for IIR studies.


evaluation methodology interactive information retrieval 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Trec 2002 interactive track guidelines. Technical report (2002)Google Scholar
  2. 2.
    Bierig, R., Gwizdka, J., Cole, M.: A user-centered experiment and logging framework for interactive information retrieval. In: Proceedings of the SIGIR 2009 Workshop on Understanding the User: Logging and Interpreting User Interactions in Information Search and Retrieval, pp. 8–11 (2009)Google Scholar
  3. 3.
    Cacioppo, J.T., Petty, R.E., Kao, C.F.: The efficient assessment of need for cognition. Journal of Personality Assessment 48(3), 306–307 (1984)CrossRefGoogle Scholar
  4. 4.
    Gwizdka, J.: Distribution of cognitive load in web search. Journal of the American Society for Information Science and Technology 61(11), 2167–2187 (2010)CrossRefGoogle Scholar
  5. 5.
    Hall, M., Clough, P., Stevenson, M.: Evaluating the use of clustering for automatically organising digital library collections. In: Zaphiris, P., Buchanan, G., Rasmussen, E., Loizides, F. (eds.) TPDL 2012. LNCS, vol. 7489, pp. 323–334. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  6. 6.
    Hersh, W.: Trec 2002 interactive track report. In: Proc. TREC (2002)Google Scholar
  7. 7.
    Ingwersen, P., Järvelin, K.: The turn: Integration of information seeking and retrieval in context, vol. 18. Springer (2005)Google Scholar
  8. 8.
    Kanoulas, E., Hall, M., Clough, P., Carterette, B.: Overview of the trec 2013 session track. In: Proceedings of the Twentieth Text REtrieval Conference (TREC 2013) (2013)Google Scholar
  9. 9.
    Kashdan, T.B., Gallagher, M.W., Silvia, P.J., Winterstein, B.P., Breen, W.E., Terhar, D., Steger, M.F.: The curiosity and exploration inventory-ii: Development, factor structure, and psychometrics. Journal of Research in Personality 43(6), 987–998 (2009)CrossRefGoogle Scholar
  10. 10.
    Kelly, D.: Measuring online information seeking context, part 1: background and method. Journal of the American Society for Information Science and Technology (14), 1862–1874 (2006)Google Scholar
  11. 11.
    Kelly, D.: Methods for evaluating interactive information retrieval systems with users. Foundations and Trends in Information Retrieval 3(1), 1–224 (2009)Google Scholar
  12. 12.
    Kelly, D., Gyllstrom, K., Bailey, E.W.: A comparison of query and term suggestion features for interactive searching. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 371–378. ACM (2009)Google Scholar
  13. 13.
    Kelly, D., Sugimoto, C.: A systematic review of interactive information retrieval evaluation studies, 1967-2006. JASIST 64(4), 745–770 (2013)CrossRefGoogle Scholar
  14. 14.
    Lee, K., Ashton, M.: The hexaco personality inventory: A new measure of the major dimensions of personality. Multivariate Behavioral Research 39, 329–358 (2004)CrossRefGoogle Scholar
  15. 15.
    O’Brien, H.L., Toms, E.G.: The development and evaluation of a survey to measure user engagement. Journal of the American Society for Information Science and Technology 61(1), 50–69 (2009)CrossRefGoogle Scholar
  16. 16.
    Petras, V., Hall, M., Savoy, J., Bogers, T., Malak, P., Toms, E., Pawlowski, A.: Cultural heritage in clef (chic) (2013)Google Scholar
  17. 17.
    Reips, U.-D.: Standards for internet-based experimenting. Experimental Psychology (formerly Zeitschrift für Experimentelle Psychologie) 49(4), 243–256 (2002)Google Scholar
  18. 18.
    Reips, U.-D., Lengler, R.: Theweb experiment list: A web service for the recruitment of participants and archiving of internet-based experiments. Behavior Research Methods 37(2), 287–292 (2005)CrossRefGoogle Scholar
  19. 19.
    Renaud, G., Azzopardi, L.: Scamp: a tool for conducting interactive information retrieval experiments. In: Proceedings of the 4th Information Interaction in Context Symposium, pp. 286–289. ACM (2012)Google Scholar
  20. 20.
    Riding, R.J., Rayner, S.: Cognitive styles and learning strategies: Understanding style differences in learning and behaviour. D. Fulton Publishers (1998)Google Scholar
  21. 21.
    Tague-Sutcliffe, J.: The pragmatics of information retrieval experimentation, revisited. Information Processing & Management 28(4), 467–490 (1992)CrossRefGoogle Scholar
  22. 22.
    Toms, E.: Task-based information searching and retrieval, pp. 43–59. Facet Publishing (2011)Google Scholar
  23. 23.
    Toms, E.G., Freund, L., Li, C.: Wiire: the web interactive information retrieval experimentation system prototype. Information Processing & Management 40(4), 655–675 (2004)CrossRefzbMATHGoogle Scholar
  24. 24.
    Toms, E.G., Freund, L., Li, C.: Wiire: the web interactive information retrieval experimentation system prototype. Information Processing & Management 40(4), 655–675 (2004)CrossRefzbMATHGoogle Scholar
  25. 25.
    Toms, E.G., O’Brien, H., Mackenzie, T., Jordan, C., Freund, L., Toze, S., Dawe, E., MacNutt, A.: Task effects on interactive search: The query factor. In: Fuhr, N., Kamps, J., Lalmas, M., Trotman, A. (eds.) INEX 2007. LNCS, vol. 4862, pp. 359–372. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  26. 26.
    Toms, E.G., Villa, R., McCay-Peet, L.: How is a search system used in work task completion? Journal of Information Science 39(1), 15–25 (2013)CrossRefGoogle Scholar
  27. 27.
    Yuan, W., Meadow, C.T.: A study of the use of variables in information retrieval user studies. Journal of the American Society for Information Science 50(2), 140–150 (1999)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Mark Michael Hall
    • 1
  • Elaine Toms
    • 1
  1. 1.Information SchoolUniversity of SheffieldSheffieldUK

Personalised recommendations