Information Retrieval

, Volume 11, Issue 3, pp 209–228

Evaluating the effectiveness of relevance feedback based on a user simulation model: effects of a user scenario on cumulated gain value

  • Heikki Keskustalo
  • Kalervo Järvelin
  • Ari Pirkola


We propose a method for performing evaluation of relevance feedback based on simulating real users. The user simulation applies a model defining the user’s relevance threshold to accept individual documents as feedback in a graded relevance environment; user’s patience to browse the initial list of retrieved documents; and his/her effort in providing the feedback. We evaluate the result by using cumulated gain-based evaluation together with freezing all documents seen by the user in order to simulate the point of view of a user who is browsing the documents during the retrieval process. We demonstrate the method by performing a simulation in the laboratory setting and present the “branching” curve sets characteristic for the presented evaluation method. Both the average and topic-by-topic results indicate that if the freezing approach is adopted, giving feedback of mixed quality makes sense for various usage scenarios even though the modeled users prefer finding especially the most relevant documents.


Evaluation Relevance feedback Simulation User modeling 


  1. Aalbersberg, I. J. (1992). Incremental relevance feedback. In N. J. Belkin, P. Ingwersen, & A. Mark Pejtersen (Eds.), Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 11–22). Copenhagen, Denmark.Google Scholar
  2. Belkin, N. J., Cool, C., Koenemann, J., Ng, K. B., & Park, S. (1995) Using relevance feedback and ranking in interactive searching. TREC 1995. Accessed 14 Aug 2007.
  3. Blair, D. C. (1984). The data-document distinction in information retrieval. Communications of the ACM, 4, 27, 369–374.Google Scholar
  4. Billerbeck, B. (2005). Efficient query expansion. Doctoral thesis. School of Computer Science and Information Technology, Portfolio of Science, Engineering and Technology, RMIT University. Melbourne, Victoria, Australia, 2005. Accessed 14 Aug 2007.
  5. Broglio, J., Callan, J. P., & Croft, W. B. (1994). INQUERY system overview. In Proceedings of the TIPSTER text program (Phase I) (pp. 47–67).Google Scholar
  6. Chang, Y. K., Cirillo, C., & Razon, J. (1971). Evaluation of feedback retrieval using modified freezing, residual collection, and test and control groups. In G. Salton (Ed.), The SMART retrieval system: Experiments in automatic document processing (pp. 355–370). London: Prentice-Hall.Google Scholar
  7. Conover, W. J. (1999). Practical nonparametric statistics (3rd ed., p. 584). New York: Wiley.Google Scholar
  8. Efthimiadis, E. N. (1996). Query expansion. In M. E. Williams (Ed.), Annual Review of Information Science and Technology, vol. 31 (ARIST 31) (pp. 121–187). Medford, NJ: Learned Information for the American Society for Information Science. Accessed 14 Aug 2007.
  9. Jordan, C., Watters, C., & Gao, Q. (2006). Using controlled query generation to evaluate blind relevance feedback algorithms. In ACM/IEEE Joint Conference on Digital Libraries (JCDL’06) (pp. 286–295). Accessed 19 Dec 2007.
  10. Järvelin K., & Kekäläinen J. (2000). IR evaluation methods for retrieving highly relevant documents. In N. J. Belkin, P. Ingwersen, & M.-K. Leong (Eds.), Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 41–48). Athens, Greece.Google Scholar
  11. Kando, N. (2000). What shall we evaluate?—Preliminary discussion for the NTCIR patent IR challenge (PIC) based on the brainstorming with the specialized intermediaries in patent searching and patent attorneys. Proceedings of the ACM SIGIR 2000 Workshop on Patent Retrieval. Athens, Greece, July 28, 2000. Accessed 19 Dec 2007.
  12. Kekäläinen, J. (1999). The effects of query compexity, expansion and structure on retrieval performance in probalistic text retrieval. Doctoral thesis. Tampere, Finland: University of Tampere, Department of Information Studies. Acta Universitatis Tamperensis 678. p. 170.Google Scholar
  13. Kekäläinen, J., & Järvelin, K. (2002). Using graded relevance assessments in IR evaluation. Journal of the American Society for Information Science and Technology, 53(13), 1120–1129.CrossRefGoogle Scholar
  14. Keskustalo, H., Järvelin K, & Pirkola, A. (2006). The effects of relevance feedback quality and quantity in interactive relevance feedback: A simulation based on user modeling. In M. Lalmas, A. MacFarlane, S. Rüger, A. Tombros, T. Tsikrika, & A. Yavlinsky (Eds.), Proceedings of the 28th European Conference on IR Research (ECIR) (pp. 191–204), London, UK.Google Scholar
  15. Marchionini, G., Dwiggins, S., Katz, A., & Lin, X. (1993). Information seeking in full-text end-user-oriented search systems: The roles of domain and search expertise. Library & Information Science Research, 15(1), 35–70.Google Scholar
  16. Pirkola, A., Leppänen E, & Järvelin K. (2002). The RATF formula (Kwok’s Formula): Exploiting average term frequency in cross-language retrieval. Information Research, 7(2). Accessed 15 Aug 2007.
  17. Price, S. L., Nielsen, M. L., Delcambre, L. M. L., & Vedsted P. (2007). Semantic components enhance retrieval of domain-specific documents. Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management (pp. 429–438), Lisbon, Portugal.Google Scholar
  18. Rocchio, J. J. Jr. (1971). Relevance feedback in information retrieval. In G. Salton (Ed.), The SMART retrieval system: Experiments in automatic document processing (pp. 313–323). Prentice-Hall: London.Google Scholar
  19. Ruthven, I., & Lalmas, M. (2003). A survey on the use of relevance feedback for information access systems. Knowledge Engineering Review, 18(2), 95–145.CrossRefGoogle Scholar
  20. Salton, G. (1989). Automatic text processing: The transformation, analysis and retrieval of information by computer (p. 530). Reading, MA: Addison-Wesley.Google Scholar
  21. Sormunen, E. (2002). Liberal relevance criteria of TREC—counting on negligible documents? In M. Beaulieu, R. Baeza-Yates, S. H. Myaeng, & K. Järvelin (Eds.), Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 320–330). Tampere, Finland.Google Scholar
  22. Sormunen, E., Kekäläinen, J., Koivisto, J., & Järvelin, K. (2001). Document text characteristics affect ranking of the most relevant documents by expanded structured queries. Journal of Documentation, 57(3), 358–374.CrossRefGoogle Scholar
  23. Spink, A., & Saracevic, T. (1998). Interaction in information retrieval: Selection and effectiveness of search terms. Journal of the American Society for Information Science, 48(8), 741–761.CrossRefGoogle Scholar
  24. Vakkari, P., & Sormunen, E. (2004). The influence of relevance levels on the effectiveness of interactive information retrieval. Journal for the American Society for Information Science and Technology, 55(11), 963–969.CrossRefGoogle Scholar
  25. Voorhees, E. M. (2001). Evaluation by highly relevant documents. In W. B. Croft, D. J. Harper, D. H. Kraft, & J. Zobel (Eds.), Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 74–82). New Orleans, Louisiana, USA.Google Scholar
  26. White, R. W., Jose, J. M., van Rijsbergen, C. J., & Ruthven, I. (2004). A simulated study of implicit feedback models. In S. McDonald & J. Tait (Eds.), Proceedings of the 26th European Conference on IR Research (ECIR) (pp. 311–326). Sunderland, UK.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  • Heikki Keskustalo
    • 1
  • Kalervo Järvelin
    • 1
  • Ari Pirkola
    • 1
  1. 1.Department of Information StudiesUniversity of TampereTampereFinland

Personalised recommendations