Validating Query Simulators: An Experiment Using Commercial Searches and Purchases

  • Bouke Huurnink
  • Katja Hofmann
  • Maarten de Rijke
  • Marc Bron
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6360)

Abstract

We design and validate simulators for generating queries and relevance judgments for retrieval system evaluation. We develop a simulation framework that incorporates existing and new simulation strategies. To validate a simulator, we assess whether evaluation using its output data ranks retrieval systems in the same way as evaluation using real-world data. The real-world data is obtained using logged commercial searches and associated purchase decisions. While no simulator reproduces an ideal ranking, there is a large variation in simulator performance that allows us to distinguish those that are better suited to creating artificial testbeds for retrieval experiments. Incorporating knowledge about document structure in the query generation process helps create more realistic simulators.

References

  1. 1.
    Azzopardi, L.: Query side evaluation: an empirical analysis of effectiveness and effort. In: SIGIR 2009, pp. 556–563. ACM, New York (2009)Google Scholar
  2. 2.
    Azzopardi, L., de Rijke, M., Balog, K.: Building simulated queries for known-item topics: an analysis using six European languages. In: SIGIR 2007, pp. 455–462. ACM Press, New York (2007)Google Scholar
  3. 3.
    Dang, V., Croft, B.W.: Query reformulation using anchor text. In: WSDM 2010, pp. 41–50. ACM Press, New York (2010)Google Scholar
  4. 4.
    Gordon, M.D.: Evaluating the effectiveness of information retrieval systems using simulated queries. J. American Society for Information Science and Technology 41(5), 313–323 (1990)CrossRefGoogle Scholar
  5. 5.
    Harman, D.K.: The TREC test collection, chapter 2, pp. 21–52. TREC: Experiment and Evaluation in Information Retrieval (2005)Google Scholar
  6. 6.
    He, J., Zhai, C., Li, X.: Evaluation of methods for relative comparison of retrieval systems based on clickthroughs. In: CIKM 2009, pp. 2029–2032. ACM, New York (2009)Google Scholar
  7. 7.
    Hofmann, K., Huurnink, B., Bron, M., de Rijke, M.: Comparing click-through data to purchase decisions for retrieval evaluation. In: SIGIR 2010, Geneva, ACM, New York (July 2010)Google Scholar
  8. 8.
    Huurnink, B., Hollink, L., van den Heuvel, W., de Rijke, M.: The search behavior of media professionals at an audiovisual archive: A transaction log analysis. J. American Society for Information Science and Technology 61(6), 1180–1197 (2010)Google Scholar
  9. 9.
    Joachims, T.: Optimizing search engines using clickthrough data. In: KDD 2002, pp. 133–142. ACM, New York (2002)Google Scholar
  10. 10.
    Jordan, C., Watters, C., Gao, Q.: Using controlled query generation to evaluate blind relevance feedback algorithms. In: JCDL 2006, New York, NY, USA, pp. 286–295. ACM, New York (2006)Google Scholar
  11. 11.
    Keskustalo, H., Järvelin, K., Pirkola, A., Sharma, T., Lykke, M.: Test collection-based IR evaluation needs extension toward sessions–a case of extremely short queries. Inf. Retr. Technology, 63–74 (2009)Google Scholar
  12. 12.
    Tague, J., Nelson, M., Wu, H.: Problems in the simulation of bibliographic retrieval systems. In: SIGIR 1980, Kent, UK, pp. 236–255. Butterworth & Co., Butterworths (1981)Google Scholar
  13. 13.
    Tague, J.M., Nelson, M.J.: Simulation of user judgments in bibliographic retrieval systems. SIGIR Forum 16(1), 66–71 (1981)CrossRefGoogle Scholar
  14. 14.
    Voorhees, E.M.: Variations in relevance judgments and the measurement of retrieval effectiveness. In: SIGIR 1998, pp. 315–323. ACM Press, New York (1998)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Bouke Huurnink
    • 1
  • Katja Hofmann
    • 1
  • Maarten de Rijke
    • 1
  • Marc Bron
    • 1
  1. 1.ISLAUniversity of AmsterdamThe Netherlands

Personalised recommendations