Validating Query Simulators: An Experiment Using Commercial Searches and Purchases

Huurnink, Bouke; Hofmann, Katja; de Rijke, Maarten; Bron, Marc

doi:10.1007/978-3-642-15998-5_6

Bouke Huurnink²¹,
Katja Hofmann²¹,
Maarten de Rijke²¹ &
…
Marc Bron²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6360))

Included in the following conference series:

International Conference of the Cross-Language Evaluation Forum for European Languages

696 Accesses
8 Citations

Abstract

We design and validate simulators for generating queries and relevance judgments for retrieval system evaluation. We develop a simulation framework that incorporates existing and new simulation strategies. To validate a simulator, we assess whether evaluation using its output data ranks retrieval systems in the same way as evaluation using real-world data. The real-world data is obtained using logged commercial searches and associated purchase decisions. While no simulator reproduces an ideal ranking, there is a large variation in simulator performance that allows us to distinguish those that are better suited to creating artificial testbeds for retrieval experiments. Incorporating knowledge about document structure in the query generation process helps create more realistic simulators.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Azzopardi, L.: Query side evaluation: an empirical analysis of effectiveness and effort. In: SIGIR 2009, pp. 556–563. ACM, New York (2009)
Google Scholar
Azzopardi, L., de Rijke, M., Balog, K.: Building simulated queries for known-item topics: an analysis using six European languages. In: SIGIR 2007, pp. 455–462. ACM Press, New York (2007)
Google Scholar
Dang, V., Croft, B.W.: Query reformulation using anchor text. In: WSDM 2010, pp. 41–50. ACM Press, New York (2010)
Google Scholar
Gordon, M.D.: Evaluating the effectiveness of information retrieval systems using simulated queries. J. American Society for Information Science and Technology 41(5), 313–323 (1990)
Article Google Scholar
Harman, D.K.: The TREC test collection, chapter 2, pp. 21–52. TREC: Experiment and Evaluation in Information Retrieval (2005)
Google Scholar
He, J., Zhai, C., Li, X.: Evaluation of methods for relative comparison of retrieval systems based on clickthroughs. In: CIKM 2009, pp. 2029–2032. ACM, New York (2009)
Google Scholar
Hofmann, K., Huurnink, B., Bron, M., de Rijke, M.: Comparing click-through data to purchase decisions for retrieval evaluation. In: SIGIR 2010, Geneva, ACM, New York (July 2010)
Google Scholar
Huurnink, B., Hollink, L., van den Heuvel, W., de Rijke, M.: The search behavior of media professionals at an audiovisual archive: A transaction log analysis. J. American Society for Information Science and Technology 61(6), 1180–1197 (2010)
Google Scholar
Joachims, T.: Optimizing search engines using clickthrough data. In: KDD 2002, pp. 133–142. ACM, New York (2002)
Google Scholar
Jordan, C., Watters, C., Gao, Q.: Using controlled query generation to evaluate blind relevance feedback algorithms. In: JCDL 2006, New York, NY, USA, pp. 286–295. ACM, New York (2006)
Google Scholar
Keskustalo, H., Järvelin, K., Pirkola, A., Sharma, T., Lykke, M.: Test collection-based IR evaluation needs extension toward sessions–a case of extremely short queries. Inf. Retr. Technology, 63–74 (2009)
Google Scholar
Tague, J., Nelson, M., Wu, H.: Problems in the simulation of bibliographic retrieval systems. In: SIGIR 1980, Kent, UK, pp. 236–255. Butterworth & Co., Butterworths (1981)
Google Scholar
Tague, J.M., Nelson, M.J.: Simulation of user judgments in bibliographic retrieval systems. SIGIR Forum 16(1), 66–71 (1981)
Article Google Scholar
Voorhees, E.M.: Variations in relevance judgments and the measurement of retrieval effectiveness. In: SIGIR 1998, pp. 315–323. ACM Press, New York (1998)
Google Scholar

Download references

Author information

Authors and Affiliations

ISLA, University of Amsterdam, The Netherlands
Bouke Huurnink, Katja Hofmann, Maarten de Rijke & Marc Bron

Authors

Bouke Huurnink
View author publications
You can also search for this author in PubMed Google Scholar
Katja Hofmann
View author publications
You can also search for this author in PubMed Google Scholar
Maarten de Rijke
View author publications
You can also search for this author in PubMed Google Scholar
Marc Bron
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Information Engineering, University of Padua, Via Gradenigo 6/a, 35131, Padova, Italy
Maristella Agosti
University of Padua, Padua, Italy
Nicola Ferro
ISTI-CNR, Area Ricerca CNR, Via Moruzzi, 1, 56124, Pisa, Italy
Carol Peters
ISLA, University of Amsterdam, Amsterdam, The Netherlands
Maarten de Rijke
Dublin City University, Dublin, Ireland
Alan Smeaton

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Huurnink, B., Hofmann, K., de Rijke, M., Bron, M. (2010). Validating Query Simulators: An Experiment Using Commercial Searches and Purchases. In: Agosti, M., Ferro, N., Peters, C., de Rijke, M., Smeaton, A. (eds) Multilingual and Multimodal Information Access Evaluation. CLEF 2010. Lecture Notes in Computer Science, vol 6360. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15998-5_6

Download citation

DOI: https://doi.org/10.1007/978-3-642-15998-5_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15997-8
Online ISBN: 978-3-642-15998-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics