Skip to main content

Validating Simulations of User Query Variants

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13185))

Abstract

System-oriented IR evaluations are limited to rather abstract understandings of real user behavior. As a solution, simulating user interactions provides a cost-efficient way to support system-oriented experiments with more realistic directives when no interaction logs are available. While there are several user models for simulated clicks or result list interactions, very few attempts have been made towards query simulations, and it has not been investigated if these can reproduce properties of real queries. In this work, we validate simulated user query variants with the help of TREC test collections in reference to real user queries that were made for the corresponding topics. Besides, we introduce a simple yet effective method that gives better reproductions of real queries than the established methods. Our evaluation framework validates the simulations regarding the retrieval performance, reproducibility of topic score distributions, shared task utility, effort and effect, and query term similarity when compared with real user query variants. While the retrieval effectiveness and statistical properties of the topic score distributions as well as economic aspects are close to that of real queries, it is still challenging to simulate exact term matches and later query reformulations.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://culpepper.io/publications/robust-uqv.txt.gz.

  2. 2.

    https://github.com/castorini/anserini/blob/master/docs/regressions-core17.md.

  3. 3.

      https://github.com/irgroup/ecir2022-uqv-sim.

  4. 4.

    S1 and S3, as well as S2 and S\(3^\prime \), do not differ when averaging over the first queries.

  5. 5.

    Applying the Bonferroni correction adjusts the alpha level to \(\alpha =\frac{0.05}{64}\approx 0.0008\) (considering eight users and eight query simulators for an alpha level of 0.05).

References

  1. Allan, J., Harman, D., Kanoulas, E., Li, D., Gysel, C.V., Voorhees, E.M.: TREC 2017 common core track overview. In: Proceedings of the TREC (2017)

    Google Scholar 

  2. Azzopardi, L.: The economics in interactive information retrieval. In: Proceedings of the SIGIR, pp. 15–24 (2011)

    Google Scholar 

  3. Azzopardi, L., de Rijke, M.: Automatic construction of known-item finding test beds. In: Efthimiadis, E.N., Dumais, S.T., Hawking, D., Järvelin, K. (eds.) Proceedings of the SIGIR, pp. 603–604 (2006)

    Google Scholar 

  4. Azzopardi, L., de Rijke, M., Balog, K.: Building simulated queries for known-item topics: an analysis using six European languages. In: Proceedings of the SIGIR, pp. 455–462 (2007)

    Google Scholar 

  5. Baskaya, F., Keskustalo, H., Järvelin, K.: Time drives interaction: simulating sessions in diverse searching environments. In: Proceedings of the SIGIR, pp. 105–114 (2012)

    Google Scholar 

  6. Benham, R., Culpepper, J.S.: Risk-reward trade-offs in rank fusion. In: Proceedings of the ADCS, pp. 1:1–1:8 (2017)

    Google Scholar 

  7. Benham, R., et al.: RMIT at the 2017 TREC CORE track. In: Proceedings of the TREC (2017)

    Google Scholar 

  8. Benham, R., Mackenzie, J.M., Moffat, A., Culpepper, J.S.: Boosting search performance using query variations. ACM Trans. Inf. Syst. 37(4), 41:1-41:25 (2019)

    Article  Google Scholar 

  9. Berendsen, R., Tsagkias, M., de Rijke, M., Meij, E.: Generating pseudo test collections for learning to rank scientific articles. In: Proceedings of the CLEF, pp. 42–53 (2012)

    Google Scholar 

  10. Breuer, T., et al.: How to measure the reproducibility of system-oriented IR experiments. In: Proceedings of the SIGIR, pp. 349–358 (2020)

    Google Scholar 

  11. Breuer, T., Ferro, N., Maistro, M., Schaer, P.: Repro_eval: a python interface to reproducibility measures of system-oriented IR experiments. In: Proceedings of the ECIR, pp. 481–486 (2021)

    Google Scholar 

  12. Carterette, B., Bah, A., Zengin, M.: Dynamic test collections for retrieval evaluation. In: Proceedings of the ICTIR, pp. 91–100. ACM (2015)

    Google Scholar 

  13. Chuklin, A., Markov, I., de Rijke, M.: Click models for web search. In: Retrieval, and Services, Morgan & Claypool Publishers, Synthesis Lectures on Information Concepts (2015)

    Google Scholar 

  14. Craswell, N., Campos, D., Mitra, B., Yilmaz, E., Billerbeck, B.: ORCAS: 20 million clicked query-document pairs for analyzing search. In: Proceedings of the CIKM, pp. 2983–2989 (2020)

    Google Scholar 

  15. Croft, W.B., Harper, D.J.: Using probabilistic models of document retrieval without relevance information. J. Document. 35(4), 285–295 (1979)

    Article  Google Scholar 

  16. Cronen-Townsend, S., Zhou, Y., Croft, W.B.: Predicting query performance. In: Proceedings of the SIGIR, pp. 299–306 (2002)

    Google Scholar 

  17. Eickhoff, C., Teevan, J., White, R., Dumais, S.T.: Lessons from the journey: a query log analysis of within-session learning. In: Proceedings of the WSDM, pp. 223–232 (2014)

    Google Scholar 

  18. Faggioli, G., Zendel, O., Culpepper, J.S., Ferro, N., Scholer, F.: An enhanced evaluation framework for query performance prediction. In: Proceedings of the ECIR, pp. 115–129 (2021)

    Google Scholar 

  19. Guan, D., Zhang, S., Yang, H.: Utilizing query change for session search. In: Proceedings of the SIGIR, pp. 453–462 (2013)

    Google Scholar 

  20. Günther, S., Hagen, M.: Assessing query suggestions for search session simulation. In: Proceedings of the Sim4IR (2021). http://ceur-ws.org/Vol-2911/paper6.pdf

  21. Gysel, C.V., Kanoulas, E., de Rijke, M.: Lexical query modeling in session search. In: Proceedings of the ICTIR, pp. 69–72 (2016)

    Google Scholar 

  22. He, Y., Tang, J., Ouyang, H., Kang, C., Yin, D., Chang, Y.: Learning to rewrite queries. In: Proceedings of the CIKM, pp. 1443–1452 (2016)

    Google Scholar 

  23. Herdagdelen, A., et al.: Generalized syntactic and semantic models of query reformulation. In: Proceedings of the SIGIR, pp. 283–290 (2010)

    Google Scholar 

  24. Huurnink, B., Hofmann, K., de Rijke, M., Bron, M.: Validating query simulators: an experiment using commercial searches and purchases. In: Proceedings of the CLEF, pp. 40–51 (2010)

    Google Scholar 

  25. Jansen, B.J., Booth, D.L., Spink, A.: Patterns of query reformulation during web searching. J. Assoc. Inf. Sci. Technol. 60(7), 1358–1371 (2009)

    Article  Google Scholar 

  26. Järvelin, K., Price, S.L., Delcambre, L.M.L., Nielsen, M.L.: Discounted cumulated gain based evaluation of multiple-query IR sessions. In: Proceedings of the ECIR, pp. 4–15 (2008)

    Google Scholar 

  27. Jones, R., Rey, B., Madani, O., Greiner, W.: Generating query substitutions. In: Proceedings of the WWW, pp. 387–396 (2006)

    Google Scholar 

  28. Jordan, C., Watters, C.R., Gao, Q.: Using controlled query generation to evaluate blind relevance feedback algorithms. In: Proceedings of the JCDL, pp. 286–295 (2006)

    Google Scholar 

  29. Keskustalo, H., Järvelin, K., Pirkola, A., Sharma, T., Lykke, M.: Test collection-based IR evaluation needs extension toward sessions - a case of extremely short queries. In: Proceedings of the AIRS, pp. 63–74 (2009)

    Google Scholar 

  30. Lin, J., Ma, X., Lin, S., Yang, J., Pradeep, R., Nogueira, R.: Pyserini: a python toolkit for reproducible information retrieval research with sparse and dense representations. In: Proceedings of the SIGIR, pp. 2356–2362. ACM (2021)

    Google Scholar 

  31. Liu, B., Craswell, N., Lu, X., Kurland, O., Culpepper, J.S.: A comparative analysis of human and automatic query variants. In: Proceedings of the SIGIR, pp. 47–50 (2019)

    Google Scholar 

  32. Liu, J., Sarkar, S., Shah, C.: Identifying and predicting the states of complex search tasks. In: Proceedings of the CHIIR, pp. 193–202 (2020)

    Google Scholar 

  33. Mackenzie, J., Moffat, A.: Modality effects when simulating user querying tasks. In: Proceedings of the ICTIR, pp. 197–201 (2021)

    Google Scholar 

  34. Maxwell, D., Azzopardi, L.: Agents, simulated users and humans: an analysis of performance and behaviour. In: Proceedings of the CIKM, pp. 731–740. ACM (2016)

    Google Scholar 

  35. Maxwell, D., Azzopardi, L.: Simulating interactive information retrieval: simiir: a framework for the simulation of interaction. In: Proceedings of the SIGIR, pp. 1141–1144. ACM (2016)

    Google Scholar 

  36. Moffat, A., Scholer, F., Thomas, P., Bailey, P.: Pooled evaluation over query variations: users are as diverse as systems. In: Proceedings of the CIKM, pp. 1759–1762 (2015)

    Google Scholar 

  37. Pääkkönen, T., Kekäläinen, J., Keskustalo, H., Azzopardi, L., Maxwell, D., Järvelin, K.: Validating simulated interaction for retrieval evaluation. Inf. Ret. J. 20(4), 338–362 (2017). https://doi.org/10.1007/s10791-017-9301-2

    Article  Google Scholar 

  38. Ruthven, I., Lalmas, M.: A survey on the use of relevance feedback for information access systems. Knowl. Eng. Rev. 18(2), 95–145 (2003)

    Article  Google Scholar 

  39. Sloan, M., Yang, H., Wang, J.: A term-based methodology for query reformulation understanding. Inf. Retriev. J. 18(2), 145–165 (2015). https://doi.org/10.1007/s10791-015-9251-5

    Article  Google Scholar 

  40. Tague, J., Nelson, M.J.: Simulation of user judgments in bibliographic retrieval systems. In: Proceedings of the SIGIR, pp. 66–71 (1981)

    Google Scholar 

  41. Verberne, S., Sappelli, M., Järvelin, K., Kraaij, W.: User simulations for interactive search: Evaluating personalized query suggestion. In: Proceedings of the ECIR, pp. 678–690 (2015)

    Google Scholar 

  42. Verberne, S., Sappelli, M., Kraaij, W.: Query term suggestion in academic search. In: Proceedings of the ECIR, pp. 560–566 (2014)

    Google Scholar 

  43. Voorhees, E.M.: Variations in relevance judgments and the measurement of retrieval effectiveness. In: Proceedings of the SIGIR, pp. 315–323 (1998)

    Google Scholar 

  44. Yang, H., Guan, D., Zhang, S.: The query change model: modeling session search as a Markov decision process. ACM Trans. Inf. Syst. 33(4), 20:1-20:33 (2015)

    Article  Google Scholar 

  45. Yang, P., Fang, H., Lin, J.: Anserini: reproducible ranking baselines using Lucene. ACM J. Data Inf. Qual. 10(4), 16:1-16:20 (2018)

    Google Scholar 

  46. Zhai, C., Lafferty, J.D.: A study of smoothing methods for language models applied to ad hoc information retrieval. In: Proceedings of the SIGIR, pp. 334–342 (2001)

    Google Scholar 

  47. Zhang, Y., Liu, X., Zhai, C.: Information retrieval evaluation as search simulation: a general formal framework for IR evaluation. In: Proceedings of the ICTIR, pp. 193–200 (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Timo Breuer .

Editor information

Editors and Affiliations

A Appendix

A Appendix

Table 1. Average retrieval performance over q queries
Fig. 4.
figure 4

RMSE between TTS\(_{\mathrm {S1}\text {-}\mathrm {S3}^{'}}\) and KIS\(_{\mathrm {S1}\text {-}\mathrm {S3}^{'}}\) queries and the UQV queries.

Fig. 5.
figure 5

p-values of paired t-tests between UQV and simulated queries.

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Breuer, T., Fuhr, N., Schaer, P. (2022). Validating Simulations of User Query Variants. In: Hagen, M., et al. Advances in Information Retrieval. ECIR 2022. Lecture Notes in Computer Science, vol 13185. Springer, Cham. https://doi.org/10.1007/978-3-030-99736-6_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-99736-6_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-99735-9

  • Online ISBN: 978-3-030-99736-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics