A Statistical Model of Query Log Generation

Dupret, Georges; Piwowarski, Benjamin; Hurtado, Carlos; Mendoza, Marcelo

doi:10.1007/11880561_18

Georges Dupret¹⁹,
Benjamin Piwowarski¹⁹,
Carlos Hurtado²⁰ &
…
Marcelo Mendoza²⁰

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4209))

Included in the following conference series:

International Symposium on String Processing and Information Retrieval

594 Accesses
7 Citations

Abstract

Query logs record past query sessions across a time span. A statistical model is proposed to explain the log generation process. Within a search engine list of results, the model explains the document selection – a user’s click – by taking into account both a document position and its popularity. We show that it is possible to quantify this influence and consequently estimate document “un-biased” popularities. Among other applications, this allows to re-order the result list to match more closely user preferences and to use the logs as a feedback to improve search engines.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Baeza-Yates, R., Saint-Jean, F.: A three level search engine index based in query log distribution. In: Nascimento, M.A., de Moura, E.S., Oliveira, A.L. (eds.) SPIRE 2003. LNCS, vol. 2857, pp. 56–65. Springer, Heidelberg (2003)
Chapter Google Scholar
Dupret, G., Mendoza, M.: Recommending better queries from click-through data. In: Consens, M.P., Navarro, G. (eds.) SPIRE 2005. LNCS, vol. 3772, pp. 41–44. Springer, Heidelberg (2005)
Chapter Google Scholar
Joachims, T.: Evaluating search engines using clickthrough data. Department of Computer Science, Cornell University (2002)
Google Scholar
Joachims, T.: Optimizing search engines using clickthrough data. In: KDD 2002: Proceedings of the eighth ACM SIGKDD, pp. 133–142. ACM Press, New York (2002)
Google Scholar
Radlinski, F., Joachims, T.: Query chains: learning to rank from implicit feedback. In: KDD 2005: Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, pp. 239–248. ACM Press, New York (2005)
Chapter Google Scholar
Ribeiro-Neto, B.A., Muntz, R.: A belief network model for IR. In: SIGIR 1996: Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 253–260. ACM Press, New York (1996)
Chapter Google Scholar
Robertson, S.E., Jones, K.S.: Relevance weighting of search terms. Taylor Graham Publishing, London (1988)
Google Scholar
Joachims, T.: Unbiased evaluation of retrieval quality using clickthrough data. Technical report, Cornell University, Department of Computer Science (2002), http://www.joachims.org
Wen, J.-R., Nie, J.-Y., Zhang, H.-J.: Clustering user queries of a search engine. In: WWW 2001: Proceedings of the 10th international conference on World Wide Web, pp. 162–168. ACM Press, New York (2001)
Chapter Google Scholar
Zaïane, O.R., Strilets, A.: Finding similar queries to satisfy searches based on query traces. In: Proceedings of the International Workshop on Efficient Web-Based Information Systems (EWIS), Montpellier, France (September 2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Yahoo! Research Latin America,
Georges Dupret & Benjamin Piwowarski
Departamento de Ciencias de la Computación, Universidad de Chile,
Carlos Hurtado & Marcelo Mendoza

Authors

Georges Dupret
View author publications
You can also search for this author in PubMed Google Scholar
Benjamin Piwowarski
View author publications
You can also search for this author in PubMed Google Scholar
Carlos Hurtado
View author publications
You can also search for this author in PubMed Google Scholar
Marcelo Mendoza
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer and Information Science, University of Strathclyde, Scotland
Fabio Crestani
Dipartimento di Informatica, University of Pisa, Largo B. Pontecorvo 3, 56127, Pisa, Italy
Paolo Ferragina
Department of Information Studies, University of Sheffield, Sheffield, UK
Mark Sanderson

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dupret, G., Piwowarski, B., Hurtado, C., Mendoza, M. (2006). A Statistical Model of Query Log Generation. In: Crestani, F., Ferragina, P., Sanderson, M. (eds) String Processing and Information Retrieval. SPIRE 2006. Lecture Notes in Computer Science, vol 4209. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11880561_18

Download citation

DOI: https://doi.org/10.1007/11880561_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45774-9
Online ISBN: 978-3-540-45775-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics