Balancing Exploration and Exploitation in Learning to Rank Online

Hofmann, Katja; Whiteson, Shimon; de Rijke, Maarten

doi:10.1007/978-3-642-20161-5_25

Katja Hofmann²¹,
Shimon Whiteson²¹ &
Maarten de Rijke²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6611))

Included in the following conference series:

European Conference on Information Retrieval

6852 Accesses
25 Citations

Abstract

As retrieval systems become more complex, learning to rank approaches are being developed to automatically tune their parameters. Using online learning to rank approaches, retrieval systems can learn directly from implicit feedback, while they are running. In such an online setting, algorithms need to both explore new solutions to obtain feedback for effective learning, and exploit what has already been learned to produce results that are acceptable to users. We formulate this challenge as an exploration-exploitation dilemma and present the first online learning to rank algorithm that works with implicit feedback and balances exploration and exploitation. We leverage existing learning to rank data sets and recently developed click models to evaluate the proposed algorithm. Our results show that finding a balance between exploration and exploitation can substantially improve online retrieval performance, bringing us one step closer to making online learning to rank work in practice.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Barto, A.G., Sutton, R.S., Brouwer, P.S.: Associative search network: A reinforcement learning associative memory. IEEE Trans. Syst., Man, and Cybern. 40, 201–211 (1981)
MATH Google Scholar
Broder, A.: A taxonomy of web search. SIGIR Forum 36(2), 3–10 (2002)
Article MATH Google Scholar
Craswell, N., Zoeter, O., Taylor, M., Ramsey, B.: An experimental comparison of click position-bias models. In: WSDM 2008, pp. 87–94 (2008)
Google Scholar
Donmez, P., Carbonell, J.G.: Active sampling for rank learning via optimizing the area under the ROC curve. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds.) ECIR 2009. LNCS, vol. 5478, pp. 78–89. Springer, Heidelberg (2009)
Chapter Google Scholar
Dupret, G.E., Piwowarski, B.: A user browsing model to predict search engine click data from past observations. In: SIGIR 2008, pp. 331–338 (2008)
Google Scholar
Guo, F., Li, L., Faloutsos, C.: Tailoring click models to user goals. In: WSCD 2009, pp. 88–92 (2009)
Google Scholar
Guo, F., Liu, C., Wang, Y.M.: Efficient multiple-click models in web search. In: WSDM 2009, pp. 124–131 (2009)
Google Scholar
He, J., Zhai, C., Li, X.: Evaluation of methods for relative comparison of retrieval systems based on clickthroughs. In: CIKM 2009, pp. 2029–2032 (2009)
Google Scholar
Joachims, T.: Optimizing search engines using clickthrough data. In: KDD 2002, pp. 133–142 (2002)
Google Scholar
Joachims, T., Granka, L., Pan, B., Hembrooke, H., Gay, G.: Accurately interpreting clickthrough data as implicit feedback. In: SIGIR 2005, pp. 154–161. ACM Press, New York (2005)
Google Scholar
Langford, J., Zhang, T.: The epoch-greedy algorithm for multi-armed bandits with side information. In: NIPS 2008, pp. 817–824 (2008)
Google Scholar
Langford, J., Strehl, A., Wortman, J.: Exploration scavenging. In: ICML 2008, pp. 528–535 (2008)
Google Scholar
Liu, T.Y.: Learning to rank for information retrieval. Foundations and Trends in Information Retrieval 3(3), 225–331 (2009)
Article Google Scholar
Liu, T.-Y., Xu, J., Qin, T., Xiong, W., Li, H.: Letor: Benchmark dataset for research on learning to rank for information retrieval. In: LR4IR 2007 (2007)
Google Scholar
Radlinski, F., Craswell, N.: Comparing the sensitivity of information retrieval metrics. In: SIGIR 2010, pp. 667–674 (2010)
Google Scholar
Radlinski, F., Joachims, T.: Active exploration for learning rankings from clickthrough data. In: KDD 2007, pp. 570–579 (2007)
Google Scholar
Radlinski, F., Kleinberg, R., Joachims, T.: Learning diverse rankings with multi-armed bandits. In: ICML 2008, pp. 784–791. ACM, New York (2008)
Google Scholar
Radlinski, F., Kurup, M., Joachims, T.: How does clickthrough data reflect retrieval quality? In: CIKM 2008, pp. 43–52 (2008)
Google Scholar
Sanderson, M.: Test collection based evaluation of information retrieval systems. Foundations and Trends in Information Retrieval 4(4), 247–375 (2010)
Article MATH Google Scholar
Silverstein, C., Marais, H., Henzinger, M., Moricz, M.: Analysis of a very large web search engine query log. SIGIR Forum 33(1), 6–12 (1999)
Article Google Scholar
Sutton, R.S., Barto, A.G.: Introduction to Reinforcement Learning. MIT Press, Cambridge (1998)
Google Scholar
Watkins, C.: Learning from Delayed Rewards. PhD thesis, Cambridge University (1989)
Google Scholar
Xu, Z., Akella, R., Zhang, Y.: Incorporating diversity and density in active learning for relevance feedback. In: Amati, G., Carpineto, C., Romano, G. (eds.) ECIR 2007. LNCS, vol. 4425, pp. 246–257. Springer, Heidelberg (2007)
Chapter Google Scholar
Xu, Z., Kersting, K., Joachims, T.: Fast active exploration for link-based preference learning using gaussian processes. In: ECML PKDD 2010, pp. 499–514 (2010)
Google Scholar
Yue, Y., Joachims, T.: Interactively optimizing information retrieval systems as a dueling bandits problem. In: ICML 2009, pp. 1201–1208 (2009)
Google Scholar
Yue, Y., Broder, J., Kleinberg, R., Joachims, T.: The k-armed dueling bandits problem. In: COLT 2009 (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

ISLA, University of Amsterdam, Netherlands
Katja Hofmann, Shimon Whiteson & Maarten de Rijke

Authors

Katja Hofmann
View author publications
You can also search for this author in PubMed Google Scholar
Shimon Whiteson
View author publications
You can also search for this author in PubMed Google Scholar
Maarten de Rijke
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Information School, University of Sheffield, Regent Court, 211 Portobello Street, S1 4DP, Sheffield, UK
Paul Clough
CLARITY: Centre for Sensor Web Technologies, School of Computing, Dublin City University, Glasnevin, Dublin 9, Ireland
Colum Foley , Cathal Gurrin & Hyowon Lee , &
Centre for Next Generation Localisation, School of Computing, Dublin City University, Glasnevin, Dublin 9, Ireland
Gareth J. F. Jones
TNO Human Factors, Brassersplein 2, 2612 CT, Delft, The Netherlands
Wessel Kraaij
Yahoo! Research, 177 Diagonal, 08018, Barcelona, Spain
Vanessa Mudoch

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hofmann, K., Whiteson, S., de Rijke, M. (2011). Balancing Exploration and Exploitation in Learning to Rank Online. In: Clough, P., et al. Advances in Information Retrieval. ECIR 2011. Lecture Notes in Computer Science, vol 6611. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20161-5_25

Download citation

DOI: https://doi.org/10.1007/978-3-642-20161-5_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20160-8
Online ISBN: 978-3-642-20161-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics