Time-Decaying Bandits for Non-stationary Systems

Komiyama, Junpei; Qin, Tao

doi:10.1007/978-3-319-13129-0_40

Junpei Komiyama¹⁸ &
Tao Qin¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8877))

Included in the following conference series:

International Conference on Web and Internet Economics

1693 Accesses
1 Citations

Abstract

Contents displayed on web portals (e.g., news articles at Yahoo.com) are usually adaptively selected from a dynamic set of candidate items, and the attractiveness of each item decays over time. The goal of those websites is to maximize the engagement of users (usually measured by their clicks) on the selected items. We formulate this kind of applications as a new variant of bandit problems where new arms are dynamically added into the candidate set and the expected reward of each arm decays as the round proceeds. For this new problem, a direct application of the algorithms designed for stochastic MAB (e.g., UCB) will lead to over-estimation of the rewards of old arms, and thus cause a misidentification of the optimal arm. To tackle this challenge, we propose a new algorithm that can adaptively estimate the temporal dynamics in the rewards of the arms, and effectively identify the best arm at a given time point on this basis. When the temporal dynamics are represented by a set of features, the proposed algorithm is able to enjoy a sub-linear regret. Our experiments verify the effectiveness of the proposed algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abbasi-Yadkori, Y., Pál, D., Szepesvári, C.: Improved algorithms for linear stochastic bandits. In: NIPS, pp. 2312–2320 (2011)
Google Scholar
Abe, N., Long, P.M.: Associative reinforcement learning using linear probabilistic concepts. In: ICML, pp. 3–11 (1999)
Google Scholar
Agarwal, D., Chen, B.C., Elango, P.: Spatio-temporal models for estimating click-through rate. In: WWW, pp. 21–30 (2009)
Google Scholar
Agrawal, S., Goyal, N.: Thompson sampling for contextual bandits with linear payoffs. In: ICML (3). JMLR Proceedings, vol. 28, pp. 127–135. JMLR.org (2013)
Google Scholar
Auer, P.: Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research 3, 397–422 (2002)
MathSciNet Google Scholar
Auer, P., Cesa-bianchi, N., Fischer, P.: Finite-time Analysis of the Multiarmed Bandit Problem. Machine Learning 47, 235–256 (2002)
Article MATH Google Scholar
Auer, P., Freund, Y., Schapire, R.E.: The non-stochastic multi-armed bandit problem. Siam Journal on Computing (2002)
Google Scholar
Chou, K.C., Lin, H.T.: Balancing between estimated reward and uncertainty during news article recommendation for ICML 2012 exploration and exploitation challenge. ICML 2012 Workshop: Exploration and Exploitation 3 (2012)
Google Scholar
Dani, V., Hayes, T.P., Kakade, S.M.: Stochastic linear optimization under bandit feedback. In: COLT, pp. 355–366 (2008)
Google Scholar
Li, L., Chu, W., Langford, J., Schapire, R.E.: A contextual-bandit approach to personalized news article recommendation. In: WWW, pp. 661–670 (2010)
Google Scholar
Cesa-Bianchi, N., Gentile, C., Zappella, G.: A Gang of Bandits. In: NIPS (2013)
Google Scholar
Wu, F., Huberman, B.A.: Novelty and collective attention. Tech. rep., Proceedings of National Academy of Sciences (2007)
Google Scholar
Yahoo!: Yahoo! Webscope dataset R6A/R6B. ydata-frontpage-todaymodule-clicks (2011), http://webscope.sandbox.yahoo.com/

Download references

Author information

Authors and Affiliations

The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, Japan
Junpei Komiyama
Microsoft Research, No. 5 Danling Street, Haidian District, Beijing, 100080, P.R. China
Tao Qin

Authors

Junpei Komiyama
View author publications
You can also search for this author in PubMed Google Scholar
Tao Qin
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Microsoft Research, Haidian District, 100080, Beijing, China
Tie-Yan Liu
The Hong Kong University of Science and Technology, Kowloon, Hong Kong
Qi Qi
Deptartment of Management Science and Engineering, Stanford University, 94305-4121, Stanford, CA, USA
Yinyu Ye

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Komiyama, J., Qin, T. (2014). Time-Decaying Bandits for Non-stationary Systems. In: Liu, TY., Qi, Q., Ye, Y. (eds) Web and Internet Economics. WINE 2014. Lecture Notes in Computer Science, vol 8877. Springer, Cham. https://doi.org/10.1007/978-3-319-13129-0_40

Download citation

DOI: https://doi.org/10.1007/978-3-319-13129-0_40
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-13128-3
Online ISBN: 978-3-319-13129-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics