Skip to main content

Time-Decaying Bandits for Non-stationary Systems

  • Conference paper
Web and Internet Economics (WINE 2014)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8877))

Included in the following conference series:

Abstract

Contents displayed on web portals (e.g., news articles at Yahoo.com) are usually adaptively selected from a dynamic set of candidate items, and the attractiveness of each item decays over time. The goal of those websites is to maximize the engagement of users (usually measured by their clicks) on the selected items. We formulate this kind of applications as a new variant of bandit problems where new arms are dynamically added into the candidate set and the expected reward of each arm decays as the round proceeds. For this new problem, a direct application of the algorithms designed for stochastic MAB (e.g., UCB) will lead to over-estimation of the rewards of old arms, and thus cause a misidentification of the optimal arm. To tackle this challenge, we propose a new algorithm that can adaptively estimate the temporal dynamics in the rewards of the arms, and effectively identify the best arm at a given time point on this basis. When the temporal dynamics are represented by a set of features, the proposed algorithm is able to enjoy a sub-linear regret. Our experiments verify the effectiveness of the proposed algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abbasi-Yadkori, Y., Pál, D., Szepesvári, C.: Improved algorithms for linear stochastic bandits. In: NIPS, pp. 2312–2320 (2011)

    Google Scholar 

  2. Abe, N., Long, P.M.: Associative reinforcement learning using linear probabilistic concepts. In: ICML, pp. 3–11 (1999)

    Google Scholar 

  3. Agarwal, D., Chen, B.C., Elango, P.: Spatio-temporal models for estimating click-through rate. In: WWW, pp. 21–30 (2009)

    Google Scholar 

  4. Agrawal, S., Goyal, N.: Thompson sampling for contextual bandits with linear payoffs. In: ICML (3). JMLR Proceedings, vol. 28, pp. 127–135. JMLR.org (2013)

    Google Scholar 

  5. Auer, P.: Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research 3, 397–422 (2002)

    MathSciNet  Google Scholar 

  6. Auer, P., Cesa-bianchi, N., Fischer, P.: Finite-time Analysis of the Multiarmed Bandit Problem. Machine Learning 47, 235–256 (2002)

    Article  MATH  Google Scholar 

  7. Auer, P., Freund, Y., Schapire, R.E.: The non-stochastic multi-armed bandit problem. Siam Journal on Computing (2002)

    Google Scholar 

  8. Chou, K.C., Lin, H.T.: Balancing between estimated reward and uncertainty during news article recommendation for ICML 2012 exploration and exploitation challenge. ICML 2012 Workshop: Exploration and Exploitation 3 (2012)

    Google Scholar 

  9. Dani, V., Hayes, T.P., Kakade, S.M.: Stochastic linear optimization under bandit feedback. In: COLT, pp. 355–366 (2008)

    Google Scholar 

  10. Li, L., Chu, W., Langford, J., Schapire, R.E.: A contextual-bandit approach to personalized news article recommendation. In: WWW, pp. 661–670 (2010)

    Google Scholar 

  11. Cesa-Bianchi, N., Gentile, C., Zappella, G.: A Gang of Bandits. In: NIPS (2013)

    Google Scholar 

  12. Wu, F., Huberman, B.A.: Novelty and collective attention. Tech. rep., Proceedings of National Academy of Sciences (2007)

    Google Scholar 

  13. Yahoo!: Yahoo! Webscope dataset R6A/R6B. ydata-frontpage-todaymodule-clicks (2011), http://webscope.sandbox.yahoo.com/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Komiyama, J., Qin, T. (2014). Time-Decaying Bandits for Non-stationary Systems. In: Liu, TY., Qi, Q., Ye, Y. (eds) Web and Internet Economics. WINE 2014. Lecture Notes in Computer Science, vol 8877. Springer, Cham. https://doi.org/10.1007/978-3-319-13129-0_40

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-13129-0_40

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-13128-3

  • Online ISBN: 978-3-319-13129-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics