Skip to main content

On the Prior Sensitivity of Thompson Sampling

  • Conference paper
  • First Online:
Algorithmic Learning Theory (ALT 2016)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9925))

Included in the following conference series:

Abstract

The empirically successful Thompson Sampling algorithm for stochastic bandits has drawn much interest in understanding its theoretical properties. One important benefit of the algorithm is that it allows domain knowledge to be conveniently encoded as a prior distribution to balance exploration and exploitation more effectively. While it is generally believed that the algorithm’s regret is low (high) when the prior is good (bad), little is known about the exact dependence. This paper is a first step towards answering this important question: focusing on a special yet representative case, we fully characterize the algorithm’s worst-case dependence of regret on the choice of prior. As a corollary, these results also provide useful insights into the general sensitivity of the algorithm to the choice of priors, when no structural assumptions are made. In particular, with p being the prior probability mass of the true reward-generating model, we prove \(O(\sqrt{T/p})\) and \(O(\sqrt{(1-p)T})\) regret upper bounds for the poor- and good-prior cases, respectively, as well as matching lower bounds. Our proofs rely on a fundamental property of Thompson Sampling and make heavy use of martingale theory, both of which appear novel in the Thompson-Sampling literature and may be useful for studying other behavior of the algorithm.

Most of this work was done when C.Y. Liu was an intern at Microsoft.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Note that in this paper, we do not impose any continuity structure on the reward distributions \(\nu (\theta )\) with respect to \(\theta \in \varTheta \). Therefore, it is easy to see that when \(\varTheta \) is uncountable, the (frequentist) regret of Thompson Sampling, as defined in Eq. 1, in the worst-case scenario is linear in time under most underlying models \(\theta \in \varTheta \).

References

  1. Abbasi-Yadkori, Y., Pál, D., Szepesvári, C.: Improved algorithms for linear stochastic bandits. In: NIPS, pp. 2312–2320 (2011)

    Google Scholar 

  2. Agarwal, A., Hsu, D., Kale, S., Langford, J., Li, L., Schapire, R.E.: Taming the monster: a fast and simple algorithm for contextual bandits. In: ICML, pp. 1638–1646 (2014)

    Google Scholar 

  3. Agrawal, S., Goyal, N.: Analysis of Thompson sampling for the multi-armed bandit problem. In: COLT, pp. 39.1–39.26 (2012)

    Google Scholar 

  4. Agrawal, S., Goyal, N.: Further optimal regret bounds for Thompson sampling. In: AISTATS, pp. 99–107 (2013)

    Google Scholar 

  5. Agrawal, S., Goyal, N.: Thompson sampling for contextual bandits with linear payoffs. In: ICML, pp. 127–135 (2013)

    Google Scholar 

  6. Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.: The non-stochastic multi-armed bandit problem. SIAM J. Comput. 32(1), 48–77 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  7. Bartroff, J., Lai, T.L., Shih, M.-C.: Sequential Experimentation in Clinical Trials: Design and Analysis, vol. 298. Springer, Heildelberg (2013)

    MATH  Google Scholar 

  8. Bubeck, S., Cesa-Bianchi, N.: Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Found. Trends Mach. Learn. 5(1), 1–122 (2012)

    Article  MATH  Google Scholar 

  9. Bubeck, S., Liu, C.Y.: Prior-free and prior-dependent regret bounds for Thompson sampling. In: NIPS, pp. 638–646 (2013)

    Google Scholar 

  10. Cesa-Bianchi, N., Lugosi, G.: Prediction, Learning, and Games. Cambridge University Press, Cambridge (2006)

    Book  MATH  Google Scholar 

  11. Chapelle, O., Li, L.: An empirical evaluation of Thompson sampling. In: NIPS, pp. 2249–2257 (2011)

    Google Scholar 

  12. Chu, W., Li, L., Reyzin, L., Schapire, R.E.: Contextual bandits with linear payoff functions. In: AISTATS, pp. 208–214 (2011)

    Google Scholar 

  13. Gopalan, A., Mannor, S., Mansour, Y.: Thompson sampling for complex online problems. In: ICML, pp. 100–108 (2014)

    Google Scholar 

  14. Graepel, T., Candela, J.Q., Borchert, T., Herbrich, R.: Web-scale Bayesian click-through rate prediction for sponsored search advertising in Microsoft’s Bing search engine. In: ICML, pp. 13–20 (2010)

    Google Scholar 

  15. Gravin, N., Peres, Y., Sivan, B.: Towards optimal algorithms for prediction with expert advice. In: SODA, pp. 528–547 (2016)

    Google Scholar 

  16. Guha, S., Munagala, K.: Approximation algorithms for Bayesian multi-armed bandit problems. arXiv preprint arXiv: 1306.3525v2 (2013)

  17. Guha, S., Munagala, K.: Stochastic regret minimization via Thompson sampling. In: COLT, pp. 317–338 (2014)

    Google Scholar 

  18. Honda, J., Takemura, A.: Optimality of Thompson sampling for Gaussian bandits depends on priors. In: AISTATS, pp. 375–383 (2014)

    Google Scholar 

  19. Kaufmann, E., Korda, N., Munos, R.: Thompson sampling: an asymptotically optimal finite-time analysis. In: Bshouty, N.H., Stoltz, G., Vayatis, N., Zeugmann, T. (eds.) ALT 2012. LNCS, vol. 7568, pp. 199–213. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  20. Komiyama, J., Honda, J., Nakagawa, H.: Optimal regret analysis of Thompson sampling in stochastic multi-armed bandit problem with multiple plays. In: ICML, pp. 1152–1161 (2015)

    Google Scholar 

  21. Lai, T.L., Robbins, H.: Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6, 4–22 (1985)

    Article  MathSciNet  MATH  Google Scholar 

  22. Lattimore, T.: The pareto regret frontier for bandits. In: NIPS, pp. 208–216 (2015)

    Google Scholar 

  23. Li, L.: Generalized Thompson sampling for contextual bandits. Technical report MSR-TR-2013-136, Microsoft Research (2013)

    Google Scholar 

  24. Liu, C.Y., Li, L.: On the prior sensitivity of Thompson sampling (2015). arXiv:1506.03378

  25. May, B.C., Korda, N., Lee, A., Leslie, D.S.: Optimistic Bayesian sampling in contextual-bandit problems. J. Mach. Learn. Res. 13, 2069–2106 (2012)

    MathSciNet  MATH  Google Scholar 

  26. Russo, D., Van Roy, B.: Learning to optimize via posterior sampling. Math. Oper. Res. 39(4), 1221–1243 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  27. Russo, D., Van Roy, B.: An information-theoretic analysis of Thompson sampling. J. Mach. Learn. Res. 17(68), 1–30 (2016)

    MathSciNet  MATH  Google Scholar 

  28. Scott, S.L.: A modern Bayesian look at the multi-armed bandit. Appl. Stoch. Models Bus. Ind. 26, 639–658 (2010)

    Article  MathSciNet  Google Scholar 

  29. Thompson, W.: On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Bull. Am. Math. Soc. 25, 285–294 (1933)

    MATH  Google Scholar 

  30. Xia, Y., Li, H., Qin, T., Yu, N., Liu, T.-Y.: Thompson sampling for budgeted multi-armed bandits. In: IJCAI, pp. 3960–3966 (2015)

    Google Scholar 

Download references

Acknowledgments

We thank Sébastien Bubeck and the anonymous reviewers for helpful advice that improves the presentation of the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lihong Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Liu, CY., Li, L. (2016). On the Prior Sensitivity of Thompson Sampling. In: Ortner, R., Simon, H., Zilles, S. (eds) Algorithmic Learning Theory. ALT 2016. Lecture Notes in Computer Science(), vol 9925. Springer, Cham. https://doi.org/10.1007/978-3-319-46379-7_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-46379-7_22

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-46378-0

  • Online ISBN: 978-3-319-46379-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics