Skip to main content

Lower Bounds on the Sample Complexity of Exploration in the Multi-armed Bandit Problem

  • Conference paper
Learning Theory and Kernel Machines

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2777))

Abstract

We consider the Multi-armed bandit problem under the PAC (“probably approximately correct”) model. It was shown by Even-Dar et al. [5] that given n arms, it suffices to play the arms a total of\(O\big(({n}/{\epsilon^2})\log ({1}/{\delta})\big)\) times to find an ε-optimal arm with probability of at least 1-δ. Our contribution is a matching lower bound that holds for any sampling policy. We also generalize the lower bound to a Bayesian setting, and to the case where the statistics of the arms are known but the identities of the arms are not.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Anthony, M., Bartlett, P.L.: Neural Network Learning; Theoretical Foundations. Cambridge University Press, Cambridge (1999)

    Book  MATH  Google Scholar 

  2. Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: Gambling in a rigged casino: The adversarial multi-armed bandit problem. In: Proc. 36th Annual Symposium on Foundations of Computer Science, pp. 322–331. IEEE Computer Society Press, Los Alamitos (1995)

    Google Scholar 

  3. Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: The non-stochastic multi-armed bandit problem. To appear in SIAM journal of Computation (2002)

    Google Scholar 

  4. Berry, D.A., Fristedt, B.: Bandit Problems. Chapman and Hall, Boca Raton (1985)

    MATH  Google Scholar 

  5. Even-Dar, E., Mannor, S., Mansour, Y.: PAC Bounds for Multi-Armed Bandit and Markov Decision Processes. In: 15th Annual Conference on Computation Learning Theory, pp. 255–270 (2002)

    Google Scholar 

  6. Gittins, J., Jones, D.: A dynamic allocation index for the sequential design of experiments. In: Gani, J., Sarkadi, K., Vincze, I. (eds.) Progress in Statistics, pp. 241–266. North-Holland, Amsterdam (1974)

    Google Scholar 

  7. Jennison, C., Johnstone, I.M., Turnbull, B.W.: Asymptotically optimal procedures for sequential adaptive selection of the best of several normal means. In: Gupta, S.S., Berger, J. (eds.) Statistical decision theory and related topics III, vol. 2, pp. 55–86. Academic Press, London (1982)

    Google Scholar 

  8. Kulkarni, S.R., Lugosi, G.: Finite-time lower bounds for the two-armed bandit problem. IEEE Trans. Aut. Control 45(4), 711–714 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  9. Lai, T.L., Robbins, H.: Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics 6, 4–22 (1985)

    Article  MATH  MathSciNet  Google Scholar 

  10. Robbins, H.: Some aspects of sequential design of experiments. Bull. Amer. Math. Soc. 55, 527–535 (1952)

    Article  MathSciNet  Google Scholar 

  11. Ross, S.M.: Stochastic processes. Wiley, Chichester (1983)

    MATH  Google Scholar 

  12. Siegmund, D.: Sequential analysis–tests and confidence intervals. Springer, Heidelberg (1985)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Mannor, S., Tsitsiklis, J.N. (2003). Lower Bounds on the Sample Complexity of Exploration in the Multi-armed Bandit Problem. In: Schölkopf, B., Warmuth, M.K. (eds) Learning Theory and Kernel Machines. Lecture Notes in Computer Science(), vol 2777. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45167-9_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-45167-9_31

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-40720-1

  • Online ISBN: 978-3-540-45167-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics