Lower Bounds on the Sample Complexity of Exploration in the Multi-armed Bandit Problem

Mannor, Shie; Tsitsiklis, John N.

doi:10.1007/978-3-540-45167-9_31

Shie Mannor⁸ &
John N. Tsitsiklis⁸

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2777))

5302 Accesses
13 Citations

Abstract

We consider the Multi-armed bandit problem under the PAC (“probably approximately correct”) model. It was shown by Even-Dar et al. [5] that given n arms, it suffices to play the arms a total of\(O\big(({n}/{\epsilon^2})\log ({1}/{\delta})\big)\) times to find an ε-optimal arm with probability of at least 1-δ. Our contribution is a matching lower bound that holds for any sampling policy. We also generalize the lower bound to a Bayesian setting, and to the case where the statistics of the arms are known but the identities of the arms are not.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Anthony, M., Bartlett, P.L.: Neural Network Learning; Theoretical Foundations. Cambridge University Press, Cambridge (1999)
Book MATH Google Scholar
Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: Gambling in a rigged casino: The adversarial multi-armed bandit problem. In: Proc. 36th Annual Symposium on Foundations of Computer Science, pp. 322–331. IEEE Computer Society Press, Los Alamitos (1995)
Google Scholar
Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: The non-stochastic multi-armed bandit problem. To appear in SIAM journal of Computation (2002)
Google Scholar
Berry, D.A., Fristedt, B.: Bandit Problems. Chapman and Hall, Boca Raton (1985)
MATH Google Scholar
Even-Dar, E., Mannor, S., Mansour, Y.: PAC Bounds for Multi-Armed Bandit and Markov Decision Processes. In: 15th Annual Conference on Computation Learning Theory, pp. 255–270 (2002)
Google Scholar
Gittins, J., Jones, D.: A dynamic allocation index for the sequential design of experiments. In: Gani, J., Sarkadi, K., Vincze, I. (eds.) Progress in Statistics, pp. 241–266. North-Holland, Amsterdam (1974)
Google Scholar
Jennison, C., Johnstone, I.M., Turnbull, B.W.: Asymptotically optimal procedures for sequential adaptive selection of the best of several normal means. In: Gupta, S.S., Berger, J. (eds.) Statistical decision theory and related topics III, vol. 2, pp. 55–86. Academic Press, London (1982)
Google Scholar
Kulkarni, S.R., Lugosi, G.: Finite-time lower bounds for the two-armed bandit problem. IEEE Trans. Aut. Control 45(4), 711–714 (2000)
Article MATH MathSciNet Google Scholar
Lai, T.L., Robbins, H.: Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics 6, 4–22 (1985)
Article MATH MathSciNet Google Scholar
Robbins, H.: Some aspects of sequential design of experiments. Bull. Amer. Math. Soc. 55, 527–535 (1952)
Article MathSciNet Google Scholar
Ross, S.M.: Stochastic processes. Wiley, Chichester (1983)
MATH Google Scholar
Siegmund, D.: Sequential analysis–tests and confidence intervals. Springer, Heidelberg (1985)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Laboratory for Information and Decision Systems, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
Shie Mannor & John N. Tsitsiklis

Authors

Shie Mannor
View author publications
You can also search for this author in PubMed Google Scholar
John N. Tsitsiklis
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

MPI for Biological Cybernetics, Spemannstr. 38, 72076, Tübingen, Germany
Bernhard Schölkopf
University of California, Santa Cruz
Manfred K. Warmuth

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mannor, S., Tsitsiklis, J.N. (2003). Lower Bounds on the Sample Complexity of Exploration in the Multi-armed Bandit Problem. In: Schölkopf, B., Warmuth, M.K. (eds) Learning Theory and Kernel Machines. Lecture Notes in Computer Science(), vol 2777. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45167-9_31

Download citation

DOI: https://doi.org/10.1007/978-3-540-45167-9_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40720-1
Online ISBN: 978-3-540-45167-9
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics