Abstract
We consider the Multi-armed bandit problem under the PAC (“probably approximately correct”) model. It was shown by Even-Dar et al. [5] that given n arms, it suffices to play the arms a total of\(O\big(({n}/{\epsilon^2})\log ({1}/{\delta})\big)\) times to find an ε-optimal arm with probability of at least 1-δ. Our contribution is a matching lower bound that holds for any sampling policy. We also generalize the lower bound to a Bayesian setting, and to the case where the statistics of the arms are known but the identities of the arms are not.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Anthony, M., Bartlett, P.L.: Neural Network Learning; Theoretical Foundations. Cambridge University Press, Cambridge (1999)
Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: Gambling in a rigged casino: The adversarial multi-armed bandit problem. In: Proc. 36th Annual Symposium on Foundations of Computer Science, pp. 322–331. IEEE Computer Society Press, Los Alamitos (1995)
Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: The non-stochastic multi-armed bandit problem. To appear in SIAM journal of Computation (2002)
Berry, D.A., Fristedt, B.: Bandit Problems. Chapman and Hall, Boca Raton (1985)
Even-Dar, E., Mannor, S., Mansour, Y.: PAC Bounds for Multi-Armed Bandit and Markov Decision Processes. In: 15th Annual Conference on Computation Learning Theory, pp. 255–270 (2002)
Gittins, J., Jones, D.: A dynamic allocation index for the sequential design of experiments. In: Gani, J., Sarkadi, K., Vincze, I. (eds.) Progress in Statistics, pp. 241–266. North-Holland, Amsterdam (1974)
Jennison, C., Johnstone, I.M., Turnbull, B.W.: Asymptotically optimal procedures for sequential adaptive selection of the best of several normal means. In: Gupta, S.S., Berger, J. (eds.) Statistical decision theory and related topics III, vol. 2, pp. 55–86. Academic Press, London (1982)
Kulkarni, S.R., Lugosi, G.: Finite-time lower bounds for the two-armed bandit problem. IEEE Trans. Aut. Control 45(4), 711–714 (2000)
Lai, T.L., Robbins, H.: Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics 6, 4–22 (1985)
Robbins, H.: Some aspects of sequential design of experiments. Bull. Amer. Math. Soc. 55, 527–535 (1952)
Ross, S.M.: Stochastic processes. Wiley, Chichester (1983)
Siegmund, D.: Sequential analysis–tests and confidence intervals. Springer, Heidelberg (1985)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mannor, S., Tsitsiklis, J.N. (2003). Lower Bounds on the Sample Complexity of Exploration in the Multi-armed Bandit Problem. In: Schölkopf, B., Warmuth, M.K. (eds) Learning Theory and Kernel Machines. Lecture Notes in Computer Science(), vol 2777. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45167-9_31
Download citation
DOI: https://doi.org/10.1007/978-3-540-45167-9_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40720-1
Online ISBN: 978-3-540-45167-9
eBook Packages: Springer Book Archive