Recommended Reading
Arora R, Dekel O, Tewari A (2009) Online bandit learning against an adaptive adversary: from regret to policy regret. In: Proceedings of the 29th international conference on machine learning, Montreal
Auer P, Cesa-Bianchi N, Fischer P (2002) Finite-time analysis of the multiarmed bandit problem. Mach Learn J 47(2–3):235–256
Auer P, Cesa-Bianchi N, Freund Y, Schapire R (2002) The nonstochastic multiarmed bandit problem. SIAM J Comput 32(1):48–77
Awerbuch B, Kleinberg R (2004) Adaptive routing with end-to-end feedback: distributed learning and geometric approaches. In: Proceedings of the 36th annual ACM symposium on theory of computing, Chicago. ACM, pp 45–53
Blackwell D (1956) An analog of the minimax theorem for vector payoffs. Pac J Math 6:1–8
Bubeck S, Munos R, Stoltz G (2009) Pure exploration in multi-armed bandits problems. In: Proceedings of the 20th international conference on algorithmic learning theory, Porto
Flaxman AD, Kalai AT, McMahan HB (2005) Online convex optimization in the bandit setting: gradient descent without a gradient. In: Proceedings of the 16th annual ACM-SIAM symposium on discrete algorithms, Philadelphia. Society for Industrial and Applied Mathematics, pp 385–394
Gittins J, Glazebrook K, Weber R (2011) Multi-armed bandit allocation indices, 2nd edn. Wiley, Hoboken
Hannan J (1957) Approximation to Bayes risk in repeated play. Contrib. Theory Games 3:97–139
Kocsis L, Szepesvari C (2006) Bandit based Monte-Carlo planning. In: Proceedings of the 15th European conference on machine learning, Vienna, pp 282–293
Lai TL, Robbins H (1985) Asymptotically efficient adaptive allocation rules. Adv Appl Math 6:4–22
Li L, Chu W, Langford J, Schapire R (2010) A contextual-bandit approach to personalized news article recommendation. In: Proceedings of the 19th international conference on world wide web, Raleigh
Robbins H (1952) Some aspects of the sequential design of experiments. Bull Am Math Soc 58:527–535
Thompson W (1933) On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Bull Am Math Soc 25:285–294
Wang CC, Kulkarni S, Poor H (2005) Bandit problems with side observations. IEEE Trans Autom Control 50(3):338–355
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer Science+Business Media New York
About this entry
Cite this entry
Cesa-Bianchi, N. (2014). Multi-armed Bandit Problem. In: Kao, MY. (eds) Encyclopedia of Algorithms. Springer, Boston, MA. https://doi.org/10.1007/978-3-642-27848-8_768-1
Download citation
DOI: https://doi.org/10.1007/978-3-642-27848-8_768-1
Received:
Accepted:
Published:
Publisher Name: Springer, Boston, MA
Online ISBN: 978-3-642-27848-8
eBook Packages: Springer Reference Computer SciencesReference Module Computer Science and Engineering