Years and Authors of Summarized Original Work
2002; Auer, Cesa-Bianchi, Freund, Schapire
2002; Auer, Cesa-Bianchi, Fischer
Problem Definition
A multi-armed bandit is a sequential decision problem defined on a set of actions. At each time step, the decision maker selects an action from the set and obtains an observable payoff. The goal is to maximize the total payoff obtained in a sequence of decisions. The name banditrefers to the colloquial term for a slot machine (“one-armed bandit” in American slang) and to the decision problem, faced by a casino gambler, of choosing which slot machine to play next. Bandit problems naturally address the fundamental trade-off between exploration and exploitation in sequential experiments. Indeed, the decision maker must use a strategy (called allocation policy) able to balance the exploitation of actions that did well in the past with the exploration of actions that might give higher payoffs in the future. Although the original motivation came from...
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsRecommended Reading
Arora R, Dekel O, Tewari A (2009) Online bandit learning against an adaptive adversary: from regret to policy regret. In: Proceedings of the 29th international conference on machine learning, Montreal
Auer P, Cesa-Bianchi N, Fischer P (2002) Finite-time analysis of the multiarmed bandit problem. Mach Learn J 47(2–3):235–256
Auer P, Cesa-Bianchi N, Freund Y, Schapire R (2002) The nonstochastic multiarmed bandit problem. SIAM J Comput 32(1):48–77
Awerbuch B, Kleinberg R (2004) Adaptive routing with end-to-end feedback: distributed learning and geometric approaches. In: Proceedings of the 36th annual ACM symposium on theory of computing, Chicago. ACM, pp 45–53
Blackwell D (1956) An analog of the minimax theorem for vector payoffs. Pac J Math 6:1–8
Bubeck S, Munos R, Stoltz G (2009) Pure exploration in multi-armed bandits problems. In: Proceedings of the 20th international conference on algorithmic learning theory, Porto
Flaxman AD, Kalai AT, McMahan HB (2005) Online convex optimization in the bandit setting: gradient descent without a gradient. In: Proceedings of the 16th annual ACM-SIAM symposium on discrete algorithms, Philadelphia. Society for Industrial and Applied Mathematics, pp 385–394
Gittins J, Glazebrook K, Weber R (2011) Multi-armed bandit allocation indices, 2nd edn. Wiley, Hoboken
Hannan J (1957) Approximation to Bayes risk in repeated play. Contrib. Theory Games 3:97–139
Kocsis L, Szepesvari C (2006) Bandit based Monte-Carlo planning. In: Proceedings of the 15th European conference on machine learning, Vienna, pp 282–293
Lai TL, Robbins H (1985) Asymptotically efficient adaptive allocation rules. Adv Appl Math 6: 4–22
Li L, Chu W, Langford J, Schapire R (2010) A contextual-bandit approach to personalized news article recommendation. In: Proceedings of the 19th international conference on world wide web, Raleigh
Robbins H (1952) Some aspects of the sequential design of experiments. Bull Am Math Soc 58:527–535
Thompson W (1933) On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Bull Am Math Soc 25: 285–294
Wang CC, Kulkarni S, Poor H (2005) Bandit problems with side observations. IEEE Trans Autom Control 50(3):338–355
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer Science+Business Media New York
About this entry
Cite this entry
Cesa-Bianchi, N. (2016). Multi-armed Bandit Problem. In: Kao, MY. (eds) Encyclopedia of Algorithms. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-2864-4_768
Download citation
DOI: https://doi.org/10.1007/978-1-4939-2864-4_768
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4939-2863-7
Online ISBN: 978-1-4939-2864-4
eBook Packages: Computer ScienceReference Module Computer Science and Engineering