Multi-armed Bandit Problem

Cesa-Bianchi, Nicolò

doi:10.1007/978-3-642-27848-8_768-1

Nicolò Cesa-Bianchi²

662 Accesses
1 Citations

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Recommended Reading

Arora R, Dekel O, Tewari A (2009) Online bandit learning against an adaptive adversary: from regret to policy regret. In: Proceedings of the 29th international conference on machine learning, Montreal
Google Scholar
Auer P, Cesa-Bianchi N, Fischer P (2002) Finite-time analysis of the multiarmed bandit problem. Mach Learn J 47(2–3):235–256
Article MATH Google Scholar
Auer P, Cesa-Bianchi N, Freund Y, Schapire R (2002) The nonstochastic multiarmed bandit problem. SIAM J Comput 32(1):48–77
Article MATH MathSciNet Google Scholar
Awerbuch B, Kleinberg R (2004) Adaptive routing with end-to-end feedback: distributed learning and geometric approaches. In: Proceedings of the 36th annual ACM symposium on theory of computing, Chicago. ACM, pp 45–53
Google Scholar
Blackwell D (1956) An analog of the minimax theorem for vector payoffs. Pac J Math 6:1–8
Article MATH MathSciNet Google Scholar
Bubeck S, Munos R, Stoltz G (2009) Pure exploration in multi-armed bandits problems. In: Proceedings of the 20th international conference on algorithmic learning theory, Porto
Google Scholar
Flaxman AD, Kalai AT, McMahan HB (2005) Online convex optimization in the bandit setting: gradient descent without a gradient. In: Proceedings of the 16th annual ACM-SIAM symposium on discrete algorithms, Philadelphia. Society for Industrial and Applied Mathematics, pp 385–394
Google Scholar
Gittins J, Glazebrook K, Weber R (2011) Multi-armed bandit allocation indices, 2nd edn. Wiley, Hoboken
Book MATH Google Scholar
Hannan J (1957) Approximation to Bayes risk in repeated play. Contrib. Theory Games 3:97–139
Google Scholar
Kocsis L, Szepesvari C (2006) Bandit based Monte-Carlo planning. In: Proceedings of the 15th European conference on machine learning, Vienna, pp 282–293
Google Scholar
Lai TL, Robbins H (1985) Asymptotically efficient adaptive allocation rules. Adv Appl Math 6:4–22
Article MATH MathSciNet Google Scholar
Li L, Chu W, Langford J, Schapire R (2010) A contextual-bandit approach to personalized news article recommendation. In: Proceedings of the 19th international conference on world wide web, Raleigh
Google Scholar
Robbins H (1952) Some aspects of the sequential design of experiments. Bull Am Math Soc 58:527–535
Article MATH Google Scholar
Thompson W (1933) On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Bull Am Math Soc 25:285–294
MATH Google Scholar
Wang CC, Kulkarni S, Poor H (2005) Bandit problems with side observations. IEEE Trans Autom Control 50(3):338–355
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Dipartimento di Informatica, Università degli Studi di Milano, Milano, Italy
Nicolò Cesa-Bianchi

Authors

Nicolò Cesa-Bianchi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nicolò Cesa-Bianchi .

Editor information

Editors and Affiliations

Dept. Electrical Engineering & Computer Science, Northwestern University, Evanston, Illinois, USA
Ming-Yang Kao

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Cesa-Bianchi, N. (2014). Multi-armed Bandit Problem. In: Kao, MY. (eds) Encyclopedia of Algorithms. Springer, Boston, MA. https://doi.org/10.1007/978-3-642-27848-8_768-1

Download citation

DOI: https://doi.org/10.1007/978-3-642-27848-8_768-1
Received: 15 October 2014
Accepted: 15 October 2014
Published: 06 November 2014
Publisher Name: Springer, Boston, MA
Online ISBN: 978-3-642-27848-8
eBook Packages: Springer Reference Computer SciencesReference Module Computer Science and Engineering

Publish with us

Policies and ethics