Skip to main content

Multi-armed Bandit Problem

  • Living reference work entry
  • First Online:
Book cover Encyclopedia of Algorithms

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Recommended Reading

  1. Arora R, Dekel O, Tewari A (2009) Online bandit learning against an adaptive adversary: from regret to policy regret. In: Proceedings of the 29th international conference on machine learning, Montreal

    Google Scholar 

  2. Auer P, Cesa-Bianchi N, Fischer P (2002) Finite-time analysis of the multiarmed bandit problem. Mach Learn J 47(2–3):235–256

    Article  MATH  Google Scholar 

  3. Auer P, Cesa-Bianchi N, Freund Y, Schapire R (2002) The nonstochastic multiarmed bandit problem. SIAM J Comput 32(1):48–77

    Article  MATH  MathSciNet  Google Scholar 

  4. Awerbuch B, Kleinberg R (2004) Adaptive routing with end-to-end feedback: distributed learning and geometric approaches. In: Proceedings of the 36th annual ACM symposium on theory of computing, Chicago. ACM, pp 45–53

    Google Scholar 

  5. Blackwell D (1956) An analog of the minimax theorem for vector payoffs. Pac J Math 6:1–8

    Article  MATH  MathSciNet  Google Scholar 

  6. Bubeck S, Munos R, Stoltz G (2009) Pure exploration in multi-armed bandits problems. In: Proceedings of the 20th international conference on algorithmic learning theory, Porto

    Google Scholar 

  7. Flaxman AD, Kalai AT, McMahan HB (2005) Online convex optimization in the bandit setting: gradient descent without a gradient. In: Proceedings of the 16th annual ACM-SIAM symposium on discrete algorithms, Philadelphia. Society for Industrial and Applied Mathematics, pp 385–394

    Google Scholar 

  8. Gittins J, Glazebrook K, Weber R (2011) Multi-armed bandit allocation indices, 2nd edn. Wiley, Hoboken

    Book  MATH  Google Scholar 

  9. Hannan J (1957) Approximation to Bayes risk in repeated play. Contrib. Theory Games 3:97–139

    Google Scholar 

  10. Kocsis L, Szepesvari C (2006) Bandit based Monte-Carlo planning. In: Proceedings of the 15th European conference on machine learning, Vienna, pp 282–293

    Google Scholar 

  11. Lai TL, Robbins H (1985) Asymptotically efficient adaptive allocation rules. Adv Appl Math 6:4–22

    Article  MATH  MathSciNet  Google Scholar 

  12. Li L, Chu W, Langford J, Schapire R (2010) A contextual-bandit approach to personalized news article recommendation. In: Proceedings of the 19th international conference on world wide web, Raleigh

    Google Scholar 

  13. Robbins H (1952) Some aspects of the sequential design of experiments. Bull Am Math Soc 58:527–535

    Article  MATH  Google Scholar 

  14. Thompson W (1933) On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Bull Am Math Soc 25:285–294

    MATH  Google Scholar 

  15. Wang CC, Kulkarni S, Poor H (2005) Bandit problems with side observations. IEEE Trans Autom Control 50(3):338–355

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nicolò Cesa-Bianchi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer Science+Business Media New York

About this entry

Cite this entry

Cesa-Bianchi, N. (2014). Multi-armed Bandit Problem. In: Kao, MY. (eds) Encyclopedia of Algorithms. Springer, Boston, MA. https://doi.org/10.1007/978-3-642-27848-8_768-1

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-27848-8_768-1

  • Received:

  • Accepted:

  • Published:

  • Publisher Name: Springer, Boston, MA

  • Online ISBN: 978-3-642-27848-8

  • eBook Packages: Springer Reference Computer SciencesReference Module Computer Science and Engineering

Publish with us

Policies and ethics