Multi-armed Bandit Algorithms and Empirical Evaluation

Vermorel, Joannès; Mohri, Mehryar

doi:10.1007/11564096_42

Joannès Vermorel²³ &
Mehryar Mohri²⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3720))

Included in the following conference series:

European Conference on Machine Learning

10k Accesses
192 Citations
3 Altmetric

Abstract

The multi-armed bandit problem for a gambler is to decide which arm of a K-slot machine to pull to maximize his total reward in a series of trials. Many real-world learning and optimization problems can be modeled in this way. Several strategies or algorithms have been proposed as a solution to this problem in the last two decades, but, to our knowledge, there has been no common evaluation of these algorithms.

This paper provides a preliminary empirical evaluation of several multi-armed bandit algorithms. It also describes and analyzes a new algorithm, Poker (Price Of Knowledge and Estimated Reward) whose performance compares favorably to that of other existing algorithms in several experiments. One remarkable outcome of our experiments is that the most naive approach, the ε-greedy strategy, proves to be often hard to beat.

Download to read the full chapter text

Chapter PDF

Bandit Problems

Asymptotically optimal algorithms for budgeted multiple play bandits

Article 16 May 2019

Sub-sampling for Multi-armed Bandits

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite Time Analysis of the Multiarmed Bandit Problem. Machine Learning 47(2/3), 235–256 (2002)
Article MATH Google Scholar
Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: Gambling in a Rigged Casino: the Adversarial Multi-Armed Bandit Problem. In: Proceedings of the 36th Annual Symposium on Foundations of Computer Science (FOCS 1995), pp. 322–331. IEEE Computer Society Press, Los Alamitos (1995)
Google Scholar
Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: The nonstochastic multiarmed bandit problem. SIAM Journal on Computing 32(1), 48–77 (2002)
Article MATH MathSciNet Google Scholar
Awerbuch, B., Kleinberg, R.: Adaptive Routing with End-to-End feedback: Distributed Learning and Geometric Approaches. In: Proceedings of the 36th ACM Symposium on Theory of Computing (STOC 2004), pp. 45–53 (2004)
Google Scholar
Cesa-Bianchi, N., Fischer, P.: Finite-Time Regret Bounds for the Multiarmed Bandit Problem. In: Proceedings of the 15th International Conference on Machine Learning (ICML 1998), pp. 100–108. Morgan Kaufmann, San Francisco (1998)
Google Scholar
Dearden, R.W.: Learning and Planning in Structured Worlds. PhD thesis, University of British Columbia (2000)
Google Scholar
Even-Dar, E., Mannor, S., Mansour, Y.: PAC Bounds for Multi-Armed Bandit and Markov Decision Processes. In: Kivinen, J., Sloan, R.H. (eds.) COLT 2002. LNCS (LNAI), vol. 2375, pp. 255–270. Springer, Heidelberg (2002)
Chapter Google Scholar
Frostig, E., Weiss, G.: Four proofs of gittins’ multiarmed bandit theorem. In: Applied Probability Trust (1999)
Google Scholar
Gittins, J.C.: Multiarmed Bandits Allocation Indices. Wiley, New York (1989)
Google Scholar
Gittins, J.C., Jones, D.M.: A dynamic allocation indices for the sequential design of experiments. In: Progress in Statistics, European Meeting of Statisticians, vol. 1, pp. 241–266 (1974)
Google Scholar
Hardwick, J.P., Stout, Q.F.: Bandit Strategies for Ethical Sequential Allocation. Computing Science and Statistics 23, 421–424 (1991)
Google Scholar
Kaelbling, L.P.: Learning in Embedded Systems. MIT Press, Cambridge (1993)
Google Scholar
Krishnamurthy, B., Wills, C., Zhang, Y.: On the use and performance of content distribution networks. In: SIGCOMM IMW, November 2001, pp. 169–182 (2001)
Google Scholar
Littlestone, N., Warmuth, M.K.: The Weighted Majority Algorithm. In: IEEE Symposium on Foundations of Computer Science, pp. 256–261 (1989)
Google Scholar
Luce, D.: Individual Choice Behavior. Wiley, Chichester (1959)
MATH Google Scholar
Mannor, S., Tsitsiklis, J.N.: The Sample Complexity of Exploration in the Multi-Armed Bandit Problem. In: Sixteenth Annual Conference on Computational Learning Theory, COLT (2003)
Google Scholar
Meuleau, N., Bourgine, P.: Exploration of Multi-State Environments: Local Measures and Back-Propagation of Uncertainty. Machine Learning 35(2), 117–154 (1999)
Article MATH Google Scholar
Rivest, R.L., Yin, Y.: Simulation Results for a New Two-Armed Bandit Heuristic. Technical report, Laboratory for Computer Science. M.I.T. (February 1993)
Google Scholar
Robbins, H.: Some Aspects of the Sequential Design of Experiments. Bulletin of the American Mathematical Society 55, 527–535 (1952)
Article MathSciNet Google Scholar
Strens, M.: Learning, Cooperation and Feedback in Pattern Recognition. PhD thesis, Physics Department, King’s College London (1999)
Google Scholar
Strens, M.: A Bayesian Framework for Reinforcement Learning. In: Proceedings of the 7th International Conf. on Machine Learning (2000)
Google Scholar
Sutton, R.S.: Integrated Architecture for Learning, Planning, and Reacting Based on Approximating Dynamic Programming. In: Proceedings of the seventh international conference (1990) on Machine learning, pp. 216–224. Morgan Kaufmann Publishers Inc., San Francisco (1990)
Google Scholar
Varaiya, P., Walrand, J., Buyukkoc, C.: Extensions of the multiarmed bandit problem: The discounted case. IEEE Transactions on Automatic Control AC-30, 426–439 (1985)
Article MATH MathSciNet Google Scholar
Watkins, C.J.C.H.: Learning from Delayed Rewards. Ph.D. thesis. Cambridge University (1989)
Google Scholar
Wyatt, J.: Exploration and Inference in Learning from Reinforcement. PhD thesis, University of Edinburgh (1997)
Google Scholar

Download references

Author information

Authors and Affiliations

École normale supérieure, 45 rue d’Ulm, 75005, Paris, France
Joannès Vermorel
Courant Institute of Mathematical Sciences, 719 Broadway, New York, NY, 10003, USA
Mehryar Mohri

Authors

Joannès Vermorel
View author publications
You can also search for this author in PubMed Google Scholar
Mehryar Mohri
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Economics of the University of Porto, Portugal
João Gama
Faculdade de Engenharia & LIAAD, Universidade do Porto, Portugal
Rui Camacho
LIAAD-INESC Porto L.A./Faculty of Economics, University of Porto, Rua de Ceuta, 118-6, 4050-190, Porto, Portugal
Pavel B. Brazdil
LIACC/FEP, Universidade do Porto, Portugal
Alípio Mário Jorge
LIAAD-INESC Porto LA / FEP, University of Porto, R. de Ceuta, 118, 6., 4050-190, Porto, Portugal
Luís Torgo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vermorel, J., Mohri, M. (2005). Multi-armed Bandit Algorithms and Empirical Evaluation. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds) Machine Learning: ECML 2005. ECML 2005. Lecture Notes in Computer Science(), vol 3720. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11564096_42

Download citation

DOI: https://doi.org/10.1007/11564096_42
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29243-2
Online ISBN: 978-3-540-31692-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Multi-armed Bandit Algorithms and Empirical Evaluation

Abstract

Chapter PDF

Similar content being viewed by others

Bandit Problems

Asymptotically optimal algorithms for budgeted multiple play bandits

Sub-sampling for Multi-armed Bandits

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Multi-armed Bandit Algorithms and Empirical Evaluation

Abstract

Chapter PDF

Similar content being viewed by others

Bandit Problems

Asymptotically optimal algorithms for budgeted multiple play bandits

Sub-sampling for Multi-armed Bandits

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation