Stochastic Multi-armed Bandit

Zheng, Rong; Hua, Cunqing

doi:10.1007/978-3-319-50502-2_2

Rong Zheng⁴ &
Cunqing Hua⁵

Part of the book series: Wireless Networks ((WN))

746 Accesses

Abstract

In this chapter, we present the formulation, theoretical bound, and algorithms for the stochastic MAB problem. Several important variants of stochastic MAB and their algorithms are also discussed including multiplay MAB, MAB with switching costs, and pure exploration MAB.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
This type of strategies are also called no-regret policies. But this term is confusing and is thus omitted here.
2.
MAB with switching costs can be cast as a restless bandit problem discussed in Chap. 3.

References

Jean-Yves Audibert and Sébastien Bubeck. “Best arm identification in multi-armed bandits”. In: COLT-23th Conference on Learning Theory. 2010, 13–p.
Google Scholar
P. Auer, N. C. Bianchi, and P. Fischer. “Finite-time Analysis of the Multiarmed Bandit Problem”. In: Mach. Learn. 47.2-3 (May 2002), pp. 235–256. ISSN: 0885-6125.
Google Scholar
Shipra Agrawal and Navin Goyal. “Analysis of Thompson Sampling for the Multi-armed Bandit Problem.” In: 2012.
Google Scholar
Jean-Yves Audibert, Rémi Munos, and Csaba Szepesvári. “Exploration-exploitation Tradeoff Using Variance Estimates in Multi-armed Bandits”. In: Theor. Comput. Sci. 410.19 (Apr. 2009), pp. 1876–1902. ISSN: 0304-3975.
Google Scholar
V. Anantharam, P. Varaiya, and J. Walrand. “Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-Part I: I.I.D. rewards”. In: Automatic Control, IEEE Transactions on 32.11 (Nov. 1987), pp. 968–976. ISSN: 0018-9286.
Google Scholar
Sébastien Bubeck and Nicol‘o Cesa-Bianchi. “Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems”. In: Foundations and Trends in Machine Learning 5.1 (2012), pp. 1–122.
Google Scholar
Sébastien Bubeck, Rémi Munos, and Gilles Stoltz. “Pure exploration in finitely-armed and continuous-armed bandits”. In: Theor. Comput. Sci. 412.19 (2011), pp. 1832–1852.
Google Scholar
Jeffrey S Banks and Rangarajan K Sundaram. “Switching costs and the Gittins index”. In: Econometrica 62.3 (1994), pp. 687–694.
Google Scholar
Shouyuan Chen et al. “Combinatorial pure exploration of multi-armed bandits”. In: Advances in Neural Information Processing Systems. 2014, pp. 379–387.
Google Scholar
Victor Gabillon et al. “Multi-bandit best arm identification”. In: Advances in Neural Information Processing Systems. 2011, pp. 2222–2230.
Google Scholar
Y. Gai, B. Krishnamachari, and R. Jain. “Combinatorial Network Optimization with Unknown Variables: Multi-Armed Bandits with Linear Rewards and Individual Observations”. In: IEEE/ACM Transactions on Networking 20.5 (Oct. 2012), pp. 1466–1478.
Google Scholar
Sudipto Guha and Kamesh Munagala. “Multi-armed bandits with metric switching costs”. In: International Colloquium on Automata, Languages, and Programming. Springer. 2009, pp. 496–507.
Google Scholar
Tackseung Jun. “A survey on the bandit problem with switching costs”. In: De Economist 152.4 (2004), pp. 513–541.
Google Scholar
Shivaram Kalyanakrishnan et al. “PAC subset selection in stochastic multi-armed bandits”. In: Proceedings of the 29th International Conference on Machine Learning (ICML-12). 2012, pp. 655–662.
Google Scholar
Junpei Komiyama, Junya Honda, and Hiroshi Nakagawa. “Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays”. In: Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. 2015, pp. 1152–1161.
Google Scholar
Shivaram Kalyanakrishnan and Peter Stone. “Efficient selection of multiple bandit arms: Theory and practice”. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10). 2010, pp. 511–518.
Google Scholar
T L Lai and H. Robbins. “Asymptotically efficient adaptive allocation rules”. In: Advances in Applied Mathematics 6.1 (1985), pp. 4–22.
Google Scholar
Rangarajan K Sundaram. “Generalized bandit problems”. In: Social choice and strategic decisions. Springer, 2005, pp. 131–162.
Google Scholar
William R Thompson. “On the likelihood that one unknown probability exceeds another in view of the evidence of two samples”. In: Biometrika 25.3/4 (1933), pp. 285–294.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computing, McMaster University, Hamilton, ON, Canada
Rong Zheng
School of Information Security Engineering, Shanghai Jiao Tong University, Shanghai, China
Cunqing Hua

Authors

Rong Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Cunqing Hua
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rong Zheng .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Zheng, R., Hua, C. (2016). Stochastic Multi-armed Bandit. In: Sequential Learning and Decision-Making in Wireless Resource Management. Wireless Networks. Springer, Cham. https://doi.org/10.1007/978-3-319-50502-2_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-50502-2_2
Published: 06 January 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-50501-5
Online ISBN: 978-3-319-50502-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics