Abstract
In this chapter, we present the formulation, theoretical bound, and algorithms for the stochastic MAB problem. Several important variants of stochastic MAB and their algorithms are also discussed including multiplay MAB, MAB with switching costs, and pure exploration MAB.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
This type of strategies are also called no-regret policies. But this term is confusing and is thus omitted here.
- 2.
MAB with switching costs can be cast as a restless bandit problem discussed in Chap. 3.
References
Jean-Yves Audibert and Sébastien Bubeck. “Best arm identification in multi-armed bandits”. In: COLT-23th Conference on Learning Theory. 2010, 13–p.
P. Auer, N. C. Bianchi, and P. Fischer. “Finite-time Analysis of the Multiarmed Bandit Problem”. In: Mach. Learn. 47.2-3 (May 2002), pp. 235–256. ISSN: 0885-6125.
Shipra Agrawal and Navin Goyal. “Analysis of Thompson Sampling for the Multi-armed Bandit Problem.” In: 2012.
Jean-Yves Audibert, Rémi Munos, and Csaba Szepesvári. “Exploration-exploitation Tradeoff Using Variance Estimates in Multi-armed Bandits”. In: Theor. Comput. Sci. 410.19 (Apr. 2009), pp. 1876–1902. ISSN: 0304-3975.
V. Anantharam, P. Varaiya, and J. Walrand. “Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-Part I: I.I.D. rewards”. In: Automatic Control, IEEE Transactions on 32.11 (Nov. 1987), pp. 968–976. ISSN: 0018-9286.
Sébastien Bubeck and Nicol‘o Cesa-Bianchi. “Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems”. In: Foundations and Trends in Machine Learning 5.1 (2012), pp. 1–122.
Sébastien Bubeck, Rémi Munos, and Gilles Stoltz. “Pure exploration in finitely-armed and continuous-armed bandits”. In: Theor. Comput. Sci. 412.19 (2011), pp. 1832–1852.
Jeffrey S Banks and Rangarajan K Sundaram. “Switching costs and the Gittins index”. In: Econometrica 62.3 (1994), pp. 687–694.
Shouyuan Chen et al. “Combinatorial pure exploration of multi-armed bandits”. In: Advances in Neural Information Processing Systems. 2014, pp. 379–387.
Victor Gabillon et al. “Multi-bandit best arm identification”. In: Advances in Neural Information Processing Systems. 2011, pp. 2222–2230.
Y. Gai, B. Krishnamachari, and R. Jain. “Combinatorial Network Optimization with Unknown Variables: Multi-Armed Bandits with Linear Rewards and Individual Observations”. In: IEEE/ACM Transactions on Networking 20.5 (Oct. 2012), pp. 1466–1478.
Sudipto Guha and Kamesh Munagala. “Multi-armed bandits with metric switching costs”. In: International Colloquium on Automata, Languages, and Programming. Springer. 2009, pp. 496–507.
Tackseung Jun. “A survey on the bandit problem with switching costs”. In: De Economist 152.4 (2004), pp. 513–541.
Shivaram Kalyanakrishnan et al. “PAC subset selection in stochastic multi-armed bandits”. In: Proceedings of the 29th International Conference on Machine Learning (ICML-12). 2012, pp. 655–662.
Junpei Komiyama, Junya Honda, and Hiroshi Nakagawa. “Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays”. In: Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. 2015, pp. 1152–1161.
Shivaram Kalyanakrishnan and Peter Stone. “Efficient selection of multiple bandit arms: Theory and practice”. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10). 2010, pp. 511–518.
T L Lai and H. Robbins. “Asymptotically efficient adaptive allocation rules”. In: Advances in Applied Mathematics 6.1 (1985), pp. 4–22.
Rangarajan K Sundaram. “Generalized bandit problems”. In: Social choice and strategic decisions. Springer, 2005, pp. 131–162.
William R Thompson. “On the likelihood that one unknown probability exceeds another in view of the evidence of two samples”. In: Biometrika 25.3/4 (1933), pp. 285–294.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this chapter
Cite this chapter
Zheng, R., Hua, C. (2016). Stochastic Multi-armed Bandit. In: Sequential Learning and Decision-Making in Wireless Resource Management. Wireless Networks. Springer, Cham. https://doi.org/10.1007/978-3-319-50502-2_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-50502-2_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-50501-5
Online ISBN: 978-3-319-50502-2
eBook Packages: Computer ScienceComputer Science (R0)