Multi-Armed Bandit Processes

Cai, Xiaoqiang; Wu, Xianyi; Zhou, Xian

doi:10.1007/978-1-4899-7405-1_6

Xiaoqiang Cai⁵,
Xianyi Wu⁶ &
Xian Zhou⁷

Part of the book series: International Series in Operations Research & Management Science ((ISOR,volume 207))

1427 Accesses

Abstract

This chapter studies the powerful tool for stochastic scheduling, using theoretically elegant multi-armed bandit processes to maximize expected total discounted rewards. Multi-armed bandit models form a particular type of optimal resource allocation problems, in which a number of machines or processors are to be allocated to serve a set of competing projects (arms). We introduce the classical theory for multi-armed bandit processes in Section 6.1, and consider open bandit processes in which infinitely many arms are allowed in Section 6.2. An extension to generalized open bandit processes is given in Section 6.3. Finally, a concise account for closed bandit processes in continuous time is presented in Section 6.4.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bank, P., & Küchler, C. (2007). On Gittins’ index theorem in continuous time. Stochastic Processes and Their Applications, 117, 1357–1371.
Article Google Scholar
Banks, J. S., & Sundaram, R. K. (1994). Switching costs and the Gittins index. Econometrica: Journal of the Econometric Society, 62(3), 687–694.
Article Google Scholar
Bertsimas, D., & Niǹo-Mora, J. (1996). Conservation laws, extended polymatroid and multi-armed bandit problems: A unified approachto indexable systems. Mathematics of Operations Research, 21, 257–306.
Article Google Scholar
Crosbie, J. H., & Glazebrook, K. D. (2000). Index policies and a novel performance space structure for a class of generalised branching bandit problems. Mathematics of Operations Research, 25, 281–297.
Article Google Scholar
EL Karoui, N., & Karatzas, I. (1993). General Gittins index processes in discrete time. Proceedings of the National Academy of Sciences of the United States of America, 90, 1232–1236.
Google Scholar
EL Karoui, N., & Karatzas, I. (1994). Dynamic allocation problems in continuous time. The Annals of Applied Probability, 4(2), 255–286.
Google Scholar
EL Karoui, N., & Karatzas, I. (1997). Synchronization and optimality for multi-armed bandit problems in continuous time. Computational and Applied Mathamatics, 16(2), 117–151.
Google Scholar
Gittins, J. C. (1979). Bandit processes and dynamic allocation indices (with discussion). Joural of Royal Statististical Society B, 41, 148–164.
Google Scholar
Gittins, J. C. (1989). Multi-armed bandit allocation indices (Wiley-Interscience series in systems and optimization). Chichester: Wiley. ISBN:0-471-92059-2.
Google Scholar
Gittins, J. C., & Jones, D. (1974). A Dynamic allocation index for the sequential allocation of experiments. In J. Gani, et al. (Eds.), Progress in statistics. Amsterdam: North Holland.
Google Scholar
Gittins, J. C., & Glazebrook, K. D. (1977). On Bayesian models in stochastic scheduling. Journal of Applied Probability, 14, 556–565.
Article Google Scholar
Glazebrook, K. D., & Owen, R. W. (1991). New results for generalised bandit processes. International Journal of Systems Science, 22, 479–494.
Article Google Scholar
Ishikida, T., & Varaiya, P. (1994). Multi-armed bandit problem revisited. Journal of Optimization Theory and Applications, 83(1), 113–154.
Article Google Scholar
Kaspi, H., & Mandelbaum, A. (1995). Lévy bandits: Multi-armed bandits driven by Lévy processes. Annals of Applied Probability, 5(2), 541–565.
Article Google Scholar
Kaspi, H., & Mandelbaum, A. (1998). Multi-armed bandits in discrete and continuous time. Annals of Applied Probability, 8(4), 1270–1290.
Article Google Scholar
Lai, T. L., & Ying, Z. (1988). Open bandit processes and optimal scheduling of queueing networks. Advances in Applied Probability, 20, 447–472.
Article Google Scholar
Mandelbaum, A. (1986). Discrete multiarmed bandits and multiparameter processes. Probability Theory and Related Fields, 71, 129–147.
Article Google Scholar
Mandelbaum, A. (1987). Continuous multi-armed bandits and multiparameter processes. Annals of Probabability, 15(4), 1527–1556.
Article Google Scholar
Nash, P. (1973). Optimal allocation of resources between research projects. Ph.D. Thesis, Cambridge University.
Google Scholar
Nash, P. (1980). A generalized bandit problem. Journal of the Royal Statistical Society, Series B, 42(2), 165–169.
Google Scholar
Robbins, H. (1952). Some aspects of the sequential design of experiments. Bulletin of the American Mathematical Society, 58(5), 527–535.
Article Google Scholar
Snell, L. (1952). Applications of martingale systems theorems. Transactions of American Mathematical Society, 73, 293–312.
Article Google Scholar
Tsitsiklis, J. N. (1994). A short proof of the Gittins index theorem. The Annals of Applied Probability, 4(1), 194–199.
Article Google Scholar
Van Oyen, M. P., Pandelis, D. G., & Teneketzis, D. (1992). Optimality of index policies for stochastic scheduling with switching penalties. Journal of Applied Probability, 29(4), 957–966.
Article Google Scholar
Varaiya, P., Walrand, J., & Buyukkoc, C. (1985). Extensions of the multiarmed bandit problem: The discounted case. IEEE Transactions on Automatic Control, 230, 426–439.
Article Google Scholar
Weber, R. R. (1992). On the Gittins index for multiarmed bandits. Annals of Probability, 2(4), 1024–1033.
Article Google Scholar
Weiss, G. (1988). Branching bandit processes. Probability in Engineering and Information Science, 2, 269–278.
Article Google Scholar
Whittle, P. (1980). Multi-armed bandits and the Gittins index. Journal of Royal Statistical Society, Series B, 42(2), 143–149.
Google Scholar
Whittle, P. (1981). Arm-acquiring bandits. The Annals of Probability, 9(2), 284–292
Article Google Scholar
Whittle, P. (1988). Restless bandits: Activity allocation in a changing world. Journal of Applied Probability, 25, 287–298. A Celebration of Applied Probability.
Google Scholar
Wu, X., & Zhou, X. (2013). Open bandit processes with uncountable states and time-backward effects. Journal of Applied Probability, 50(2), 388–402.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong SAR
Xiaoqiang Cai
Department of Statistics and Actuarial Science, East China Normal University, Shanghai, People’s Republic of China
Xianyi Wu
Department of Applied Finance and Actuarial Studies, Macquarie University, North Ryde, Sydney, Australia
Xian Zhou

Authors

Xiaoqiang Cai
View author publications
You can also search for this author in PubMed Google Scholar
Xianyi Wu
View author publications
You can also search for this author in PubMed Google Scholar
Xian Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Cai, X., Wu, X., Zhou, X. (2014). Multi-Armed Bandit Processes. In: Optimal Stochastic Scheduling. International Series in Operations Research & Management Science, vol 207. Springer, Boston, MA. https://doi.org/10.1007/978-1-4899-7405-1_6

Download citation

DOI: https://doi.org/10.1007/978-1-4899-7405-1_6
Published: 23 January 2014
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4899-7404-4
Online ISBN: 978-1-4899-7405-1
eBook Packages: Business and EconomicsBusiness and Management (R0)

Publish with us

Policies and ethics