Abstract
This chapter studies the powerful tool for stochastic scheduling, using theoretically elegant multi-armed bandit processes to maximize expected total discounted rewards. Multi-armed bandit models form a particular type of optimal resource allocation problems, in which a number of machines or processors are to be allocated to serve a set of competing projects (arms). We introduce the classical theory for multi-armed bandit processes in Section 6.1, and consider open bandit processes in which infinitely many arms are allowed in Section 6.2. An extension to generalized open bandit processes is given in Section 6.3. Finally, a concise account for closed bandit processes in continuous time is presented in Section 6.4.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bank, P., & Küchler, C. (2007). On Gittins’ index theorem in continuous time. Stochastic Processes and Their Applications, 117, 1357–1371.
Banks, J. S., & Sundaram, R. K. (1994). Switching costs and the Gittins index. Econometrica: Journal of the Econometric Society, 62(3), 687–694.
Bertsimas, D., & Niǹo-Mora, J. (1996). Conservation laws, extended polymatroid and multi-armed bandit problems: A unified approachto indexable systems. Mathematics of Operations Research, 21, 257–306.
Crosbie, J. H., & Glazebrook, K. D. (2000). Index policies and a novel performance space structure for a class of generalised branching bandit problems. Mathematics of Operations Research, 25, 281–297.
EL Karoui, N., & Karatzas, I. (1993). General Gittins index processes in discrete time. Proceedings of the National Academy of Sciences of the United States of America, 90, 1232–1236.
EL Karoui, N., & Karatzas, I. (1994). Dynamic allocation problems in continuous time. The Annals of Applied Probability, 4(2), 255–286.
EL Karoui, N., & Karatzas, I. (1997). Synchronization and optimality for multi-armed bandit problems in continuous time. Computational and Applied Mathamatics, 16(2), 117–151.
Gittins, J. C. (1979). Bandit processes and dynamic allocation indices (with discussion). Joural of Royal Statististical Society B, 41, 148–164.
Gittins, J. C. (1989). Multi-armed bandit allocation indices (Wiley-Interscience series in systems and optimization). Chichester: Wiley. ISBN:0-471-92059-2.
Gittins, J. C., & Jones, D. (1974). A Dynamic allocation index for the sequential allocation of experiments. In J. Gani, et al. (Eds.), Progress in statistics. Amsterdam: North Holland.
Gittins, J. C., & Glazebrook, K. D. (1977). On Bayesian models in stochastic scheduling. Journal of Applied Probability, 14, 556–565.
Glazebrook, K. D., & Owen, R. W. (1991). New results for generalised bandit processes. International Journal of Systems Science, 22, 479–494.
Ishikida, T., & Varaiya, P. (1994). Multi-armed bandit problem revisited. Journal of Optimization Theory and Applications, 83(1), 113–154.
Kaspi, H., & Mandelbaum, A. (1995). Lévy bandits: Multi-armed bandits driven by Lévy processes. Annals of Applied Probability, 5(2), 541–565.
Kaspi, H., & Mandelbaum, A. (1998). Multi-armed bandits in discrete and continuous time. Annals of Applied Probability, 8(4), 1270–1290.
Lai, T. L., & Ying, Z. (1988). Open bandit processes and optimal scheduling of queueing networks. Advances in Applied Probability, 20, 447–472.
Mandelbaum, A. (1986). Discrete multiarmed bandits and multiparameter processes. Probability Theory and Related Fields, 71, 129–147.
Mandelbaum, A. (1987). Continuous multi-armed bandits and multiparameter processes. Annals of Probabability, 15(4), 1527–1556.
Nash, P. (1973). Optimal allocation of resources between research projects. Ph.D. Thesis, Cambridge University.
Nash, P. (1980). A generalized bandit problem. Journal of the Royal Statistical Society, Series B, 42(2), 165–169.
Robbins, H. (1952). Some aspects of the sequential design of experiments. Bulletin of the American Mathematical Society, 58(5), 527–535.
Snell, L. (1952). Applications of martingale systems theorems. Transactions of American Mathematical Society, 73, 293–312.
Tsitsiklis, J. N. (1994). A short proof of the Gittins index theorem. The Annals of Applied Probability, 4(1), 194–199.
Van Oyen, M. P., Pandelis, D. G., & Teneketzis, D. (1992). Optimality of index policies for stochastic scheduling with switching penalties. Journal of Applied Probability, 29(4), 957–966.
Varaiya, P., Walrand, J., & Buyukkoc, C. (1985). Extensions of the multiarmed bandit problem: The discounted case. IEEE Transactions on Automatic Control, 230, 426–439.
Weber, R. R. (1992). On the Gittins index for multiarmed bandits. Annals of Probability, 2(4), 1024–1033.
Weiss, G. (1988). Branching bandit processes. Probability in Engineering and Information Science, 2, 269–278.
Whittle, P. (1980). Multi-armed bandits and the Gittins index. Journal of Royal Statistical Society, Series B, 42(2), 143–149.
Whittle, P. (1981). Arm-acquiring bandits. The Annals of Probability, 9(2), 284–292
Whittle, P. (1988). Restless bandits: Activity allocation in a changing world. Journal of Applied Probability, 25, 287–298. A Celebration of Applied Probability.
Wu, X., & Zhou, X. (2013). Open bandit processes with uncountable states and time-backward effects. Journal of Applied Probability, 50(2), 388–402.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer Science+Business Media New York
About this chapter
Cite this chapter
Cai, X., Wu, X., Zhou, X. (2014). Multi-Armed Bandit Processes. In: Optimal Stochastic Scheduling. International Series in Operations Research & Management Science, vol 207. Springer, Boston, MA. https://doi.org/10.1007/978-1-4899-7405-1_6
Download citation
DOI: https://doi.org/10.1007/978-1-4899-7405-1_6
Published:
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4899-7404-4
Online ISBN: 978-1-4899-7405-1
eBook Packages: Business and EconomicsBusiness and Management (R0)