Skip to main content

Multi-Armed Bandit Processes

  • Chapter
  • First Online:
Optimal Stochastic Scheduling

Part of the book series: International Series in Operations Research & Management Science ((ISOR,volume 207))

  • 1427 Accesses

Abstract

This chapter studies the powerful tool for stochastic scheduling, using theoretically elegant multi-armed bandit processes to maximize expected total discounted rewards. Multi-armed bandit models form a particular type of optimal resource allocation problems, in which a number of machines or processors are to be allocated to serve a set of competing projects (arms). We introduce the classical theory for multi-armed bandit processes in Section 6.1, and consider open bandit processes in which infinitely many arms are allowed in Section 6.2. An extension to generalized open bandit processes is given in Section 6.3. Finally, a concise account for closed bandit processes in continuous time is presented in Section 6.4.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Bank, P., & Küchler, C. (2007). On Gittins’ index theorem in continuous time. Stochastic Processes and Their Applications, 117, 1357–1371.

    Article  Google Scholar 

  • Banks, J. S., & Sundaram, R. K. (1994). Switching costs and the Gittins index. Econometrica: Journal of the Econometric Society, 62(3), 687–694.

    Article  Google Scholar 

  • Bertsimas, D., & Niǹo-Mora, J. (1996). Conservation laws, extended polymatroid and multi-armed bandit problems: A unified approachto indexable systems. Mathematics of Operations Research, 21, 257–306.

    Article  Google Scholar 

  • Crosbie, J. H., & Glazebrook, K. D. (2000). Index policies and a novel performance space structure for a class of generalised branching bandit problems. Mathematics of Operations Research, 25, 281–297.

    Article  Google Scholar 

  • EL Karoui, N., & Karatzas, I. (1993). General Gittins index processes in discrete time. Proceedings of the National Academy of Sciences of the United States of America, 90, 1232–1236.

    Google Scholar 

  • EL Karoui, N., & Karatzas, I. (1994). Dynamic allocation problems in continuous time. The Annals of Applied Probability, 4(2), 255–286.

    Google Scholar 

  • EL Karoui, N., & Karatzas, I. (1997). Synchronization and optimality for multi-armed bandit problems in continuous time. Computational and Applied Mathamatics, 16(2), 117–151.

    Google Scholar 

  • Gittins, J. C. (1979). Bandit processes and dynamic allocation indices (with discussion). Joural of Royal Statististical Society B, 41, 148–164.

    Google Scholar 

  • Gittins, J. C. (1989). Multi-armed bandit allocation indices (Wiley-Interscience series in systems and optimization). Chichester: Wiley. ISBN:0-471-92059-2.

    Google Scholar 

  • Gittins, J. C., & Jones, D. (1974). A Dynamic allocation index for the sequential allocation of experiments. In J. Gani, et al. (Eds.), Progress in statistics. Amsterdam: North Holland.

    Google Scholar 

  • Gittins, J. C., & Glazebrook, K. D. (1977). On Bayesian models in stochastic scheduling. Journal of Applied Probability, 14, 556–565.

    Article  Google Scholar 

  • Glazebrook, K. D., & Owen, R. W. (1991). New results for generalised bandit processes. International Journal of Systems Science, 22, 479–494.

    Article  Google Scholar 

  • Ishikida, T., & Varaiya, P. (1994). Multi-armed bandit problem revisited. Journal of Optimization Theory and Applications, 83(1), 113–154.

    Article  Google Scholar 

  • Kaspi, H., & Mandelbaum, A. (1995). Lévy bandits: Multi-armed bandits driven by Lévy processes. Annals of Applied Probability, 5(2), 541–565.

    Article  Google Scholar 

  • Kaspi, H., & Mandelbaum, A. (1998). Multi-armed bandits in discrete and continuous time. Annals of Applied Probability, 8(4), 1270–1290.

    Article  Google Scholar 

  • Lai, T. L., & Ying, Z. (1988). Open bandit processes and optimal scheduling of queueing networks. Advances in Applied Probability, 20, 447–472.

    Article  Google Scholar 

  • Mandelbaum, A. (1986). Discrete multiarmed bandits and multiparameter processes. Probability Theory and Related Fields, 71, 129–147.

    Article  Google Scholar 

  • Mandelbaum, A. (1987). Continuous multi-armed bandits and multiparameter processes. Annals of Probabability, 15(4), 1527–1556.

    Article  Google Scholar 

  • Nash, P. (1973). Optimal allocation of resources between research projects. Ph.D. Thesis, Cambridge University.

    Google Scholar 

  • Nash, P. (1980). A generalized bandit problem. Journal of the Royal Statistical Society, Series B, 42(2), 165–169.

    Google Scholar 

  • Robbins, H. (1952). Some aspects of the sequential design of experiments. Bulletin of the American Mathematical Society, 58(5), 527–535.

    Article  Google Scholar 

  • Snell, L. (1952). Applications of martingale systems theorems. Transactions of American Mathematical Society, 73, 293–312.

    Article  Google Scholar 

  • Tsitsiklis, J. N. (1994). A short proof of the Gittins index theorem. The Annals of Applied Probability, 4(1), 194–199.

    Article  Google Scholar 

  • Van Oyen, M. P., Pandelis, D. G., & Teneketzis, D. (1992). Optimality of index policies for stochastic scheduling with switching penalties. Journal of Applied Probability, 29(4), 957–966.

    Article  Google Scholar 

  • Varaiya, P., Walrand, J., & Buyukkoc, C. (1985). Extensions of the multiarmed bandit problem: The discounted case. IEEE Transactions on Automatic Control, 230, 426–439.

    Article  Google Scholar 

  • Weber, R. R. (1992). On the Gittins index for multiarmed bandits. Annals of Probability, 2(4), 1024–1033.

    Article  Google Scholar 

  • Weiss, G. (1988). Branching bandit processes. Probability in Engineering and Information Science, 2, 269–278.

    Article  Google Scholar 

  • Whittle, P. (1980). Multi-armed bandits and the Gittins index. Journal of Royal Statistical Society, Series B, 42(2), 143–149.

    Google Scholar 

  • Whittle, P. (1981). Arm-acquiring bandits. The Annals of Probability, 9(2), 284–292

    Article  Google Scholar 

  • Whittle, P. (1988). Restless bandits: Activity allocation in a changing world. Journal of Applied Probability, 25, 287–298. A Celebration of Applied Probability.

    Google Scholar 

  • Wu, X., & Zhou, X. (2013). Open bandit processes with uncountable states and time-backward effects. Journal of Applied Probability, 50(2), 388–402.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer Science+Business Media New York

About this chapter

Cite this chapter

Cai, X., Wu, X., Zhou, X. (2014). Multi-Armed Bandit Processes. In: Optimal Stochastic Scheduling. International Series in Operations Research & Management Science, vol 207. Springer, Boston, MA. https://doi.org/10.1007/978-1-4899-7405-1_6

Download citation

Publish with us

Policies and ethics