Skip to main content

Part of the book series: Studies in Choice and Welfare ((WELFARE))

Summary

This chapter examines a number of extensions of the multi-armed bandit framework. We consider the possibility of an infinite number of available arms, we give conditions under which the Gittins index strategy is well-defined, and we examine the optimality of that strategy. We then consider some difficulties arising from “parallel search,” in which a decision-maker may pull more than one arm per period, and from the introduction of a cost of switching between arms.

The questions addressed in this paper grew out of my work with Jeff Banks on Bandit problems and their applications (Branks and Sundaram [3, 4, 5]) and owe much to many discussions I had with him on this subject. I also had the benefit of several discussions with Andy McLennan, especially regarding the material in Sections 4 and 6 of this paper.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, R., M. V. Hegde, and D. Teneketzis (1988) Asymptotically efficient adaptive allocation rules for the multiarmed bandit problem with switching costs. IEEE Transactions on Optimal Control, 33(10): 899–906.

    Article  MathSciNet  Google Scholar 

  2. Berry, D. and B. Fristedt (1985) Bandit Problems: Sequential Allocation of Experiments London: Chapman and Hall.

    Google Scholar 

  3. Banks, J. S. and R. K. Sundaram (1992a) Denumerable-armed bandits. Econometrica, 60(5): 1071–1096.

    MathSciNet  Google Scholar 

  4. Banks, J. S. and R. K. Sundaram (1992b) A class of bandit problems yielding myopic optimal strategies. Journal of Applied Probability, 625–632.

    Google Scholar 

  5. Banks, J. S. and R. K. Sundaram (1994) Switching costs and the Gittins index. Econometrica, 62(3): 687–694.

    Google Scholar 

  6. Basu, A, A. Bose, and J. K. Ghosh (1990) An expository review of sequential design and allocation rules. Mimeo. Purdue University.

    Google Scholar 

  7. Blackwell, D. (1965) Discounted dynamic programming. Annals of Mathematical Statistics, 36: 225–235.

    MathSciNet  Google Scholar 

  8. Feldman, D. (1962) Contributions to the “two-armed bandit” problem. Annals of Mathematical Statistics, 33: 847–856.

    MATH  Google Scholar 

  9. Feldman, M. and M. Spagat (1993), Optimal learning with costly adjustment. Mimeo. Brown University.

    Google Scholar 

  10. Gittins, J. (1979) Bandit processes and dynamic allocation indices, Journal of the Royal Statistical Society, Series B 41: 148–164.

    MATH  MathSciNet  Google Scholar 

  11. Gittins, J. (1989) Allocation Indices for Multi-Armed Bandits London: Wiley.

    Google Scholar 

  12. Gittins, J. and D. Jones (1974) A dynamic allocation index for the sequential allocation of experiments. In J. Gani et al. (eds) Progress in Statistics Amsterdam: North Holland.

    Google Scholar 

  13. Kolonko, M. and H. Benzing (1983) The sequential design of Bernoulli experiments including switching costs. Mimeo.

    Google Scholar 

  14. Lai, T. L. and H. Robbins (1985) Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6: 4–22.

    Article  MathSciNet  Google Scholar 

  15. Mortensen, D. (1985) Job search and labor market analysis. In O. Ashenfelter and J. Layard (eds) Handbook of Labor Economics Vol. II New York: North Holland.

    Google Scholar 

  16. Pressman, E. L. and I. M. Sonin (1990) Sequential Control with Partial Information New York: Academic Press.

    Google Scholar 

  17. Rieder, U. (1975) Bayesian dynamic programming. Advances in Applied Probability, 7: 330–348.

    Article  MATH  MathSciNet  Google Scholar 

  18. Rothschild, M. (1974) A two-armed bandit theory of market pricing. Journal of Economic Theory, 9: 185–202.

    Article  MathSciNet  Google Scholar 

  19. Schäl, M. (1979) Dynamic programming and statistical decision theory. Annals of Statistics, 7(2): 432–445.

    MATH  MathSciNet  Google Scholar 

  20. Vicusi, W. (1979) Job hazards and worker quit rates: an analysis of adaptive worker behavior. International Economic Review, 20: 29–58.

    Google Scholar 

  21. Weizman, M. L. (1979) Optimal search for the best alternative. Econometrica, 47: 641–654.

    MathSciNet  Google Scholar 

  22. Whittle, P. (1981) Arm-acquiring bandits. Annals of Probability, 9(2): 284–292.

    Article  MATH  MathSciNet  Google Scholar 

  23. Whittle, P. (1982) Optimization Over Time: Dynamic Programming and Stochastic Control Vol. I New York: Wiley.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Sundaram, R.K. (2005). Generalized Bandit Problems. In: Austen-Smith, D., Duggan, J. (eds) Social Choice and Strategic Decisions. Studies in Choice and Welfare. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-27295-X_6

Download citation

Publish with us

Policies and ethics