Generalized Bandit Problems

Part of the Studies in Choice and Welfare book series (WELFARE)


This chapter examines a number of extensions of the multi-armed bandit framework. We consider the possibility of an infinite number of available arms, we give conditions under which the Gittins index strategy is well-defined, and we examine the optimality of that strategy. We then consider some difficulties arising from “parallel search,” in which a decision-maker may pull more than one arm per period, and from the introduction of a cost of switching between arms.


Optimal Strategy Switching Cost Index Strategy Bandit Problem Dynamic Programming Problem 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    Agrawal, R., M. V. Hegde, and D. Teneketzis (1988) Asymptotically efficient adaptive allocation rules for the multiarmed bandit problem with switching costs. IEEE Transactions on Optimal Control, 33(10): 899–906.MathSciNetCrossRefGoogle Scholar
  2. [2]
    Berry, D. and B. Fristedt (1985) Bandit Problems: Sequential Allocation of Experiments London: Chapman and Hall.Google Scholar
  3. [3]
    Banks, J. S. and R. K. Sundaram (1992a) Denumerable-armed bandits. Econometrica, 60(5): 1071–1096.MathSciNetGoogle Scholar
  4. [4]
    Banks, J. S. and R. K. Sundaram (1992b) A class of bandit problems yielding myopic optimal strategies. Journal of Applied Probability, 625–632.Google Scholar
  5. [5]
    Banks, J. S. and R. K. Sundaram (1994) Switching costs and the Gittins index. Econometrica, 62(3): 687–694.Google Scholar
  6. [6]
    Basu, A, A. Bose, and J. K. Ghosh (1990) An expository review of sequential design and allocation rules. Mimeo. Purdue University.Google Scholar
  7. [7]
    Blackwell, D. (1965) Discounted dynamic programming. Annals of Mathematical Statistics, 36: 225–235.MathSciNetGoogle Scholar
  8. [8]
    Feldman, D. (1962) Contributions to the “two-armed bandit” problem. Annals of Mathematical Statistics, 33: 847–856.zbMATHGoogle Scholar
  9. [9]
    Feldman, M. and M. Spagat (1993), Optimal learning with costly adjustment. Mimeo. Brown University.Google Scholar
  10. [10]
    Gittins, J. (1979) Bandit processes and dynamic allocation indices, Journal of the Royal Statistical Society, Series B 41: 148–164.zbMATHMathSciNetGoogle Scholar
  11. [11]
    Gittins, J. (1989) Allocation Indices for Multi-Armed Bandits London: Wiley.Google Scholar
  12. [12]
    Gittins, J. and D. Jones (1974) A dynamic allocation index for the sequential allocation of experiments. In J. Gani et al. (eds) Progress in Statistics Amsterdam: North Holland.Google Scholar
  13. [13]
    Kolonko, M. and H. Benzing (1983) The sequential design of Bernoulli experiments including switching costs. Mimeo.Google Scholar
  14. [14]
    Lai, T. L. and H. Robbins (1985) Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6: 4–22.MathSciNetCrossRefGoogle Scholar
  15. [15]
    Mortensen, D. (1985) Job search and labor market analysis. In O. Ashenfelter and J. Layard (eds) Handbook of Labor Economics Vol. II New York: North Holland.Google Scholar
  16. [16]
    Pressman, E. L. and I. M. Sonin (1990) Sequential Control with Partial Information New York: Academic Press.Google Scholar
  17. [17]
    Rieder, U. (1975) Bayesian dynamic programming. Advances in Applied Probability, 7: 330–348.zbMATHMathSciNetCrossRefGoogle Scholar
  18. [18]
    Rothschild, M. (1974) A two-armed bandit theory of market pricing. Journal of Economic Theory, 9: 185–202.MathSciNetCrossRefGoogle Scholar
  19. [19]
    Schäl, M. (1979) Dynamic programming and statistical decision theory. Annals of Statistics, 7(2): 432–445.zbMATHMathSciNetGoogle Scholar
  20. [20]
    Vicusi, W. (1979) Job hazards and worker quit rates: an analysis of adaptive worker behavior. International Economic Review, 20: 29–58.Google Scholar
  21. [21]
    Weizman, M. L. (1979) Optimal search for the best alternative. Econometrica, 47: 641–654.MathSciNetGoogle Scholar
  22. [22]
    Whittle, P. (1981) Arm-acquiring bandits. Annals of Probability, 9(2): 284–292.zbMATHMathSciNetCrossRefGoogle Scholar
  23. [23]
    Whittle, P. (1982) Optimization Over Time: Dynamic Programming and Stochastic Control Vol. I New York: Wiley.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  1. 1.New York UniversityUSA

Personalised recommendations