Skip to main content

The Role of Dynamic Allocation Indices in the Evaluation of Suboptimal Strategies for Families of Bandit Processes

  • Conference paper
Mathematical Learning Models — Theory and Algorithms

Part of the book series: Lecture Notes in Statistics ((LNS,volume 20))

  • 222 Accesses

Abstract

A family of N alternative bandit processes {(Ωj, Pj, Cj, α); j = l, 2,...N} is a cost-discounted Markov decision process with the following special features:

  1. (a)

    Its state at time t∈ℕ is x(t)={x1(t), x2(t),..., xN(t)} where xj(t)∈Ωj, 1 ≤ j ≤ N. State space Ωj may be finite, countable or continuous.

  2. (b)

    The action space A is {a1, a2,...,aN}. Action aj denotes the choice of bandit process j. An action is taken at each time t∈ℕ.

  3. (c)

    If action aj is taken at time t∈ℕ only the jth component of x(t) changes. Hence xi(t+1)=xi(t), i ≠ j, and xj(t+1) is determined according to a probabilistic law of motion Pj{xj(t)}.

  4. (d)

    The transition of the process under action aj described in (c) incurs a cost αt Cj{xj(t), xj(t+1)} where 0≤α≤1. The costs are assumed to be bounded.

  5. (e)

    An optimal strategy is a rule for choosing actions which minimises the total expected cost.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. J.A. BATHER, Randomized allocation of treatments in sequential experiments, J.R. Stat. Soc. Ser. B, 43, 265–292, l98l.

    MathSciNet  Google Scholar 

  2. J. BRUNO and M. HOFRI, On scheduling chains of jobs on one processor with limited preemption, SIAM J. Comput., 4, 478–490, 1975.

    Article  MathSciNet  MATH  Google Scholar 

  3. J.C. GITTINS, Bandit processes and dynamic allocation indices, J.R. Stat. Soc. Ser. B, 41, 148–177, 1979.

    MathSciNet  MATH  Google Scholar 

  4. J.C. GITTINS and K.D. GLAZEBROOK, On Bayesian models in stochastic scheduling, J. Appl. Probab., 14, 556–565, 1977.

    Article  MathSciNet  MATH  Google Scholar 

  5. J.C. GITTINS and D.M. JONES, A dynamic allocation index for the sequential design of experiments, Progress in Statistics, ed. J. Gani, North-Holland, Amsterdam, 1974.

    Google Scholar 

  6. J.C. GITTINS and D.M. JONES, A dynamic allocation index for the discounted multi-armed bandit problem, Biometrika, 66, 561–565, 1979.

    Article  Google Scholar 

  7. K.D. GLAZEBROOK, Stochastic scheduling with order constraints, Int. J. Syst. Sci., 7, 657–666, 1976.

    Article  MathSciNet  MATH  Google Scholar 

  8. K.D. GLAZEBROOK, On randomized dynamic allocation indices for the sequential design of experiments, J.R. Stat. Soc. Ser. B, 42, 342–346, 1980.

    MathSciNet  MATH  Google Scholar 

  9. K.D. GLAZEBROOK, On a sufficient condition for superprocesses due to Whittle, J. Appl. Probab., 19, 99–110, 1982.

    Article  MathSciNet  MATH  Google Scholar 

  10. K.D. GLAZEBROOK, On the evaluation of suboptimal strategies for families of alternative bandit processes, J. Appl. Probab. (to appear).

    Google Scholar 

  11. K.D. GLAZEBROOK, Myopic strategies for Bayesian models in stochastic scheduling (submitted).

    Google Scholar 

  12. K.D. GLAZEBROOK, On the evaluation of stochastic scheduling problems with order constraints, Int. J. Syst. Sci. (to appear).

    Google Scholar 

  13. I. MEILIJSON and G. WEISS, Multiple feedback at a single server station, Stochastic Processes Appl. 5, 195–205, 1977.

    Article  MathSciNet  MATH  Google Scholar 

  14. P. NASH, Optimal allocation of resources between research projects, Ph.D. thesis, Cambridge University, 1973.

    Google Scholar 

  15. H. ROBBINS, Some aspects of the sequential design of experiments, Bull. Am. Math. Soc, 58, 527–535, 1952.

    Article  MathSciNet  MATH  Google Scholar 

  16. W.E. SMITH, Various optimisers for single-stage production, Nav. Res. Logist. Q. 3, 59–66, 1956.

    Article  Google Scholar 

  17. P. WHITTLE, Multi-armed bandits and the Gittins index, J.R. Stat. Soc. Ser. B, 42, 143–149, 1980.

    MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1983 Springer-Verlag New York Inc.

About this paper

Cite this paper

Glazebrook, K.D. (1983). The Role of Dynamic Allocation Indices in the Evaluation of Suboptimal Strategies for Families of Bandit Processes. In: Herkenrath, U., Kalin, D., Vogel, W. (eds) Mathematical Learning Models — Theory and Algorithms. Lecture Notes in Statistics, vol 20. Springer, New York, NY. https://doi.org/10.1007/978-1-4612-5612-0_7

Download citation

  • DOI: https://doi.org/10.1007/978-1-4612-5612-0_7

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-0-387-90913-4

  • Online ISBN: 978-1-4612-5612-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics