Abstract
A family of N alternative bandit processes {(Ωj, Pj, Cj, α); j = l, 2,...N} is a cost-discounted Markov decision process with the following special features:
-
(a)
Its state at time t∈ℕ is x(t)={x1(t), x2(t),..., xN(t)} where xj(t)∈Ωj, 1 ≤ j ≤ N. State space Ωj may be finite, countable or continuous.
-
(b)
The action space A is {a1, a2,...,aN}. Action aj denotes the choice of bandit process j. An action is taken at each time t∈ℕ.
-
(c)
If action aj is taken at time t∈ℕ only the jth component of x(t) changes. Hence xi(t+1)=xi(t), i ≠ j, and xj(t+1) is determined according to a probabilistic law of motion Pj{xj(t)}.
-
(d)
The transition of the process under action aj described in (c) incurs a cost αt Cj{xj(t), xj(t+1)} where 0≤α≤1. The costs are assumed to be bounded.
-
(e)
An optimal strategy is a rule for choosing actions which minimises the total expected cost.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
J.A. BATHER, Randomized allocation of treatments in sequential experiments, J.R. Stat. Soc. Ser. B, 43, 265–292, l98l.
J. BRUNO and M. HOFRI, On scheduling chains of jobs on one processor with limited preemption, SIAM J. Comput., 4, 478–490, 1975.
J.C. GITTINS, Bandit processes and dynamic allocation indices, J.R. Stat. Soc. Ser. B, 41, 148–177, 1979.
J.C. GITTINS and K.D. GLAZEBROOK, On Bayesian models in stochastic scheduling, J. Appl. Probab., 14, 556–565, 1977.
J.C. GITTINS and D.M. JONES, A dynamic allocation index for the sequential design of experiments, Progress in Statistics, ed. J. Gani, North-Holland, Amsterdam, 1974.
J.C. GITTINS and D.M. JONES, A dynamic allocation index for the discounted multi-armed bandit problem, Biometrika, 66, 561–565, 1979.
K.D. GLAZEBROOK, Stochastic scheduling with order constraints, Int. J. Syst. Sci., 7, 657–666, 1976.
K.D. GLAZEBROOK, On randomized dynamic allocation indices for the sequential design of experiments, J.R. Stat. Soc. Ser. B, 42, 342–346, 1980.
K.D. GLAZEBROOK, On a sufficient condition for superprocesses due to Whittle, J. Appl. Probab., 19, 99–110, 1982.
K.D. GLAZEBROOK, On the evaluation of suboptimal strategies for families of alternative bandit processes, J. Appl. Probab. (to appear).
K.D. GLAZEBROOK, Myopic strategies for Bayesian models in stochastic scheduling (submitted).
K.D. GLAZEBROOK, On the evaluation of stochastic scheduling problems with order constraints, Int. J. Syst. Sci. (to appear).
I. MEILIJSON and G. WEISS, Multiple feedback at a single server station, Stochastic Processes Appl. 5, 195–205, 1977.
P. NASH, Optimal allocation of resources between research projects, Ph.D. thesis, Cambridge University, 1973.
H. ROBBINS, Some aspects of the sequential design of experiments, Bull. Am. Math. Soc, 58, 527–535, 1952.
W.E. SMITH, Various optimisers for single-stage production, Nav. Res. Logist. Q. 3, 59–66, 1956.
P. WHITTLE, Multi-armed bandits and the Gittins index, J.R. Stat. Soc. Ser. B, 42, 143–149, 1980.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1983 Springer-Verlag New York Inc.
About this paper
Cite this paper
Glazebrook, K.D. (1983). The Role of Dynamic Allocation Indices in the Evaluation of Suboptimal Strategies for Families of Bandit Processes. In: Herkenrath, U., Kalin, D., Vogel, W. (eds) Mathematical Learning Models — Theory and Algorithms. Lecture Notes in Statistics, vol 20. Springer, New York, NY. https://doi.org/10.1007/978-1-4612-5612-0_7
Download citation
DOI: https://doi.org/10.1007/978-1-4612-5612-0_7
Publisher Name: Springer, New York, NY
Print ISBN: 978-0-387-90913-4
Online ISBN: 978-1-4612-5612-0
eBook Packages: Springer Book Archive