The Role of Dynamic Allocation Indices in the Evaluation of Suboptimal Strategies for Families of Bandit Processes

Glazebrook, K. D.

doi:10.1007/978-1-4612-5612-0_7

K. D. Glazebrook³

Part of the book series: Lecture Notes in Statistics ((LNS,volume 20))

222 Accesses

Abstract

A family of N alternative bandit processes {(Ω_j, P_j, C_j, α); j = l, 2,...N} is a cost-discounted Markov decision process with the following special features:

(a)
Its state at time t∈ℕ is x(t)={x₁(t), x₂(t),..., x_N(t)} where x_j(t)∈Ω_j, 1 ≤ j ≤ N. State space Ω_j may be finite, countable or continuous.
(b)
The action space A is {a₁, a₂,...,a_N}. Action a_j denotes the choice of bandit process j. An action is taken at each time t∈ℕ.
(c)
If action a_j is taken at time t∈ℕ only the j^th component of x(t) changes. Hence x_i(t+1)=x_i(t), i ≠ j, and x_j(t+1) is determined according to a probabilistic law of motion P_j{x_j(t)}.
(d)
The transition of the process under action a_j described in (c) incurs a cost α^t C_j{x_j(t), x_j(t+1)} where 0≤α≤1. The costs are assumed to be bounded.
(e)
An optimal strategy is a rule for choosing actions which minimises the total expected cost.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

J.A. BATHER, Randomized allocation of treatments in sequential experiments, J.R. Stat. Soc. Ser. B, 43, 265–292, l98l.
MathSciNet Google Scholar
J. BRUNO and M. HOFRI, On scheduling chains of jobs on one processor with limited preemption, SIAM J. Comput., 4, 478–490, 1975.
Article MathSciNet MATH Google Scholar
J.C. GITTINS, Bandit processes and dynamic allocation indices, J.R. Stat. Soc. Ser. B, 41, 148–177, 1979.
MathSciNet MATH Google Scholar
J.C. GITTINS and K.D. GLAZEBROOK, On Bayesian models in stochastic scheduling, J. Appl. Probab., 14, 556–565, 1977.
Article MathSciNet MATH Google Scholar
J.C. GITTINS and D.M. JONES, A dynamic allocation index for the sequential design of experiments, Progress in Statistics, ed. J. Gani, North-Holland, Amsterdam, 1974.
Google Scholar
J.C. GITTINS and D.M. JONES, A dynamic allocation index for the discounted multi-armed bandit problem, Biometrika, 66, 561–565, 1979.
Article Google Scholar
K.D. GLAZEBROOK, Stochastic scheduling with order constraints, Int. J. Syst. Sci., 7, 657–666, 1976.
Article MathSciNet MATH Google Scholar
K.D. GLAZEBROOK, On randomized dynamic allocation indices for the sequential design of experiments, J.R. Stat. Soc. Ser. B, 42, 342–346, 1980.
MathSciNet MATH Google Scholar
K.D. GLAZEBROOK, On a sufficient condition for superprocesses due to Whittle, J. Appl. Probab., 19, 99–110, 1982.
Article MathSciNet MATH Google Scholar
K.D. GLAZEBROOK, On the evaluation of suboptimal strategies for families of alternative bandit processes, J. Appl. Probab. (to appear).
Google Scholar
K.D. GLAZEBROOK, Myopic strategies for Bayesian models in stochastic scheduling (submitted).
Google Scholar
K.D. GLAZEBROOK, On the evaluation of stochastic scheduling problems with order constraints, Int. J. Syst. Sci. (to appear).
Google Scholar
I. MEILIJSON and G. WEISS, Multiple feedback at a single server station, Stochastic Processes Appl. 5, 195–205, 1977.
Article MathSciNet MATH Google Scholar
P. NASH, Optimal allocation of resources between research projects, Ph.D. thesis, Cambridge University, 1973.
Google Scholar
H. ROBBINS, Some aspects of the sequential design of experiments, Bull. Am. Math. Soc, 58, 527–535, 1952.
Article MathSciNet MATH Google Scholar
W.E. SMITH, Various optimisers for single-stage production, Nav. Res. Logist. Q. 3, 59–66, 1956.
Article Google Scholar
P. WHITTLE, Multi-armed bandits and the Gittins index, J.R. Stat. Soc. Ser. B, 42, 143–149, 1980.
MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics, University of Newcastle upon Tyne Newcastle upon Tyne, Newcastle upon Tyne, UK
K. D. Glazebrook

Authors

K. D. Glazebrook
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institut für Angewandte Mathematik der Universität Bonn, Wegelerstrasse 6, 5300, Bonn, Federal Republic of Germany
Ulrich Herkenrath & Walter Vogel &
Abt. Mathematik VII, Universität Ulm, Oberer Eselsberg, 7900, Ulm, Federal Republic of Germany
Dieter Kalin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Glazebrook, K.D. (1983). The Role of Dynamic Allocation Indices in the Evaluation of Suboptimal Strategies for Families of Bandit Processes. In: Herkenrath, U., Kalin, D., Vogel, W. (eds) Mathematical Learning Models — Theory and Algorithms. Lecture Notes in Statistics, vol 20. Springer, New York, NY. https://doi.org/10.1007/978-1-4612-5612-0_7

Download citation

DOI: https://doi.org/10.1007/978-1-4612-5612-0_7
Publisher Name: Springer, New York, NY
Print ISBN: 978-0-387-90913-4
Online ISBN: 978-1-4612-5612-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics