Abstract
In recent years, considerable effort has been devoted to the development of a theory for multi-parameter processes. These are stochastic processes that evolve in “time” which is only partially ordered. The multi-parameter theory provides a natural way to formulate problems in dynamic allocation of resources, including discrete and continuous time multi-armed bandits as special cases. Multi-parameter processes that describe a game played by a gambler against a multi-armed bandit are called bandit processes. My talk will focus on two control problems for bandit processes. The first problem, the optimal stopping problem, is that of a gambler who can stop playing at any time. The reward from the game depends only on the state of affairs at the time of stopping, and the gambler’s problem is to choose an optimal stopping time. In the second problem, the optimal navigation problem, the gambler plays forever and seeks to maximize total discounted reward over an infinite horizon.
Research partially supported by NSF grant ECS 8603857.
A substantial part of this manuscript is based on [8,9] which have been published, and on [10] which, hopefully, will be published.
AMS 1980 subject classification. Primary 62L99, 60G40, 93E20; Secondary 60J60, 60K10, 60G17, 60J55.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bellman, R. (1957) “Dynamic Programming”, Princeton University Press.
Berry D.A. and Fristedt, B. (1985), “Bandit Problems”, Chapman and Hall.
Dynkin, E.B. (1963) “The Optimum Choice of the instant of stopping a Markov process”, Soviet Math. Dokl. 4, 627–629.
Dynkin, E.B. and Yushkevich, A.A. (1969) “Markov Processes, Theorems and Problems”, Plenum Press.
Gittins J.C. (1979) “Bandit processes and dynamic allocation indices”, J. Roy. Statist. Soc. Ser. B 41, 148–177.
Grigelionis, B.I. and Shiryayev, A.N. (1968) “Controllable Markov Processes and Stefan’s Problem”, Problemi Peredachi Informatsii, 4, 60–72 (English translation).
Karatzas, I. (1984) “Gittins indices in the dynamic allocation problem for diffusion processes”, Ann. prob. 12, 173–192.
Mandelbaum, A. and Vanderbei, R.J. (1981) “Optimal Stopping and supermartingales over partially ordered sets”, Z. Wahrsch. verw. Gebeite, 57, 253–264.
Mandelbaum, A. (1986) “Discrete multi-armed bandits and multi-parameter processes”, Prob. Th. Rel. Fields (previously Z.W.), 71, 129–147.
Mandelbaum, A. (1986) “Continuous multi-armed bandits and multi-parameter processes”, under revision for the Ann. Prob.
Mazziotto, G. (1985) “Two parameter optimal stopping and Bi-Markov Processes”, Z. Wahrsch. verw. Gebeite. 69, 99–135.
Neveu, J. (1975) “Discrete Parameter Martingales”, North-Holland.
Snell, J.L. (1962) “Applications of Martingale Systems theorems”, Trans. Am. Math. Soc. 73, 293–312.
Varaiya, P., Walrand, J. and Buyukkoc, C. (1985) “Extensions of the multi-armed bandit problem. The discounted case”, IEEE Trans. Autom. Control.
Walsh, J.B. (1981) “Optional increasing paths”, Colloque ENST-CNET; Lect. Notes in Maths 863, 172–201, Springer Verlag.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1988 Springer-Verlag New York Inc.
About this paper
Cite this paper
Mandelbaum, A. (1988). Navigating and Stopping Multi-Parameter Bandit Processes. In: Fleming, W., Lions, PL. (eds) Stochastic Differential Systems, Stochastic Control Theory and Applications. The IMA Volumes in Mathematics and Its Applications, vol 10. Springer, New York, NY. https://doi.org/10.1007/978-1-4613-8762-6_22
Download citation
DOI: https://doi.org/10.1007/978-1-4613-8762-6_22
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4613-8764-0
Online ISBN: 978-1-4613-8762-6
eBook Packages: Springer Book Archive