Navigating and Stopping Multi-Parameter Bandit Processes

Mandelbaum, Avi

doi:10.1007/978-1-4613-8762-6_22

Avi Mandelbaum³

Part of the book series: The IMA Volumes in Mathematics and Its Applications ((IMA,volume 10))

1276 Accesses
2 Citations

Abstract

In recent years, considerable effort has been devoted to the development of a theory for multi-parameter processes. These are stochastic processes that evolve in “time” which is only partially ordered. The multi-parameter theory provides a natural way to formulate problems in dynamic allocation of resources, including discrete and continuous time multi-armed bandits as special cases. Multi-parameter processes that describe a game played by a gambler against a multi-armed bandit are called bandit processes. My talk will focus on two control problems for bandit processes. The first problem, the optimal stopping problem, is that of a gambler who can stop playing at any time. The reward from the game depends only on the state of affairs at the time of stopping, and the gambler’s problem is to choose an optimal stopping time. In the second problem, the optimal navigation problem, the gambler plays forever and seeks to maximize total discounted reward over an infinite horizon.

Research partially supported by NSF grant ECS 8603857.

A substantial part of this manuscript is based on [8,9] which have been published, and on [10] which, hopefully, will be published.

AMS 1980 subject classification. Primary 62L99, 60G40, 93E20; Secondary 60J60, 60K10, 60G17, 60J55.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bellman, R. (1957) “Dynamic Programming”, Princeton University Press.
Google Scholar
Berry D.A. and Fristedt, B. (1985), “Bandit Problems”, Chapman and Hall.
Google Scholar
Dynkin, E.B. (1963) “The Optimum Choice of the instant of stopping a Markov process”, Soviet Math. Dokl. 4, 627–629.
MATH Google Scholar
Dynkin, E.B. and Yushkevich, A.A. (1969) “Markov Processes, Theorems and Problems”, Plenum Press.
Google Scholar
Gittins J.C. (1979) “Bandit processes and dynamic allocation indices”, J. Roy. Statist. Soc. Ser. B 41, 148–177.
MathSciNet MATH Google Scholar
Grigelionis, B.I. and Shiryayev, A.N. (1968) “Controllable Markov Processes and Stefan’s Problem”, Problemi Peredachi Informatsii, 4, 60–72 (English translation).
Google Scholar
Karatzas, I. (1984) “Gittins indices in the dynamic allocation problem for diffusion processes”, Ann. prob. 12, 173–192.
Article MathSciNet MATH Google Scholar
Mandelbaum, A. and Vanderbei, R.J. (1981) “Optimal Stopping and supermartingales over partially ordered sets”, Z. Wahrsch. verw. Gebeite, 57, 253–264.
Article MathSciNet MATH Google Scholar
Mandelbaum, A. (1986) “Discrete multi-armed bandits and multi-parameter processes”, Prob. Th. Rel. Fields (previously Z.W.), 71, 129–147.
Article MathSciNet MATH Google Scholar
Mandelbaum, A. (1986) “Continuous multi-armed bandits and multi-parameter processes”, under revision for the Ann. Prob.
Google Scholar
Mazziotto, G. (1985) “Two parameter optimal stopping and Bi-Markov Processes”, Z. Wahrsch. verw. Gebeite. 69, 99–135.
Article MathSciNet Google Scholar
Neveu, J. (1975) “Discrete Parameter Martingales”, North-Holland.
Google Scholar
Snell, J.L. (1962) “Applications of Martingale Systems theorems”, Trans. Am. Math. Soc. 73, 293–312.
Article MathSciNet Google Scholar
Varaiya, P., Walrand, J. and Buyukkoc, C. (1985) “Extensions of the multi-armed bandit problem. The discounted case”, IEEE Trans. Autom. Control.
Google Scholar
Walsh, J.B. (1981) “Optional increasing paths”, Colloque ENST-CNET; Lect. Notes in Maths 863, 172–201, Springer Verlag.
Google Scholar

Download references

Author information

Authors and Affiliations

Graduate School of Business, Stanford University, Stanford, CA, 94305-5015, USA
Avi Mandelbaum

Authors

Avi Mandelbaum
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Division of Applied Mathematics, Brown University, 02912, Providence, Rhode Island, USA
Wendell Fleming
Ceremade, Universite Paris-Dauphine, Place de Lattre de Tassigny, 75775, Paris Cedex 16, France
Pierre-Louis Lions

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mandelbaum, A. (1988). Navigating and Stopping Multi-Parameter Bandit Processes. In: Fleming, W., Lions, PL. (eds) Stochastic Differential Systems, Stochastic Control Theory and Applications. The IMA Volumes in Mathematics and Its Applications, vol 10. Springer, New York, NY. https://doi.org/10.1007/978-1-4613-8762-6_22

Download citation

DOI: https://doi.org/10.1007/978-1-4613-8762-6_22
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4613-8764-0
Online ISBN: 978-1-4613-8762-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics