Summary
The general multi-armed bandit problem is reformulated and solved as a control problem over a partially ordered set. The approach taken provides a technically convenient framework for bandit-like problems. It also adds insight to the structure of strategies over partially ordered sets.
References
Bather, J.A.: Optimal Stopping of a Brownian Motion: A Comparison Technique. In: H. Chernoffs 60'th Birthday Festschrift. D. Siegmund et al. (eds.). New York: Academic Press 1983
Cairoli, R., Walsh, J.B.: Stochastic integrals in the plane. Acta. Math. 134, 111–183 (1975)
Gittins, J.C.: Bandit processes and dynamic allocation indices. J. R. Stat. Soc., Ser. B 41, 148–177 (1979)
Herkenrath, V., Kalin, D., Vogel, W. (eds.): Mathematical Learning Models-Theory and Algorithms: Proceedings of a Conference. Lect. Notes Stat. Berlin-Heidelberg-New York: Springer 1983
Karatzas, I.: Gittins indices in the dynamic allocation problem for diffusion processes. Ann. Probab. 12, 173–192 (1984)
Krengel, V., Sucheston, L.: Stopping rules and tactics for processes indexed by a directed set. J. Multivariate Anal. 11, 199–229 (1981)
Lawler, G.F., Vanderbei, R.J.: Markov strategies for optimal control problems indexed by a partially ordered set. Ann. Probab. 11, 642–647 (1983)
Mandelbaum, A., Vanderbei, R.J.: Optimal stopping and supermartingales over partially ordered sets. Z. Wahrscheinlichkeitstheor. Verw. Geb. 57, 253–264 (1981)
Mazziotto, G., Szpirglas, J.: Arrêt optimal sur le plan. Preprint (1981)
Neveu, J.: Discrete-Parameter Martingales. Amsterdam: North Holland 1975
Snell, J.L.: Applications of Martingales system theorems. Trans. Am. Math. Soc. 73, 293–312 (1952)
Varaiya, P., Walrand, J., Buyukkoc, C.: Extensions of the multi-armed bandit problem. The discounted case. To be published in IEEE Trans. Autom. Control (1984)
Walsh, J.B.: Martingales with a multi-dimensional parameter and stochastic integrals in the plane. Cours de eème Cycle. Laboratoire de Calcul de Probabilités, Université Paris VI, Année 76–77
Washburn, R.B., Willsky, A.S.: Optional sampling of supermartingales indexed by partially ordered sets. Ann. Probab. 9, 957–970 (1981)
Whittle, P.: Optimization over Time: Dynamic Programming and Stochastic Control, Vol. I. New York: Wiley 1982
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Mandelbaum, A. Discrete multi-armed bandits and multi-parameter processes. Probab. Th. Rel. Fields 71, 129–147 (1986). https://doi.org/10.1007/BF00366276
Received:
Issue Date:
DOI: https://doi.org/10.1007/BF00366276
Keywords
- Stochastic Process
- Control Problem
- Probability Theory
- Mathematical Biology
- Bandit Problem