Abstract
We consider the minimax setting for the one-armed bandit problem, i.e., for the two-armed bandit problem with a known distribution function of incomes corresponding to the first action. Incomes that correspond to the second action have normal distribution functions with unit variance and an unknown mathematical expectation. According to the main theorem of game theory, the minimax strategy and minimax risk are sought for as Bayesian, corresponding to the worst-case prior distribution. Results can be applied to parallel data processing systems if there are two processing methods available with an a priori known efficiency of the first.
Similar content being viewed by others
References
Berry, D.A. and Fristedt, B., Bandit Problems: Sequential Allocation of Experiments, London: Chapman & Hall, 1985.
Presman, E.L. and Sonin, I.M., Posledovatel’noe upravlenie po nepolnym dannym. Baiesovskii podkhod (Sequential Control Based on Incomplete Data: Bayesian Approach), Moscow: Nauka, 1982.
Tsetlin, M.L., Issledovaniya po teorii avtomatov i modelirovaniyu biologicheskikh sistem, Moscow: Nauka, 1969. Translated under the title Automaton Theory and Modeling of Biological Systems, New York: Academic, 1973.
Varshavsky, V.I., Kollektivnoe povedenie avtomatov (Collective Behavior of Automata), Moscow: Nauka, 1973. Translated under the title Kollektives Verhalten von Automaten, Warschawski, W.I., Berlin: Akademie, 1978.
Sragovich, V.G., Adaptivnoe upravlenie (Adaptive Control), Moscow: Nauka, 1981. Translated under the title Mathematical Theory of Adaptive Control, Singapore: World Sci., 2006.
Nazin, A.V. and Poznyak, A.S., Adaptivnyi vybor variantov: rekurrentnye algoritmy (Adaptive Choice: Recursive Algorithms), Moscow: Nauka, 1986.
Robbins, H., Some Aspects of the Sequential Design of Experiments, Bull. Amer. Math. Soc., 1952, vol. 58, no. 5, pp. 527–535.
Vogel, W., An Asymptotic Minimax Theorem for the Two Armed Bandit Problem, Ann. Math. Stat., 1960, vol. 31, no. 2, pp. 444–451.
Juditsky, A., Nazin, A.V., Tsybakov, A.B., and Vayatis, N., Gap-Free Bounds for Stochastic Multi-Armed Bandit, in Proc. 17th IFAC World Congr., Seoul, Korea, July 6–11, 2008, pp. 11560–11563. Available at http://www.ifac-papersonline.net/Detailed/37644.html.
Lai, T.L., Levin, B., Robbins, H., and Siegmund, D., Sequential Medical Trials, Proc. Natl. Acad. Sci. USA, 1980, vol. 77, no. 6, Part 1, pp. 3135–3138.
Kolnogorov, A.V., Two-Armed Bandit Problem for Parallel Data Processing Systems, Probl. Peredachi Inf., 2012, vol. 48, no. 1, pp. 83–95 [Probl. Inf. Trans. (Engl. Transl.), 2012, vol. 48, no. 1, pp. 72–84].
Kolnogorov, A.V., Determination of Minimax Strategies and Risk in a Random Environment (the Two-Armed Bandit Problem), Avtomat. i Telemekh., 2011, no. 5, pp. 127–138 [Autom. Remote Control (Engl. Transl.), 2011, vol. 72, no. 5, pp. 1017–1027].
Bradt, R.N., Johnson, S.M., and Karlin, S., On Sequential Designs for Maximizing the Sum of n Observations, Ann. Math. Statist., 1956, vol. 27, no. 4, pp. 1060–1074.
Chernoff, H. and Ray, S.N., A Bayes Sequential Sampling Inspection Plan, Ann. Math. Statist., 1965, vol. 36, no. 5, pp. 1387–1407.
Kolnogorov, A.V., Determination of the Minimax Risk for the Normal Two-Armed Bandit, in Proc. 10th IFAC Workshop on the Adaptation and Learning in Control and Signal Processing (ALCOSP’2010), Antalya, Turkey, Aug. 26–28, 2010, pp. 231–236. Available at http://www.ifac-papersonline.net/Detailed/46787.html.
Author information
Authors and Affiliations
Corresponding author
Additional information
Original Russian Text © A.V. Kolnogorov, 2015, published in Problemy Peredachi Informatsii, 2015, Vol. 51, No. 2, pp. 99–113.
Supported in part by the Russian Foundation for Basic Research, project no. 13-01-00334-a, and the Project Part of the State Assignment in the Field of Scientific Activity by the Ministry of Education and Science of the Russian Federation, project no. 1.949.2014/K.
Rights and permissions
About this article
Cite this article
Kolnogorov, A.V. One-armed bandit problem for parallel data processing systems. Probl Inf Transm 51, 177–191 (2015). https://doi.org/10.1134/S0032946015020088
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1134/S0032946015020088