Problems of Information Transmission

, Volume 54, Issue 1, pp 84–100 | Cite as

Gaussian Two-Armed Bandit and Optimization of Batch Data Processing

  • A. V. Kolnogorov
Large Systems


We consider the minimax setting for the two-armed bandit problem with normally distributed incomes having a priori unknown mathematical expectations and variances. This setting naturally arises in optimization of batch data processing where two alternative processing methods are available with different a priori unknown efficiencies. During the control process, it is required to determine the most efficient method and ensure its predominant application. We use the main theorem of game theory to search for minimax strategy and minimax risk as Bayesian ones corresponding to the worst-case prior distribution. To find them, a recursive integro-difference equation is obtained. We show that batch data processing almost does not increase the minimax risk if the number of batches is large enough.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Berry, D.A. and Fristedt, B., Bandit Problems: Sequential Allocation of Experiments, London: Chapman & Hall, 1985.CrossRefzbMATHGoogle Scholar
  2. 2.
    Presman, E.L. and Sonin, I.M., Posledovatel’noe upravlenie po nepolnym dannym. Baiesovskii podkhod, Moscow: Nauka, 1982. Translated under the title Sequential Control with Incomplete Information, New York: Academic, 1990.zbMATHGoogle Scholar
  3. 3.
    Tsetlin, M.L., Issledovaniya po teorii avtomatov i modelirovaniyu biologicheskikh sistem, Moscow: Nauka, 1969. Translated under the title Automaton Theory and Modeling of Biological Systems, New York: Academic, 1973.zbMATHGoogle Scholar
  4. 4.
    Varshavsky, V.I., Kollektivnoe povedenie avtomatov (Collective Behavior of Automata), Moscow: Nauka, 1973. Translated under the title Kollektives Verhalten von Automaten, Warschawski, W.I., Berlin: Akademie, 1978.Google Scholar
  5. 5.
    Sragovich, V.G., Adaptivnoe upravlenie (Adaptive Control), Moscow: Nauka, 1981. Translated under the title Mathematical Theory of Adaptive Control, Singapore: World Sci., 2006.zbMATHGoogle Scholar
  6. 6.
    Nazin, A.V. and Poznyak, A.S., Adaptivnyi vybor variantov: rekurrentnye algoritmy (Adaptive Choice between Alternatives: Recursive Algorithms), Moscow: Nauka, 1986.Google Scholar
  7. 7.
    Robbins, H., Some Aspects of the Sequential Design of Experiments, Bull. Amer. Math. Soc., 1952, vol. 58, no. 5, pp. 527–535.MathSciNetCrossRefzbMATHGoogle Scholar
  8. 8.
    Fabius, J. and van Zwet, W.R., Some Remarks on the Two-Armed Bandit, Ann. Math. Statist., 1970, vol. 41, no. 6, pp. 1906–1916.MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Vogel, W., An Asymptotic Minimax Theorem for the Two Armed Bandit Problem, Ann. Math. Statist., 1960, vol. 31, no. 2, pp. 444–451.MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    Bather, J.A., The Minimax Risk for the Two-Armed Bandit Problem, Mathematical Learning Models—Theory and Algorithms, Herkenrath, U., Kalin, D., and Vogel, W., Eds., Lect. Notes Statist, vol. 20, New York: Springer, 1983, pp. 1–11.MathSciNetCrossRefzbMATHGoogle Scholar
  11. 11.
    Lai, T.L., Levin, B., Robbins, H., and Siegmund, D., Sequential Medical Trials (Stopping Rules/Asymptotic Optimality), Proc. Natl. Acad. Sci. USA, 1980, vol. 77, no. 6, Part 1, pp. 3135–3138.CrossRefzbMATHGoogle Scholar
  12. 12.
    Cesa-Bianchi, N. and Lugosi, G., Prediction, Learning, and Games, Cambridge: Cambridge Univ. Press, 2006.CrossRefzbMATHGoogle Scholar
  13. 13.
    Juditsky, A., Nazin, A.V., Tsybakov, A.B., and Vayatis, N., Gap-Free Bounds for Stochastic Multi-Armed Bandit, in Proc. 17th IFAC World Congr., Seoul, Korea, July 6–11, 2008, pp. 11560–11563. Available at Scholar
  14. 14.
    Gasnikov, A.V., Nesterov, Yu.E., and Spokoiny, V.G., On the Efficiency of a Randomized Mirror Descent Algorithm in Online Optimization Problems, Zh. Vychisl. Mat. Mat. Fiz., 2015, vol. 55, no. 4, pp. 582–598 [Comput. Math. Math. Phys. (Engl. Transl.), 2015, vol. 55, no. 4, pp. 580–596].MathSciNetzbMATHGoogle Scholar
  15. 15.
    Kolnogorov, A.V., Determination of Minimax Strategies and Risk in a Random Environment (the Two-Armed Bandit Problem), Avtomat. i Telemekh., 2011, no. 5, pp. 127–138 [Autom. Remote Control (Engl. Transl.), 2011, vol. 72, no. 5, pp. 1017–1027].MathSciNetzbMATHGoogle Scholar
  16. 16.
    Kolnogorov, A.V., One-Armed Bandit Problem for Parallel Data Processing Systems, Probl. Peredachi Inf., 2015, vol. 51, no. 2, pp. 99–113 [Probl. Inf. Trans. (Engl. Transl.), 2015, vol. 51, no. 2, pp. 177–191].MathSciNetzbMATHGoogle Scholar
  17. 17.
    Oleynikov, A.O., Numerical Optimization of Parallel Processing in a Stationary Environment, Trans. Karelian Res. Centre Russ. Acad. Sci., 2013, no. 1, pp. 73–78.Google Scholar

Copyright information

© Pleiades Publishing, Inc. 2018

Authors and Affiliations

  1. 1.Department of Applied Mathematics and Information ScienceYaroslav-the-Wise Novgorod State UniversityMoscowRussia

Personalised recommendations