Skip to main content
Log in

One-armed bandit problem for parallel data processing systems

  • Large Systems
  • Published:
Problems of Information Transmission Aims and scope Submit manuscript

Abstract

We consider the minimax setting for the one-armed bandit problem, i.e., for the two-armed bandit problem with a known distribution function of incomes corresponding to the first action. Incomes that correspond to the second action have normal distribution functions with unit variance and an unknown mathematical expectation. According to the main theorem of game theory, the minimax strategy and minimax risk are sought for as Bayesian, corresponding to the worst-case prior distribution. Results can be applied to parallel data processing systems if there are two processing methods available with an a priori known efficiency of the first.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Berry, D.A. and Fristedt, B., Bandit Problems: Sequential Allocation of Experiments, London: Chapman & Hall, 1985.

    Book  Google Scholar 

  2. Presman, E.L. and Sonin, I.M., Posledovatel’noe upravlenie po nepolnym dannym. Baiesovskii podkhod (Sequential Control Based on Incomplete Data: Bayesian Approach), Moscow: Nauka, 1982.

    Google Scholar 

  3. Tsetlin, M.L., Issledovaniya po teorii avtomatov i modelirovaniyu biologicheskikh sistem, Moscow: Nauka, 1969. Translated under the title Automaton Theory and Modeling of Biological Systems, New York: Academic, 1973.

    MATH  Google Scholar 

  4. Varshavsky, V.I., Kollektivnoe povedenie avtomatov (Collective Behavior of Automata), Moscow: Nauka, 1973. Translated under the title Kollektives Verhalten von Automaten, Warschawski, W.I., Berlin: Akademie, 1978.

    Google Scholar 

  5. Sragovich, V.G., Adaptivnoe upravlenie (Adaptive Control), Moscow: Nauka, 1981. Translated under the title Mathematical Theory of Adaptive Control, Singapore: World Sci., 2006.

    Google Scholar 

  6. Nazin, A.V. and Poznyak, A.S., Adaptivnyi vybor variantov: rekurrentnye algoritmy (Adaptive Choice: Recursive Algorithms), Moscow: Nauka, 1986.

    Google Scholar 

  7. Robbins, H., Some Aspects of the Sequential Design of Experiments, Bull. Amer. Math. Soc., 1952, vol. 58, no. 5, pp. 527–535.

    Article  Google Scholar 

  8. Vogel, W., An Asymptotic Minimax Theorem for the Two Armed Bandit Problem, Ann. Math. Stat., 1960, vol. 31, no. 2, pp. 444–451.

    Article  Google Scholar 

  9. Juditsky, A., Nazin, A.V., Tsybakov, A.B., and Vayatis, N., Gap-Free Bounds for Stochastic Multi-Armed Bandit, in Proc. 17th IFAC World Congr., Seoul, Korea, July 6–11, 2008, pp. 11560–11563. Available at http://www.ifac-papersonline.net/Detailed/37644.html.

  10. Lai, T.L., Levin, B., Robbins, H., and Siegmund, D., Sequential Medical Trials, Proc. Natl. Acad. Sci. USA, 1980, vol. 77, no. 6, Part 1, pp. 3135–3138.

    Article  MathSciNet  Google Scholar 

  11. Kolnogorov, A.V., Two-Armed Bandit Problem for Parallel Data Processing Systems, Probl. Peredachi Inf., 2012, vol. 48, no. 1, pp. 83–95 [Probl. Inf. Trans. (Engl. Transl.), 2012, vol. 48, no. 1, pp. 72–84].

    Google Scholar 

  12. Kolnogorov, A.V., Determination of Minimax Strategies and Risk in a Random Environment (the Two-Armed Bandit Problem), Avtomat. i Telemekh., 2011, no. 5, pp. 127–138 [Autom. Remote Control (Engl. Transl.), 2011, vol. 72, no. 5, pp. 1017–1027].

    Google Scholar 

  13. Bradt, R.N., Johnson, S.M., and Karlin, S., On Sequential Designs for Maximizing the Sum of n Observations, Ann. Math. Statist., 1956, vol. 27, no. 4, pp. 1060–1074.

    Article  MathSciNet  Google Scholar 

  14. Chernoff, H. and Ray, S.N., A Bayes Sequential Sampling Inspection Plan, Ann. Math. Statist., 1965, vol. 36, no. 5, pp. 1387–1407.

    Article  MathSciNet  Google Scholar 

  15. Kolnogorov, A.V., Determination of the Minimax Risk for the Normal Two-Armed Bandit, in Proc. 10th IFAC Workshop on the Adaptation and Learning in Control and Signal Processing (ALCOSP’2010), Antalya, Turkey, Aug. 26–28, 2010, pp. 231–236. Available at http://www.ifac-papersonline.net/Detailed/46787.html.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to A. V. Kolnogorov.

Additional information

Original Russian Text © A.V. Kolnogorov, 2015, published in Problemy Peredachi Informatsii, 2015, Vol. 51, No. 2, pp. 99–113.

Supported in part by the Russian Foundation for Basic Research, project no. 13-01-00334-a, and the Project Part of the State Assignment in the Field of Scientific Activity by the Ministry of Education and Science of the Russian Federation, project no. 1.949.2014/K.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kolnogorov, A.V. One-armed bandit problem for parallel data processing systems. Probl Inf Transm 51, 177–191 (2015). https://doi.org/10.1134/S0032946015020088

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1134/S0032946015020088

Keywords

Navigation