Poissonian Two-Armed Bandit: A New Approach

Kolnogorov, A. V.

doi:10.1134/S0032946022020065

Poissonian Two-Armed Bandit: A New Approach

AUTOMATA THEORY
Published: 11 July 2022

Volume 58, pages 160–183, (2022)
Cite this article

Problems of Information Transmission Aims and scope Submit manuscript

A. V. Kolnogorov¹

90 Accesses
1 Citation
Explore all metrics

Abstract

We consider a new approach to the continuous-time two-armed bandit problem in which incomes are described by Poisson processes. For this purpose, first, the control horizon is divided into equal consecutive half-intervals in which the strategy remains constant, and the incomes arrive in batches corresponding to these half-intervals. For finding the optimal piecewise constant Bayesian strategy and its corresponding Bayesian risk, a recursive difference equation is derived. The existence of a limiting value of the Bayesian risk when the number of half-intervals grows infinitely is established, and a partial differential equation for finding it is derived. Second, unlike previously considered settings of this problem, we analyze the strategy as a function of the current history of the controlled process rather than of the evolution of the posterior distribution. This removes the requirement of finiteness of the set of admissible parameters, which was imposed in previous settings. Simulation shows that in order to find the Bayesian and minimax strategies and risks in practice, it is sufficient to partition the arriving incomes into 30 batches. In the case of the minimax setting, it is shown that optimal processing of arriving incomes one by one is not more efficient than optimal batch processing if the control horizon grows infinitely.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Berry, D.A. and Fristedt, B., Bandit Problems: Sequential Allocation of Experiments, London, New York: Chapman & Hall, 1985.
Book Google Scholar
Presman, E.L. and Sonin, I.M., Posledovatel’noe upravlenie po nepolnym dannym. Baiesovskii podkhod, Moscow: Nauka, 1982. Translated under the title Sequential Control with Incomplete Information,New York: Academic, 1990.
Google Scholar
Sragovich, V.G., Adaptivnoe upravlenie (Adaptive Control), Moscow: Nauka, 1981. Translated under the title Mathematical Theory of Adaptive Control, Singapore: World Sci., 2006.
MATH Google Scholar
Nazin, A.V. and Poznyak, A.S., Adaptivnyi vybor variantov: rekurrentnye algoritmy (Adaptive Choice between Alternatives: Recursive Algorithms), Moscow: Nauka, 1986.
Google Scholar
Tsetlin, M.L., Issledovaniya po teorii avtomatov i modelirovaniyu biologicheskikh sistem, Moscow: Nauka, 1969. Translated under the title Automaton Theory and Modeling of Biological Systems,New York: Academic, 1973.
MATH Google Scholar
Varshavsky, V.I., Kollektivnoe povedenie avtomatov (Collective Behavior of Automata), Moscow: Nauka, 1973. Translated under the title Kollektives Verhalten von Automaten, Warschawski, W.I., Berlin: Akademie, 1978.
Google Scholar
Presman, E.L., Poisson Version of the Two-Armed Bandit Problem with Discounting, Teor. Veroyatn. Primen., 1990, vol. 35, no. 2, pp. 318⁠–⁠328 [Theory Probab. Appl. (Engl. Transl.), 1990, vol. 35, no. 2, pp. 307⁠–⁠317]. https://doi.org/10.1137/1135038
MathSciNet MATH Google Scholar
Chernoff, H. and Ray, S.N., A Bayes Sequential Sampling Inspection Plan, Ann. Math. Statist., 1965, vol. 36, no. 5, pp. 1387⁠–⁠1407. https://doi.org/10.1214/aoms/1177699898
Article MathSciNet Google Scholar
Mandelbaum, A., Continuous Multi-Armed Bandits and Multiparameter Processes, Ann. Probab., 1987, vol. 15, no. 4, pp. 1527⁠–⁠1556. https://doi.org/10.1214/aop/1176991992
Article MathSciNet Google Scholar
Lai, T.L., Adaptive Treatment Allocation and the Multi-Armed Bandit Problem, Ann. Statist., 1987, vol. 15, no. 3, pp. 1091⁠–⁠1114. https://doi.org/10.1214/aos/1176350495
Article MathSciNet Google Scholar
Vogel, W., An Asymptotic Minimax Theorem for the Two Armed Bandit Problem, Ann. Math. Statist., 1960, vol. 31, pp. 444⁠–⁠451. https://doi.org/10.1214/aoms/1177705907
Article MathSciNet Google Scholar
Borovkov, A.A., Matematicheskaya statistika. Dopolnitel’nye glavy (Mathematical Statistics: Advanced Chapters), Moscow: Nauka, 1984.
Google Scholar
Kolnogorov, A.V., Finding Minimax Strategy and Minimax Risk in a Random Environment (The Two-Armed Bandit Problem), Avtomat. i Telemekh., 2011, no. 5, pp. 127⁠–⁠138 [Autom. Remote Control (Engl. Transl.), 2011, vol. 72, no. 5, pp. 1017⁠–⁠1027]. https://doi.org/10.1134/S0005117911050092
MathSciNet MATH Google Scholar
Fabius, J., and van Zwet, W.R., Some Remarks on the Two-Armed Bandit, Ann. Math. Statist., 1970, vol. 41, no. 6, pp. 1906⁠–⁠1916. https://doi.org/10.1214/aoms/1177696692
Article MathSciNet Google Scholar
Kolnogorov, A.V., On a Limiting Description of Robust Parallel Control in a Random Environment, Avtomat. i Telemekh., 2015, no. 7, pp. 111⁠–⁠126 [Autom. Remote Control (Engl. Transl.), 2015, vol. 76, no. 7, pp. 1229⁠–⁠1241]. https://doi.org/10.1134/S0005117915070085
MathSciNet MATH Google Scholar
Kolnogorov, A.V., Gaussian Two-Armed Bandit: Limiting Description, Probl. Peredachi Inf., 2020, vol. 56, no. 3, pp. 86⁠–⁠111 [Probl. Inf. Transm. (Engl. Transl.), 2020, vol. 56, no. 3, pp. 278⁠–⁠301]. https://doi.org/10.1134/S0032946020030059
MathSciNet MATH Google Scholar

Download references

Acknowledgments

The author is grateful to a reviewer for his/her attention to the paper and valuable remarks.

Funding

Supported in part by the Russian Foundation for Basic Research, project no. 20-01-00062.

Author information

Authors and Affiliations

Department of Applied Mathematics and Information Science, Yaroslav-the-Wise Novgorod State University, Novgorod, Russia
A. V. Kolnogorov

Authors

A. V. Kolnogorov
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

Translated from Problemy Peredachi Informatsii, 2022, Vol. 58, No. 2, pp. 66–91 https://doi.org/10.31857/S0555292322020065.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kolnogorov, A. Poissonian Two-Armed Bandit: A New Approach. Probl Inf Transm 58, 160–183 (2022). https://doi.org/10.1134/S0032946022020065

Download citation

Received: 31 May 2021
Revised: 09 April 2022
Accepted: 18 April 2022
Published: 11 July 2022
Issue Date: April 2022
DOI: https://doi.org/10.1134/S0032946022020065

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Poissonian Two-Armed Bandit: A New Approach

Abstract

Access this article

Similar content being viewed by others

Gaussian Two-Armed Bandit: Limiting Description

Gaussian Two-Armed Bandit and Optimization of Batch Data Processing

Two-Armed Bandit Problem and Batch Version of the Mirror Descent Algorithm

References

Acknowledgments

Funding

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Poissonian Two-Armed Bandit: A New Approach

Abstract

Access this article

Similar content being viewed by others

Gaussian Two-Armed Bandit: Limiting Description

Gaussian Two-Armed Bandit and Optimization of Batch Data Processing

Two-Armed Bandit Problem and Batch Version of the Mirror Descent Algorithm

References

Acknowledgments

Funding

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation