Estimation of Both Unknown Parameters in Gaussian Multi-armed Bandit for Batch Processing Scenario

Garbar, Sergey

doi:10.1007/978-3-031-43257-6_7

Sergey Garbar ORCID: orcid.org/0000-0002-5205-5252¹¹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1881))

Included in the following conference series:

International Conference on Mathematical Optimization Theory and Operations Research

177 Accesses

Abstract

We consider a Gaussian multi-armed bandit problem with both reward means and variances unknown. A Gaussian multi-armed bandit is considered because in case of batch processing the cumulative rewards for the batches are distributed close to normally. A batch version of the UCB strategy is proposed. Strategy’s description that is invariant in regards to the horizon size is obtained. We consider different approaches to the task of estimating unknown variances of rewards and study their effect on the normalized regret. A set of Monte-Carlo simulations is performed to study the batch strategy and illustrate the results for the two-armed bandit.

Supported by Russian Science Foundation, project number 23-21-00447, https://rscf.ru/en/project/23-21-00447/.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Lattimore, T., Szepesvari, C.: Bandit Algorithms. Cambridge University Press, Cambridge (2020)
Book MATH Google Scholar
Sragovich, V.: Mathematical Theory of Adaptive Control. World Scientific, Singapore (2006)
MATH Google Scholar
Tsetlin, M.: Automaton Theory and Modeling of Biological Systems. Academic Press, New York (1973)
MATH Google Scholar
Auer, P.: Using confidence bounds for exploitation-exploration trade-offs. J. Mach. Learn. Res. 3, 397–422 (2002)
MathSciNet MATH Google Scholar
Lugosi, G., Cesa-Bianchi, N.: Prediction, Learning and Game. University Press, New York (2006)
MATH Google Scholar
Berry, D., Fristedt, B.: Bandit Problems: Sequential Allocation of Experiments. Chapman and Hall, London (1985)
Book MATH Google Scholar
Gittins, J.: Multi-armed bandit allocation indices. In: Wiley-Interscience Series in Systems and Optimization. John Wiley & Sons, Ltd., Chichester (1989)
Google Scholar
Zhang, D., Lu, J.: Batch-mode computational advertising based on modern portfolio theory. In: Azzopardi, L., et al. (eds.) ICTIR 2009. LNCS, vol. 5766, pp. 380–383. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04417-5_44
Chapter Google Scholar
Kolnogorov, A.V.: Parallel design of robust control in the stochastic environment (the two-armed bandit problem). Autom. Remote Control 73, 689–701 (2012)
Article MathSciNet MATH Google Scholar
Perchet, V., Rigollet, P., Chassang, S., Snowberg, E.: Batched bandit problems. Ann. Stat. 44(2), 660–681 (2016)
Article MathSciNet MATH Google Scholar
Kolnogorov, A.: Gaussian two-armed bandit and optimization of batch data processing. Prob. Inf. Trans. 54, 84–100 (2018)
Article MathSciNet MATH Google Scholar
Gao, Z., Han, Y., Ren, Z., Zhou, Z.: Batched multi-armed bandits problem. In: NeurIPS (2019)
Google Scholar
Garbar, S.: Invariant description of UCB strategy for multi-armed bandits for batch processing scenario. In: 2020 24th International Conference on Circuits, Systems, Communications and Computers (CSCC), pp. 75–78 (2020)
Google Scholar
Garbar, S.: Invariant description for batch version of UCB strategy for multi-armed bandit. J. Phys.: Conf. Ser. 1658, 012015 (2020)
Google Scholar
Kolnogorov, A.V., Nazin, A.V., Shiyan, D.N.: Two-armed bandit problem and batch version of the mirror descent algorithm. Autom. Remote Control 83, 1288–1307 (2022)
Article MathSciNet MATH Google Scholar
Vogel, W.: An asymptotic minimax theorem for the two-armed bandit problem. Ann. Math. Statist. 31, 444–451 (1960)
Article MathSciNet MATH Google Scholar
Lai, T.L.: Adaptive treatment allocation and the multi-armed bandit problem. Ann. Stat. 25, 1091–1114 (1987)
MathSciNet MATH Google Scholar
Garbar, S.: Stochastic differential equations for limiting description of UCB rule for Gaussian multi-armed bandits. arXiv:2112.06423 (2021)
Garbar, S.: Dependency of regret on accuracy of variance estimation for different versions of UCB strategy for Gaussian multi-armed bandits. J. Phys.: Conf. Ser. 2052, 012013 (2021)
Google Scholar

Download references

Author information

Authors and Affiliations

Yaroslav-the-Wise Novgorod State University, Velikiy Novgorod, 173003, Russia
Sergey Garbar

Authors

Sergey Garbar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sergey Garbar .

Editor information

Editors and Affiliations

Krasovsky Institute of Mathematics and Mechanics, Ekaterinburg, Russia
Michael Khachay
Sobolev Institute of Mathematics, Novosibirsk, Russia
Yury Kochetov
Sobolev Institute of Mathematics, Omsk, Russia
Anton Eremeev
Melentiev Energy Systems Institute, Irkutsk, Russia
Oleg Khamisov
Institute of Applied Mathematical Research, Petrozavodsk, Russia
Vladimir Mazalov
University of Florida, Gainesville, FL, USA
Panos Pardalos

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Garbar, S. (2023). Estimation of Both Unknown Parameters in Gaussian Multi-armed Bandit for Batch Processing Scenario. In: Khachay, M., Kochetov, Y., Eremeev, A., Khamisov, O., Mazalov, V., Pardalos, P. (eds) Mathematical Optimization Theory and Operations Research: Recent Trends. MOTOR 2023. Communications in Computer and Information Science, vol 1881. Springer, Cham. https://doi.org/10.1007/978-3-031-43257-6_7

Download citation

DOI: https://doi.org/10.1007/978-3-031-43257-6_7
Published: 21 September 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43256-9
Online ISBN: 978-3-031-43257-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Estimation of Both Unknown Parameters in Gaussian Multi-armed Bandit for Batch Processing Scenario