Skip to main content

Estimation of Both Unknown Parameters in Gaussian Multi-armed Bandit for Batch Processing Scenario

  • Conference paper
  • First Online:
Mathematical Optimization Theory and Operations Research: Recent Trends (MOTOR 2023)

Abstract

We consider a Gaussian multi-armed bandit problem with both reward means and variances unknown. A Gaussian multi-armed bandit is considered because in case of batch processing the cumulative rewards for the batches are distributed close to normally. A batch version of the UCB strategy is proposed. Strategy’s description that is invariant in regards to the horizon size is obtained. We consider different approaches to the task of estimating unknown variances of rewards and study their effect on the normalized regret. A set of Monte-Carlo simulations is performed to study the batch strategy and illustrate the results for the two-armed bandit.

Supported by Russian Science Foundation, project number 23-21-00447, https://rscf.ru/en/project/23-21-00447/.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Lattimore, T., Szepesvari, C.: Bandit Algorithms. Cambridge University Press, Cambridge (2020)

    Book  MATH  Google Scholar 

  2. Sragovich, V.: Mathematical Theory of Adaptive Control. World Scientific, Singapore (2006)

    MATH  Google Scholar 

  3. Tsetlin, M.: Automaton Theory and Modeling of Biological Systems. Academic Press, New York (1973)

    MATH  Google Scholar 

  4. Auer, P.: Using confidence bounds for exploitation-exploration trade-offs. J. Mach. Learn. Res. 3, 397–422 (2002)

    MathSciNet  MATH  Google Scholar 

  5. Lugosi, G., Cesa-Bianchi, N.: Prediction, Learning and Game. University Press, New York (2006)

    MATH  Google Scholar 

  6. Berry, D., Fristedt, B.: Bandit Problems: Sequential Allocation of Experiments. Chapman and Hall, London (1985)

    Book  MATH  Google Scholar 

  7. Gittins, J.: Multi-armed bandit allocation indices. In: Wiley-Interscience Series in Systems and Optimization. John Wiley & Sons, Ltd., Chichester (1989)

    Google Scholar 

  8. Zhang, D., Lu, J.: Batch-mode computational advertising based on modern portfolio theory. In: Azzopardi, L., et al. (eds.) ICTIR 2009. LNCS, vol. 5766, pp. 380–383. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04417-5_44

    Chapter  Google Scholar 

  9. Kolnogorov, A.V.: Parallel design of robust control in the stochastic environment (the two-armed bandit problem). Autom. Remote Control 73, 689–701 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  10. Perchet, V., Rigollet, P., Chassang, S., Snowberg, E.: Batched bandit problems. Ann. Stat. 44(2), 660–681 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  11. Kolnogorov, A.: Gaussian two-armed bandit and optimization of batch data processing. Prob. Inf. Trans. 54, 84–100 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  12. Gao, Z., Han, Y., Ren, Z., Zhou, Z.: Batched multi-armed bandits problem. In: NeurIPS (2019)

    Google Scholar 

  13. Garbar, S.: Invariant description of UCB strategy for multi-armed bandits for batch processing scenario. In: 2020 24th International Conference on Circuits, Systems, Communications and Computers (CSCC), pp. 75–78 (2020)

    Google Scholar 

  14. Garbar, S.: Invariant description for batch version of UCB strategy for multi-armed bandit. J. Phys.: Conf. Ser. 1658, 012015 (2020)

    Google Scholar 

  15. Kolnogorov, A.V., Nazin, A.V., Shiyan, D.N.: Two-armed bandit problem and batch version of the mirror descent algorithm. Autom. Remote Control 83, 1288–1307 (2022)

    Article  MathSciNet  MATH  Google Scholar 

  16. Vogel, W.: An asymptotic minimax theorem for the two-armed bandit problem. Ann. Math. Statist. 31, 444–451 (1960)

    Article  MathSciNet  MATH  Google Scholar 

  17. Lai, T.L.: Adaptive treatment allocation and the multi-armed bandit problem. Ann. Stat. 25, 1091–1114 (1987)

    MathSciNet  MATH  Google Scholar 

  18. Garbar, S.: Stochastic differential equations for limiting description of UCB rule for Gaussian multi-armed bandits. arXiv:2112.06423 (2021)

  19. Garbar, S.: Dependency of regret on accuracy of variance estimation for different versions of UCB strategy for Gaussian multi-armed bandits. J. Phys.: Conf. Ser. 2052, 012013 (2021)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sergey Garbar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Garbar, S. (2023). Estimation of Both Unknown Parameters in Gaussian Multi-armed Bandit for Batch Processing Scenario. In: Khachay, M., Kochetov, Y., Eremeev, A., Khamisov, O., Mazalov, V., Pardalos, P. (eds) Mathematical Optimization Theory and Operations Research: Recent Trends. MOTOR 2023. Communications in Computer and Information Science, vol 1881. Springer, Cham. https://doi.org/10.1007/978-3-031-43257-6_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-43257-6_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-43256-9

  • Online ISBN: 978-3-031-43257-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics