On the Complexity of All $$\varepsilon $$ -Best Arms Identification

al Marjani, Aymen; Kocak, Tomas; Garivier, Aurélien

doi:10.1007/978-3-031-26412-2_20

Aymen al Marjani¹³,
Tomas Kocak¹⁴ &
Aurélien Garivier¹³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13716))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

582 Accesses

Abstract

We consider the question introduced by [16] of identifying all the $\varepsilon $-optimal arms in a finite stochastic multi-armed bandit with Gaussian rewards. We give two lower bounds on the sample complexity of any algorithm solving the problem with a confidence at least $1-\delta $. The first, unimprovable in the asymptotic regime, motivates the design of a Track-and-Stop strategy whose average sample complexity is asymptotically optimal when the risk $\delta $ goes to zero. Notably, we provide an efficient numerical method to solve the convex max-min program that appears in the lower bound. Our method is based on a complete characterization of the alternative bandit instances that the optimal sampling strategy needs to rule out, thus making our bound tighter than the one provided by [16]. The second lower bound deals with the regime of high and moderate values of the risk $\delta $, and characterizes the behavior of any algorithm in the initial phase. It emphasizes the linear dependency of the sample complexity in the number of arms. Finally, we report on numerical simulations demonstrating our algorithm’s advantage over state-of-the-art methods, even for moderate risks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
For $\sigma ^2$-subgaussian distributions, we only need to multiply our bounds by $\sigma ^2$. For bandits coming from another single-parameter exponential family, we lose the closed-form expression of the best response oracle that we have in the Gaussian case, but one can use binary search to solve the best response problem.
2.
or a subset of arms, as in our case.
3.
The phenomenon discussed above is essentially already discussed in [16], a very rich study of the problem. However, we do not fully understand the proof of Theorem 4.1. Define a sub-instance to be a bandit $\widetilde{\nu }$ with fewer arms $m \le K$ such that $\{\widetilde{\nu }_1,\ldots , \widetilde{\nu }_{m}\} \subset \{\nu _1, \ldots , \nu _K\}$. Lemma D.5 in [16] actually shows that there exists some sub-instance of $\nu $ on which the algorithm must pay $\varOmega (\sum _{b=2}^{m} 1/(\mu _1-\mu _b)^2)$ samples. But this does not imply that such cost must be paid for the instance of interest $\nu $ instead of some sub-instance with very few arms.
4.
$\overline{{\boldsymbol{\mu }}}_{\varepsilon }^{k,\ell }({\boldsymbol{\omega }})$ has a different definition depending on k being a good or a bad arm.
5.
percent control is a metric expressing the efficiency of the compound as an inhibitor against the target Kinaze.
6.
F1 score is the harmonic mean of precision (the proportion of arms in $\widehat{G}$ that are actually good) and recall (the proportion of arms in $G_{\varepsilon }({\boldsymbol{\mu }})$ that were correctly returned in $\widehat{G}$).

References

Bocci, M., et al.: Activin receptor-like kinase 1 is associated with immune cell infiltration and regulates CLEC14A transcription in cancer. Angiogenesis 22(1), 117–131 (2018). https://doi.org/10.1007/s10456-018-9642-5
Article MathSciNet Google Scholar
Bubeck, S.: Convex optimization: algorithms and complexity. Foundations and Trends in Machine Learning (2015)
Google Scholar
Chernoff, H.: Sequential design of experiments. Ann. Math. Stat. 30(3), 755–770 (1959)
Article MathSciNet MATH Google Scholar
Danskin, J.M.: The theory of max-min, with applications. SIAM J. Appl. Math. 14, 641–664 (1966)
Article MathSciNet MATH Google Scholar
Degenne, R., Koolen, W.M., Ménard, P.: Non-asymptotic pure exploration by solving games. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 32. Curran Associates, Inc. (2019). https://proceedings.neurips.cc/paper/2019/file/8d1de7457fa769ece8d93a13a59c8552-Paper.pdf
Garivier, A., Kaufmann, E.: Non-asymptotic sequential tests for overlapping hypotheses and application to near optimal arm identification in bandit models. Sequential Anal. 40, 61–96 (2021)
Article MathSciNet MATH Google Scholar
Garivier, A.: Informational confidence bounds for self-normalized averages and applications. In: 2013 IEEE Information Theory Workshop (ITW) (Sep 2013). https://doi.org/10.1109/itw.2013.6691311
Garivier, A., Kaufmann, E.: Optimal best arm identification with fixed confidence. In: Proceedings of the 29th Conference On Learning Theory, pp. 998–1027 (2016)
Google Scholar
Jedra, Y., Proutiere, A.: Optimal best-arm identification in linear bandits. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M.F., Lin, H. (eds.) Advances in Neural Information Processing Systems, vol. 33, pp. 10007–10017. Curran Associates, Inc. (2020). https://proceedings.neurips.cc/paper/2020/file/7212a6567c8a6c513f33b858d868ff80-Paper.pdf
Jourdan, M., Mutn’y, M., Kirschner, J., Krause, A.: Efficient pure exploration for combinatorial bandits with semi-bandit feedback. In: ALT (2021)
Google Scholar
Kaufmann, E., Cappé, O., Garivier, A.: On the complexity of best arm identification in multi-armed bandit models. J. Mach. Learn. Res. (2015)
Google Scholar
Kaufmann, E., Koolen, W.M.: Mixture martingales revisited with applications to sequential tests and confidence intervals. arXiv preprint arXiv:1811.11419 (2018)
Lai, T., Robbins, H.: Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6(1), 4–22 (1985)
Article MathSciNet MATH Google Scholar
Lattimore, T., Szepesvári, C.: Bandit Algorithms. Cambridge University Press, Cambridge (2019)
MATH Google Scholar
Magureanu, S., Combes, R., Proutiere, A.: Lipschitz bandits: regret lower bounds and optimal algorithms. In: Conference on Learning Theory (2014)
Google Scholar
Mason, B., Jain, L., Tripathy, A., Nowak, R.: Finding all $\epsilon $-good arms in stochastic bandits. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M.F., Lin, H. (eds.) Advances in Neural Information Processing Systems, vol. 33, pp. 20707–20718. Curran Associates, Inc. (2020). https://proceedings.neurips.cc/paper/2020/file/edf0320adc8658b25ca26be5351b6c4a-Paper.pdf
Ménard, P.: Gradient ascent for active exploration in bandit problems. arXiv e-prints p. arXiv:1905.08165 (May 2019)
Simchowitz, M., Jamieson, K., Recht, B.: The simulator: understanding adaptive sampling in the moderate-confidence regime. In: Kale, S., Shamir, O. (eds.) Proceedings of the 2017 Conference on Learning Theory. Proceedings of Machine Learning Research, vol. 65, pp. 1794–1834. PMLR, Amsterdam, Netherlands (07–10 Jul 2017), http://proceedings.mlr.press/v65/simchowitz17a.html
Wang, P.A., Tzeng, R.C., Proutiere, A.: Fast pure exploration via frank-wolfe. In: Advances in Neural Information Processing Systems, vol. 34 (2021)
Google Scholar

Download references

Acknowledgements

The authors acknowledge the support of the Chaire SeqALO (ANR-20-CHIA-0020-01) and of Project IDEXLYON of the University of Lyon, in the framework of the Programme Investissements d’Avenir (ANR-16-IDEX-0005).

Author information

Authors and Affiliations

UMPA, ENS Lyon, Lyon, France
Aymen al Marjani & Aurélien Garivier
University of Potsdam, Potsdam, Germany
Tomas Kocak

Authors

Aymen al Marjani
View author publications
You can also search for this author in PubMed Google Scholar
Tomas Kocak
View author publications
You can also search for this author in PubMed Google Scholar
Aurélien Garivier
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aymen al Marjani .

Editor information

Editors and Affiliations

Grenoble Alpes University, Saint Martin d’Hères, France
Massih-Reza Amini
INSA Rouen Normandy, Saint Etienne du Rouvray, France
Stéphane Canu
Ruhr-Universität Bochum, Bochum, Germany
Asja Fischer
KU Leuven, Leuven, Belgium
Tias Guns
Central European University, Vienna, Austria
Petra Kralj Novak
Aristotle University of Thessaloniki, Thessaloniki, Greece
Grigorios Tsoumakas

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (zip 902 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

al Marjani, A., Kocak, T., Garivier, A. (2023). On the Complexity of All $\varepsilon $-Best Arms Identification. In: Amini, MR., Canu, S., Fischer, A., Guns, T., Kralj Novak, P., Tsoumakas, G. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2022. Lecture Notes in Computer Science(), vol 13716. Springer, Cham. https://doi.org/10.1007/978-3-031-26412-2_20

Download citation

DOI: https://doi.org/10.1007/978-3-031-26412-2_20
Published: 17 March 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-26411-5
Online ISBN: 978-3-031-26412-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)

On the Complexity of All \(\varepsilon \)-Best Arms Identification

Abstract

Access this chapter

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (zip 902 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

On the Complexity of All \(\varepsilon \)-Best Arms Identification

Abstract

Access this chapter

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (zip 902 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation