Abstract
We consider the question introduced by [16] of identifying all the \(\varepsilon \)-optimal arms in a finite stochastic multi-armed bandit with Gaussian rewards. We give two lower bounds on the sample complexity of any algorithm solving the problem with a confidence at least \(1-\delta \). The first, unimprovable in the asymptotic regime, motivates the design of a Track-and-Stop strategy whose average sample complexity is asymptotically optimal when the risk \(\delta \) goes to zero. Notably, we provide an efficient numerical method to solve the convex max-min program that appears in the lower bound. Our method is based on a complete characterization of the alternative bandit instances that the optimal sampling strategy needs to rule out, thus making our bound tighter than the one provided by [16]. The second lower bound deals with the regime of high and moderate values of the risk \(\delta \), and characterizes the behavior of any algorithm in the initial phase. It emphasizes the linear dependency of the sample complexity in the number of arms. Finally, we report on numerical simulations demonstrating our algorithm’s advantage over state-of-the-art methods, even for moderate risks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
For \(\sigma ^2\)-subgaussian distributions, we only need to multiply our bounds by \(\sigma ^2\). For bandits coming from another single-parameter exponential family, we lose the closed-form expression of the best response oracle that we have in the Gaussian case, but one can use binary search to solve the best response problem.
- 2.
or a subset of arms, as in our case.
- 3.
The phenomenon discussed above is essentially already discussed in [16], a very rich study of the problem. However, we do not fully understand the proof of Theorem 4.1. Define a sub-instance to be a bandit \(\widetilde{\nu }\) with fewer arms \(m \le K\) such that \(\{\widetilde{\nu }_1,\ldots , \widetilde{\nu }_{m}\} \subset \{\nu _1, \ldots , \nu _K\}\). Lemma D.5 in [16] actually shows that there exists some sub-instance of \(\nu \) on which the algorithm must pay \(\varOmega (\sum _{b=2}^{m} 1/(\mu _1-\mu _b)^2)\) samples. But this does not imply that such cost must be paid for the instance of interest \(\nu \) instead of some sub-instance with very few arms.
- 4.
\(\overline{{\boldsymbol{\mu }}}_{\varepsilon }^{k,\ell }({\boldsymbol{\omega }})\) has a different definition depending on k being a good or a bad arm.
- 5.
percent control is a metric expressing the efficiency of the compound as an inhibitor against the target Kinaze.
- 6.
F1 score is the harmonic mean of precision (the proportion of arms in \(\widehat{G}\) that are actually good) and recall (the proportion of arms in \(G_{\varepsilon }({\boldsymbol{\mu }})\) that were correctly returned in \(\widehat{G}\)).
References
Bocci, M., et al.: Activin receptor-like kinase 1 is associated with immune cell infiltration and regulates CLEC14A transcription in cancer. Angiogenesis 22(1), 117–131 (2018). https://doi.org/10.1007/s10456-018-9642-5
Bubeck, S.: Convex optimization: algorithms and complexity. Foundations and Trends in Machine Learning (2015)
Chernoff, H.: Sequential design of experiments. Ann. Math. Stat. 30(3), 755–770 (1959)
Danskin, J.M.: The theory of max-min, with applications. SIAM J. Appl. Math. 14, 641–664 (1966)
Degenne, R., Koolen, W.M., Ménard, P.: Non-asymptotic pure exploration by solving games. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 32. Curran Associates, Inc. (2019). https://proceedings.neurips.cc/paper/2019/file/8d1de7457fa769ece8d93a13a59c8552-Paper.pdf
Garivier, A., Kaufmann, E.: Non-asymptotic sequential tests for overlapping hypotheses and application to near optimal arm identification in bandit models. Sequential Anal. 40, 61–96 (2021)
Garivier, A.: Informational confidence bounds for self-normalized averages and applications. In: 2013 IEEE Information Theory Workshop (ITW) (Sep 2013). https://doi.org/10.1109/itw.2013.6691311
Garivier, A., Kaufmann, E.: Optimal best arm identification with fixed confidence. In: Proceedings of the 29th Conference On Learning Theory, pp. 998–1027 (2016)
Jedra, Y., Proutiere, A.: Optimal best-arm identification in linear bandits. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M.F., Lin, H. (eds.) Advances in Neural Information Processing Systems, vol. 33, pp. 10007–10017. Curran Associates, Inc. (2020). https://proceedings.neurips.cc/paper/2020/file/7212a6567c8a6c513f33b858d868ff80-Paper.pdf
Jourdan, M., Mutn’y, M., Kirschner, J., Krause, A.: Efficient pure exploration for combinatorial bandits with semi-bandit feedback. In: ALT (2021)
Kaufmann, E., Cappé, O., Garivier, A.: On the complexity of best arm identification in multi-armed bandit models. J. Mach. Learn. Res. (2015)
Kaufmann, E., Koolen, W.M.: Mixture martingales revisited with applications to sequential tests and confidence intervals. arXiv preprint arXiv:1811.11419 (2018)
Lai, T., Robbins, H.: Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6(1), 4–22 (1985)
Lattimore, T., Szepesvári, C.: Bandit Algorithms. Cambridge University Press, Cambridge (2019)
Magureanu, S., Combes, R., Proutiere, A.: Lipschitz bandits: regret lower bounds and optimal algorithms. In: Conference on Learning Theory (2014)
Mason, B., Jain, L., Tripathy, A., Nowak, R.: Finding all \(\epsilon \)-good arms in stochastic bandits. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M.F., Lin, H. (eds.) Advances in Neural Information Processing Systems, vol. 33, pp. 20707–20718. Curran Associates, Inc. (2020). https://proceedings.neurips.cc/paper/2020/file/edf0320adc8658b25ca26be5351b6c4a-Paper.pdf
Ménard, P.: Gradient ascent for active exploration in bandit problems. arXiv e-prints p. arXiv:1905.08165 (May 2019)
Simchowitz, M., Jamieson, K., Recht, B.: The simulator: understanding adaptive sampling in the moderate-confidence regime. In: Kale, S., Shamir, O. (eds.) Proceedings of the 2017 Conference on Learning Theory. Proceedings of Machine Learning Research, vol. 65, pp. 1794–1834. PMLR, Amsterdam, Netherlands (07–10 Jul 2017), http://proceedings.mlr.press/v65/simchowitz17a.html
Wang, P.A., Tzeng, R.C., Proutiere, A.: Fast pure exploration via frank-wolfe. In: Advances in Neural Information Processing Systems, vol. 34 (2021)
Acknowledgements
The authors acknowledge the support of the Chaire SeqALO (ANR-20-CHIA-0020-01) and of Project IDEXLYON of the University of Lyon, in the framework of the Programme Investissements d’Avenir (ANR-16-IDEX-0005).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
al Marjani, A., Kocak, T., Garivier, A. (2023). On the Complexity of All \(\varepsilon \)-Best Arms Identification. In: Amini, MR., Canu, S., Fischer, A., Guns, T., Kralj Novak, P., Tsoumakas, G. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2022. Lecture Notes in Computer Science(), vol 13716. Springer, Cham. https://doi.org/10.1007/978-3-031-26412-2_20
Download citation
DOI: https://doi.org/10.1007/978-3-031-26412-2_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-26411-5
Online ISBN: 978-3-031-26412-2
eBook Packages: Computer ScienceComputer Science (R0)