Skip to main content

On the Complexity of All \(\varepsilon \)-Best Arms Identification

  • Conference paper
  • First Online:
Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2022)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13716))

  • 582 Accesses

Abstract

We consider the question introduced by [16] of identifying all the \(\varepsilon \)-optimal arms in a finite stochastic multi-armed bandit with Gaussian rewards. We give two lower bounds on the sample complexity of any algorithm solving the problem with a confidence at least \(1-\delta \). The first, unimprovable in the asymptotic regime, motivates the design of a Track-and-Stop strategy whose average sample complexity is asymptotically optimal when the risk \(\delta \) goes to zero. Notably, we provide an efficient numerical method to solve the convex max-min program that appears in the lower bound. Our method is based on a complete characterization of the alternative bandit instances that the optimal sampling strategy needs to rule out, thus making our bound tighter than the one provided by [16]. The second lower bound deals with the regime of high and moderate values of the risk \(\delta \), and characterizes the behavior of any algorithm in the initial phase. It emphasizes the linear dependency of the sample complexity in the number of arms. Finally, we report on numerical simulations demonstrating our algorithm’s advantage over state-of-the-art methods, even for moderate risks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    For \(\sigma ^2\)-subgaussian distributions, we only need to multiply our bounds by \(\sigma ^2\). For bandits coming from another single-parameter exponential family, we lose the closed-form expression of the best response oracle that we have in the Gaussian case, but one can use binary search to solve the best response problem.

  2. 2.

    or a subset of arms, as in our case.

  3. 3.

    The phenomenon discussed above is essentially already discussed in [16], a very rich study of the problem. However, we do not fully understand the proof of Theorem 4.1. Define a sub-instance to be a bandit \(\widetilde{\nu }\) with fewer arms \(m \le K\) such that \(\{\widetilde{\nu }_1,\ldots , \widetilde{\nu }_{m}\} \subset \{\nu _1, \ldots , \nu _K\}\). Lemma D.5 in [16] actually shows that there exists some sub-instance of \(\nu \) on which the algorithm must pay \(\varOmega (\sum _{b=2}^{m} 1/(\mu _1-\mu _b)^2)\) samples. But this does not imply that such cost must be paid for the instance of interest \(\nu \) instead of some sub-instance with very few arms.

  4. 4.

    \(\overline{{\boldsymbol{\mu }}}_{\varepsilon }^{k,\ell }({\boldsymbol{\omega }})\) has a different definition depending on k being a good or a bad arm.

  5. 5.

    percent control is a metric expressing the efficiency of the compound as an inhibitor against the target Kinaze.

  6. 6.

    F1 score is the harmonic mean of precision (the proportion of arms in \(\widehat{G}\) that are actually good) and recall (the proportion of arms in \(G_{\varepsilon }({\boldsymbol{\mu }})\) that were correctly returned in \(\widehat{G}\)).

References

  1. Bocci, M., et al.: Activin receptor-like kinase 1 is associated with immune cell infiltration and regulates CLEC14A transcription in cancer. Angiogenesis 22(1), 117–131 (2018). https://doi.org/10.1007/s10456-018-9642-5

    Article  MathSciNet  Google Scholar 

  2. Bubeck, S.: Convex optimization: algorithms and complexity. Foundations and Trends in Machine Learning (2015)

    Google Scholar 

  3. Chernoff, H.: Sequential design of experiments. Ann. Math. Stat. 30(3), 755–770 (1959)

    Article  MathSciNet  MATH  Google Scholar 

  4. Danskin, J.M.: The theory of max-min, with applications. SIAM J. Appl. Math. 14, 641–664 (1966)

    Article  MathSciNet  MATH  Google Scholar 

  5. Degenne, R., Koolen, W.M., Ménard, P.: Non-asymptotic pure exploration by solving games. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 32. Curran Associates, Inc. (2019). https://proceedings.neurips.cc/paper/2019/file/8d1de7457fa769ece8d93a13a59c8552-Paper.pdf

  6. Garivier, A., Kaufmann, E.: Non-asymptotic sequential tests for overlapping hypotheses and application to near optimal arm identification in bandit models. Sequential Anal. 40, 61–96 (2021)

    Article  MathSciNet  MATH  Google Scholar 

  7. Garivier, A.: Informational confidence bounds for self-normalized averages and applications. In: 2013 IEEE Information Theory Workshop (ITW) (Sep 2013). https://doi.org/10.1109/itw.2013.6691311

  8. Garivier, A., Kaufmann, E.: Optimal best arm identification with fixed confidence. In: Proceedings of the 29th Conference On Learning Theory, pp. 998–1027 (2016)

    Google Scholar 

  9. Jedra, Y., Proutiere, A.: Optimal best-arm identification in linear bandits. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M.F., Lin, H. (eds.) Advances in Neural Information Processing Systems, vol. 33, pp. 10007–10017. Curran Associates, Inc. (2020). https://proceedings.neurips.cc/paper/2020/file/7212a6567c8a6c513f33b858d868ff80-Paper.pdf

  10. Jourdan, M., Mutn’y, M., Kirschner, J., Krause, A.: Efficient pure exploration for combinatorial bandits with semi-bandit feedback. In: ALT (2021)

    Google Scholar 

  11. Kaufmann, E., Cappé, O., Garivier, A.: On the complexity of best arm identification in multi-armed bandit models. J. Mach. Learn. Res. (2015)

    Google Scholar 

  12. Kaufmann, E., Koolen, W.M.: Mixture martingales revisited with applications to sequential tests and confidence intervals. arXiv preprint arXiv:1811.11419 (2018)

  13. Lai, T., Robbins, H.: Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6(1), 4–22 (1985)

    Article  MathSciNet  MATH  Google Scholar 

  14. Lattimore, T., Szepesvári, C.: Bandit Algorithms. Cambridge University Press, Cambridge (2019)

    MATH  Google Scholar 

  15. Magureanu, S., Combes, R., Proutiere, A.: Lipschitz bandits: regret lower bounds and optimal algorithms. In: Conference on Learning Theory (2014)

    Google Scholar 

  16. Mason, B., Jain, L., Tripathy, A., Nowak, R.: Finding all \(\epsilon \)-good arms in stochastic bandits. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M.F., Lin, H. (eds.) Advances in Neural Information Processing Systems, vol. 33, pp. 20707–20718. Curran Associates, Inc. (2020). https://proceedings.neurips.cc/paper/2020/file/edf0320adc8658b25ca26be5351b6c4a-Paper.pdf

  17. Ménard, P.: Gradient ascent for active exploration in bandit problems. arXiv e-prints p. arXiv:1905.08165 (May 2019)

  18. Simchowitz, M., Jamieson, K., Recht, B.: The simulator: understanding adaptive sampling in the moderate-confidence regime. In: Kale, S., Shamir, O. (eds.) Proceedings of the 2017 Conference on Learning Theory. Proceedings of Machine Learning Research, vol. 65, pp. 1794–1834. PMLR, Amsterdam, Netherlands (07–10 Jul 2017), http://proceedings.mlr.press/v65/simchowitz17a.html

  19. Wang, P.A., Tzeng, R.C., Proutiere, A.: Fast pure exploration via frank-wolfe. In: Advances in Neural Information Processing Systems, vol. 34 (2021)

    Google Scholar 

Download references

Acknowledgements

The authors acknowledge the support of the Chaire SeqALO (ANR-20-CHIA-0020-01) and of Project IDEXLYON of the University of Lyon, in the framework of the Programme Investissements d’Avenir (ANR-16-IDEX-0005).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aymen al Marjani .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (zip 902 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

al Marjani, A., Kocak, T., Garivier, A. (2023). On the Complexity of All \(\varepsilon \)-Best Arms Identification. In: Amini, MR., Canu, S., Fischer, A., Guns, T., Kralj Novak, P., Tsoumakas, G. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2022. Lecture Notes in Computer Science(), vol 13716. Springer, Cham. https://doi.org/10.1007/978-3-031-26412-2_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-26412-2_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-26411-5

  • Online ISBN: 978-3-031-26412-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics