Upper-Confidence-Bound Algorithms for Active Learning in Multi-armed Bandits

  • Alexandra Carpentier
  • Alessandro Lazaric
  • Mohammad Ghavamzadeh
  • Rémi Munos
  • Peter Auer
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6925)

Abstract

In this paper, we study the problem of estimating the mean values of all the arms uniformly well in the multi-armed bandit setting. If the variances of the arms were known, one could design an optimal sampling strategy by pulling the arms proportionally to their variances. However, since the distributions are not known in advance, we need to design adaptive sampling strategies to select an arm at each round based on the previous observed samples. We describe two strategies based on pulling the arms proportionally to an upper-bound on their variance and derive regret bounds for these strategies. We show that the performance of these allocation strategies depends not only on the variances of the arms but also on the full shape of their distribution.

Keywords

Europe Marketing Banner Alloca 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Antos, A., Grover, V., Szepesvári, C.: Active learning in heteroscedastic noise. Theoretical Computer Science 411, 2712–2728 (2010)MathSciNetCrossRefMATHGoogle Scholar
  2. Audibert, J.-Y., Munos, R., Szepesvari, C.: Exploration-exploitation trade-off using variance estimates in multi-armed bandits. Theoretical Computer Science 410, 1876–1902 (2009)MathSciNetCrossRefMATHGoogle Scholar
  3. Audibert, J.-Y., Bubeck, S., Munos, R.: Best arm identification in multi-armed bandits. In: Proceedings of the Twenty-Third Annual Conference on Learning Theory (COLT 2010), pp. 41–53 (2010)Google Scholar
  4. Bubeck, S., Munos, R., Stoltz, G.: Pure exploration in finitely-armed and continuous-armed bandits. Theoretical Computer Science 412, 1832–1852 (2011) ISSN 0304-3975MathSciNetCrossRefMATHGoogle Scholar
  5. Carpentier, A., Lazaric, A., Ghavamzadeh, M., Munos, R., Auer, P.: Upper-confidence-bound algorithms for active learning in multi-armed bandits. Technical Report inria-0059413, INRIA (2011)Google Scholar
  6. Castro, R., Willett, R., Nowak, R.: Faster rates in regression via active learning. In: Proceedings of Neural Information Processing Systems (NIPS), pp. 179–186 (2005)Google Scholar
  7. Chaudhuri, P., Mykland, P.A.: On efficient designing of nonlinear experiments. Statistica Sinica 5, 421–440 (1995)MathSciNetMATHGoogle Scholar
  8. Cohn, D.A., Ghahramani, Z., Jordan, M.I.: Active learning with statistical models. J. Artif. Int. Res. 4, 129–145 (1996) ISSN 1076-9757MATHGoogle Scholar
  9. Étoré, P., Jourdain, B.: Adaptive optimal allocation in stratified sampling methods. Methodology and Computing in Applied Probability 12, 335–360 (2010)MathSciNetCrossRefMATHGoogle Scholar
  10. Fedorov, V.: Theory of Optimal Experiments. Academic Press, London (1972)Google Scholar
  11. Maurer, A., Pontil, M.: Empirical bernstein bounds and sample-variance penalization. In: Proceedings of the Twenty-Second Annual Conference on Learning Theory, pp. 115–124 (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Alexandra Carpentier
    • 1
  • Alessandro Lazaric
    • 1
  • Mohammad Ghavamzadeh
    • 1
  • Rémi Munos
    • 1
  • Peter Auer
    • 2
  1. 1.INRIA Lille - Nord Europe, Team SequeLFrance
  2. 2.University of LeobenLeobenAustria

Personalised recommendations