Active Learning in Multi-armed Bandits

Antos, András; Grover, Varun; Szepesvári, Csaba

doi:10.1007/978-3-540-87987-9_25

Active Learning in Multi-armed Bandits

András Antos⁵,
Varun Grover⁶ &
Csaba Szepesvári^5,6

Conference paper

1435 Accesses
9 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5254))

Abstract

In this paper we consider the problem of actively learning the mean values of distributions associated with a finite number of options (arms). The algorithms can select which option to generate the next sample from in order to produce estimates with equally good precision for all the distributions. When an algorithm uses sample means to estimate the unknown values then the optimal solution, assuming full knowledge of the distributions, is to sample each option proportional to its variance. In this paper we propose an incremental algorithm that asymptotically achieves the same loss as an optimal rule. We prove that the excess loss suffered by this algorithm, apart from logarithmic factors, scales as n ^− 3/2, which we conjecture to be the optimal rate. The performance of the algorithm is illustrated in a simple problem.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Athreya, K.B., Lahiri, S.N.: Measure Theory and Probability Theory. Springer, Heidelberg (2006)
Google Scholar
Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite time analysis of the multiarmed bandit problem. Machine Learning 47(2-3), 235–256 (2002)
Article MATH Google Scholar
Castro, R., Willett, R., Nowak, R.D.: Faster rates in regression via active learning. In: Advances in Neural Information Processing Systems 18 (NIPS-2005) (2005)
Google Scholar
Devroye, L., Györfi, L., Lugosi, G.: A Probabilistic Theory of Pattern Recognition. In: Applications of Mathematics: Stochastic Modelling and Applied Probability. Springer, New York (1996)
Google Scholar
Etore, P., Jourdain, B.: Adaptive optimal allocation in stratified sampling methods (2007), http://www.citebase.org/abstract?id=oai:arXiv.org:0711.4514
Hoeffding, W.: Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association 58, 13–30 (1963)
Article MATH MathSciNet Google Scholar
Lai, T.L., Robbins, H.: Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics 6, 4–22 (1985)
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Computer and Automation Research Institute, of the Hungarian Academy of Sciences, Kende u. 13-17, Budapest, 1111, Hungary
András Antos & Csaba Szepesvári
Department of Computing Science, University of Alberta, Edmonton, T6G 2E8, Canada
Varun Grover & Csaba Szepesvári

Authors

András Antos
View author publications
You can also search for this author in PubMed Google Scholar
Varun Grover
View author publications
You can also search for this author in PubMed Google Scholar
Csaba Szepesvári
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computer Science and Engineering, University of California, San Diego, USA
Yoav Freund
Department of Computer Science and Information Theory, Department of Computer Science and Budapest University of Technology and Economics, Stoczek u. 2, 1521, Budapest, Hungary
László Györfi
Department of Math., Stat. and Comp. Sci,, University of Illinois, 851 S. Morgan, IL 60607-7045, Chicago, USA
György Turán
Division of Computer Science, Hokkaido University, N-14, W-9, 060-0814, Sapporo, Japan
Thomas Zeugmann

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Antos, A., Grover, V., Szepesvári, C. (2008). Active Learning in Multi-armed Bandits. In: Freund, Y., Györfi, L., Turán, G., Zeugmann, T. (eds) Algorithmic Learning Theory. ALT 2008. Lecture Notes in Computer Science(), vol 5254. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87987-9_25

Download citation

DOI: https://doi.org/10.1007/978-3-540-87987-9_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-87986-2
Online ISBN: 978-3-540-87987-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics