Asymptotically efficient rules in multiarmed bandit problems

Anantharam, V.; Varaiya, P.

doi:10.1007/BFb0043173

V. Anantharam¹ &
P. Varaiya¹

Part of the book series: Lecture Notes in Control and Information Sciences ((LNCIS,volume 105))

176 Accesses

Abstract

Variations of the multiarmed bandit problem are introduced and a sequence of results leading to the work of Lai and Robbins and its extentions is summarized. The guiding concern is to determine the optimal tradeoff between taking actions that maximize immediate rewards based on current information about unknown system parameters and making experiments that may reduce immediate rewards but improve parameter estimates.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Anantharam. V., Ph.D. Dissertation, Univ. of California, Berkeley, 1986.
Google Scholar
Gittins, J.C., "Bandit processes and dynamic allocation indices," J. Roy. Statist. Soc., vol. 41, 1979, 148–177.
Google Scholar
Gittins, J.C. and D.M. Jones, "A dynamic allocation index for the sequential design of experiments," in Gani, J., K. Sarkadi and I. Vince, Eds., Progress in Statistics, Euro. Meet. Statist., vol. 1, New York, North-Holland, 1972, 241–266.
Google Scholar
Lai, T.L., "Some thoughts on stochastic adaptive control," Proc. 23rd IEEE Conf. on Decision and Control, Las Vegas, Dec. 1984, 51–56.
Google Scholar
Lai, T.L. and H. Robbins, "Asymptotically efficient adaptive allocation rules," Adv. Appl. Math., vol. 6, 1985, 4–22.
Google Scholar
Lai, T.L. and H. Robbins, "Asymptotically efficient allocation of treatments in sequential experiments," in Santner, T.J. and A.C. Tamhane (eds) Design of Experiments, New York, Marcel Dekker, 1985, 127–142.
Google Scholar
Varaiya, P., J.C. Walrand and C. Buyukkoc. "Extensions of the multiarmed bandit problem," IEEE Trans. Automat. Contr., vol. AC-30, May 1985, 426–439.
Google Scholar
Weitzman, M.L., "Optimal search for the best alternative," Econometrica, vol. 47, 1979, 641–654.
Google Scholar
Whittle, P., "Multi-armed bandits and the Gittins index.” J. Roy. Statist. Soc., vol. 42, 1980, 143–149.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering and Computer Sciences and Electronics Research Laboratory, University of California, 94720, Berkeley, CA
V. Anantharam & P. Varaiya

Authors

V. Anantharam
View author publications
You can also search for this author in PubMed Google Scholar
P. Varaiya
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Christopher Ian Byrnes Alexander B. Kurzhanski

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Anantharam, V., Varaiya, P. (1988). Asymptotically efficient rules in multiarmed bandit problems. In: Byrnes, C.I., Kurzhanski, A.B. (eds) Modelling and Adaptive Control. Lecture Notes in Control and Information Sciences, vol 105. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0043173

Download citation

DOI: https://doi.org/10.1007/BFb0043173
Published: 19 January 2006
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-19019-6
Online ISBN: 978-3-540-38904-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics