Skip to main content

Asymptotically efficient rules in multiarmed bandit problems

  • Conference paper
  • First Online:
Modelling and Adaptive Control

Part of the book series: Lecture Notes in Control and Information Sciences ((LNCIS,volume 105))

  • 176 Accesses

Abstract

Variations of the multiarmed bandit problem are introduced and a sequence of results leading to the work of Lai and Robbins and its extentions is summarized. The guiding concern is to determine the optimal tradeoff between taking actions that maximize immediate rewards based on current information about unknown system parameters and making experiments that may reduce immediate rewards but improve parameter estimates.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Anantharam. V., Ph.D. Dissertation, Univ. of California, Berkeley, 1986.

    Google Scholar 

  2. Gittins, J.C., "Bandit processes and dynamic allocation indices," J. Roy. Statist. Soc., vol. 41, 1979, 148–177.

    Google Scholar 

  3. Gittins, J.C. and D.M. Jones, "A dynamic allocation index for the sequential design of experiments," in Gani, J., K. Sarkadi and I. Vince, Eds., Progress in Statistics, Euro. Meet. Statist., vol. 1, New York, North-Holland, 1972, 241–266.

    Google Scholar 

  4. Lai, T.L., "Some thoughts on stochastic adaptive control," Proc. 23rd IEEE Conf. on Decision and Control, Las Vegas, Dec. 1984, 51–56.

    Google Scholar 

  5. Lai, T.L. and H. Robbins, "Asymptotically efficient adaptive allocation rules," Adv. Appl. Math., vol. 6, 1985, 4–22.

    Google Scholar 

  6. Lai, T.L. and H. Robbins, "Asymptotically efficient allocation of treatments in sequential experiments," in Santner, T.J. and A.C. Tamhane (eds) Design of Experiments, New York, Marcel Dekker, 1985, 127–142.

    Google Scholar 

  7. Varaiya, P., J.C. Walrand and C. Buyukkoc. "Extensions of the multiarmed bandit problem," IEEE Trans. Automat. Contr., vol. AC-30, May 1985, 426–439.

    Google Scholar 

  8. Weitzman, M.L., "Optimal search for the best alternative," Econometrica, vol. 47, 1979, 641–654.

    Google Scholar 

  9. Whittle, P., "Multi-armed bandits and the Gittins index.” J. Roy. Statist. Soc., vol. 42, 1980, 143–149.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Christopher Ian Byrnes Alexander B. Kurzhanski

Rights and permissions

Reprints and permissions

Copyright information

© 1988 Springer-Verlag

About this paper

Cite this paper

Anantharam, V., Varaiya, P. (1988). Asymptotically efficient rules in multiarmed bandit problems. In: Byrnes, C.I., Kurzhanski, A.B. (eds) Modelling and Adaptive Control. Lecture Notes in Control and Information Sciences, vol 105. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0043173

Download citation

  • DOI: https://doi.org/10.1007/BFb0043173

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-19019-6

  • Online ISBN: 978-3-540-38904-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics