Skip to main content

Bandit Problems

  • Living reference work entry
  • First Online:

Abstract

The multi-armed bandit problem is a statistical decision model of an agent trying to optimize his decisions while improving his information at the same time. This classic problem has received much attention in economics as it concisely models the tradeoff between exploration (trying out each arm to find the best one) and exploitation (playing the arm believed to give the best payoff).

This chapter was originally published in The New Palgrave Dictionary of Economics, 2nd edition, 2008. Edited by Steven N. Durlauf and Lawrence E. Blume

This is a preview of subscription content, log in via an institution.

Bibliography

  • Banks, J., and R. Sundaram. 1992. Denumerable-armed bandits. Econometrica 60: 1071–1096.

    Article  Google Scholar 

  • Banks, J., and R. Sundaram. 1994. Switching costs and the Gittins index. Econometrica 62: 687–694.

    Article  Google Scholar 

  • Bergemann, D., and U. Hege. 1998. Dynamic venture capital financing, learning and moral hazard. Journal of Banking and Finance 22: 703–735.

    Article  Google Scholar 

  • Bergemann, D., and U. Hege. 2005. The financing of innovation: Learning and stopping. RAND Journal of Economics 36: 719–752.

    Google Scholar 

  • Bergemann, D., and J. Välimäki. 1996. Learning and strategic pricing. Econometrica 64: 1125–1149.

    Article  Google Scholar 

  • Bergemann, D., and J. Välimäki. 2000. Experimentation in markets. Review of Economic Studies 67: 213–234.

    Article  Google Scholar 

  • Bergemann, D., and J. Välimäki. 2001. Stationary multi choice bandit problems. Journal of Economic Dynamics and Control 25: 1585–1594.

    Article  Google Scholar 

  • Bergemann, D., and J. Välimäki. 2006. Dynamic price competition. Journal of Economic Theory 127: 232–263.

    Article  Google Scholar 

  • Berry, D., and B. Fristedt. 1985. Bandit problems. London: Chapman and Hall.

    Book  Google Scholar 

  • Bolton, P., and C. Harris. 1999. Strategic experimentation. Econometrica 67: 349–374.

    Article  Google Scholar 

  • Felli, L., and C. Harris. 1996. Job matching, learning and firm-specific human capital. Journal of Political Economy 104: 838–868.

    Article  Google Scholar 

  • Gittins, J. 1989. Allocation indices for multi-armed bandits. London: Wiley.

    Google Scholar 

  • Gittins, J., and D. Jones. 1974. A dynamic allocation index for the sequential allocation of experiments. In Progress in statistics, ed. J. Gani. Amsterdam: North-Holland.

    Google Scholar 

  • Hong, H., and S. Rady. 2002. Strategic trading and learning about liquidity. Journal of Financial Markets 5: 419–450.

    Article  Google Scholar 

  • Jovanovic, B. 1979. Job search and the theory of turnover. Journal of Political Economy 87: 972–990.

    Article  Google Scholar 

  • Karatzas, I. 1984. Gittins indices in the dynamic allocation problem for diffusion processes. Annals of Probability 12: 173–192.

    Article  Google Scholar 

  • Karoui, N., and I. Karatzas. 1997. Synchronization and optimality for multi-armed bandit problems in continuous time. Computational and Applied Mathematics 16: 117–152.

    Google Scholar 

  • Keller, G., and S. Rady. 1999. Optimal experimentation in a changing environment. Review of Economic Studies 66: 475–507.

    Article  Google Scholar 

  • Keller, G., S. Rady, and M. Cripps. 2005. Strategic experimentation with exponential bandits. Econometrica 73: 39–68.

    Article  Google Scholar 

  • McLennan, A. 1984. Price dispersion and incomplete learning in the long run. Journal of Economic Dynamics and Control 7: 331–347.

    Article  Google Scholar 

  • Miller, R. 1984. Job matching and occupational choice. Journal of Political Economy 92: 1086–1120.

    Article  Google Scholar 

  • Robbins, H. 1952. Some aspects of the sequential design of experiments. Bulletin of the American Mathematical Society 55: 527–535.

    Article  Google Scholar 

  • Roberts, K., and M. Weitzman. 1981. Funding criteria for research, development and exploration of projects. Econometrica 49: 1261–1288.

    Article  Google Scholar 

  • Rothschild, M. 1974. A two-armed bandit theory of market pricing. Journal of Economic Theory 9: 185–202.

    Article  Google Scholar 

  • Rustichini, A., and A. Wolinsky. 1995. Learning about variable demand in the long run. Journal of Economic Dynamics and Control 19: 1283–1292.

    Article  Google Scholar 

  • Varaiya, P., J. Walrand, and C. Buyukkoc. 1985. Extensions of the multiarmed bandit problem: The discounted case. IEEE Transactions on Automatic Control AC-30: 426–439.

    Article  Google Scholar 

  • Weber, R. 1992. On the Gittins index for multi-armed bandits. Annals of Applied Probability 2: 1024–1033.

    Article  Google Scholar 

  • Weitzman, M. 1979. Optimal search for the best alternative. Econometrica 47: 641–654.

    Article  Google Scholar 

  • Whittle, P. 1981. Arm-acquiring bandits. Annals of Probability 9: 284–292.

    Article  Google Scholar 

  • Whittle, P. 1982. Optimization over time. Vol. 1. Chichester: Wiley.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Copyright information

© 2008 The Author(s)

About this entry

Cite this entry

Bergemann, D., Välimäki, J. (2008). Bandit Problems. In: The New Palgrave Dictionary of Economics. Palgrave Macmillan, London. https://doi.org/10.1057/978-1-349-95121-5_2386-1

Download citation

  • DOI: https://doi.org/10.1057/978-1-349-95121-5_2386-1

  • Received:

  • Accepted:

  • Published:

  • Publisher Name: Palgrave Macmillan, London

  • Online ISBN: 978-1-349-95121-5

  • eBook Packages: Springer Reference Economics and FinanceReference Module Humanities and Social SciencesReference Module Business, Economics and Social Sciences

Publish with us

Policies and ethics