Skip to main content
Log in

A survey on the bandit problem with switching costs

  • Published:
De Economist Aims and scope Submit manuscript

Abstract

The paper surveys the literature on the bandit problem, focusing on its recent development in the presence of switching costs. Switching costs between arms makes not only the Gittins index policy suboptimal, but also renders the search for the optimal policy computationally infeasible. This survey will first discuss the decomposability properties of the arms that make the Gittins index policy optimal, and show how these properties break down upon the introduction of costs on switching arms. Having established the failure of the simple index policy, the survey focus on the recent efforts to overcome the difficulty of finding the optimal policy in the bandit problem with switching costs: characterization of the optimal policy, exact derivation of the optimal policy in the restricted environments, and lastly approximation of optimal policy. The advantages and disadvantages of the above approaches are discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

REFERENCES

  • Adler, P. S., A. Mandeldaum, V. Nguyen, and R. Schwerer (1995), 'From Project to Process Management: An Empirically-Developed Framework for Analyzing Product Development Time, ' Management Science, 41, pp. 458–484.

    Google Scholar 

  • Aghion, P., P. Bolton, and C. Harris (1991), 'Optimal Learning by Experiment, ' Review of Economic Studies, 58, pp. 621–654.

    Google Scholar 

  • Agrawal, R., M. Hegde, and D. Teneketzis (1988), 'Asymptotically Efficient Allocations Rules for Multi-armed Bandit Problem with Switching Cost, ' IEEE Transactions on Automatic Control, AC-32, pp. 968–982.

    Google Scholar 

  • Agrawal, R., M. Hegde, and D. Teneketzis (1990), 'Multi-armed Bandit Problems with Multiple Plays and Switching Cost, ' Stochastics and Stochastic Reports, 29, pp. 437–459.

    Google Scholar 

  • Asawa, M. and D. Teneketzis (1996), 'Multi-Armed Bandits with Switching Penalties, ' IEEE Transactions on Automatic Control, 41, pp. 328–348.

    Google Scholar 

  • Azoulay-Schwartz, R., S. Kraus, and J. Wilkenfeld (2003), 'Exploitation vs. Exploration:Choos-ing a Supplier in an Environment of Incomplete Information, ' mimeo.

  • Banks, J. S. (2003), 'Generalized Bandit Problems, ' mimeo.

  • Banks, J. S. and R. K. Sundaram (1992), 'Denumerable-Armed Bandits, ' Econometrica, 60, pp. 1071–1096.

    Google Scholar 

  • Banks, J. S. and R. K. Sundaram (1994), 'Switching Costs and the Gittins index, ' Econometrica, 62, pp. 687–694.

    Google Scholar 

  • Barron, J. M., D. A. Black, and M. A. Loewenstein (1993), 'Gender Difference in Training, Capital, and Wages, ' Journal of Human Resources, 28, pp. 343–364.

    Google Scholar 

  • Basu, A., A. Bose and J. K. Ghosh (1990), An Expository Review of Sequential Design and Allocation Rules, Technical Report 90–08, Department of Statistics, Purdue University.

  • Bellman, R. (1956), 'A Problem in the Sequential Design of Experiments, ' Sankhya, 16, pp. 221–229.

    Google Scholar 

  • Bellman, R. (1957), Dynamic Programming, New Jersey, Princeton University Press.

    Google Scholar 

  • Benkherouf, L. (1990), 'Optimal Stopping in Oil Exploration with Small and Large Oil fields, ' Probability in Engineering and Information Sciences, 28, pp. 529–543.

    Google Scholar 

  • Benkherouf, L. and J. A. Bather (1988), 'Oil Exploration:Sequential Decisions in the Face of Uncertainty, ' Journal of Applied Probability, 28, pp. 529–543.

    Google Scholar 

  • Benkherouf, L., K. D. Glazebrook, and R. W. Owen (1992), 'Gittins Indices and Oil Exploration, ' Journal of Royal Statistical Society Serial B, 54, pp. 229–241.

    Google Scholar 

  • Bergemann, D. and J. V¨alim¨aki (1999), 'Entry and Innovation in Vertically Differentiated Markets, ' mimeo.

  • Bergemann, D. and J. V¨alim¨aki, (2001), 'Stationary Multi Choice Bandit Problems, ' Journal of Economic Dynamics and Control, 25, pp. 1585–1594.

    Google Scholar 

  • Berninghaus, S., V. Seifert, and G. Hans (1987), 'International Migration under Incomplete Information, ' Schweizerische Zeitschrift fur Volkswirtschaft und Statistik, 123, pp. 199–218.

    Google Scholar 

  • Bertsimas, D. and J. Niño-Mora (1996), 'Conservation Laws, Extended Polymatroids and Multi-armed Bandit Problems, ' Mathematics of Operations Research, 21, pp. 257–306.

    Google Scholar 

  • Bertsimas, D. and J. Niño-Mora (2000), 'Restless Bandits, Linear Programming Relaxations, and A Primal-Dual Index Heuristic, ' Operations Research, 48, pp. 80–90.

    Google Scholar 

  • Berry, D. A. (1972), 'A Bernoulli Two-Armed Bandit, ' Annals of Mathematical Statistics, 43, pp. 871–897.

    Google Scholar 

  • Berry, D. A. and B. Fristedt (1985), Bandit Problems:Sequential Allocation of Experiments, London, Chapman and Hall.

    Google Scholar 

  • Black, D. A. and M. A. Loewenstein (1991), 'Self-enforcing Labor Contracts with Costly Mobility, ' Review of Labor Economics, 12, pp. 63–83.

    Google Scholar 

  • Brenner, T. and N. J. Vriend (2003), 'On the Behavior of Proposers in Ultimatum Games, ' mimeo.

  • Brent, R. P. (1973), Algorithms for Minimization Without Derivatives, New Jersey, Prentice-Hall.

    Google Scholar 

  • Brezzi, M. and T. L. Lai (2002), 'Optimal Learning and Experimentation in Ban-dit Problems, ' Journal of Economic Dynamics and Control, 27, pp. 87–108.

    Google Scholar 

  • Cvitani´c, J., L. Martellini, and F. Zapatero (2002), 'Optimal Active Management Fees, ' in: E. Y¨ucesan, C. H. Chen, J. L. Snowdon and J. M. Charnes, (eds. ), Proceedings of the 2002 Winter Simulation Conference.

  • Cowan, R. (1991), 'Tortoises and Hares:Choice Among Technologies of Unknown Merit, ' Economic Journal, 407, pp. 801–814.

    Google Scholar 

  • Cox, D. R. and W. L. Smith (1961), Queues, Monographs on Statistics and Applied Probability 2, New York, Chapman & Hall.

    Google Scholar 

  • Derman, C. (1962), 'On Sequential Decision and Markov Chains, ' Management Science, 9, pp. 16–24.

    Google Scholar 

  • Duenyas, I. and M. P. Van Oyen (1996), 'Heuristic Scheduling of Parallel Heterogeneous Queues with Set-Ups, ' Management Science, 42, pp. 814–829.

    Google Scholar 

  • Dusonchet, F. and M. O. Hongler (2000), 'Continuous time Restless Bandit an Dynamic Scheduling for Make-to-Stock Production, ' mimeo.

  • Dusonchet, F. and M. O. Hongler (2003), 'Optimal Hysteresis for a Class of Deterministic Deteriorating Two-Armed Bandit Problem with Switching Costs, ' Automatica, 39, pp. 1947–1955.

    Google Scholar 

  • Easley, D. and N. Kiefer (1988), 'Controlling a Stochastic Process with Unknown Parameters, ' Econometrica, 56, pp. 1045–1064.

    Google Scholar 

  • Ehsan, N. and M. Liu (2004), 'On the Optimality of an Index Policy for Bandwidth Allocation with Delayed State Observation and Differentiated Services, ' IEEE INFOCOM Conference.

  • El Karoui, N. and I. Karatzas (1997), 'Synchronization and Optimality for Multiarmed Bandit Problems in Continuous Time, ' Computational and Applied Mathematics, 16, pp. 117–151.

    Google Scholar 

  • Eswaran, M. (1994), 'Licensees as Entry Barriers, ' Canadian Journal of Economics, 27, pp. 673–688.

    Google Scholar 

  • Frostig, E. and G. Weiss (1999), 'Four Proofs of Gittins 'Multiarmed Bandit Theorem, ' Applied Probability Trust, pp. 1–20.

  • Gale, J., K. Binmore and L. Samuelson (1995), 'Learning to be imperfect:The Ultimatum Games, ' Games and Economic Behavior, 8, pp. 56–90.

    Google Scholar 

  • Gittins, J. C. and D. M. Jones (1974), 'A Dynamic Allocation Index for the Sequential Design of Experiments, ' in:European Meeting of Statisticians, J. Gani, K. Sarkadi and I. Vince, (eds. ), Progress in Statistics, Amsterdam, North-Holland, pp. 241–266.

  • Gittins, J. C. (1979), 'Bandit Processes and Dynamic Allocation Indices, ' Journal of Royal Statistical Society Serial B, 14, pp. 148–177.

    Google Scholar 

  • Gittins, J. C. (1989), Multi-armed Bandit Allocation Indices, New York, Wiley.

    Google Scholar 

  • Harrison, J. M. and J. A. Van Mieghem (1999), 'Multi-resource Investment Strategies:Operational Hedging under Demand Uncertainty, ' European Journal of Operational Research, 113, pp. 17–29.

    Google Scholar 

  • He, H. and R. S. Pindyck (1992), 'Investment in Flexible Production Capacity, ' Journal of Economic Dynamics and Control, 16, p. 575–599.

    Google Scholar 

  • Ishikida, A. T. and P. Varaiya (1994), 'Multi-armed Bandit Problem Revisited, ' Journal of Optimization Theory and Applications, 83, pp. 113–154.

    Google Scholar 

  • Johnson, W. R. (1978), 'A Theory of Job Shopping, ' Quarterly Journal of Economics, 92, pp. 261–278.

    Google Scholar 

  • Jonsson, M. and J. Ve¡ce¡r (2004), 'Insider Trading in Convergent Markets, ' mimeo.

  • Jovanovic, B. (1979), 'Job Matching and the Theory of Turnover, ' Journal of Political Economy, 87, pp. 972–990.

    Google Scholar 

  • Jovanovic, B. (1984), 'Matching, Turnover, and Unemployment, ' Journal of Political Economy, 92, pp. 108–122.

    Google Scholar 

  • Jun, T. (2001), Essays on Decision Theory:Effects of Changes in Environment on Decision, Ph. D. thesis, Columbia University, New York.

    Google Scholar 

  • Jung, A. (2003), 'Are Product Innovation and Flexible Technology Complements?, ' mimeo.

  • Karaesmen, F. and S. M. Gupta (1997), 'Control of Arrivals in a Finite Buffered Queue with Setup Costs, ' Journal of the Operational Research Society, 48, pp. 1113–1122.

    Google Scholar 

  • Karatzas, I. (1984), 'Gittins Indices in the Dynamic Allocation Problem for Diffusion Processes, ' Annals of Probability, 12, pp. 173–192.

    Google Scholar 

  • Kast, R., A. Lapeid and S. Pardo (2003), 'Virtual Underlying Security, ' mimeo.

  • Kavadias, S. K. and C. H. Loch (2000), Dynamic Resource Allocation Policy in Multiproject Environments, INSEAD Working Papers, 2000/10/TM.

  • Keller, G. and A. Oldale (2003), 'Branching Bandits: A Sequential Search Process with Correlated Pay-offs, ' Journal of Economic Theory, 113, pp. 302–315.

    Google Scholar 

  • Keller, G. and S. Rady (1999), 'Optimal Experimentation in a Changing Environment, ' Review of Economic Studies, 66, pp. 475–507.

    Google Scholar 

  • Kennan, J. and J. R. Walker (2003), The E´f1ect of Expected Income on Individual Migration Decisions, NBER Working Papers 9585.

  • Klimenko, M. M. (2003), 'Industrial Targeting, Experimentation and Long-run Specialization, ' Journal of Development Economics, forthcoming.

  • Kolonko, M. and H. Benzing (1985), 'The Sequential Design of Bernoulli Experiments Including Switching Costs, ' Operations Research, 2, pp. 412–426.

    Google Scholar 

  • Koole, G. (1997), 'Assigning a Single Server to Inhomogeneous Queues with Switching Costs, ' mimeo.

  • Kr¨ahmer, D. (2003), 'Entry and Experimentation in Oligopolistic Markets for Experience Goods, ' International Journal of Industrial Organization, 21, pp. 1201–1213.

    Google Scholar 

  • Kuhn, P. (1993), 'Demographic Groups and Personnel Policy, ' Labour Economics, 1, pp. 49–70.

    Google Scholar 

  • Kulatilaka, N. (1988), 'Valuing the Flexibility of Flexible Manufacturing Systems, ' IEEE Transactions on Engineering Management, 35, pp. 250–257.

    Google Scholar 

  • Lai, T. L. (1987), 'Adaptive Treatment Allocation and the Multi-Armed Bandit Problem, ' Annals of Statistics, 15, pp. 1091–1114.

    Google Scholar 

  • Lai, T. L. and H. Robbins (1985), 'Asymptotically Efficient Adaptive Allocation Rules, ' Advances in Applied Mathematics, 6, pp. 4–42.

    Google Scholar 

  • Land, A. H. and A. G. Doig (1960), 'An Automatic Method for Solving Discrete Programming Problems, ' Econometrica, 28, pp. 497–520.

    Google Scholar 

  • Lippman, S. A. and J. W. Mamer (1993), 'Preemptive Innovation, ' Journal of Economic Theory, 61, pp. 104–119.

    Google Scholar 

  • Loch, C. H. and S. K. Kavadias (2002), 'Dynamic Portfolio Selection of NPD Programs Using Marginal Returns, ' Management Science, 48, pp. 1227–1241.

    Google Scholar 

  • Lott, C. and D. Tenekezis (2000), 'On the Optimality of An Index Rule in Multichannel Allocation for Single-Hop Mobile Networks with Multiple Service Classes, ' Probability in the Engineering and Informational Sciences, 14, pp. 259–197.

    Google Scholar 

  • MacDonald, G. M. (1980), 'Person-Specific Information in the Labor Market, ' Journal of Political Economy, 92, pp. 1086–1120.

    Google Scholar 

  • Manne, A. (1960), 'Linear Programming and Sequential Decisions, ' Management Science, 6, pp. 259–267.

    Google Scholar 

  • McCall, B. P. and J. J. McCall (1987), 'A Sequential Study of Migration and Job Search, ' Journal of Labor Economics, 5, pp. 452–476.

    Google Scholar 

  • McDonald, R. and D. Siegel (1986), 'The Value of Waiting to Invest, ' Quarterly Journal of Economics, 101, pp. 707–728.

    Google Scholar 

  • McLennan, A. (1984), 'Price Dispersion and Incomplete Learning in the Long Run, ' Journal of Economic Dynamics and Control, 7, pp. 331–347.

    Google Scholar 

  • Miller, R. A. (1984), 'Job Matching and Occupational Choice, ' Journal of Political Economy, 92, pp. 1086–1120.

    Google Scholar 

  • Mortensen, D. T. (1988), 'Wages, Separations, and Job Tenure:On-the-Job Specific Training or Matching?, ' Journal of Labor Economics, 6, pp. 445–471.

    Google Scholar 

  • Murnane, R., F. Levy and J. Willett (1995), The Growing Importance of Cognitive Skills in Wage Determination, NBER Working Papers, pp. 50–76.

  • Niño-Mora, J. (2001), 'Restless Bandit, Partial Conservation Laws and Indexibility, ' Advances in Applied Probability, 33, pp. 77–98.

    Google Scholar 

  • Niño-Mora, J. (2004a). Restless Bandit Marginal Productivity Indices I:Single Project Case and Optimal Control of a Make-To-Stock M/G/1 Queue, Universidad Carlos III, Departamento de Estad'istica y Econometr´ia, WS040801.

  • Niño-Mora, J. (2004b). Restless BanditMarginal Productivity Indices II:Rest-less Bandit Marginal Productivity Indices II:Multi-project Case and Scheduling a Multiclass Make-To-Order/-Stock M/G/1 Queue, Universidad Carlos III, Departamento de Estad´istica y Econometr´aa, WS040902.

  • O'Flaherty, B. (1987). Some Results on Two-Armed Bandits When Both Projects Vary, Columbia Department of Economics Working Paper No. 359.

  • Pandelis, D. G. and D. Teneketzis (1995), 'On the Optimality of the Gittins Index Rule in Multi-armed Bandits with Multiple Plays, ' Proceedings of the 34th Conference on Decision & Control, WP13 5:30, pp. 1408–1414.

    Google Scholar 

  • Press, W. H., B. P. Flannery, S. A. Teukolsky and W. T. Vetterling (1992), 'Secant Method, False Position Method, and Ridders 'Method,' in: The Art of Scientific Computing, 2nd ed., England, Cambridge University Press. pp. 347–352.

    Google Scholar 

  • Puterman, M. L. (1994), Makov Decision Processes:Discrete Stochastic Dynamic Programming, New York, Wiley.

    Google Scholar 

  • Reiman, M. I. and L. M. Wein (1998), 'Dynamic Scheduling of a Two-Class Queue with Setups, ' Operations Research, 46, pp. 532–547.

    Google Scholar 

  • Ross, S. (1983), Introduction to Stochastic Dynamic Programming, New York, Academic Press.

    Google Scholar 

  • Rothschild, M. (1974), 'A Two-armed Bandit Theory of Market Pricing, ' Journal of Economic Theory, 9, pp. 185–202.

    Google Scholar 

  • Rustichini, A. and A. Wolinsky (1995), 'Learning about Variable Demand in the Long Run, ' Journal of Economic Dynamics and Control, 19, pp. 1283–1292.

    Google Scholar 

  • Santos, C. and M. Magazine (1985), 'Batching in Single Operation Manufacturing System, ' Operations Research Letters, 4, pp. 99–103.

    Google Scholar 

  • Schlag, K. H. (1998), 'Why Imitate, and If So, How?A Bounded Rational Approach to Multi-armed Bandits, ' Journal of Economic Theory, 78, pp. 130–156.

    Google Scholar 

  • Schlag, K. H. (2003), 'How to Minimize Maximum Regret under Repeated Decision-Making, ' mimeo.

  • Smith, L. (1995), 'Optimal Job Search in a Changing World, ' mimeo.

  • Smith. L. and P. Sorensen (2001), 'Informational Herding and Optimal Experimentation, University of Michigan, ' mimeo.

  • Smith, W. E. (1956), 'Various Optimizers for Single-State Production, ' Naval Research Logistics Quarterly, 3, pp. 59–66.

    Google Scholar 

  • Soberman, D. A. (1999), 'Joint Research and Development:The Lure of Dominance, ' INSEAD Working Paper, 99/18/MKT.

  • Subramanian, A. (2002), Managerial Flexibility, Agency Costs and Optimal Capital Structure, mimeo.

  • Thaler, R. H. (1988), 'The Ultimatum Game, ' Journal of Economic Perspective, 2, pp. 195–206.

    Google Scholar 

  • Tuluca, S. and P. Stalinski (2004), 'The Manufacturing Flexibility to Switch Products:Valuation and Optimal Strategy, ' mimeo.

  • Van Oyen, M. P. and J. Pichitlamken (1999), 'Properties of Optimal Weighted Flowtime Policies with a Makespan Constraint and Set-up Times, ' mimeo.

  • Van Oyen, M. P. and D. Teneketzis (1993), 'Optimal Stochastic Scheduling of Forest Network with Switching Penalties, ' mimeo.

  • Van Oyen, M. P., D. G. Pandelis and D. Teneketzis (1992), 'Optimality of Index Policies for Stochastic Scheduling with Switching Penalties, ' Journal of Applied Probability, 29, pp. 957–966.

    Google Scholar 

  • Van Oyen, M. P. and J. Pichitlamken (1999), 'Properties of Optimal Weighted Flowtime Policies with a Makespan Constraint and Set-up Times, ' Department of Industrial Engineering and Management Science, mimeo.

  • Viscusi, W. K. (1980), 'A Theory of Job Shopping: A Bayesian Perspective, ' Quarterly Journal of Economics, 94, pp. 609–614.

    Google Scholar 

  • Waldman, M. (1984), 'Job Assignments, Signaling and Efficiency, ' Rand Journal of Economics, 15, pp. 255–267.

    Google Scholar 

  • Weber, R. R. (1992), 'On the Gittins index for Multiarmed Bandits, ' Annals of Probability, 2, pp. 1024–1033.

    Google Scholar 

  • Weber, R. R. and G. Weiss (1990), 'On an Index Policy for Restless Bandits, ' Journal of Applied Probability, 27, pp. 637–648.

    Google Scholar 

  • Weiss, G. (1995), 'On Almost Optimal Priority Rules for Preemptive Scheduling of Stochastic Jobs on Parallel Machines, ' Advances in Applied Probability, 27, pp. 827–845.

    Google Scholar 

  • Weitzman, M. L. (1979), 'Optimal Search for the Best Alternative, ' Econometrica, 47, pp. 641–654.

    Google Scholar 

  • Wilk, S. and P. Sackett (1995), 'A Longitudinal Analysis of Ability-Job Complexity Fit and Job Change, ' mimeo.

  • Wilk, S., L. Desmaris, and P. Sackett (1995), 'Gravitation to Jobs Commensurate with Ability: Longitudinal and Cross-Sectional Tests, ' Journal of Applied Psychology, 80, pp. 79–85.

    Google Scholar 

  • Whittle, P. (1980), 'Multi-armed Bandits and the Gittins Index, ' Journal of Royal Statistical Society Serial B, 42, pp. 143–149.

    Google Scholar 

  • Whittle, P. (1982), Optimization Over Time:Dynamic Programming and Stochastic Control, Vol. 1, New York, Wiley.

    Google Scholar 

  • Whittle, P. (1988), 'Restless Bandits:Activity Allocation in a Changing World, ' Journal of Applied Probability, 25A, pp. 287–298.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jun, T. A survey on the bandit problem with switching costs. De Economist 152, 513–541 (2004). https://doi.org/10.1007/s10645-004-2477-z

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10645-004-2477-z

Navigation