Abstract
The paper surveys the literature on the bandit problem, focusing on its recent development in the presence of switching costs. Switching costs between arms makes not only the Gittins index policy suboptimal, but also renders the search for the optimal policy computationally infeasible. This survey will first discuss the decomposability properties of the arms that make the Gittins index policy optimal, and show how these properties break down upon the introduction of costs on switching arms. Having established the failure of the simple index policy, the survey focus on the recent efforts to overcome the difficulty of finding the optimal policy in the bandit problem with switching costs: characterization of the optimal policy, exact derivation of the optimal policy in the restricted environments, and lastly approximation of optimal policy. The advantages and disadvantages of the above approaches are discussed.
Similar content being viewed by others
REFERENCES
Adler, P. S., A. Mandeldaum, V. Nguyen, and R. Schwerer (1995), 'From Project to Process Management: An Empirically-Developed Framework for Analyzing Product Development Time, ' Management Science, 41, pp. 458–484.
Aghion, P., P. Bolton, and C. Harris (1991), 'Optimal Learning by Experiment, ' Review of Economic Studies, 58, pp. 621–654.
Agrawal, R., M. Hegde, and D. Teneketzis (1988), 'Asymptotically Efficient Allocations Rules for Multi-armed Bandit Problem with Switching Cost, ' IEEE Transactions on Automatic Control, AC-32, pp. 968–982.
Agrawal, R., M. Hegde, and D. Teneketzis (1990), 'Multi-armed Bandit Problems with Multiple Plays and Switching Cost, ' Stochastics and Stochastic Reports, 29, pp. 437–459.
Asawa, M. and D. Teneketzis (1996), 'Multi-Armed Bandits with Switching Penalties, ' IEEE Transactions on Automatic Control, 41, pp. 328–348.
Azoulay-Schwartz, R., S. Kraus, and J. Wilkenfeld (2003), 'Exploitation vs. Exploration:Choos-ing a Supplier in an Environment of Incomplete Information, ' mimeo.
Banks, J. S. (2003), 'Generalized Bandit Problems, ' mimeo.
Banks, J. S. and R. K. Sundaram (1992), 'Denumerable-Armed Bandits, ' Econometrica, 60, pp. 1071–1096.
Banks, J. S. and R. K. Sundaram (1994), 'Switching Costs and the Gittins index, ' Econometrica, 62, pp. 687–694.
Barron, J. M., D. A. Black, and M. A. Loewenstein (1993), 'Gender Difference in Training, Capital, and Wages, ' Journal of Human Resources, 28, pp. 343–364.
Basu, A., A. Bose and J. K. Ghosh (1990), An Expository Review of Sequential Design and Allocation Rules, Technical Report 90–08, Department of Statistics, Purdue University.
Bellman, R. (1956), 'A Problem in the Sequential Design of Experiments, ' Sankhya, 16, pp. 221–229.
Bellman, R. (1957), Dynamic Programming, New Jersey, Princeton University Press.
Benkherouf, L. (1990), 'Optimal Stopping in Oil Exploration with Small and Large Oil fields, ' Probability in Engineering and Information Sciences, 28, pp. 529–543.
Benkherouf, L. and J. A. Bather (1988), 'Oil Exploration:Sequential Decisions in the Face of Uncertainty, ' Journal of Applied Probability, 28, pp. 529–543.
Benkherouf, L., K. D. Glazebrook, and R. W. Owen (1992), 'Gittins Indices and Oil Exploration, ' Journal of Royal Statistical Society Serial B, 54, pp. 229–241.
Bergemann, D. and J. V¨alim¨aki (1999), 'Entry and Innovation in Vertically Differentiated Markets, ' mimeo.
Bergemann, D. and J. V¨alim¨aki, (2001), 'Stationary Multi Choice Bandit Problems, ' Journal of Economic Dynamics and Control, 25, pp. 1585–1594.
Berninghaus, S., V. Seifert, and G. Hans (1987), 'International Migration under Incomplete Information, ' Schweizerische Zeitschrift fur Volkswirtschaft und Statistik, 123, pp. 199–218.
Bertsimas, D. and J. Niño-Mora (1996), 'Conservation Laws, Extended Polymatroids and Multi-armed Bandit Problems, ' Mathematics of Operations Research, 21, pp. 257–306.
Bertsimas, D. and J. Niño-Mora (2000), 'Restless Bandits, Linear Programming Relaxations, and A Primal-Dual Index Heuristic, ' Operations Research, 48, pp. 80–90.
Berry, D. A. (1972), 'A Bernoulli Two-Armed Bandit, ' Annals of Mathematical Statistics, 43, pp. 871–897.
Berry, D. A. and B. Fristedt (1985), Bandit Problems:Sequential Allocation of Experiments, London, Chapman and Hall.
Black, D. A. and M. A. Loewenstein (1991), 'Self-enforcing Labor Contracts with Costly Mobility, ' Review of Labor Economics, 12, pp. 63–83.
Brenner, T. and N. J. Vriend (2003), 'On the Behavior of Proposers in Ultimatum Games, ' mimeo.
Brent, R. P. (1973), Algorithms for Minimization Without Derivatives, New Jersey, Prentice-Hall.
Brezzi, M. and T. L. Lai (2002), 'Optimal Learning and Experimentation in Ban-dit Problems, ' Journal of Economic Dynamics and Control, 27, pp. 87–108.
Cvitani´c, J., L. Martellini, and F. Zapatero (2002), 'Optimal Active Management Fees, ' in: E. Y¨ucesan, C. H. Chen, J. L. Snowdon and J. M. Charnes, (eds. ), Proceedings of the 2002 Winter Simulation Conference.
Cowan, R. (1991), 'Tortoises and Hares:Choice Among Technologies of Unknown Merit, ' Economic Journal, 407, pp. 801–814.
Cox, D. R. and W. L. Smith (1961), Queues, Monographs on Statistics and Applied Probability 2, New York, Chapman & Hall.
Derman, C. (1962), 'On Sequential Decision and Markov Chains, ' Management Science, 9, pp. 16–24.
Duenyas, I. and M. P. Van Oyen (1996), 'Heuristic Scheduling of Parallel Heterogeneous Queues with Set-Ups, ' Management Science, 42, pp. 814–829.
Dusonchet, F. and M. O. Hongler (2000), 'Continuous time Restless Bandit an Dynamic Scheduling for Make-to-Stock Production, ' mimeo.
Dusonchet, F. and M. O. Hongler (2003), 'Optimal Hysteresis for a Class of Deterministic Deteriorating Two-Armed Bandit Problem with Switching Costs, ' Automatica, 39, pp. 1947–1955.
Easley, D. and N. Kiefer (1988), 'Controlling a Stochastic Process with Unknown Parameters, ' Econometrica, 56, pp. 1045–1064.
Ehsan, N. and M. Liu (2004), 'On the Optimality of an Index Policy for Bandwidth Allocation with Delayed State Observation and Differentiated Services, ' IEEE INFOCOM Conference.
El Karoui, N. and I. Karatzas (1997), 'Synchronization and Optimality for Multiarmed Bandit Problems in Continuous Time, ' Computational and Applied Mathematics, 16, pp. 117–151.
Eswaran, M. (1994), 'Licensees as Entry Barriers, ' Canadian Journal of Economics, 27, pp. 673–688.
Frostig, E. and G. Weiss (1999), 'Four Proofs of Gittins 'Multiarmed Bandit Theorem, ' Applied Probability Trust, pp. 1–20.
Gale, J., K. Binmore and L. Samuelson (1995), 'Learning to be imperfect:The Ultimatum Games, ' Games and Economic Behavior, 8, pp. 56–90.
Gittins, J. C. and D. M. Jones (1974), 'A Dynamic Allocation Index for the Sequential Design of Experiments, ' in:European Meeting of Statisticians, J. Gani, K. Sarkadi and I. Vince, (eds. ), Progress in Statistics, Amsterdam, North-Holland, pp. 241–266.
Gittins, J. C. (1979), 'Bandit Processes and Dynamic Allocation Indices, ' Journal of Royal Statistical Society Serial B, 14, pp. 148–177.
Gittins, J. C. (1989), Multi-armed Bandit Allocation Indices, New York, Wiley.
Harrison, J. M. and J. A. Van Mieghem (1999), 'Multi-resource Investment Strategies:Operational Hedging under Demand Uncertainty, ' European Journal of Operational Research, 113, pp. 17–29.
He, H. and R. S. Pindyck (1992), 'Investment in Flexible Production Capacity, ' Journal of Economic Dynamics and Control, 16, p. 575–599.
Ishikida, A. T. and P. Varaiya (1994), 'Multi-armed Bandit Problem Revisited, ' Journal of Optimization Theory and Applications, 83, pp. 113–154.
Johnson, W. R. (1978), 'A Theory of Job Shopping, ' Quarterly Journal of Economics, 92, pp. 261–278.
Jonsson, M. and J. Ve¡ce¡r (2004), 'Insider Trading in Convergent Markets, ' mimeo.
Jovanovic, B. (1979), 'Job Matching and the Theory of Turnover, ' Journal of Political Economy, 87, pp. 972–990.
Jovanovic, B. (1984), 'Matching, Turnover, and Unemployment, ' Journal of Political Economy, 92, pp. 108–122.
Jun, T. (2001), Essays on Decision Theory:Effects of Changes in Environment on Decision, Ph. D. thesis, Columbia University, New York.
Jung, A. (2003), 'Are Product Innovation and Flexible Technology Complements?, ' mimeo.
Karaesmen, F. and S. M. Gupta (1997), 'Control of Arrivals in a Finite Buffered Queue with Setup Costs, ' Journal of the Operational Research Society, 48, pp. 1113–1122.
Karatzas, I. (1984), 'Gittins Indices in the Dynamic Allocation Problem for Diffusion Processes, ' Annals of Probability, 12, pp. 173–192.
Kast, R., A. Lapeid and S. Pardo (2003), 'Virtual Underlying Security, ' mimeo.
Kavadias, S. K. and C. H. Loch (2000), Dynamic Resource Allocation Policy in Multiproject Environments, INSEAD Working Papers, 2000/10/TM.
Keller, G. and A. Oldale (2003), 'Branching Bandits: A Sequential Search Process with Correlated Pay-offs, ' Journal of Economic Theory, 113, pp. 302–315.
Keller, G. and S. Rady (1999), 'Optimal Experimentation in a Changing Environment, ' Review of Economic Studies, 66, pp. 475–507.
Kennan, J. and J. R. Walker (2003), The E´f1ect of Expected Income on Individual Migration Decisions, NBER Working Papers 9585.
Klimenko, M. M. (2003), 'Industrial Targeting, Experimentation and Long-run Specialization, ' Journal of Development Economics, forthcoming.
Kolonko, M. and H. Benzing (1985), 'The Sequential Design of Bernoulli Experiments Including Switching Costs, ' Operations Research, 2, pp. 412–426.
Koole, G. (1997), 'Assigning a Single Server to Inhomogeneous Queues with Switching Costs, ' mimeo.
Kr¨ahmer, D. (2003), 'Entry and Experimentation in Oligopolistic Markets for Experience Goods, ' International Journal of Industrial Organization, 21, pp. 1201–1213.
Kuhn, P. (1993), 'Demographic Groups and Personnel Policy, ' Labour Economics, 1, pp. 49–70.
Kulatilaka, N. (1988), 'Valuing the Flexibility of Flexible Manufacturing Systems, ' IEEE Transactions on Engineering Management, 35, pp. 250–257.
Lai, T. L. (1987), 'Adaptive Treatment Allocation and the Multi-Armed Bandit Problem, ' Annals of Statistics, 15, pp. 1091–1114.
Lai, T. L. and H. Robbins (1985), 'Asymptotically Efficient Adaptive Allocation Rules, ' Advances in Applied Mathematics, 6, pp. 4–42.
Land, A. H. and A. G. Doig (1960), 'An Automatic Method for Solving Discrete Programming Problems, ' Econometrica, 28, pp. 497–520.
Lippman, S. A. and J. W. Mamer (1993), 'Preemptive Innovation, ' Journal of Economic Theory, 61, pp. 104–119.
Loch, C. H. and S. K. Kavadias (2002), 'Dynamic Portfolio Selection of NPD Programs Using Marginal Returns, ' Management Science, 48, pp. 1227–1241.
Lott, C. and D. Tenekezis (2000), 'On the Optimality of An Index Rule in Multichannel Allocation for Single-Hop Mobile Networks with Multiple Service Classes, ' Probability in the Engineering and Informational Sciences, 14, pp. 259–197.
MacDonald, G. M. (1980), 'Person-Specific Information in the Labor Market, ' Journal of Political Economy, 92, pp. 1086–1120.
Manne, A. (1960), 'Linear Programming and Sequential Decisions, ' Management Science, 6, pp. 259–267.
McCall, B. P. and J. J. McCall (1987), 'A Sequential Study of Migration and Job Search, ' Journal of Labor Economics, 5, pp. 452–476.
McDonald, R. and D. Siegel (1986), 'The Value of Waiting to Invest, ' Quarterly Journal of Economics, 101, pp. 707–728.
McLennan, A. (1984), 'Price Dispersion and Incomplete Learning in the Long Run, ' Journal of Economic Dynamics and Control, 7, pp. 331–347.
Miller, R. A. (1984), 'Job Matching and Occupational Choice, ' Journal of Political Economy, 92, pp. 1086–1120.
Mortensen, D. T. (1988), 'Wages, Separations, and Job Tenure:On-the-Job Specific Training or Matching?, ' Journal of Labor Economics, 6, pp. 445–471.
Murnane, R., F. Levy and J. Willett (1995), The Growing Importance of Cognitive Skills in Wage Determination, NBER Working Papers, pp. 50–76.
Niño-Mora, J. (2001), 'Restless Bandit, Partial Conservation Laws and Indexibility, ' Advances in Applied Probability, 33, pp. 77–98.
Niño-Mora, J. (2004a). Restless Bandit Marginal Productivity Indices I:Single Project Case and Optimal Control of a Make-To-Stock M/G/1 Queue, Universidad Carlos III, Departamento de Estad'istica y Econometr´ia, WS040801.
Niño-Mora, J. (2004b). Restless BanditMarginal Productivity Indices II:Rest-less Bandit Marginal Productivity Indices II:Multi-project Case and Scheduling a Multiclass Make-To-Order/-Stock M/G/1 Queue, Universidad Carlos III, Departamento de Estad´istica y Econometr´aa, WS040902.
O'Flaherty, B. (1987). Some Results on Two-Armed Bandits When Both Projects Vary, Columbia Department of Economics Working Paper No. 359.
Pandelis, D. G. and D. Teneketzis (1995), 'On the Optimality of the Gittins Index Rule in Multi-armed Bandits with Multiple Plays, ' Proceedings of the 34th Conference on Decision & Control, WP13 5:30, pp. 1408–1414.
Press, W. H., B. P. Flannery, S. A. Teukolsky and W. T. Vetterling (1992), 'Secant Method, False Position Method, and Ridders 'Method,' in: The Art of Scientific Computing, 2nd ed., England, Cambridge University Press. pp. 347–352.
Puterman, M. L. (1994), Makov Decision Processes:Discrete Stochastic Dynamic Programming, New York, Wiley.
Reiman, M. I. and L. M. Wein (1998), 'Dynamic Scheduling of a Two-Class Queue with Setups, ' Operations Research, 46, pp. 532–547.
Ross, S. (1983), Introduction to Stochastic Dynamic Programming, New York, Academic Press.
Rothschild, M. (1974), 'A Two-armed Bandit Theory of Market Pricing, ' Journal of Economic Theory, 9, pp. 185–202.
Rustichini, A. and A. Wolinsky (1995), 'Learning about Variable Demand in the Long Run, ' Journal of Economic Dynamics and Control, 19, pp. 1283–1292.
Santos, C. and M. Magazine (1985), 'Batching in Single Operation Manufacturing System, ' Operations Research Letters, 4, pp. 99–103.
Schlag, K. H. (1998), 'Why Imitate, and If So, How?A Bounded Rational Approach to Multi-armed Bandits, ' Journal of Economic Theory, 78, pp. 130–156.
Schlag, K. H. (2003), 'How to Minimize Maximum Regret under Repeated Decision-Making, ' mimeo.
Smith, L. (1995), 'Optimal Job Search in a Changing World, ' mimeo.
Smith. L. and P. Sorensen (2001), 'Informational Herding and Optimal Experimentation, University of Michigan, ' mimeo.
Smith, W. E. (1956), 'Various Optimizers for Single-State Production, ' Naval Research Logistics Quarterly, 3, pp. 59–66.
Soberman, D. A. (1999), 'Joint Research and Development:The Lure of Dominance, ' INSEAD Working Paper, 99/18/MKT.
Subramanian, A. (2002), Managerial Flexibility, Agency Costs and Optimal Capital Structure, mimeo.
Thaler, R. H. (1988), 'The Ultimatum Game, ' Journal of Economic Perspective, 2, pp. 195–206.
Tuluca, S. and P. Stalinski (2004), 'The Manufacturing Flexibility to Switch Products:Valuation and Optimal Strategy, ' mimeo.
Van Oyen, M. P. and J. Pichitlamken (1999), 'Properties of Optimal Weighted Flowtime Policies with a Makespan Constraint and Set-up Times, ' mimeo.
Van Oyen, M. P. and D. Teneketzis (1993), 'Optimal Stochastic Scheduling of Forest Network with Switching Penalties, ' mimeo.
Van Oyen, M. P., D. G. Pandelis and D. Teneketzis (1992), 'Optimality of Index Policies for Stochastic Scheduling with Switching Penalties, ' Journal of Applied Probability, 29, pp. 957–966.
Van Oyen, M. P. and J. Pichitlamken (1999), 'Properties of Optimal Weighted Flowtime Policies with a Makespan Constraint and Set-up Times, ' Department of Industrial Engineering and Management Science, mimeo.
Viscusi, W. K. (1980), 'A Theory of Job Shopping: A Bayesian Perspective, ' Quarterly Journal of Economics, 94, pp. 609–614.
Waldman, M. (1984), 'Job Assignments, Signaling and Efficiency, ' Rand Journal of Economics, 15, pp. 255–267.
Weber, R. R. (1992), 'On the Gittins index for Multiarmed Bandits, ' Annals of Probability, 2, pp. 1024–1033.
Weber, R. R. and G. Weiss (1990), 'On an Index Policy for Restless Bandits, ' Journal of Applied Probability, 27, pp. 637–648.
Weiss, G. (1995), 'On Almost Optimal Priority Rules for Preemptive Scheduling of Stochastic Jobs on Parallel Machines, ' Advances in Applied Probability, 27, pp. 827–845.
Weitzman, M. L. (1979), 'Optimal Search for the Best Alternative, ' Econometrica, 47, pp. 641–654.
Wilk, S. and P. Sackett (1995), 'A Longitudinal Analysis of Ability-Job Complexity Fit and Job Change, ' mimeo.
Wilk, S., L. Desmaris, and P. Sackett (1995), 'Gravitation to Jobs Commensurate with Ability: Longitudinal and Cross-Sectional Tests, ' Journal of Applied Psychology, 80, pp. 79–85.
Whittle, P. (1980), 'Multi-armed Bandits and the Gittins Index, ' Journal of Royal Statistical Society Serial B, 42, pp. 143–149.
Whittle, P. (1982), Optimization Over Time:Dynamic Programming and Stochastic Control, Vol. 1, New York, Wiley.
Whittle, P. (1988), 'Restless Bandits:Activity Allocation in a Changing World, ' Journal of Applied Probability, 25A, pp. 287–298.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Jun, T. A survey on the bandit problem with switching costs. De Economist 152, 513–541 (2004). https://doi.org/10.1007/s10645-004-2477-z
Issue Date:
DOI: https://doi.org/10.1007/s10645-004-2477-z