A survey on the bandit problem with switching costs

Jun, Tackseung

doi:10.1007/s10645-004-2477-z

A survey on the bandit problem with switching costs

Published: December 2004

Volume 152, pages 513–541, (2004)
Cite this article

De Economist Aims and scope Submit manuscript

Tackseung Jun¹

993 Accesses
56 Citations
Explore all metrics

Abstract

The paper surveys the literature on the bandit problem, focusing on its recent development in the presence of switching costs. Switching costs between arms makes not only the Gittins index policy suboptimal, but also renders the search for the optimal policy computationally infeasible. This survey will first discuss the decomposability properties of the arms that make the Gittins index policy optimal, and show how these properties break down upon the introduction of costs on switching arms. Having established the failure of the simple index policy, the survey focus on the recent efforts to overcome the difficulty of finding the optimal policy in the bandit problem with switching costs: characterization of the optimal policy, exact derivation of the optimal policy in the restricted environments, and lastly approximation of optimal policy. The advantages and disadvantages of the above approaches are discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

REFERENCES

Adler, P. S., A. Mandeldaum, V. Nguyen, and R. Schwerer (1995), 'From Project to Process Management: An Empirically-Developed Framework for Analyzing Product Development Time, ' Management Science, 41, pp. 458–484.
Google Scholar
Aghion, P., P. Bolton, and C. Harris (1991), 'Optimal Learning by Experiment, ' Review of Economic Studies, 58, pp. 621–654.
Google Scholar
Agrawal, R., M. Hegde, and D. Teneketzis (1988), 'Asymptotically Efficient Allocations Rules for Multi-armed Bandit Problem with Switching Cost, ' IEEE Transactions on Automatic Control, AC-32, pp. 968–982.
Google Scholar
Agrawal, R., M. Hegde, and D. Teneketzis (1990), 'Multi-armed Bandit Problems with Multiple Plays and Switching Cost, ' Stochastics and Stochastic Reports, 29, pp. 437–459.
Google Scholar
Asawa, M. and D. Teneketzis (1996), 'Multi-Armed Bandits with Switching Penalties, ' IEEE Transactions on Automatic Control, 41, pp. 328–348.
Google Scholar
Azoulay-Schwartz, R., S. Kraus, and J. Wilkenfeld (2003), 'Exploitation vs. Exploration:Choos-ing a Supplier in an Environment of Incomplete Information, ' mimeo.
Banks, J. S. (2003), 'Generalized Bandit Problems, ' mimeo.
Banks, J. S. and R. K. Sundaram (1992), 'Denumerable-Armed Bandits, ' Econometrica, 60, pp. 1071–1096.
Google Scholar
Banks, J. S. and R. K. Sundaram (1994), 'Switching Costs and the Gittins index, ' Econometrica, 62, pp. 687–694.
Google Scholar
Barron, J. M., D. A. Black, and M. A. Loewenstein (1993), 'Gender Difference in Training, Capital, and Wages, ' Journal of Human Resources, 28, pp. 343–364.
Google Scholar
Basu, A., A. Bose and J. K. Ghosh (1990), An Expository Review of Sequential Design and Allocation Rules, Technical Report 90–08, Department of Statistics, Purdue University.
Bellman, R. (1956), 'A Problem in the Sequential Design of Experiments, ' Sankhya, 16, pp. 221–229.
Google Scholar
Bellman, R. (1957), Dynamic Programming, New Jersey, Princeton University Press.
Google Scholar
Benkherouf, L. (1990), 'Optimal Stopping in Oil Exploration with Small and Large Oil fields, ' Probability in Engineering and Information Sciences, 28, pp. 529–543.
Google Scholar
Benkherouf, L. and J. A. Bather (1988), 'Oil Exploration:Sequential Decisions in the Face of Uncertainty, ' Journal of Applied Probability, 28, pp. 529–543.
Google Scholar
Benkherouf, L., K. D. Glazebrook, and R. W. Owen (1992), 'Gittins Indices and Oil Exploration, ' Journal of Royal Statistical Society Serial B, 54, pp. 229–241.
Google Scholar
Bergemann, D. and J. V¨alim¨aki (1999), 'Entry and Innovation in Vertically Differentiated Markets, ' mimeo.
Bergemann, D. and J. V¨alim¨aki, (2001), 'Stationary Multi Choice Bandit Problems, ' Journal of Economic Dynamics and Control, 25, pp. 1585–1594.
Google Scholar
Berninghaus, S., V. Seifert, and G. Hans (1987), 'International Migration under Incomplete Information, ' Schweizerische Zeitschrift fur Volkswirtschaft und Statistik, 123, pp. 199–218.
Google Scholar
Bertsimas, D. and J. Niño-Mora (1996), 'Conservation Laws, Extended Polymatroids and Multi-armed Bandit Problems, ' Mathematics of Operations Research, 21, pp. 257–306.
Google Scholar
Bertsimas, D. and J. Niño-Mora (2000), 'Restless Bandits, Linear Programming Relaxations, and A Primal-Dual Index Heuristic, ' Operations Research, 48, pp. 80–90.
Google Scholar
Berry, D. A. (1972), 'A Bernoulli Two-Armed Bandit, ' Annals of Mathematical Statistics, 43, pp. 871–897.
Google Scholar
Berry, D. A. and B. Fristedt (1985), Bandit Problems:Sequential Allocation of Experiments, London, Chapman and Hall.
Google Scholar
Black, D. A. and M. A. Loewenstein (1991), 'Self-enforcing Labor Contracts with Costly Mobility, ' Review of Labor Economics, 12, pp. 63–83.
Google Scholar
Brenner, T. and N. J. Vriend (2003), 'On the Behavior of Proposers in Ultimatum Games, ' mimeo.
Brent, R. P. (1973), Algorithms for Minimization Without Derivatives, New Jersey, Prentice-Hall.
Google Scholar
Brezzi, M. and T. L. Lai (2002), 'Optimal Learning and Experimentation in Ban-dit Problems, ' Journal of Economic Dynamics and Control, 27, pp. 87–108.
Google Scholar
Cvitani´c, J., L. Martellini, and F. Zapatero (2002), 'Optimal Active Management Fees, ' in: E. Y¨ucesan, C. H. Chen, J. L. Snowdon and J. M. Charnes, (eds. ), Proceedings of the 2002 Winter Simulation Conference.
Cowan, R. (1991), 'Tortoises and Hares:Choice Among Technologies of Unknown Merit, ' Economic Journal, 407, pp. 801–814.
Google Scholar
Cox, D. R. and W. L. Smith (1961), Queues, Monographs on Statistics and Applied Probability 2, New York, Chapman & Hall.
Google Scholar
Derman, C. (1962), 'On Sequential Decision and Markov Chains, ' Management Science, 9, pp. 16–24.
Google Scholar
Duenyas, I. and M. P. Van Oyen (1996), 'Heuristic Scheduling of Parallel Heterogeneous Queues with Set-Ups, ' Management Science, 42, pp. 814–829.
Google Scholar
Dusonchet, F. and M. O. Hongler (2000), 'Continuous time Restless Bandit an Dynamic Scheduling for Make-to-Stock Production, ' mimeo.
Dusonchet, F. and M. O. Hongler (2003), 'Optimal Hysteresis for a Class of Deterministic Deteriorating Two-Armed Bandit Problem with Switching Costs, ' Automatica, 39, pp. 1947–1955.
Google Scholar
Easley, D. and N. Kiefer (1988), 'Controlling a Stochastic Process with Unknown Parameters, ' Econometrica, 56, pp. 1045–1064.
Google Scholar
Ehsan, N. and M. Liu (2004), 'On the Optimality of an Index Policy for Bandwidth Allocation with Delayed State Observation and Differentiated Services, ' IEEE INFOCOM Conference.
El Karoui, N. and I. Karatzas (1997), 'Synchronization and Optimality for Multiarmed Bandit Problems in Continuous Time, ' Computational and Applied Mathematics, 16, pp. 117–151.
Google Scholar
Eswaran, M. (1994), 'Licensees as Entry Barriers, ' Canadian Journal of Economics, 27, pp. 673–688.
Google Scholar
Frostig, E. and G. Weiss (1999), 'Four Proofs of Gittins 'Multiarmed Bandit Theorem, ' Applied Probability Trust, pp. 1–20.
Gale, J., K. Binmore and L. Samuelson (1995), 'Learning to be imperfect:The Ultimatum Games, ' Games and Economic Behavior, 8, pp. 56–90.
Google Scholar
Gittins, J. C. and D. M. Jones (1974), 'A Dynamic Allocation Index for the Sequential Design of Experiments, ' in:European Meeting of Statisticians, J. Gani, K. Sarkadi and I. Vince, (eds. ), Progress in Statistics, Amsterdam, North-Holland, pp. 241–266.
Gittins, J. C. (1979), 'Bandit Processes and Dynamic Allocation Indices, ' Journal of Royal Statistical Society Serial B, 14, pp. 148–177.
Google Scholar
Gittins, J. C. (1989), Multi-armed Bandit Allocation Indices, New York, Wiley.
Google Scholar
Harrison, J. M. and J. A. Van Mieghem (1999), 'Multi-resource Investment Strategies:Operational Hedging under Demand Uncertainty, ' European Journal of Operational Research, 113, pp. 17–29.
Google Scholar
He, H. and R. S. Pindyck (1992), 'Investment in Flexible Production Capacity, ' Journal of Economic Dynamics and Control, 16, p. 575–599.
Google Scholar
Ishikida, A. T. and P. Varaiya (1994), 'Multi-armed Bandit Problem Revisited, ' Journal of Optimization Theory and Applications, 83, pp. 113–154.
Google Scholar
Johnson, W. R. (1978), 'A Theory of Job Shopping, ' Quarterly Journal of Economics, 92, pp. 261–278.
Google Scholar
Jonsson, M. and J. Ve¡ce¡r (2004), 'Insider Trading in Convergent Markets, ' mimeo.
Jovanovic, B. (1979), 'Job Matching and the Theory of Turnover, ' Journal of Political Economy, 87, pp. 972–990.
Google Scholar
Jovanovic, B. (1984), 'Matching, Turnover, and Unemployment, ' Journal of Political Economy, 92, pp. 108–122.
Google Scholar
Jun, T. (2001), Essays on Decision Theory:Effects of Changes in Environment on Decision, Ph. D. thesis, Columbia University, New York.
Google Scholar
Jung, A. (2003), 'Are Product Innovation and Flexible Technology Complements?, ' mimeo.
Karaesmen, F. and S. M. Gupta (1997), 'Control of Arrivals in a Finite Buffered Queue with Setup Costs, ' Journal of the Operational Research Society, 48, pp. 1113–1122.
Google Scholar
Karatzas, I. (1984), 'Gittins Indices in the Dynamic Allocation Problem for Diffusion Processes, ' Annals of Probability, 12, pp. 173–192.
Google Scholar
Kast, R., A. Lapeid and S. Pardo (2003), 'Virtual Underlying Security, ' mimeo.
Kavadias, S. K. and C. H. Loch (2000), Dynamic Resource Allocation Policy in Multiproject Environments, INSEAD Working Papers, 2000/10/TM.
Keller, G. and A. Oldale (2003), 'Branching Bandits: A Sequential Search Process with Correlated Pay-offs, ' Journal of Economic Theory, 113, pp. 302–315.
Google Scholar
Keller, G. and S. Rady (1999), 'Optimal Experimentation in a Changing Environment, ' Review of Economic Studies, 66, pp. 475–507.
Google Scholar
Kennan, J. and J. R. Walker (2003), The E´f1ect of Expected Income on Individual Migration Decisions, NBER Working Papers 9585.
Klimenko, M. M. (2003), 'Industrial Targeting, Experimentation and Long-run Specialization, ' Journal of Development Economics, forthcoming.
Kolonko, M. and H. Benzing (1985), 'The Sequential Design of Bernoulli Experiments Including Switching Costs, ' Operations Research, 2, pp. 412–426.
Google Scholar
Koole, G. (1997), 'Assigning a Single Server to Inhomogeneous Queues with Switching Costs, ' mimeo.
Kr¨ahmer, D. (2003), 'Entry and Experimentation in Oligopolistic Markets for Experience Goods, ' International Journal of Industrial Organization, 21, pp. 1201–1213.
Google Scholar
Kuhn, P. (1993), 'Demographic Groups and Personnel Policy, ' Labour Economics, 1, pp. 49–70.
Google Scholar
Kulatilaka, N. (1988), 'Valuing the Flexibility of Flexible Manufacturing Systems, ' IEEE Transactions on Engineering Management, 35, pp. 250–257.
Google Scholar
Lai, T. L. (1987), 'Adaptive Treatment Allocation and the Multi-Armed Bandit Problem, ' Annals of Statistics, 15, pp. 1091–1114.
Google Scholar
Lai, T. L. and H. Robbins (1985), 'Asymptotically Efficient Adaptive Allocation Rules, ' Advances in Applied Mathematics, 6, pp. 4–42.
Google Scholar
Land, A. H. and A. G. Doig (1960), 'An Automatic Method for Solving Discrete Programming Problems, ' Econometrica, 28, pp. 497–520.
Google Scholar
Lippman, S. A. and J. W. Mamer (1993), 'Preemptive Innovation, ' Journal of Economic Theory, 61, pp. 104–119.
Google Scholar
Loch, C. H. and S. K. Kavadias (2002), 'Dynamic Portfolio Selection of NPD Programs Using Marginal Returns, ' Management Science, 48, pp. 1227–1241.
Google Scholar
Lott, C. and D. Tenekezis (2000), 'On the Optimality of An Index Rule in Multichannel Allocation for Single-Hop Mobile Networks with Multiple Service Classes, ' Probability in the Engineering and Informational Sciences, 14, pp. 259–197.
Google Scholar
MacDonald, G. M. (1980), 'Person-Specific Information in the Labor Market, ' Journal of Political Economy, 92, pp. 1086–1120.
Google Scholar
Manne, A. (1960), 'Linear Programming and Sequential Decisions, ' Management Science, 6, pp. 259–267.
Google Scholar
McCall, B. P. and J. J. McCall (1987), 'A Sequential Study of Migration and Job Search, ' Journal of Labor Economics, 5, pp. 452–476.
Google Scholar
McDonald, R. and D. Siegel (1986), 'The Value of Waiting to Invest, ' Quarterly Journal of Economics, 101, pp. 707–728.
Google Scholar
McLennan, A. (1984), 'Price Dispersion and Incomplete Learning in the Long Run, ' Journal of Economic Dynamics and Control, 7, pp. 331–347.
Google Scholar
Miller, R. A. (1984), 'Job Matching and Occupational Choice, ' Journal of Political Economy, 92, pp. 1086–1120.
Google Scholar
Mortensen, D. T. (1988), 'Wages, Separations, and Job Tenure:On-the-Job Specific Training or Matching?, ' Journal of Labor Economics, 6, pp. 445–471.
Google Scholar
Murnane, R., F. Levy and J. Willett (1995), The Growing Importance of Cognitive Skills in Wage Determination, NBER Working Papers, pp. 50–76.
Niño-Mora, J. (2001), 'Restless Bandit, Partial Conservation Laws and Indexibility, ' Advances in Applied Probability, 33, pp. 77–98.
Google Scholar
Niño-Mora, J. (2004a). Restless Bandit Marginal Productivity Indices I:Single Project Case and Optimal Control of a Make-To-Stock M/G/1 Queue, Universidad Carlos III, Departamento de Estad'istica y Econometr´ia, WS040801.
Niño-Mora, J. (2004b). Restless BanditMarginal Productivity Indices II:Rest-less Bandit Marginal Productivity Indices II:Multi-project Case and Scheduling a Multiclass Make-To-Order/-Stock M/G/1 Queue, Universidad Carlos III, Departamento de Estad´istica y Econometr´aa, WS040902.
O'Flaherty, B. (1987). Some Results on Two-Armed Bandits When Both Projects Vary, Columbia Department of Economics Working Paper No. 359.
Pandelis, D. G. and D. Teneketzis (1995), 'On the Optimality of the Gittins Index Rule in Multi-armed Bandits with Multiple Plays, ' Proceedings of the 34th Conference on Decision & Control, WP13 5:30, pp. 1408–1414.
Google Scholar
Press, W. H., B. P. Flannery, S. A. Teukolsky and W. T. Vetterling (1992), 'Secant Method, False Position Method, and Ridders 'Method,' in: The Art of Scientific Computing, 2nd ed., England, Cambridge University Press. pp. 347–352.
Google Scholar
Puterman, M. L. (1994), Makov Decision Processes:Discrete Stochastic Dynamic Programming, New York, Wiley.
Google Scholar
Reiman, M. I. and L. M. Wein (1998), 'Dynamic Scheduling of a Two-Class Queue with Setups, ' Operations Research, 46, pp. 532–547.
Google Scholar
Ross, S. (1983), Introduction to Stochastic Dynamic Programming, New York, Academic Press.
Google Scholar
Rothschild, M. (1974), 'A Two-armed Bandit Theory of Market Pricing, ' Journal of Economic Theory, 9, pp. 185–202.
Google Scholar
Rustichini, A. and A. Wolinsky (1995), 'Learning about Variable Demand in the Long Run, ' Journal of Economic Dynamics and Control, 19, pp. 1283–1292.
Google Scholar
Santos, C. and M. Magazine (1985), 'Batching in Single Operation Manufacturing System, ' Operations Research Letters, 4, pp. 99–103.
Google Scholar
Schlag, K. H. (1998), 'Why Imitate, and If So, How?A Bounded Rational Approach to Multi-armed Bandits, ' Journal of Economic Theory, 78, pp. 130–156.
Google Scholar
Schlag, K. H. (2003), 'How to Minimize Maximum Regret under Repeated Decision-Making, ' mimeo.
Smith, L. (1995), 'Optimal Job Search in a Changing World, ' mimeo.
Smith. L. and P. Sorensen (2001), 'Informational Herding and Optimal Experimentation, University of Michigan, ' mimeo.
Smith, W. E. (1956), 'Various Optimizers for Single-State Production, ' Naval Research Logistics Quarterly, 3, pp. 59–66.
Google Scholar
Soberman, D. A. (1999), 'Joint Research and Development:The Lure of Dominance, ' INSEAD Working Paper, 99/18/MKT.
Subramanian, A. (2002), Managerial Flexibility, Agency Costs and Optimal Capital Structure, mimeo.
Thaler, R. H. (1988), 'The Ultimatum Game, ' Journal of Economic Perspective, 2, pp. 195–206.
Google Scholar
Tuluca, S. and P. Stalinski (2004), 'The Manufacturing Flexibility to Switch Products:Valuation and Optimal Strategy, ' mimeo.
Van Oyen, M. P. and J. Pichitlamken (1999), 'Properties of Optimal Weighted Flowtime Policies with a Makespan Constraint and Set-up Times, ' mimeo.
Van Oyen, M. P. and D. Teneketzis (1993), 'Optimal Stochastic Scheduling of Forest Network with Switching Penalties, ' mimeo.
Van Oyen, M. P., D. G. Pandelis and D. Teneketzis (1992), 'Optimality of Index Policies for Stochastic Scheduling with Switching Penalties, ' Journal of Applied Probability, 29, pp. 957–966.
Google Scholar
Van Oyen, M. P. and J. Pichitlamken (1999), 'Properties of Optimal Weighted Flowtime Policies with a Makespan Constraint and Set-up Times, ' Department of Industrial Engineering and Management Science, mimeo.
Viscusi, W. K. (1980), 'A Theory of Job Shopping: A Bayesian Perspective, ' Quarterly Journal of Economics, 94, pp. 609–614.
Google Scholar
Waldman, M. (1984), 'Job Assignments, Signaling and Efficiency, ' Rand Journal of Economics, 15, pp. 255–267.
Google Scholar
Weber, R. R. (1992), 'On the Gittins index for Multiarmed Bandits, ' Annals of Probability, 2, pp. 1024–1033.
Google Scholar
Weber, R. R. and G. Weiss (1990), 'On an Index Policy for Restless Bandits, ' Journal of Applied Probability, 27, pp. 637–648.
Google Scholar
Weiss, G. (1995), 'On Almost Optimal Priority Rules for Preemptive Scheduling of Stochastic Jobs on Parallel Machines, ' Advances in Applied Probability, 27, pp. 827–845.
Google Scholar
Weitzman, M. L. (1979), 'Optimal Search for the Best Alternative, ' Econometrica, 47, pp. 641–654.
Google Scholar
Wilk, S. and P. Sackett (1995), 'A Longitudinal Analysis of Ability-Job Complexity Fit and Job Change, ' mimeo.
Wilk, S., L. Desmaris, and P. Sackett (1995), 'Gravitation to Jobs Commensurate with Ability: Longitudinal and Cross-Sectional Tests, ' Journal of Applied Psychology, 80, pp. 79–85.
Google Scholar
Whittle, P. (1980), 'Multi-armed Bandits and the Gittins Index, ' Journal of Royal Statistical Society Serial B, 42, pp. 143–149.
Google Scholar
Whittle, P. (1982), Optimization Over Time:Dynamic Programming and Stochastic Control, Vol. 1, New York, Wiley.
Google Scholar
Whittle, P. (1988), 'Restless Bandits:Activity Allocation in a Changing World, ' Journal of Applied Probability, 25A, pp. 287–298.
Google Scholar

Download references

Author information

Authors and Affiliations

School of Economics and International Trade, Department of Economics, Kyung Hee University, South Korea
Tackseung Jun

Authors

Tackseung Jun
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jun, T. A survey on the bandit problem with switching costs. De Economist 152, 513–541 (2004). https://doi.org/10.1007/s10645-004-2477-z

Download citation

Issue Date: December 2004
DOI: https://doi.org/10.1007/s10645-004-2477-z

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey on the bandit problem with switching costs

Abstract

Access this article

Similar content being viewed by others

Voting over selfishly optimal income tax schedules with tax-driven migrations

Taste-Based Discrimination

The Bayesian approach to monopoly regulation after 40 years

REFERENCES

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

A survey on the bandit problem with switching costs

Abstract

Access this article

Similar content being viewed by others

Voting over selfishly optimal income tax schedules with tax-driven migrations

Taste-Based Discrimination

The Bayesian approach to monopoly regulation after 40 years

REFERENCES

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation