Robust control of the multi-armed bandit problem

Caro, Felipe; Das Gupta, Aparupa

doi:10.1007/s10479-015-1965-7

Robust control of the multi-armed bandit problem

Published: 21 August 2015

Volume 317, pages 461–480, (2022)
Cite this article

Annals of Operations Research Aims and scope Submit manuscript

Felipe Caro¹ &
Aparupa Das Gupta¹

598 Accesses
7 Citations
Explore all metrics

Abstract

We study a robust model of the multi-armed bandit (MAB) problem in which the transition probabilities are ambiguous and belong to subsets of the probability simplex. We first show that for each arm there exists a robust counterpart of the Gittins index that is the solution to a robust optimal stopping-time problem and can be computed effectively with an equivalent restart problem. We then characterize the optimal policy of the robust MAB as a project-by-project retirement policy but we show that arms become dependent so the policy based on the robust Gittins index is not optimal. For a project selection problem, we show that the robust Gittins index policy is near optimal but its implementation requires more computational effort than solving a non-robust MAB problem. Hence, we propose a Lagrangian index policy that requires the same computational effort as evaluating the indices of a non-robust MAB and is within 1 % of the optimum in the robust project selection problem.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

We thank the Associate Editor for bringing the restart problem to our attention.
From the proof of Proposition 1 it follows that \(\Delta f_{i}(\lambda )\) defined in Eq. (3) is monotone decreasing in \(\lambda \).

References

Auer, P., Cesa-Bianchi, N., Freund, Y., & Schapire, R. E. (2003). The nonstochastic multiarmed bandit problem. SIAM Journal on Computing, 32(1), 48–77.
Article Google Scholar
Bagnell, J. D., Ng, A. Y., & Schneider, J. (2001). Solving uncertain markov decision problems. Technical report, CMU-RI-TR-01-25, Pittsburgh, PA: Robotics Institute, Carnegie Mellon University.
Bertsekas, D. (2000). Dynamic programming and optimal control (Vol. II). Belmont, MA: Athena Scientific.
Google Scholar
Besbes, O., Gur, Y., & Zeevi, A. (2014). Optimal exploration-exploitation in multi-armed-bandit problems with non-stationary rewards. Columbia Business School Working paper.
Burnetas, A. N., & Katehakis, M. N. (1996). Optimal adaptive policies for sequential allocation problems. Advances in Applied Mathematics, 17(2), 122–142.
Article Google Scholar
Burnetas, A. N., & Katehakis, M. N. (1997). Optimal adaptive policies for Markov decision processes. Mathematics of Operations Research, 22(1), 222–255.
Article Google Scholar
Caro, F., & Yoo, O. S. (2010). Indexability of bandit problems with response delays. Probability in the Engineering and Informational Sciences, 24, 349–374.
Article Google Scholar
Cowan, W., & Katehakis, M. N. (2015). Multi-armed bandits under general depreciation and commitment. Probability in the Engineering and Informational Sciences, 29, 51–76.
Article Google Scholar
Delage, E., & Mannor, S. (2010). Percentile optimization for markov decision processes with parameter uncertainty. Operations Research, 58(1), 203–213.
Article Google Scholar
Denardo, E. V., Feinberg, E. A., & Rothblum, U. G. (2013). The multi-armed bandit, with constraints. Ann, 208, 37–62.
Google Scholar
Dimitrov, N., Dimitrov, S., & Chukova, S. (2014). Robust decomposable Markov decision processes motivated by allocating school budgets. European Journal of Operational Research, 239, 199–213.
Article Google Scholar
Frostig, E., & Weiss, G. (2014). Four proofs of gittins multiarmed bandit theorem. Annals of Operations Research. doi:10.1007/s10479-013-1523-0.
Givan, R., Leach, S., & Dean, T. (2000). Bounded-parameter Markov decision processes. Artificial Intelligence, 122(1), 71–109.
Article Google Scholar
Gocgun, Y., & Ghate, A. (2012). Lagrangian relaxation and constraint generation for allocation and advanced scheduling. Computers & Operations Research, 39, 2323–2336.
Article Google Scholar
Iyengar, G. N. (2005). Robust dynamic programming. Mathematics of Operations Research, 30(2), 257–280.
Article Google Scholar
Katehakis, M. N., & Robbins, H. (1995). Sequential choice from several populations. Proceedings of the National Academy of Sciences of the United States of America, 92(19), 8584–8585.
Article Google Scholar
Katehakis, M. N., & Veinott, A. F, Jr. (1987). The multi-armed bandit problem: Decomposition and computation. Mathematics of Operations Research, 22(2), 262–268.
Article Google Scholar
Kim, M. J., & Lim, A. (2015). Robust multi-armed bandit problems. Management Science. doi:10.1287/mnsc.2015.2153.
Lai, T. L., & Robbins, H. (1985). Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6, 4–22.
Article Google Scholar
Nilim, A., & El Ghaoui, L. (2005). Robust control of Markov decision processes with uncertain transition matrix. Operations Research, 53(5), 780–798.
Article Google Scholar
Pandelis, D. G., & Teneketzis, D. (1990). On the optimality of the Gittins index rule for multi-armed bandits with multiple plays. Mathematical Methods of Operations Research, 50, 449–461.
Article Google Scholar
Paschalidis, I. C., & Kang, S. C. (2008). A robust approach to Markov decision problems with uncertain transition probabilities. In Proceedings of the 17th IFAC World Congress, pp. 408–413.
Roberts, K., & Weitzman, M. L. (1981). Funding criteria for research, development, and exploration projects. Econometrica, 49(5), 1261–1288.
Article Google Scholar
Robinson, D. R. (1982). Algorithms for evaluating the dynamic allocation index. Operations Research Letters, 1, 72–74.
Article Google Scholar
Satia, J. K., & Lave, R. E. (1973). Markovian decision processes with uncertain transition probabilities. Operations Research, 21(3), 728–740.
Article Google Scholar
Shapiro, A. (2011). A dynamic programming approach to adjustable robust optimization. Operations Research Letters, 39, 83–87.
Article Google Scholar
White, C. C., & Eldeib, H. K. (1994). Markov decision processes with imprecise transition probabilities. Operations Research, 42(4), 739–749.
Article Google Scholar
Whittle, P. (1981). Arm-acquiring bandits. The Annals of Probability, 9(2), 284–292.
Article Google Scholar
Whittle, P. (1988). Restless bandits: Activity allocation in a changing world. Journal of Applied Probability, 25, 287–298.
Article Google Scholar
Wiesemann, W., Kuhn, D., & Rustem, B. (2013). Robust Markov decision processes. Mathematics of Operations Research, 38(1), 153–183.
Article Google Scholar

Download references

Author information

Authors and Affiliations

UCLA Anderson School of Management, Los Angeles, CA, USA
Felipe Caro & Aparupa Das Gupta

Authors

Felipe Caro
View author publications
You can also search for this author in PubMed Google Scholar
Aparupa Das Gupta
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Felipe Caro.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Caro, F., Das Gupta, A. Robust control of the multi-armed bandit problem. Ann Oper Res 317, 461–480 (2022). https://doi.org/10.1007/s10479-015-1965-7

Download citation

Published: 21 August 2015
Issue Date: October 2022
DOI: https://doi.org/10.1007/s10479-015-1965-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust control of the multi-armed bandit problem

Abstract

Access this article

Similar content being viewed by others

Robust Risk-Averse Stochastic Multi-armed Bandits

An asymptotically optimal strategy for constrained multi-armed bandit problems

Max K-Armed Bandit: On the ExtremeHunter Algorithm and Beyond

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Robust control of the multi-armed bandit problem

Abstract

Access this article

Similar content being viewed by others

Robust Risk-Averse Stochastic Multi-armed Bandits

An asymptotically optimal strategy for constrained multi-armed bandit problems

Max K-Armed Bandit: On the ExtremeHunter Algorithm and Beyond

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation