Advertisement

Mathematical Methods of Operations Research

, Volume 89, Issue 1, pp 1–42 | Cite as

Computation of weighted sums of rewards for concurrent MDPs

  • Peter BuchholzEmail author
  • Dimitri Scheftelowitsch
Original Article

Abstract

We consider sets of Markov decision processes (MDPs) with shared state and action spaces and assume that the individual MDPs in such a set represent different scenarios for a system’s operation. In this setting, we solve the problem of finding a single policy that performs well under each of these scenarios by considering the weighted sum of value vectors for each of the scenarios. Several solution approaches as well as the general complexity of the problem are discussed and algorithms that are based on these solution approaches are presented. Finally, we compare the derived algorithms on a set of benchmark problems.

Keywords

Markov decision processes Optimization Multi-objective optimization Non-linear programming 

References

  1. Amato C, Bernstein DS, Zilberstein S (2007) Solving POMDPs using quadratically constrained linear programs. In: Proceedings of the 20th international joint conference on artificial intelligence, IJCAI 2007. Hyderabad, India, January 6–12, 2007, pp 2418–2424Google Scholar
  2. Berman A, Plemmons RJ (1994) Nonnegative matrices in the mathematical sciences. Classics in applied mathematics. SIAM, PhiladelphiazbMATHGoogle Scholar
  3. Bertsimas D, Mišić VV (2017) Robust product line design. Oper Res 65(1):19–37MathSciNetzbMATHGoogle Scholar
  4. Bertsimas D, Silberholz J, Trikalinos T (2016) Optimal healthcare decision making under multiple mathematical models: application in prostate cancer screening. Health Care Manag Sci 21:105–118Google Scholar
  5. Björklund H, Vorobyov S (2007) A combinatorial strongly subexponential strategy improvement algorithm for mean payoff games. Discrete Appl Math 155(2):210–229.  https://doi.org/10.1016/j.dam.2006.04.029 MathSciNetzbMATHGoogle Scholar
  6. Caro F, Das-Gupta A (2015) Robust control of the multi-armed bandit problem. Ann Oper Res.  https://doi.org/10.1007/s10479-015-1965-7
  7. Castillo AC, Castro PM, Mahalec V (2018) Global optimization of MIQCPs with dynamic piecewise relaxations. J Glob Optim 71(4):691–716.  https://doi.org/10.1007/s10898-018-0612-7 MathSciNetzbMATHGoogle Scholar
  8. Colvin M, Maravelias CT (2010) Modeling methods and a branch and cut algorithm for pharmaceutical clinical trial planning using stochastic programming. Eur J Oper Res 203(1):205–215zbMATHGoogle Scholar
  9. d’Epenoux F (1963) A probabilistic production and inventory problem. Manag Sci 10(1):98–108.  https://doi.org/10.1287/mnsc.10.1.98 Google Scholar
  10. Dupacová J, Consigli G, Wallace SW (2000) Scenarios for multistage stochastic programs. Ann Oper Res 100(1–4):25–53.  https://doi.org/10.1023/A:1019206915174 MathSciNetzbMATHGoogle Scholar
  11. Ehrgott M (2005) Multicriteria optimization, 2nd edn. Springer, Berlin.  https://doi.org/10.1007/3-540-27659-9 zbMATHGoogle Scholar
  12. Feinberg EA, Schwartz A (eds) (2002) Handbook of Markov decision processes. Kluwer, BostonzbMATHGoogle Scholar
  13. Filar J, Vrieze K (1997) Competitive Markov decision processes. Springer, New YorkzbMATHGoogle Scholar
  14. Gandhi A, Gupta V, Harchol-Balter M, Kozuch MA (2010) Optimality analysis of energy-performance trade-off for server farm management. Perform Eval 67(11):1155–1171Google Scholar
  15. Garey MR, Johnson DS (1978) Computers and intractability: a guide to the theory of NP-completeness. Freeman, San FranciscozbMATHGoogle Scholar
  16. Givan R, Leach SM, Dean TL (2000) Bounded-parameter Markov decision processes. Artif Intell 122(1–2):71–109MathSciNetzbMATHGoogle Scholar
  17. Hager WW (1989) Updating the inverse of a matrix. SIAM Rev 31(2):221–239MathSciNetzbMATHGoogle Scholar
  18. Iyengar GN (2005) Robust dynamic programming. Math Oper Res 30(2):257–280MathSciNetzbMATHGoogle Scholar
  19. Kaelbling LP, Littman ML, Cassandra AR (1998) Planning and acting in partially observable stochastic domains. Artif Intell 101(1–2):99–134MathSciNetzbMATHGoogle Scholar
  20. Klamroth K, Köbis E, Schöbel A, Tammer C (2013) A unified approach for different concepts of robustness and stochastic programming via non-linear scalarizing functionals. Optimization 62(5):649–671MathSciNetzbMATHGoogle Scholar
  21. Mercier L, Hentenryck PV (2008) Amsaa: a multistep anticipatory algorithm for online stochastic combinatorial optimization. In: Perron L, Trick MA (eds) Integration of AI and OR techniques in constraint programming for combinatorial optimization problems, 5th international conference, CPAIOR 2008, Paris, France, May 20–23, 2008, Proceedings. Lecture Notes in Computer Science, vol 5015, pp 173–187. SpringerGoogle Scholar
  22. Nesterov Y, Nemirovskii A (1994) Interior-point polynomial algorithms in convex programming. Society for Industrial and Applied Mathematics, PhiladelphiazbMATHGoogle Scholar
  23. Nilim A, Ghaoui LE (2005) Robust control of Markov decision processes with uncertain transition matrices. Oper Res 53(5):780–798MathSciNetzbMATHGoogle Scholar
  24. Papadimitriou CH, Tsitsiklis JN (1987) The complexity of Markov decision processes. Math Oper Res 12(3):441–450MathSciNetzbMATHGoogle Scholar
  25. Park J, Boyd S (2017) Heuristics for nonconvex quadratically constrained quadratic programming. CoRR arXiv:1703.07870v2
  26. Puterman ML (2005) Markov decision processes. Wiley, LondonzbMATHGoogle Scholar
  27. Qualizza A, Belotti P, Margot F (2012) Linear programming relaxations of quadratically constrained quadratic programs. In: Lee J, Leyffer S (eds) Mixed integer nonlinear programming, vol 154. Springer, New YorkGoogle Scholar
  28. Raskin J, Sankur O (2014) Multiple-environment Markov decision processes. CoRR arXiv:1405.4733
  29. Rockafellar RT, Wets RJ (1991) Scenarios and policy aggregation in optimization under uncertainty. Math Oper Res 16(1):119–147MathSciNetzbMATHGoogle Scholar
  30. Roijers DM, Scharpff J, Spaan MTJ, Oliehoek FA, de Weerdt M, Whiteson S (2014) Bounded approximations for linear multi-objective planning under uncertainty. In: Chien SA, Do MB, Fern A, Ruml W (eds) Proceedings of the twenty-fourth international conference on automated planning and scheduling, ICAPS 2014, Portsmouth, New Hampshire, USA, June 21–26, 2014. http://www.aaai.org/ocs/index.php/ICAPS/ICAPS14/paper/view/7929
  31. Ruszczyński A, Shapiro A (2009) Lectures on stochastic programming. SIAM, Philadelphia.  https://doi.org/10.1137/1.9780898718751 zbMATHGoogle Scholar
  32. Satia JK, Lave RE (1973) Markovian decision processes with uncertain transition probabilities. Oper Res 21(3):728–740MathSciNetzbMATHGoogle Scholar
  33. Serfozo RF (1979) An equivalence between continuous and discrete time Markov decision processes. Oper Res 27(3):616–620MathSciNetzbMATHGoogle Scholar
  34. Sigaud O, Buffet O (eds) (2010) Markov decision processes in artificial intelligence. Wiley-ISTE, LondonzbMATHGoogle Scholar
  35. Singh SP, Cohn D (1997) How to dynamically merge Markov decision processes. In: Jordan MI, Kearns MJ, Solla SA(eds) Advances in neural information processing systems 10, [NIPS Conference, Denver, Colorado, USA, 1997]. The MIT Press, pp 1057–1063Google Scholar
  36. Singh SP, Jaakkola TS, Jordan MI (1994) Learning without state-estimation in partially observable Markovian decision processes. In: Cohen WW, Hirsh H (eds) Machine learning, proceedings of the eleventh international conference, Rutgers University, New Brunswick, NJ, USA, July 10–13, 1994, pp 284–292Google Scholar
  37. Steimle LN, Kaufman DL, Denton BT (2018) Multi-model Markov decision processes. Technical report, Optimization-onlineGoogle Scholar
  38. Vielma JP (2015) Mixed integer linear programming formulation techniques. SIAM Rev 57(1):3–57MathSciNetzbMATHGoogle Scholar
  39. Walraven E, Spaan MTJ (2015) Planning under uncertainty with weighted state scenarios. In: Meila M, Heskes T (eds) Proceedings of the thirty-first conference on uncertainty in artificial intelligence, UAI 2015, July 12–16, 2015, Amsterdam, The Netherlands, pp 912–921. AUAI PressGoogle Scholar
  40. White CC, Eldeib HK (1994) Markov decision processes with imprecise transition probabilities. Oper Res 42(4):739–749MathSciNetzbMATHGoogle Scholar
  41. White CC, White DJ (1989) Markov decision processes. Eur J Oper Res 39(6):1–16MathSciNetzbMATHGoogle Scholar
  42. Wierman A, Andrew LL, Tang A (2012) Power-aware speed scaling in processor sharing systems: optimality and robustness. Perform Eval 69(12):601–622Google Scholar
  43. Wiesemann W, Kuhn D, Rustem B (2013) Robust Markov decision processes. Math Oper Res 38(1):153–183MathSciNetzbMATHGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Informatik IV, TU DortmundDortmundGermany

Personalised recommendations