Advertisement

Mathematical Programming

, Volume 125, Issue 2, pp 235–261 | Cite as

Risk-averse dynamic programming for Markov decision processes

  • Andrzej RuszczyńskiEmail author
Full Length Paper Series B

Abstract

We introduce the concept of a Markov risk measure and we use it to formulate risk-averse control problems for two Markov decision models: a finite horizon model and a discounted infinite horizon model. For both models we derive risk-averse dynamic programming equations and a value iteration method. For the infinite horizon problem we develop a risk-averse policy iteration method and we prove its convergence. We also propose a version of the Newton method to solve a nonsmooth equation arising in the policy iteration method and we prove its global convergence. Finally, we discuss relations to min–max Markov decision models.

Keywords

Dynamic risk measures Markov risk measures Value iteration Policy iteration Nonsmooth Newton’s method Min-max Markov models 

Mathematics Subject Classification (2000)

Primary 49L20 90C40 91B30 Secondary 91A25 93E20 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Artzner P., Delbaen F., Eber J.M., Heath D.: Coherent measures of risk. Math. Finance 9, 203–228 (1999)zbMATHCrossRefMathSciNetGoogle Scholar
  2. 2.
    Artzner P., Delbaen F., Eber J.-M., Heath D., Ku H.: Coherent multiperiod risk adjusted values and Bellmans principle. Ann. Oper. Res. 152, 5–22 (2007)zbMATHCrossRefMathSciNetGoogle Scholar
  3. 3.
    Aubin J.-P., Frankowska H.: Set-Valued Analysis. Birkhäuser, Boston (1990)zbMATHGoogle Scholar
  4. 4.
    Bellman R.: On the theory of dynamic programming. Proc. Natl. Acad. Sci 38, 716 (1952)zbMATHCrossRefGoogle Scholar
  5. 5.
    Bellman R.: Applied Dynamic Programming. Princeton University Press, Princeton (1957)Google Scholar
  6. 6.
    Bertsekas D., Shreve S.E.: Stochastic Optimal Control. The Discrete Time Case. Academic Press, New York (1978)zbMATHGoogle Scholar
  7. 7.
    Boda K., Filar J.A.: Time consistent dynamic risk measures. Math. Methods Oper. Res. 63, 169–186 (2006)zbMATHCrossRefMathSciNetGoogle Scholar
  8. 8.
    Cheridito P., Delbaen F., Kupper M.: Dynamic monetary risk measures for bounded discrete-time processes. Electron. J. Probab. 11, 57–106 (2006)MathSciNetGoogle Scholar
  9. 9.
    Chung K.-J., Sobel M.J.: Discounted MDPs: distribution functions and exponential utility maximization. SIAM J. Control Optim. 25, 49–62 (1987)zbMATHCrossRefMathSciNetGoogle Scholar
  10. 10.
    Delbaen F.: Coherent risk measures on general probability spaces, In essays in honour of Dieter Sondermann. Springer, Berlin (2002)Google Scholar
  11. 11.
    Eichhorn A., Römisch W.: Polyhedral risk measures in stochastic programming. SIAM J. Optim. 16, 69–95 (2005)zbMATHCrossRefMathSciNetGoogle Scholar
  12. 12.
    Fleming W.H., Sheu S.J.: Optimal long term growth rate of expected utility of wealth. Ann. Appl. Probab. 9, 871–903 (1999)zbMATHCrossRefMathSciNetGoogle Scholar
  13. 13.
    Fleming W.H., Sheu S.J.: Risk-sensitive control and an optimal investment model. Math. Finance 10, 197–213 (2000)zbMATHCrossRefMathSciNetGoogle Scholar
  14. 14.
    Föllmer H., Penner I.: Convex risk measures and the dynamics of their penalty functions. Stat. Decis. 24, 61–96 (2006)zbMATHCrossRefGoogle Scholar
  15. 15.
    Föllmer H., Schied A.: Convex measures of risk and trading constraints. Finance Stoch. 6, 429–447 (2002)zbMATHCrossRefMathSciNetGoogle Scholar
  16. 16.
    Föllmer H., Schied A.: Stochastic Finance. An Introduction in Discrete Time. de Gruyter, Berlin (2004)zbMATHCrossRefGoogle Scholar
  17. 17.
    Fritelli M., Rosazza Gianin E.: Putting order in risk measures. J. Bank. Finance 26, 1473–1486 (2002)CrossRefGoogle Scholar
  18. 18.
    Frittelli M., Rosazza Gianin E.: Dynamic convex risk measures. In: Szegö, G. (eds) Risk Measures for the 21st Century, pp. 227–248. Wiley, Chichester (2005)Google Scholar
  19. 19.
    Fritelli M., Scandolo G.: Risk measures and capital requirements for processes. Math. Finance 16, 589–612 (2006)CrossRefMathSciNetGoogle Scholar
  20. 20.
    González-Trejo J.I., Hernández-Lerma O., Hoyos-Reyes L.F.: Minimax control of discrete-time stochastic systems. SIAM J. Control Optim. 41, 1626–1659 (2003)zbMATHCrossRefGoogle Scholar
  21. 21.
    Hernández-Lerma O., Lasserre J.B.: Discrete-time Markov Control Processes. Basic Optimality Criteria. Springer, New York (1996)Google Scholar
  22. 22.
    Howard R.A.: Dynamic Programming and Markov Processes. Wiley, New York (1960)zbMATHGoogle Scholar
  23. 23.
    Jaquette S.C.: Markov decision processes with a new optimality criterion: Discrete time. Ann. Stat. 1, 496–505 (1973)zbMATHCrossRefMathSciNetGoogle Scholar
  24. 24.
    Jaquette S.C.: A utility criterion for Markov decision processes. Manag. Sci. 23, 43–49 (1976)zbMATHCrossRefMathSciNetGoogle Scholar
  25. 25.
    Jobert L., Rogers L.C.G.: Valuations and dynamic convex risk measures. Math. Finance 18, 1–22 (2008)zbMATHCrossRefMathSciNetGoogle Scholar
  26. 26.
    Klatte D., Kummer B.: Nonsmooth Equations in Optimization. Kluwer, Dordrecht (2002)zbMATHGoogle Scholar
  27. 27.
    Klein Haneveld, W.: Duality in stochastic linear and dynamic programming. Lecture notes economics and mathematical systems 274. Springer, Berlin (1986)Google Scholar
  28. 28.
    Klöppel S., Schweizer M.: Dynamic indifference valuation via convex risk measures. Math. Finance 17, 599–627 (2007)zbMATHCrossRefMathSciNetGoogle Scholar
  29. 29.
    Koopmans T.C.: Stationary ordinal utility and impatience. Econometrica 28, 287–309 (1960)zbMATHCrossRefMathSciNetGoogle Scholar
  30. 30.
    Kreps M.K., Porteus E.L.: Temporal resolution of uncertainty and dynamic choice theory. Econometrica 46, 185–200 (1978)zbMATHCrossRefMathSciNetGoogle Scholar
  31. 31.
    Kummer B. et al.: Newton’s method for non-differentiable functions. In: Guddat, J. (eds) Advances in Mathematical Optimization, pp. 114–125. Academie Verlag, Berlin (1988)Google Scholar
  32. 32.
    Kushner H.J.: Introduction to Stochastic Control. Holt, Rhinehart, and Winston, New York (1971)zbMATHGoogle Scholar
  33. 33.
    Küenle H.-U.: Stochastiche Spiele und Entscheidungsmodelle. B. G. Teubner, Leipzig (1986)Google Scholar
  34. 34.
    Leitner J.: A short note on second-order stochastic dominance preserving coherent risk measures. Math. Finance 15, 649–651 (2005)zbMATHCrossRefMathSciNetGoogle Scholar
  35. 35.
    Ogryczak W., Ruszczyński A.: From stochastic dominance to mean-risk models: Semideviations as risk measures. Eur. J. Oper. Res. 116, 33–50 (1999)zbMATHCrossRefGoogle Scholar
  36. 36.
    Ogryczak W., Ruszczyński A.: On consistency of stochastic dominance and mean-semideviation models. Math. Program. 89, 217–232 (2001)zbMATHCrossRefMathSciNetGoogle Scholar
  37. 37.
    Ogryczak W., Ruszczyński A.: Dual stochastic dominance and related mean-risk models. SIAM J. Optim. 13(1), 60–78 (2002)zbMATHCrossRefMathSciNetGoogle Scholar
  38. 38.
    Pflug G.Ch., Römisch W.: Modeling, Measuring and Managing Risk. World Scientific, Singapore (2007)zbMATHCrossRefGoogle Scholar
  39. 39.
    Puterman M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, New York (1994)zbMATHGoogle Scholar
  40. 40.
    Riedel F.: Dynamic coherent risk measures. Stoch. Process. Appl. 112, 185–200 (2004)zbMATHCrossRefMathSciNetGoogle Scholar
  41. 41.
    Rockafellar R.T., Uryasev S.P.: Conditional value-at-risk for general loss distributions. J. Bank. Finance 26, 1443–1471 (2002)CrossRefGoogle Scholar
  42. 42.
    Rockafellar R.T., Wets R.J.-B.: Variational Analysis. Springer, Berlin (1998)zbMATHCrossRefGoogle Scholar
  43. 43.
    Rockafellar R.T., Uryasev S., Zabarankin M.: Deviation measures in risk analysis and optimization. Finance Stoch. 10, 51–74 (2006)zbMATHCrossRefMathSciNetGoogle Scholar
  44. 44.
    Robinson S.M.: Newton’s method for a class of nonsmooth functions. Set-Valued Anal. 2, 291–305 (1994)zbMATHCrossRefMathSciNetGoogle Scholar
  45. 45.
    Ruszczyński A., Shapiro A.: Optimization of risk measures. In: Calafiore, G., Dabbene, F. (eds) Probabilistic and Randomized Methods for Design Under Uncertainty, Springer, London (2005)Google Scholar
  46. 46.
    Ruszczyński A., Shapiro A.: Optimization of convex risk functions. Math. Oper. Res. 31, 433–452 (2006)zbMATHCrossRefMathSciNetGoogle Scholar
  47. 47.
    Ruszczyński A., Shapiro A.: Conditional risk mappings. Math. Oper. Res. 31, 544–561 (2006)zbMATHCrossRefMathSciNetGoogle Scholar
  48. 48.
    Scandolo, G.: Risk measures in a dynamic setting. PhD Thesis, Università degli Studi di Milano, Milan (2003)Google Scholar
  49. 49.
    Shapiro A.: On a time consistency concept in risk averse multistage stochastic programming. Oper. Res. Lett. 37, 143–147 (2009)zbMATHCrossRefMathSciNetGoogle Scholar
  50. 50.
    White D.J.: Markov Decision Processes. Wiley, New York (1993)zbMATHGoogle Scholar

Copyright information

© Springer and Mathematical Optimization Society 2010

Authors and Affiliations

  1. 1.Department of Management Science and Information SystemsRutgers UniversityPiscatawayUSA

Personalised recommendations