Mathematical Programming

, Volume 125, Issue 2, pp 235–261

# Risk-averse dynamic programming for Markov decision processes

• Andrzej Ruszczyński
Full Length Paper Series B

## Abstract

We introduce the concept of a Markov risk measure and we use it to formulate risk-averse control problems for two Markov decision models: a finite horizon model and a discounted infinite horizon model. For both models we derive risk-averse dynamic programming equations and a value iteration method. For the infinite horizon problem we develop a risk-averse policy iteration method and we prove its convergence. We also propose a version of the Newton method to solve a nonsmooth equation arising in the policy iteration method and we prove its global convergence. Finally, we discuss relations to min–max Markov decision models.

## Keywords

Dynamic risk measures Markov risk measures Value iteration Policy iteration Nonsmooth Newton’s method Min-max Markov models

## Mathematics Subject Classification (2000)

Primary 49L20 90C40 91B30 Secondary 91A25 93E20

## References

1. 1.
Artzner P., Delbaen F., Eber J.M., Heath D.: Coherent measures of risk. Math. Finance 9, 203–228 (1999)
2. 2.
Artzner P., Delbaen F., Eber J.-M., Heath D., Ku H.: Coherent multiperiod risk adjusted values and Bellmans principle. Ann. Oper. Res. 152, 5–22 (2007)
3. 3.
Aubin J.-P., Frankowska H.: Set-Valued Analysis. Birkhäuser, Boston (1990)
4. 4.
Bellman R.: On the theory of dynamic programming. Proc. Natl. Acad. Sci 38, 716 (1952)
5. 5.
Bellman R.: Applied Dynamic Programming. Princeton University Press, Princeton (1957)Google Scholar
6. 6.
Bertsekas D., Shreve S.E.: Stochastic Optimal Control. The Discrete Time Case. Academic Press, New York (1978)
7. 7.
Boda K., Filar J.A.: Time consistent dynamic risk measures. Math. Methods Oper. Res. 63, 169–186 (2006)
8. 8.
Cheridito P., Delbaen F., Kupper M.: Dynamic monetary risk measures for bounded discrete-time processes. Electron. J. Probab. 11, 57–106 (2006)
9. 9.
Chung K.-J., Sobel M.J.: Discounted MDPs: distribution functions and exponential utility maximization. SIAM J. Control Optim. 25, 49–62 (1987)
10. 10.
Delbaen F.: Coherent risk measures on general probability spaces, In essays in honour of Dieter Sondermann. Springer, Berlin (2002)Google Scholar
11. 11.
Eichhorn A., Römisch W.: Polyhedral risk measures in stochastic programming. SIAM J. Optim. 16, 69–95 (2005)
12. 12.
Fleming W.H., Sheu S.J.: Optimal long term growth rate of expected utility of wealth. Ann. Appl. Probab. 9, 871–903 (1999)
13. 13.
Fleming W.H., Sheu S.J.: Risk-sensitive control and an optimal investment model. Math. Finance 10, 197–213 (2000)
14. 14.
Föllmer H., Penner I.: Convex risk measures and the dynamics of their penalty functions. Stat. Decis. 24, 61–96 (2006)
15. 15.
Föllmer H., Schied A.: Convex measures of risk and trading constraints. Finance Stoch. 6, 429–447 (2002)
16. 16.
Föllmer H., Schied A.: Stochastic Finance. An Introduction in Discrete Time. de Gruyter, Berlin (2004)
17. 17.
Fritelli M., Rosazza Gianin E.: Putting order in risk measures. J. Bank. Finance 26, 1473–1486 (2002)
18. 18.
Frittelli M., Rosazza Gianin E.: Dynamic convex risk measures. In: Szegö, G. (eds) Risk Measures for the 21st Century, pp. 227–248. Wiley, Chichester (2005)Google Scholar
19. 19.
Fritelli M., Scandolo G.: Risk measures and capital requirements for processes. Math. Finance 16, 589–612 (2006)
20. 20.
González-Trejo J.I., Hernández-Lerma O., Hoyos-Reyes L.F.: Minimax control of discrete-time stochastic systems. SIAM J. Control Optim. 41, 1626–1659 (2003)
21. 21.
Hernández-Lerma O., Lasserre J.B.: Discrete-time Markov Control Processes. Basic Optimality Criteria. Springer, New York (1996)Google Scholar
22. 22.
Howard R.A.: Dynamic Programming and Markov Processes. Wiley, New York (1960)
23. 23.
Jaquette S.C.: Markov decision processes with a new optimality criterion: Discrete time. Ann. Stat. 1, 496–505 (1973)
24. 24.
Jaquette S.C.: A utility criterion for Markov decision processes. Manag. Sci. 23, 43–49 (1976)
25. 25.
Jobert L., Rogers L.C.G.: Valuations and dynamic convex risk measures. Math. Finance 18, 1–22 (2008)
26. 26.
Klatte D., Kummer B.: Nonsmooth Equations in Optimization. Kluwer, Dordrecht (2002)
27. 27.
Klein Haneveld, W.: Duality in stochastic linear and dynamic programming. Lecture notes economics and mathematical systems 274. Springer, Berlin (1986)Google Scholar
28. 28.
Klöppel S., Schweizer M.: Dynamic indifference valuation via convex risk measures. Math. Finance 17, 599–627 (2007)
29. 29.
Koopmans T.C.: Stationary ordinal utility and impatience. Econometrica 28, 287–309 (1960)
30. 30.
Kreps M.K., Porteus E.L.: Temporal resolution of uncertainty and dynamic choice theory. Econometrica 46, 185–200 (1978)
31. 31.
Kummer B. et al.: Newton’s method for non-differentiable functions. In: Guddat, J. (eds) Advances in Mathematical Optimization, pp. 114–125. Academie Verlag, Berlin (1988)Google Scholar
32. 32.
Kushner H.J.: Introduction to Stochastic Control. Holt, Rhinehart, and Winston, New York (1971)
33. 33.
Küenle H.-U.: Stochastiche Spiele und Entscheidungsmodelle. B. G. Teubner, Leipzig (1986)Google Scholar
34. 34.
Leitner J.: A short note on second-order stochastic dominance preserving coherent risk measures. Math. Finance 15, 649–651 (2005)
35. 35.
Ogryczak W., Ruszczyński A.: From stochastic dominance to mean-risk models: Semideviations as risk measures. Eur. J. Oper. Res. 116, 33–50 (1999)
36. 36.
Ogryczak W., Ruszczyński A.: On consistency of stochastic dominance and mean-semideviation models. Math. Program. 89, 217–232 (2001)
37. 37.
Ogryczak W., Ruszczyński A.: Dual stochastic dominance and related mean-risk models. SIAM J. Optim. 13(1), 60–78 (2002)
38. 38.
Pflug G.Ch., Römisch W.: Modeling, Measuring and Managing Risk. World Scientific, Singapore (2007)
39. 39.
Puterman M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, New York (1994)
40. 40.
Riedel F.: Dynamic coherent risk measures. Stoch. Process. Appl. 112, 185–200 (2004)
41. 41.
Rockafellar R.T., Uryasev S.P.: Conditional value-at-risk for general loss distributions. J. Bank. Finance 26, 1443–1471 (2002)
42. 42.
Rockafellar R.T., Wets R.J.-B.: Variational Analysis. Springer, Berlin (1998)
43. 43.
Rockafellar R.T., Uryasev S., Zabarankin M.: Deviation measures in risk analysis and optimization. Finance Stoch. 10, 51–74 (2006)
44. 44.
Robinson S.M.: Newton’s method for a class of nonsmooth functions. Set-Valued Anal. 2, 291–305 (1994)
45. 45.
Ruszczyński A., Shapiro A.: Optimization of risk measures. In: Calafiore, G., Dabbene, F. (eds) Probabilistic and Randomized Methods for Design Under Uncertainty, Springer, London (2005)Google Scholar
46. 46.
Ruszczyński A., Shapiro A.: Optimization of convex risk functions. Math. Oper. Res. 31, 433–452 (2006)
47. 47.
Ruszczyński A., Shapiro A.: Conditional risk mappings. Math. Oper. Res. 31, 544–561 (2006)
48. 48.
Scandolo, G.: Risk measures in a dynamic setting. PhD Thesis, Università degli Studi di Milano, Milan (2003)Google Scholar
49. 49.
Shapiro A.: On a time consistency concept in risk averse multistage stochastic programming. Oper. Res. Lett. 37, 143–147 (2009)
50. 50.
White D.J.: Markov Decision Processes. Wiley, New York (1993)