Risk measurement and risk-averse control of partially observable discrete-time Markov systems

Abstract

We consider risk measurement in controlled partially observable Markov processes in discrete time. We introduce a new concept of conditional stochastic time consistency and we derive the structure of risk measures enjoying this property. We prove that they can be represented by a collection of static law invariant risk measures on the space of function of the observable part of the state. We also derive the corresponding dynamic programming equations. Finally we illustrate the results on a machine deterioration problem.

This is a preview of subscription content, access via your institution.

Fig. 1

References

  1. Arlotto A, Gans N, Steele JM (2014) Markov decision problems where means bound variances. Oper Res 62(4):864–875

    MathSciNet  Article  Google Scholar 

  2. Artzner P, Delbaen F, Eber J-M, Heath D, Ku H (2007) Coherent multiperiod risk adjusted values and Bellman’s principle. Ann Oper Res 152:5–22

    MathSciNet  Article  Google Scholar 

  3. Aubin J-P, Frankowska H (2009) Set-valued analysis. Birkhäuser, Boston

    Google Scholar 

  4. Bäuerle N, Rieder U (2011) Markov decision processes with applications to finance. Universitext. Springer, Heidelberg

    Google Scholar 

  5. Bäuerle N, Rieder U (2013) More risk-sensitive Markov decision processes. Math Oper Res 39(1):105–120

    MathSciNet  Article  Google Scholar 

  6. Bäuerle N, Rieder U (2017) Partially observable risk-sensitive Markov decision processes. Math Oper Res 42:1180–1196

    MathSciNet  Article  Google Scholar 

  7. Bertsekas DP, Shreve SE (1978) Stochastic optimal control, volume 139 of mathematics in science and engineering. Academic Press, New York

    Google Scholar 

  8. Çavus Ö, Ruszczyński A (2014a) Computational methods for risk-averse undiscounted transient Markov models. Oper Res 62(2):401–417

    MathSciNet  Article  Google Scholar 

  9. Çavus Ö, Ruszczyński A (2014b) Risk-averse control of undiscounted transient Markov models. SIAM J Control Optim 52(6):3935–3966

    MathSciNet  Article  Google Scholar 

  10. Chen Z, Li G, Zhao Y (2014) Time-consistent investment policies in Markovian markets: a case of mean-variance analysis. J Econ Dyn Control 40:293–316

    MathSciNet  Article  Google Scholar 

  11. Cheridito P, Delbaen F, Kupper M (2006) Dynamic monetary risk measures for bounded discrete-time processes. Electron J Probab 11:57–106

    MathSciNet  Article  Google Scholar 

  12. Cheridito P, Kupper M (2011) Composition of time-consistent dynamic monetary risk measures in discrete time. Int J Theor Appl Finance 14(01):137–162

    MathSciNet  Article  Google Scholar 

  13. Chu S, Zhang Y (2014) Markov decision processes with iterated coherent risk measures. Int J Control 87(11):2286–2293

    MathSciNet  MATH  Google Scholar 

  14. Coraluppi SP, Marcus SI (1999) Risk-sensitive and minimax control of discrete-time, finite-state Markov decision processes. Automatica 35(2):301–309

    MathSciNet  Article  Google Scholar 

  15. Dai Pra P, Meneghini L, Runggaldier WJ (1998) Explicit solutions for multivariate, discrete-time control problems under uncertainty. Syst Control Lett 34(4):169–176

    MathSciNet  Article  Google Scholar 

  16. Denardo EV, Rothblum UG (1979) Optimal stopping, exponential utility, and linear programming. Math Program 16(2):228–244

    MathSciNet  Article  Google Scholar 

  17. Di Masi GB, Stettner Ł (1999) Risk-sensitive control of discrete-time Markov processes with infinite horizon. SIAM J Control Optim 38(1):61–78

    MathSciNet  Article  Google Scholar 

  18. Fan J (2017) Process-based risk measures and risk-averse control of observable and partially observable discrete-time systems. Ph.D. Dissertation, Rutgers University

  19. Fan J, Ruszczyński A (2016) Process-based risk measures and risk-averse control of discrete-time systems. arXiv:1411.2675

  20. Feinberg EA, Kasyanov PO, Zgurovsky MZ (2016) Partially observable total-cost Markov decision processes with weakly continuous transition probabilities. Math Oper Res 41(2):656–681

    MathSciNet  Article  Google Scholar 

  21. Fernández-Gaucherand E, Marcus SI (1997) Risk-sensitive optimal control of hidden Markov models: structural results. IEEE Trans Autom Control 42(10):1418–1422

    MathSciNet  Article  Google Scholar 

  22. Filar JA, Kallenberg LCM, Lee H-M (1989) Variance-penalized Markov decision processes. Math Oper Res 14(1):147–161

    MathSciNet  Article  Google Scholar 

  23. Föllmer H, Penner I (2006) Convex risk measures and the dynamics of their penalty functions. Stat Decis 24(1/2006):61–96

    MathSciNet  MATH  Google Scholar 

  24. Hinderer K (1970) Foundations of non-stationary dynamic programming with discrete time parameter. Springer, Berlin

    Google Scholar 

  25. Howard RA, Matheson JE (1971/72) Risk-sensitive Markov decision processes. Manag Sci. 18:356–369

    MathSciNet  Article  Google Scholar 

  26. James MR, Baras JS, Elliott RJ (1994) Risk-sensitive control and dynamic games for partially observed discrete-time nonlinear systems. IEEE Trans Autom Control 39(4):780–792

    MathSciNet  Article  Google Scholar 

  27. Jaquette SC (1973) Markov decision processes with a new optimality criterion: discrete time. Ann Statist 1:496–505

    MathSciNet  Article  Google Scholar 

  28. Jaśkiewicz A, Matkowski J, Nowak AS (2013) Persistently optimal policies in stochastic dynamic programming with generalized discounting. Math Oper Res 38(1):108–121

    MathSciNet  Article  Google Scholar 

  29. Jobert A, Rogers LCG (2008) Valuations and dynamic convex risk measures. Math Finance 18(1):1–22

    MathSciNet  Article  Google Scholar 

  30. Klöppel S, Schweizer M (2007) Dynamic indifference valuation via convex risk measures. Math Finance 17(4):599–627

    MathSciNet  Article  Google Scholar 

  31. Kuratowski K, Ryll-Nardzewski C (1965) A general theorem on selectors. Bull Acad Polon Sci Ser Sci Math Astron Phys 13(1):397–403

    MathSciNet  MATH  Google Scholar 

  32. Levitt S, Ben-Israel A (2001) On modeling risk in Markov decision processes. In: Rubinov A, Glover B (eds) Optimization and related topics . Applied Optimization, vol 47. Springer, Boston, MA, pp 27–40

    Google Scholar 

  33. Lin K, Marcus SI (2013) Dynamic programming with non-convex risk-sensitive measures. In: American control conference (ACC), 2013, IEEE, pp 6778–6783

  34. Mannor S, Tsitsiklis JN (2013) Algorithmic aspects of mean-variance optimization in Markov decision processes. Eur J Oper Res 231(3):645–653

    MathSciNet  Article  Google Scholar 

  35. Marcus, SI, Fernández-Gaucherand E, Hernández-Hernández D, Coraluppi S, Fard P (1997) Risk sensitive Markov decision processes. In: Byrnes CI, Datta BN, Martin CF, Gilliam DS (eds) Systems and control in the twenty-first century. Systems & Control: Foundations & Applications, vol 22. Birkhäuser, Boston, MA, pp 263–279

    Google Scholar 

  36. Ogryczak W, Ruszczyński A (1999) From stochastic dominance to mean-risk models: semideviations as risk measures. Eur J Oper Res 116(1):33–50

    Article  Google Scholar 

  37. Ogryczak W, Ruszczyński A (2001) On consistency of stochastic dominance and mean-semideviation models. Math Program 89(2):217–232

    MathSciNet  Article  Google Scholar 

  38. Pflug ChG, Römisch W (2007) Modeling, measuring and managing risk. World Scientific, Singapore

    Google Scholar 

  39. Riedel F (2004) Dynamic coherent risk measures. Stoch Process Their Appl 112:185–200

    MathSciNet  Article  Google Scholar 

  40. Roorda B, Schumacher JM, Engwerda J (2005) Coherent acceptability measures in multiperiod models. Math Finance 15(4):589–612

    MathSciNet  Article  Google Scholar 

  41. Runggaldier WJ (1998) Concepts and methods for discrete and continuous time control under uncertainty. Insur Math Econ 22(1):25–39

    MathSciNet  Article  Google Scholar 

  42. Ruszczyński A (2010) Risk-averse dynamic programming for Markov decision processes. Math Program 125(2, Ser. B):235–261

    MathSciNet  Article  Google Scholar 

  43. Ruszczyński A, Shapiro A (2006a) Optimization of convex risk functions. Math Oper Res 31:433–542

    MathSciNet  Article  Google Scholar 

  44. Ruszczyński A, Shapiro A (2006b) Conditional risk mappings. Math Oper Res 31:544–561

    MathSciNet  Article  Google Scholar 

  45. Scandolo G (2003) Risk measures in a dynamic setting. Ph.D. thesis, Università degli Studi di Milano

  46. Shen Y, Stannat W, Obermayer K (2013) Risk-sensitive Markov control processes. SIAM J Control Optim 51(5):3652–3672

    MathSciNet  Article  Google Scholar 

  47. White DJ (1988) Mean, variance, and probabilistic criteria in finite Markov decision processes: a review. J Optim Theory Appl 56(1):1–29

    MathSciNet  Article  Google Scholar 

Download references

Acknowledgements

Funding was provided by the National Science Foundation, Division of Mathematical Sciences (Grant No. 1312016).

Author information

Affiliations

Authors

Corresponding author

Correspondence to Andrzej Ruszczyński.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Fan, J., Ruszczyński, A. Risk measurement and risk-averse control of partially observable discrete-time Markov systems. Math Meth Oper Res 88, 161–184 (2018). https://doi.org/10.1007/s00186-018-0633-5

Download citation

Keywords

  • Partially observable Markov processes
  • Dynamic risk measures
  • Time consistency
  • Dynamic programming