Advertisement

Annals of Operations Research

, Volume 200, Issue 1, pp 247–263 | Cite as

Dynamic consistency for stochastic optimal control problems

  • Pierre CarpentierEmail author
  • Jean-Philippe Chancelier
  • Guy Cohen
  • Michel De Lara
  • Pierre Girardeau
Article

Abstract

For a sequence of dynamic optimization problems, we aim at discussing a notion of consistency over time. This notion can be informally introduced as follows. At the very first time step t 0, the decision maker formulates an optimization problem that yields optimal decision rules for all the forthcoming time steps t 0,t 1,…,T; at the next time step t 1, he is able to formulate a new optimization problem starting at time t 1 that yields a new sequence of optimal decision rules. This process can be continued until the final time T is reached. A family of optimization problems formulated in this way is said to be dynamically consistent if the optimal strategies obtained when solving the original problem remain optimal for all subsequent problems. The notion of dynamic consistency, well-known in the field of economics, has been recently introduced in the context of risk measures, notably by Artzner et al. (Ann. Oper. Res. 152(1):5–22, 2007) and studied in the stochastic programming framework by Shapiro (Oper. Res. Lett. 37(3):143–147, 2009) and for Markov Decision Processes (MDP) by Ruszczynski (Math. Program. 125(2):235–261, 2010). We here link this notion with the concept of “state variable” in MDP, and show that a significant class of dynamic optimization problems are dynamically consistent, provided that an adequate state variable is chosen.

Keywords

Stochastic optimal control Dynamic consistency Dynamic programming Risk measures 

Notes

Acknowledgements

This study was made within the Systems and Optimization Working Group (SOWG), which is composed of Laetitia Andrieu, Kengy Barty, Pierre Carpentier, Jean-Philippe Chancelier, Guy Cohen, Anes Dallagi, Michel De Lara and Pierre Girardeau, and based at Université Paris-Est, CERMICS, Champs sur Marne, 77455 Marne la Vallée Cedex 2, France.

References

  1. Artzner, P., Delbaen, F., Eber, J.-M., Heath, D., & Ku, H. (2007). Coherent multiperiod risk-adjusted values and Bellman’s principle. Annals of Operations Research, 152(1), 5–22. CrossRefGoogle Scholar
  2. Bellman, R. (1957). Dynamic programming. Princeton: Princeton University Press. Google Scholar
  3. Bertsekas, D. (2000). Dynamic programming and optimal control (2nd ed.). Nashua: Athena Scientific. Google Scholar
  4. Cheridito, P., Delbaen, F., & Kupper, M. (2006). Dynamic monetary risk measures for bounded discrete-time processes. Electronic Journal of Probability, 11(3), 57–106. Google Scholar
  5. Detlefsen, K., & Scandolo, G. (2005). Conditional and dynamic convex risk measures. Finance and Stochastics, 9(4), 539–561. CrossRefGoogle Scholar
  6. Dreyfus, S. (2002). Richard Bellman on the birth of dynamic programming. Operations Research, 50(1), 48–51. CrossRefGoogle Scholar
  7. Ekeland, I., & Lazrak, A. (2006). Being serious about non-commitment: subgame perfect equilibrium in continuous time. arXiv:math.OC/0604264.
  8. Hammond, P. J. (1976). Changing tastes and coherent dynamic choice. Review of Economic Studies, 43(1), 159–173. CrossRefGoogle Scholar
  9. Henrion, R. (2002). On the connectedness of probabilistic constraint sets. Journal of Optimization Theory and Applications, 112(3), 657–663. CrossRefGoogle Scholar
  10. Henrion, R., & Strugarek, C. (2008). Convexity of chance constraints with independent random variables. Computational Optimization and Applications, 41(2), 263–276. CrossRefGoogle Scholar
  11. Kreps, D. M., & Porteus, E. L. (1978). Temporal resolution of uncertainty and dynamic choice theory. Econometrica, 46(1), 185–200. CrossRefGoogle Scholar
  12. Peleg, B., & Yaari, M. E. (1973). On the existence of a consistent course of action when tastes are changing. Review of Economic Studies, 40(3), 391–401. CrossRefGoogle Scholar
  13. Prékopa, A. (1995). Stochastic programming. Dordrecht: Kluwer Academic. Google Scholar
  14. Riedel, F. (2004). Dynamic coherent risk measures. Stochastic Processes and Their Applications, 112(2), 185–200. CrossRefGoogle Scholar
  15. Rockafellar, R., & Wets, R.-B. (1998). Variational analysis. Berlin: Springer. CrossRefGoogle Scholar
  16. Ruszczynski, A. (2010). Risk-averse dynamic programming for Markov decision processes. Mathematical Programming, 125(2), 235–261. CrossRefGoogle Scholar
  17. Ruszczynski, A., & Shapiro, A. (Eds.) (2003). Handbooks in operations research and management science: Vol10. Stochastic programming. Amsterdam: Elsevier. Google Scholar
  18. Shapiro, A. (2009). On a time consistency concept in risk averse multistage stochastic programming. Operations Research Letters, 37(3), 143–147. CrossRefGoogle Scholar
  19. Strotz, R. H. (1955–1956). Myopia and inconsistency in dynamic utility maximization. Review of Economic Studies, 23(3), 165–180. CrossRefGoogle Scholar
  20. Whittle, P. (1982). Optimization over time. New York: Wiley. Google Scholar
  21. Witsenhausen, H. S. (1971). On information structures, feedback and causality. SIAM Journal on Control, 9(2), 149–160. CrossRefGoogle Scholar
  22. Witsenhausen, H. S. (1973). A standard form for sequential stochastic control. Mathematical Systems Theory, 7(1), 5–11. CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  • Pierre Carpentier
    • 1
    Email author
  • Jean-Philippe Chancelier
    • 2
  • Guy Cohen
    • 2
  • Michel De Lara
    • 2
  • Pierre Girardeau
    • 1
    • 2
    • 3
  1. 1.ENSTA ParisTechParis Cedex 15France
  2. 2.CERMICS, École des Ponts ParisTechUniversité Paris-EstMarne-la-Vallée Cedex 2France
  3. 3.EDF R&DClamart CedexFrance

Personalised recommendations