Computational Optimization and Applications

, Volume 56, Issue 3, pp 635–674 | Cite as

Particle methods for stochastic optimal control problems

Article

Abstract

When dealing with numerical solution of stochastic optimal control problems, stochastic dynamic programming is the natural framework. In order to try to overcome the so-called curse of dimensionality, the stochastic programming school promoted another approach based on scenario trees which can be seen as the combination of Monte Carlo sampling ideas on the one hand, and of a heuristic technique to handle causality (or nonanticipativeness) constraints on the other hand.

However, if one considers that the solution of a stochastic optimal control problem is a feedback law which relates control to state variables, the numerical resolution of the optimization problem over a scenario tree should be completed by a feedback synthesis stage in which, at each time step of the scenario tree, control values at nodes are plotted against corresponding state values to provide a first discrete shape of this feedback law from which a continuous function can be finally inferred. From this point of view, the scenario tree approach faces an important difficulty: at the first time stages (close to the tree root), there are a few nodes (or Monte-Carlo particles), and therefore a relatively scarce amount of information to guess a feedback law, but this information is generally of a good quality (that is, viewed as a set of control value estimates for some particular state values, it has a small variance because the future of those nodes is rich enough); on the contrary, at the final time stages (near the tree leaves), the number of nodes increases but the variance gets large because the future of each node gets poor (and sometimes even deterministic).

After this dilemma has been confirmed by numerical experiments, we have tried to derive new variational approaches. First of all, two different formulations of the essential constraint of nonanticipativeness are considered: one is called algebraic and the other one is called functional. Next, in both settings, we obtain optimality conditions for the corresponding optimal control problem. For the numerical resolution of those optimality conditions, an adaptive mesh discretization method is used in the state space in order to provide information for feedback synthesis. This mesh is naturally derived from a bunch of sample noise trajectories which need not to be put into the form of a tree prior to numerical resolution. In particular, an important consequence of this discrepancy with the scenario tree approach is that the same number of nodes (or points) are available from the beginning to the end of the time horizon. And this will be obtained without sacrifying the quality of the results (that is, the variance of the estimates). Results of experiments with a hydro-electric dam production management problem will be presented and will demonstrate the claimed improvements. A more realistic problem will also be presented in order to demonstrate the effectiveness of the method for high dimensional problems.

References

  1. 1.
    Aubin, J.-P., Frankowska, H.: Set-Valued Analysis. Birkhäuser, Boston (1990) MATHGoogle Scholar
  2. 2.
    Barty, K.: Contributions à la discrétisation des contraintes de mesurabilité pour les problèmes d’optimisation stochastique. PhD dissertation, École Nationale des Ponts et Chaussées (2004) Google Scholar
  3. 3.
    Barty, K., Carpentier, P., Chancelier, J.-P., Cohen, G., De Lara, M., Guilbaud, T.: Dual effect free stochastic controls. Ann. Oper. Res. 142, 41–62 (2006) MathSciNetCrossRefMATHGoogle Scholar
  4. 4.
    Bellman, R.: Dynamic Programming. Princeton University Press, New Jersey (1957) MATHGoogle Scholar
  5. 5.
    Bertsekas, D.: Dynamic Programming and Stochastic Control. Academic Press, San Diego (1976) MATHGoogle Scholar
  6. 6.
    Bertsekas, D., Shreve, S.: Stochastic Optimal Control: The Discrete-Time Case. Athena Scientific, Belmont (1996) Google Scholar
  7. 7.
    Breiman, L.: Probability. SIAM, Philadelphia (1992) CrossRefMATHGoogle Scholar
  8. 8.
    Brodie, P., Glasserman, M.: A stochastic mesh method for pricing high dimensional American options. J. Comput. Finance 7 (2004) Google Scholar
  9. 9.
    Chen, Z.L., Powell, W.B.: Convergent cutting plane and partial sampling algorithms for multistage stochastic linear programs with recourse. J. Optim. Theory Appl. 102, 497–524 (1999) MathSciNetCrossRefMATHGoogle Scholar
  10. 10.
    Dallagi, A.: Méthodes particulaires en commande optimale stochastique. Ph.D. dissertation, Université Paris I Panthéon-Sorbonne (2007) Google Scholar
  11. 11.
    Donohue, C.J., Birge, J.R.: The abridged nested decomposition method for multistage stochastic linear programs with relatively complete recourse. Algorithmic Oper. Res. 1, 20–30 (2006) MathSciNetMATHGoogle Scholar
  12. 12.
    Dupac̀ová, J., Gröwe-Kuska, N., Römisch, W.: Scenario reduction in stochastic programming. An approach using probability metrics. Math. Program. 95, 493–511 (2003) MathSciNetCrossRefGoogle Scholar
  13. 13.
    Ekeland, I., Temam, R.: Convex Analysis and Variational Problems. SIAM, Philadelphia (1999) CrossRefMATHGoogle Scholar
  14. 14.
    Heitsch, H., Römisch, W.: Scenario reduction algorithms in stochastic programming. Comput. Optim. Appl. 187–206 (2003) Google Scholar
  15. 15.
    Heitsch, H., Römisch, W.: Scenario tree modeling for multistage stochastic programs. Math. Program. 118, 371–406 (2009) MathSciNetCrossRefMATHGoogle Scholar
  16. 16.
    Hiriart-Urruty, J.-B.: Extension of Lipschitz integrands and minimization of nonconvex integral functionals: Applications to the optimal recourse problem in discrete time. Probab. Math. Stat. 3, 19–36 (1982) MathSciNetMATHGoogle Scholar
  17. 17.
    Leese, S.: Multifunctions of Souslin type. Bull. Aust. Math. Soc. 11, 395–411 (1974) MathSciNetCrossRefMATHGoogle Scholar
  18. 18.
    Outrata, J., Römisch, W.: On optimality conditions for some nonsmooth optimization problems over L p spaces. J. Optim. Theory Appl. 126, 411–438 (2005) MathSciNetCrossRefMATHGoogle Scholar
  19. 19.
    Pereira, M., Pinto, L.: Multi-stage stochastic optimization applied to energy planning. Math. Program. 52, 359–375 (1991) MathSciNetCrossRefMATHGoogle Scholar
  20. 20.
    Pflug, G.: Scenario tree generation for multiperiod financial optimization by optimal discretization. Math. Program. 89, 251–271 (2001) MathSciNetCrossRefMATHGoogle Scholar
  21. 21.
    Philpott, A.B., Guan, Z.: On the convergence of stochastic dual dynamic programming and related methods. Oper. Res. Lett. 36, 450–455 (2008) MathSciNetCrossRefMATHGoogle Scholar
  22. 22.
    Powell, W.B.: Approximate Dynamic Programming: Solving the Curses of Dimensionality. Wiley Series in Probability and Statistics. Wiley-Interscence, New York (2007) CrossRefGoogle Scholar
  23. 23.
    Rao, M.: Measure Theory and Integration. Pure and Applied Mathematics Series. Marcel Dekker, New York (2004) MATHGoogle Scholar
  24. 24.
    Rockafellar, R.T., Wets, R.J.-B.: Variational Analysis. Springer, Berlin (1998) CrossRefMATHGoogle Scholar
  25. 25.
    Ruszczynski, A., Shapiro, A. (eds.): Handbooks in Operations Research and Management Science: Stochastic Programming. Elsevier, Amsterdam (2003) Google Scholar
  26. 26.
    Shapiro, A.: On complexity of multistage stochastic programs. Oper. Res. Lett. 34, 1–8 (2006) MathSciNetCrossRefMATHGoogle Scholar
  27. 27.
    Strugarek, C.: Approaches variationnelles et autres contributions en optimisation stochastique. PhD dissertation, École Nationale des Ponts et Chaussées (2006) Google Scholar
  28. 28.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998) Google Scholar
  29. 29.
    Thénié, J., Vial, J.-P.: Step decision rules for multistage stochastic programming: a heuristic approach. Automatica 44, 1569–1584 (2008) CrossRefGoogle Scholar
  30. 30.
    Turgeon, A.: Optimal operation of multireservoir power system with stochastic inflows. Water Resour. Res. 16, 275–283 (1980) CrossRefGoogle Scholar
  31. 31.
    Wagner, D.: Survey of measurable selection theorems. SIAM J. Control Optim. 15, 859–903 (1977) CrossRefMATHGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  1. 1.ENSTA ParisTechUnité de Mathématiques AppliquéesParis Cedex 15France
  2. 2.CERMICS, École des Ponts, Champs sur MarneUniversité de Paris-EstMarne la Vallée Cedex 2France
  3. 3.EDF R&DClamart CedexFrance

Personalised recommendations