Particle methods for stochastic optimal control problems
When dealing with numerical solution of stochastic optimal control problems, stochastic dynamic programming is the natural framework. In order to try to overcome the so-called curse of dimensionality, the stochastic programming school promoted another approach based on scenario trees which can be seen as the combination of Monte Carlo sampling ideas on the one hand, and of a heuristic technique to handle causality (or nonanticipativeness) constraints on the other hand.
However, if one considers that the solution of a stochastic optimal control problem is a feedback law which relates control to state variables, the numerical resolution of the optimization problem over a scenario tree should be completed by a feedback synthesis stage in which, at each time step of the scenario tree, control values at nodes are plotted against corresponding state values to provide a first discrete shape of this feedback law from which a continuous function can be finally inferred. From this point of view, the scenario tree approach faces an important difficulty: at the first time stages (close to the tree root), there are a few nodes (or Monte-Carlo particles), and therefore a relatively scarce amount of information to guess a feedback law, but this information is generally of a good quality (that is, viewed as a set of control value estimates for some particular state values, it has a small variance because the future of those nodes is rich enough); on the contrary, at the final time stages (near the tree leaves), the number of nodes increases but the variance gets large because the future of each node gets poor (and sometimes even deterministic).
After this dilemma has been confirmed by numerical experiments, we have tried to derive new variational approaches. First of all, two different formulations of the essential constraint of nonanticipativeness are considered: one is called algebraic and the other one is called functional. Next, in both settings, we obtain optimality conditions for the corresponding optimal control problem. For the numerical resolution of those optimality conditions, an adaptive mesh discretization method is used in the state space in order to provide information for feedback synthesis. This mesh is naturally derived from a bunch of sample noise trajectories which need not to be put into the form of a tree prior to numerical resolution. In particular, an important consequence of this discrepancy with the scenario tree approach is that the same number of nodes (or points) are available from the beginning to the end of the time horizon. And this will be obtained without sacrifying the quality of the results (that is, the variance of the estimates). Results of experiments with a hydro-electric dam production management problem will be presented and will demonstrate the claimed improvements. A more realistic problem will also be presented in order to demonstrate the effectiveness of the method for high dimensional problems.
- 2.Barty, K.: Contributions à la discrétisation des contraintes de mesurabilité pour les problèmes d’optimisation stochastique. PhD dissertation, École Nationale des Ponts et Chaussées (2004) Google Scholar
- 6.Bertsekas, D., Shreve, S.: Stochastic Optimal Control: The Discrete-Time Case. Athena Scientific, Belmont (1996) Google Scholar
- 8.Brodie, P., Glasserman, M.: A stochastic mesh method for pricing high dimensional American options. J. Comput. Finance 7 (2004) Google Scholar
- 10.Dallagi, A.: Méthodes particulaires en commande optimale stochastique. Ph.D. dissertation, Université Paris I Panthéon-Sorbonne (2007) Google Scholar
- 14.Heitsch, H., Römisch, W.: Scenario reduction algorithms in stochastic programming. Comput. Optim. Appl. 187–206 (2003) Google Scholar
- 25.Ruszczynski, A., Shapiro, A. (eds.): Handbooks in Operations Research and Management Science: Stochastic Programming. Elsevier, Amsterdam (2003) Google Scholar
- 27.Strugarek, C.: Approaches variationnelles et autres contributions en optimisation stochastique. PhD dissertation, École Nationale des Ponts et Chaussées (2006) Google Scholar
- 28.Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998) Google Scholar