Lazy Planning under Uncertainty by Optimizing Decisions on an Ensemble of Incomplete Disturbance Trees
 Boris Defourny,
 Damien Ernst,
 Louis Wehenkel
 … show all 3 hide
Abstract
This paper addresses the problem of solving discretetime optimal sequential decision making problems having a disturbance space W composed of a finite number of elements. In this context, the problem of finding from an initial state x _{0} an optimal decision strategy can be stated as an optimization problem which aims at finding an optimal combination of decisions attached to the nodes of a disturbance tree modeling all possible sequences of disturbances w _{0}, w _{1}, ..., \(w_{T1} \in W^T\) over the optimization horizon T. A significant drawback of this approach is that the resulting optimization problem has a search space which is the Cartesian product of O(W^{ T − 1}) decision spaces U, which makes the approach computationally impractical as soon as the optimization horizon grows, even if W has just a handful of elements. To circumvent this difficulty, we propose to exploit an ensemble of randomly generated incomplete disturbance trees of controlled complexity, to solve their induced optimization problems in parallel, and to combine their predictions at time t = 0 to obtain a (near)optimal firststage decision. Because this approach postpones the determination of the decisions for subsequent stages until additional information about the realization of the uncertain process becomes available, we call it lazy. Simulations carried out on a robot corridor navigation problem show that even for small incomplete trees, this approach can lead to nearoptimal decisions.
 Maciejowski, J. (2001) Predictive Control with Constraints. Prentice Hall, Englewood Cliffs
 Morari, M., Lee, J. (1999) Model predictive control: past, present and future. Computers and Chemical Engineering 23: pp. 667682 CrossRef
 Birge, J., Louveaux, F. (1997) Introduction to Stochastic Programming. Springer, New York
 Launchbury, J. (1993) A natural semantics for lazy evaluation. POPL 1993: Proceedings of the 20th ACM SIGPLANSIGACT symposium on Principles of programming languages. ACM, New York, pp. 144154 CrossRef
 Friedman, J., Kohavi, R., Yun, Y.: Lazy decision trees. In: Proc. of 13th National Conference on Artificial Intelligence, AAAI 1996. Part 1(of 2), pp. 717–724 (1996)
 Heitsch, H., Römisch, W., Strugarek, C. (2006) Stability of multistage stochastic programs. SIAM Journal on Optimization 17: pp. 511525 CrossRef
 Römisch, W. Stability of stochastic programming problems. In: Ruszczyński, A., Shapiro, A. eds. (2003) Stochastic Programming. Handbooks in Operations Research and Management Science. Elsevier, Amsterdam, pp. 483554
 Dempster, M. (1998) Sequential importance sampling algorithms for dynamic stochastic programming. Annals of Operations Research 84: pp. 153184
 Shapiro, A. Monte Carlo sampling methods. In: Ruszczyński, A., Shapiro, A. eds. (2003) Stochastic Programming. Handbooks in Operations Research and Management Science. Elsevier, Amsterdam, pp. 353425
 Høyland, K., Wallace, S. (2001) Generating scenario trees for multistage decision problems. Management Science 47: pp. 295307 CrossRef
 Hochreiter, R., Pflug, G. (2007) Financial scenario generation for stochastic multistage decision processes as facility location problems. Annals of Operations Research 152: pp. 257272 CrossRef
 Rachev, S., Römisch, W. (2002) Quantitative stability in stochastic programming: The method of probability metrics. Mathematics of Operations Research 27: pp. 792818 CrossRef
 Ernst, D., Glavic, M., Capitanescu, F., Wehenkel, L.: Reinforcement learning versus model predictive control: a comparison on a power system problem. IEEE Transactions on Systems, Man and Cybernetics  Part B (to appear, 2008)
 Kothare, M., Balakrishnan, V., Morari, M. (1996) Robust constrained model predictive control using matrix inequalities. Automatica 32: pp. 13611379 CrossRef
 Nesterov, Y., Vial, J.P. (2008) Confidence level solutions for stochastic programming. Automatica 44: pp. 15591568 CrossRef
 Schapire, R. (1990) The strength of weak learnability. Machine Learning 5: pp. 197227
 Breiman, L. (1996) Bagging predictors. Machine Learning 24: pp. 123140
 Ernst, D., Geurts, P., Wehenkel, L. (2005) Treebased batch mode reinforcement learning. Journal of Machine Learning Research 6: pp. 503556
 Sutton, R. (1996) Generalization in reinforcement learning: successful examples using sparse coarse coding. Advances in Neural Information Processing Systems 8: pp. 10381044
 Kearns, M., Mansour, Y., Ng, A. (2002) A sparse sampling algorithm for nearoptimal planning in large Markov decision processes. Machine Learning 49: pp. 193208 CrossRef
 Rubinstein, R., Kroese, D. (2004) The CrossEntropy Method. A Unified Approach to Combinatorial Optimization, MonteCarlo Simulation, and Machine Learning. Information Science and Statistics. Springer, Heidelberg
 Cassandra, A., Kaelbling, L., Littman, M. (1994) Acting optimally in partially observable stochastic domains. Proceedings of the Twelfth National Conference on Artificial Intelligence (AAAI 1994). AAAI Press/MIT Press, Menlo Park, pp. 10231028
 Ng, A., Jordan, M.: PEGASUS: a policy search method for large MDPs and POMDPs. In: Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence, pp. 406–415 (1999)
 Defourny, B.: Approximate solution to multistage stochastic programs with ensembles of randomized scenario trees. Master’s thesis, University of Liège, Department of Electrical Engineering and Computer Science (2007)
 Defourny, B., Wehenkel, L.: Averaging decisions from an ensemble of scenario trees: a validation on newsvendor problems (submitted, 2008)
 Bellman, R. (1957) Dynamic Programming. Princeton University Press, Princeton
 Sutton, R., McAllester, D., Singh, S., Mansour, Y. (2000) Policy gradient methods for reinforcement learning with function approximation. Advances in Neural Information Processing Systems 12: pp. 10571063
 Title
 Lazy Planning under Uncertainty by Optimizing Decisions on an Ensemble of Incomplete Disturbance Trees
 Book Title
 Recent Advances in Reinforcement Learning
 Book Subtitle
 8th European Workshop, EWRL 2008, Villeneuve d’Ascq, France, June 30July 3, 2008, Revised and Selected Papers
 Pages
 pp 114
 Copyright
 2008
 DOI
 10.1007/9783540897224_1
 Print ISBN
 9783540897217
 Online ISBN
 9783540897224
 Series Title
 Lecture Notes in Computer Science
 Series Volume
 5323
 Series ISSN
 03029743
 Publisher
 Springer Berlin Heidelberg
 Copyright Holder
 Springer Berlin Heidelberg
 Additional Links
 Topics
 Keywords

 Stochastic dynamic programming
 Ensemble methods
 Industry Sectors
 eBook Packages
 Editors

 Sertan Girgin ^{(1)}
 Manuel Loth ^{(2)}
 Rémi Munos ^{(2)}
 Philippe Preux ^{(2)}
 Daniil Ryabko ^{(2)}
 Editor Affiliations

 1. INRIA LilleNord Europe
 2. INRIA, LIFL, CNRS, Université de Lille
 Authors

 Boris Defourny ^{(3)}
 Damien Ernst ^{(3)}
 Louis Wehenkel ^{(3)}
 Author Affiliations

 3. Department of Electrical Engineering and Computer Science, University of Liège, Grande Traverse, 10, SartTilman, B4000, Liège, Belgium
Continue reading...
To view the rest of this content please follow the download PDF link above.