Machine Learning

, Volume 22, Issue 1–3, pp 59–94 | Cite as

Feature-based methods for large scale dynamic programming

  • John N. Tsitsiklis
  • Benjamin van Roy


We develop a methodological framework and present a few different ways in which dynamic programming and compact representations can be combined to solve large scale stochastic control problems. In particular, we develop algorithms that employ two types of feature-based compact representations; that is, representations that involve feature extraction and a relatively simple approximation architecture. We prove the convergence of these algorithms and provide bounds on the approximation error. As an example, one of these algorithms is used to generate a strategy for the game of Tetris. Furthermore, we provide a counter-example illustrating the difficulties of integrating compact representations with dynamic programming, which exemplifies the shortcomings of certain simple approaches.


Compact representation curse of dimensionality dynamic programming features function approximation neuro-dynamic programming reinforcement learning 


  1. Bakshi, B. R. & Stephanopoulos, G., (1993). "Wave-Net: A Multiresolution, Hierarchical Neural Network with Localized Learning", AIChE Journal, vol, 39, no 1, pp. 57–81.Google Scholar
  2. Barto, A. G., Bradtke, S. J. & Singh, S. P., (1995). "Real-time Learning and Control Using Asynchronous Dynamic Programming", Artificial Intelligence, vol. 72, pp. 81–138.Google Scholar
  3. Bellman, R. E. & Dreyfus, S. E., (1959). "Functional Approximation and Dynamic Programming" Math. Tables and Other Aids Comp. Vol. 13, pp. 247–251.Google Scholar
  4. Bortsekas, D. P., (1995).Dynamic Programming and Optimal Control, Athena Scientific, Bellmont, MA.Google Scholar
  5. Bertsekas, D. P. (1994), "A Counter-Example to Temporal Differences Learning", Neural Computation, vol. 7, pp. 270–279.Google Scholar
  6. Bertsekas, D. P. & Castañon, D. A., (1989), "Adaptive Aggregation for Infinite Horizon Dynamic Programming", IEEE Transactions on Automatic Control, Vol. 34, No. 6, pp. 589–598.Google Scholar
  7. Bertsekas, D. P. & Tsitsiklis, J. N., (1989).Parallel and Distributed Computation: Numerical Methods, Prentice Hall, Englewood Cliffs, NJ.Google Scholar
  8. Dayan, P. D., (1992). "The Convergence of TD(λ) for General λ", Machine Learning, vol. 8, pp. 341–362.Google Scholar
  9. Gordon, G. J., (1995). "Stable Function Approximation in Dynamic Programming", Technical Report: CMUCS-95-103, Carnegie Mellon University.Google Scholar
  10. Jaakola, T., Jordan M. I., & Singh, S. P., (1994). "On the Convergence of Stochastic Iterative Dynamic Programming Algorithms," Neural Computation, Vol. 6, No. 6.Google Scholar
  11. Jaakola T., Singh, S. P., & Jordan, M. I., (1995). "Reinforcement Learning Algorithms for Partially Observable Markovian Decision Processes," inAdvances in Neural Information Processing Systems 7, J. D. Cowan, G. Tesauro, and D. Touretzky, editors, Morgan Kaufmann.Google Scholar
  12. Korf, R. E. (1987). "Planning as Search: A Quantitative Approach", Artificial Intelligence, vol. 33, pp. 65–88.Google Scholar
  13. Lippman, R. P., Kukolich, L. & Singer, E., (1993). "LNKnet: Neural Network, Machine-Learning, and Statistical Software for Pattern Classification", The Lincoln Laboratory Journal, vol. 6, no. 2, pp. 249–268.Google Scholar
  14. Morin, T. L., (1987). "Computational Advances in Dynamic Programming", inDynamic Programming and Its Applications, edited by Puterman, M.L., pp. 53–90.Google Scholar
  15. Poggio, T. & Girosi, F., (1990). "Networks for Approximation and Learning", Proceedings of the IEEE, vol. 78, no. 9, pp. 1481–1497.Google Scholar
  16. Reetz, D., (1977). "Approximate Solutions of a Discounted Markovian Decision Process", Bonner Mathematische Schriften, vol. 98: Dynamische Optimierung, pp. 77–92.Google Scholar
  17. Schweitzer, P. J., & Seidmann, A., (1985). "Generalized Polynomial Approximations in Markovian Decision Processes", Journal of Mathematical Analysis and Applications, vol. 110, pp. 568–582.Google Scholar
  18. Sutton, R. S., (1988). "Learning to Predict by the Method of Temporal Differences", Machine Learning, vol. 3, pp. 9–44.Google Scholar
  19. Tesauro, G., (1992). "Practical Issues in Temporal Difference Learning", Machine Learning, vol. 8, pp. 257–277.Google Scholar
  20. Tsitsiklis, J. N., (1994). "Asynchronous Stochastic Approximation and Q-Learning", Machine Learning, vol. 16, pp. 185–202.Google Scholar
  21. Watkina, C. J. C. H., Dayan, P., (1992). "Q-learning", Machine Learning, vol. 8, pp. 279–292.Google Scholar
  22. Whitt, W., (1978). Approximations of Dynamic Programs I.Mathematics of Operations Research, vol. 3, pp. 231–243.Google Scholar

Copyright information

© Kluwer Academic Publishers 1996

Authors and Affiliations

  • John N. Tsitsiklis
    • 1
  • Benjamin van Roy
    • 1
  1. 1.Laboratory for Information and Decision SystemsMassachusetts Institute of Technology

Personalised recommendations