Feature-Based Methods for Large Scale Dynamic Programming

Tsitsiklis, John N.; Van Roy, Benjamin

doi:10.1007/978-0-585-33656-5_5

John N. Tsitsiklis² &
Benjamin Van Roy²

232 Accesses
9 Citations

Abstract

We develop a methodological framework and present a few different ways in which dynamic programming and compact representations can be combined to solve large scale stochastic control problems. In particular, we develop algorithms that employ two types of feature-based compact representations; that is, representations that involve feature extraction and a relatively simple approximation architecture. We prove the convergence of these algorithms and provide bounds on the approximation error. As an example, one of these algorithms is used to generate a strategy for the game of Tetris. Furthermore, we provide a counter-example illustrating the difficulties of integrating compact representations with dynamic programming, which exemplifies the shortcomings of certain simple approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bakshi, B. R. & Stephanopoulos G., (1993). “Wave-Net: A Multiresolution, Hierarchical Neural Network with Localized Learning,” AIChE Journal, vol. 39,no. 1, pp. 57–81.
Article Google Scholar
Barto, A. G., Bradtke, S. J., & Singh, S. P., (1995). “Real-time Learning and Control Using Asynchronous Dynamic Programming,” Aritificial Intelligence, vol. 72, pp. 81–138.
Article Google Scholar
Bellman, R. E. & Dreyfus, S. E., (1959). “Functional Approximation and Dynamic Programming,” Math. Tables and Other Aids Comp., Vol. 13, pp. 247–251.
Article MATH MathSciNet Google Scholar
Bertsekas, D. P., (1995). Dynamic Programming and Optimal Control, Athena Scientific, Bellmont, MA.
MATH Google Scholar
Bertsekas, D. P., (1994). “A Counter-Example to Temporal Differences Learning,” Neural Computation, vol. 7, pp. 270–279.
Article Google Scholar
Bertsekas D. P. & Castañon, D. A., (1989). “Adaptive Aggregation for Infinite Horizon Dynamic Programming,” IEEE Transactions on Automatic Control, Vol. 34,No. 6, pp. 589–598.
Article MATH Google Scholar
Bertsekas, D. P. & Tsitsiklis, J. N., (1989). Parallel and Distributed Computation: Numerical Methods, Prentice Hall, Englewood Cliffs, NJ.
MATH Google Scholar
Dayan, P. D., (1992). “The Convergence of TD(λ) for General λ,” Machine Learning, vol. 8, pp. 341–362.
MATH Google Scholar
Gordon, G. J., (1995). “Stable Function Approximation in Dynamic Programming,” Technical Report: CMU-CS-95-103, Carnegie Mellon University.
Google Scholar
Jaakola, T., Jordan M. I., & Singh, S. P., (1994). “On the Convergence of Stochastic Iterative Dynamic Programming Algorithms,” Neural Computation, Vol. 6,No. 6.
Google Scholar
Jaakola T., Singh, S. P., & Jordan, M. I., (1995). “Reinforcement Learning Algorithms for Partially Observable Markovian Decision Processes,” in Advances in Neural Information Processing Systems 7, J. D. Cowan, G. Tesauro, and D. Touretzky, editors, Morgan Kaufmann.
Google Scholar
Korf, R. E., (1987). “Planning as Search: A Quantitative Approach,” Artificial Intelligence, vol. 33, pp. 65–88.
Article Google Scholar
Lippman, R. P., Kukolich, L. & Singer, E., (1993). “LNKnet: Neural Network, Machine-Learning, and Statistical Software for Pattern Classification,” The Lincoln Laboratory Journal, vol. 6,no. 2, pp. 249–268.
Google Scholar
Morin, T. L., (1987). “Computational Advances in Dynamic Programming,” in Dynamic Programming and Its Applications, edited by Puterman, M.L., pp. 53–90.
Google Scholar
Poggio, T. & Girosi, F., (1990). “Networks for Approximation and Learning,” Proceedings of the IEEE, vol. 78,no. 9, pp. 1481–1497.
Article Google Scholar
Reetz, D., (1977). “Approximate Solutions of a Discounted Markovian Decision Process,” Bonner Mathematische Schriften, vol. 98: Dynamische Optimierung, pp. 77–92.
Google Scholar
Schweitzer, P. J., & Seidmann, A., (1985). “Generalized Polynomial Approximations in Markovian Decision Processes,” Journal of Mathematical Analysis and Applications, vol. 110, pp. 568–582.
Article MATH MathSciNet Google Scholar
Sutton, R. S., (1988). “Learning to Predict by the Method of Temporal Differences,” Machine Learning, vol. 3, pp. 9–44.
Google Scholar
Tesauro, G., (1992). “Practical Issues in Temporal Difference Learning,” Machine Learning, vol. 8, pp. 257–277.
MATH Google Scholar
Tsitsiklis, J. N., (1994). “Asynchronous Stochastic Approximation and Q-Learning,” Machine Learning, vol. 16, pp. 185–202.
MATH Google Scholar
Watkins, C. J. C. H., Dayan, P., (1992). “Q-learning,” Machine Learning, vol. 8, pp. 279–292.
MATH Google Scholar
Whitt, W., (1978). Approximations of Dynamic Programs I. Mathematics of Operations Research, vol. 3, pp. 231–243.
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Laboratory for Information and Decision Systems, Massachusetts Institute of Technology, Cambridge, MA, 02139
John N. Tsitsiklis & Benjamin Van Roy

Authors

John N. Tsitsiklis
View author publications
You can also search for this author in PubMed Google Scholar
Benjamin Van Roy
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Brown University, USA
Leslie Pack Kaelbling

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Tsitsiklis, J.N., Van Roy, B. (1996). Feature-Based Methods for Large Scale Dynamic Programming. In: Kaelbling, L.P. (eds) Recent Advances in Reinforcement Learning. Springer, Boston, MA. https://doi.org/10.1007/978-0-585-33656-5_5

Download citation

DOI: https://doi.org/10.1007/978-0-585-33656-5_5
Received: 02 December 1994
Accepted: 13 October 1995
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-7923-9705-2
Online ISBN: 978-0-585-33656-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics