Skip to main content

Feature-Based Methods for Large Scale Dynamic Programming

  • Chapter
Recent Advances in Reinforcement Learning

Abstract

We develop a methodological framework and present a few different ways in which dynamic programming and compact representations can be combined to solve large scale stochastic control problems. In particular, we develop algorithms that employ two types of feature-based compact representations; that is, representations that involve feature extraction and a relatively simple approximation architecture. We prove the convergence of these algorithms and provide bounds on the approximation error. As an example, one of these algorithms is used to generate a strategy for the game of Tetris. Furthermore, we provide a counter-example illustrating the difficulties of integrating compact representations with dynamic programming, which exemplifies the shortcomings of certain simple approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Bakshi, B. R. & Stephanopoulos G., (1993). “Wave-Net: A Multiresolution, Hierarchical Neural Network with Localized Learning,” AIChE Journal, vol. 39,no. 1, pp. 57–81.

    Article  Google Scholar 

  • Barto, A. G., Bradtke, S. J., & Singh, S. P., (1995). “Real-time Learning and Control Using Asynchronous Dynamic Programming,” Aritificial Intelligence, vol. 72, pp. 81–138.

    Article  Google Scholar 

  • Bellman, R. E. & Dreyfus, S. E., (1959). “Functional Approximation and Dynamic Programming,” Math. Tables and Other Aids Comp., Vol. 13, pp. 247–251.

    Article  MATH  MathSciNet  Google Scholar 

  • Bertsekas, D. P., (1995). Dynamic Programming and Optimal Control, Athena Scientific, Bellmont, MA.

    MATH  Google Scholar 

  • Bertsekas, D. P., (1994). “A Counter-Example to Temporal Differences Learning,” Neural Computation, vol. 7, pp. 270–279.

    Article  Google Scholar 

  • Bertsekas D. P. & Castañon, D. A., (1989). “Adaptive Aggregation for Infinite Horizon Dynamic Programming,” IEEE Transactions on Automatic Control, Vol. 34,No. 6, pp. 589–598.

    Article  MATH  Google Scholar 

  • Bertsekas, D. P. & Tsitsiklis, J. N., (1989). Parallel and Distributed Computation: Numerical Methods, Prentice Hall, Englewood Cliffs, NJ.

    MATH  Google Scholar 

  • Dayan, P. D., (1992). “The Convergence of TD(λ) for General λ,” Machine Learning, vol. 8, pp. 341–362.

    MATH  Google Scholar 

  • Gordon, G. J., (1995). “Stable Function Approximation in Dynamic Programming,” Technical Report: CMU-CS-95-103, Carnegie Mellon University.

    Google Scholar 

  • Jaakola, T., Jordan M. I., & Singh, S. P., (1994). “On the Convergence of Stochastic Iterative Dynamic Programming Algorithms,” Neural Computation, Vol. 6,No. 6.

    Google Scholar 

  • Jaakola T., Singh, S. P., & Jordan, M. I., (1995). “Reinforcement Learning Algorithms for Partially Observable Markovian Decision Processes,” in Advances in Neural Information Processing Systems 7, J. D. Cowan, G. Tesauro, and D. Touretzky, editors, Morgan Kaufmann.

    Google Scholar 

  • Korf, R. E., (1987). “Planning as Search: A Quantitative Approach,” Artificial Intelligence, vol. 33, pp. 65–88.

    Article  Google Scholar 

  • Lippman, R. P., Kukolich, L. & Singer, E., (1993). “LNKnet: Neural Network, Machine-Learning, and Statistical Software for Pattern Classification,” The Lincoln Laboratory Journal, vol. 6,no. 2, pp. 249–268.

    Google Scholar 

  • Morin, T. L., (1987). “Computational Advances in Dynamic Programming,” in Dynamic Programming and Its Applications, edited by Puterman, M.L., pp. 53–90.

    Google Scholar 

  • Poggio, T. & Girosi, F., (1990). “Networks for Approximation and Learning,” Proceedings of the IEEE, vol. 78,no. 9, pp. 1481–1497.

    Article  Google Scholar 

  • Reetz, D., (1977). “Approximate Solutions of a Discounted Markovian Decision Process,” Bonner Mathematische Schriften, vol. 98: Dynamische Optimierung, pp. 77–92.

    Google Scholar 

  • Schweitzer, P. J., & Seidmann, A., (1985). “Generalized Polynomial Approximations in Markovian Decision Processes,” Journal of Mathematical Analysis and Applications, vol. 110, pp. 568–582.

    Article  MATH  MathSciNet  Google Scholar 

  • Sutton, R. S., (1988). “Learning to Predict by the Method of Temporal Differences,” Machine Learning, vol. 3, pp. 9–44.

    Google Scholar 

  • Tesauro, G., (1992). “Practical Issues in Temporal Difference Learning,” Machine Learning, vol. 8, pp. 257–277.

    MATH  Google Scholar 

  • Tsitsiklis, J. N., (1994). “Asynchronous Stochastic Approximation and Q-Learning,” Machine Learning, vol. 16, pp. 185–202.

    MATH  Google Scholar 

  • Watkins, C. J. C. H., Dayan, P., (1992). “Q-learning,” Machine Learning, vol. 8, pp. 279–292.

    MATH  Google Scholar 

  • Whitt, W., (1978). Approximations of Dynamic Programs I. Mathematics of Operations Research, vol. 3, pp. 231–243.

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1996 Kluwer Academic Publishers

About this chapter

Cite this chapter

Tsitsiklis, J.N., Van Roy, B. (1996). Feature-Based Methods for Large Scale Dynamic Programming. In: Kaelbling, L.P. (eds) Recent Advances in Reinforcement Learning. Springer, Boston, MA. https://doi.org/10.1007/978-0-585-33656-5_5

Download citation

  • DOI: https://doi.org/10.1007/978-0-585-33656-5_5

  • Received:

  • Accepted:

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-0-7923-9705-2

  • Online ISBN: 978-0-585-33656-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics