Skip to main content

Least-Squares Reinforcement Learning Methods

  • Reference work entry
Encyclopedia of Machine Learning
  • 191 Accesses

Definition

Most algorithms for sequential decision making rely on computing or learning a value function that captures the expected long-term return of a decision at any given state. Value functions are in general complex, nonlinear functions that cannot be represented compactly as they are defined over the entire state or state-action space. Therefore, most practical algorithms rely on value function approximation methods and the most common choice for approximation architecture is a linear architecture. Exploiting the properties of linear architectures, a number of efficient learning algorithms based on least-squares techniques have been developed. These algorithms focus on different aspects of the approximation problem and deliver diverse solutions, nevertheless they share the tendency to process data collectively (batch mode) and, in general, achieve better results compared to their counterpart algorithms based on stochastic approximation.

Motivation and Background

Consider a Markov...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Recommended Reading

  • Boyan, J. A. (1999). Least-squares temporal difference learning. Proceedings of the Sixteenth International Conference on Machine Learning, Bled, Slovenia, pp. 49–56.

    Google Scholar 

  • Bradtke, S. J., & Barto, A. G. (1996). Linear least-squares algorithms for temporal difference learning. Machine Learning, 22, 33–57.

    MATH  Google Scholar 

  • Ernst, D., Geurts, P., & Wehenkel, L. (2005). Tree-based batch mode reinforcement learning. Journal of Machine Learning Research, 6, 503–556.

    MathSciNet  Google Scholar 

  • Johns, J., Petrik, M., & Mahadevan, S. (2009). Hybrid least-squares algorithms for approximate policy evaluation. Machine Learning, 76(2–3), 243–256.

    Google Scholar 

  • Koller, D., & Parr, R. (2000). Policy iteration for factored MDPs. Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence, Stanford, CA, USA, pp. 326–334.

    Google Scholar 

  • Lagoudakis, M. G., Parr, R. (2003). Least-squares policy iteration. Journal of Machine Learning Research, 4, 1107–1149.

    MathSciNet  Google Scholar 

  • Nedić, A., & Bertsekas, D. P. (2003). Least-squares policy evaluation algorithms with linear function approximation. Discrete Event Dynamic Systems: Theory and Applications, 13(1–2), 79–110.

    MATH  MathSciNet  Google Scholar 

  • Parr, R., Li, L., Taylor, G., Painter-Wakefield, C., & Littman, M. L. (2008). An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning, Proceedings of the twenty-fifth international conference on machine learning, Helsinki, Finland, pp. 752–759.

    Google Scholar 

  • Schweitzer, P. J., & Seidmann, A. (1985). Generalized polynomial approximations in Markovian decision processes. Journal of Mathematical Analysis and Applications, 110(6), 568–582.

    MATH  MathSciNet  Google Scholar 

  • Xu, X., He, H. G., & Hu, D. (2002). Efficient reinforcement learning using recursive least-squares methods. Journal of Artificial Intelligence Research, 16, 259–292.

    MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer Science+Business Media, LLC

About this entry

Cite this entry

Lagoudakis, M.G. (2011). Least-Squares Reinforcement Learning Methods. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-30164-8_468

Download citation

Publish with us

Policies and ethics