Encyclopedia of Machine Learning and Data Mining

2017 Edition
| Editors: Claude Sammut, Geoffrey I. Webb

Least-Squares Reinforcement Learning Methods

  • Michail G. LagoudakisEmail author
Reference work entry
DOI: https://doi.org/10.1007/978-1-4899-7687-1_473


Most algorithms for sequential decision making rely on computing or learning a value function that captures the expected long-term return of a decision at any given state. Value functions are in general complex, nonlinear functions that cannot be represented compactly as they are defined over the entire state or state-action space. Therefore, most practical algorithms rely on value function approximation methods, and the most common choice for approximation architecture is a linear architecture. Exploiting the properties of linear architectures, a number of efficient learning algorithms based on least-squares techniques have been developed. These algorithms focus on different aspects of the approximation problem and deliver diverse solutions; nevertheless they share the tendency to process data collectively (batch mode) and, in general, achieve better results compared to their counterpart algorithms based on stochastic approximation.

This is a preview of subscription content, log in to check access.

Recommended Reading

  1. Boyan JA (1999) Least-squares temporal difference learning. In: Proceedings of the sixteenth international conference on machine learning, Bled, pp 49–56Google Scholar
  2. Bradtke SJ, Barto AG (1996) Linear least-squares algorithms for temporal difference learning. Mach Learn 22:33–57zbMATHGoogle Scholar
  3. Ernst D, Geurts P, Wehenkel L (2005) Tree-based batch mode reinforcement learning. J Mach Learn Res 6:503–556MathSciNetzbMATHGoogle Scholar
  4. Johns J, Petrik M, Mahadevan S (2009) Hybrid least-squares algorithms for approximate policy evaluation. Mach Learn 76(2–3):243–256CrossRefGoogle Scholar
  5. Koller D, Parr R (2000) Policy iteration for factored MDPs. In: Proceedings of the sixteenth conference on uncertainty in artificial intelligence, Stanford, pp 326–334Google Scholar
  6. Lagoudakis MG, Parr R (2003) Least-squares policy iteration. J Mach Learn Res 4:1107–1149MathSciNetzbMATHGoogle Scholar
  7. Nedić A, Bertsekas DP (2003) Least-squares policy evaluation algorithms with linear function approximation. Discret Event Dyn Syst Theory Appl 13(1–2):79–110MathSciNetzbMATHCrossRefGoogle Scholar
  8. Parr R, Li L, Taylor G, Painter-Wakefield C, Littman ML (2008) An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning. In: Proceedings of the twenty-fifth international conference on machine learning, Helsinki, pp 752–759Google Scholar
  9. Schweitzer PJ, Seidmann A (1985) Generalized polynomial approximations in Markovian decision processes. J Math Anal Appl 110(6):568–582MathSciNetzbMATHCrossRefGoogle Scholar
  10. Xu X, He H-G, Hu D (2002) Efficient reinforcement learning using recursive least-squares methods. J Artif Intell Res 16:259–292MathSciNetzbMATHGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2017

Authors and Affiliations

  1. 1.Technical University of CreteChaniaGreece