# Least-Squares Reinforcement Learning Methods

**DOI:**https://doi.org/10.1007/978-0-387-30164-8_468

## Definition

Most algorithms for sequential decision making rely on computing or learning a value function that captures the expected long-term return of a decision at any given state. Value functions are in general complex, nonlinear functions that cannot be represented compactly as they are defined over the entire state or state-action space. Therefore, most practical algorithms rely on value function approximation methods and the most common choice for approximation architecture is a linear architecture. Exploiting the properties of linear architectures, a number of efficient learning algorithms based on least-squares techniques have been developed. These algorithms focus on different aspects of the approximation problem and deliver diverse solutions, nevertheless they share the tendency to process data collectively (batch mode) and, in general, achieve better results compared to their counterpart algorithms based on stochastic approximation.

## Motivation and Background

## Recommended Reading

- Boyan, J. A. (1999). Least-squares temporal difference learning.
*Proceedings of the Sixteenth International Conference on Machine Learning*, Bled, Slovenia, pp. 49–56.Google Scholar - Bradtke, S. J., & Barto, A. G. (1996). Linear least-squares algorithms for temporal difference learning.
*Machine Learning, 22*, 33–57.zbMATHGoogle Scholar - Ernst, D., Geurts, P., & Wehenkel, L. (2005). Tree-based batch mode reinforcement learning.
*Journal of Machine Learning Research, 6*, 503–556.MathSciNetGoogle Scholar - Johns, J., Petrik, M., & Mahadevan, S. (2009). Hybrid least-squares algorithms for approximate policy evaluation.
*Machine Learning, 76*(2–3), 243–256.Google Scholar - Koller, D., & Parr, R. (2000). Policy iteration for factored MDPs.
*Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence*, Stanford, CA, USA, pp. 326–334.Google Scholar - Lagoudakis, M. G., Parr, R. (2003). Least-squares policy iteration.
*Journal of Machine Learning Research, 4*, 1107–1149.MathSciNetGoogle Scholar - Nedić, A., & Bertsekas, D. P. (2003). Least-squares policy evaluation algorithms with linear function approximation.
*Discrete Event Dynamic Systems: Theory and Applications, 13*(1–2), 79–110.zbMATHMathSciNetGoogle Scholar - Parr, R., Li, L., Taylor, G., Painter-Wakefield, C., & Littman, M. L. (2008). An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning, Proceedings of the twenty-fifth international conference on machine learning, Helsinki, Finland, pp. 752–759.Google Scholar
- Schweitzer, P. J., & Seidmann, A. (1985). Generalized polynomial approximations in Markovian decision processes.
*Journal of Mathematical Analysis and Applications, 110*(6), 568–582.zbMATHMathSciNetGoogle Scholar - Xu, X., He, H. G., & Hu, D. (2002). Efficient reinforcement learning using recursive least-squares methods.
*Journal of Artificial Intelligence Research, 16*, 259–292.zbMATHMathSciNetGoogle Scholar