# Least-Squares Reinforcement Learning Methods

**DOI:**https://doi.org/10.1007/978-1-4899-7502-7_473-1

## Abstract

Most algorithms for sequential decision making rely on computing or learning a value function that captures the expected long-term return of a decision at any given state. Value functions are in general complex, nonlinear functions that cannot be represented compactly as they are defined over the entire state or state-action space. Therefore, most practical algorithms rely on value function approximation methods, and the most common choice for approximation architecture is a linear architecture. Exploiting the properties of linear architectures, a number of efficient learning algorithms based on least-squares techniques have been developed. These algorithms focus on different aspects of the approximation problem and deliver diverse solutions; nevertheless they share the tendency to process data collectively (batch mode) and, in general, achieve better results compared to their counterpart algorithms based on stochastic approximation.

## Definition

Least-Squares Reinforcement...

## Recommended Reading

- Boyan JA (1999) Least-squares temporal difference learning. In: Proceedings of the sixteenth international conference on machine learning, Bled, pp 49–56Google Scholar
- Bradtke SJ, Barto AG (1996) Linear least-squares algorithms for temporal difference learning. Mach Learn 22:33–57zbMATHGoogle Scholar
- Ernst D, Geurts P, Wehenkel L (2005) Tree-based batch mode reinforcement learning. J Mach Learn Res 6:503–556zbMATHMathSciNetGoogle Scholar
- Johns J, Petrik M, Mahadevan S (2009) Hybrid least-squares algorithms for approximate policy evaluation. Mach Learn 76(2–3):243–256CrossRefGoogle Scholar
- Koller D, Parr R (2000) Policy iteration for factored MDPs. In: Proceedings of the sixteenth conference on uncertainty in artificial intelligence, Stanford, pp 326–334Google Scholar
- Lagoudakis MG, Parr R (2003) Least-squares policy iteration. J Mach Learn Res 4:1107–1149MathSciNetGoogle Scholar
- Nedić A, Bertsekas DP (2003) Least-squares policy evaluation algorithms with linear function approximation. Discret Event Dyn Syst Theory Appl 13(1–2):79–110zbMATHGoogle Scholar
- Parr R, Li L, Taylor G, Painter-Wakefield C, Littman ML (2008) An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning. In: Proceedings of the twenty-fifth international conference on machine learning, Helsinki, pp 752–759Google Scholar
- Schweitzer PJ, Seidmann A (1985) Generalized polynomial approximations in Markovian decision processes. J Math Anal Appl 110(6):568–582zbMATHMathSciNetCrossRefGoogle Scholar
- Xu X, He H-G, Hu D (2002) Efficient reinforcement learning using recursive least-squares methods. J Artif Intell Res 16:259–292zbMATHMathSciNetGoogle Scholar