Discrete Event Dynamic Systems

, Volume 13, Issue 1, pp 79–110

Least Squares Policy Evaluation Algorithms with Linear Function Approximation

  • A. NediĆ
  • D. P. Bertsekas
Article

DOI: 10.1023/A:1022192903948

Cite this article as:
NediĆ, A. & Bertsekas, D.P. Discrete Event Dynamic Systems (2003) 13: 79. doi:10.1023/A:1022192903948

Abstract

We consider policy evaluation algorithms within the context of infinite-horizon dynamic programming problems with discounted cost. We focus on discrete-time dynamic systems with a large number of states, and we discuss two methods, which use simulation, temporal differences, and linear cost function approximation. The first method is a new gradient-like algorithm involving least-squares subproblems and a diminishing stepsize, which is based on the λ-policy iteration method of Bertsekas and Ioffe. The second method is the LSTD(λ) algorithm recently proposed by Boyan, which for λ=0 coincides with the linear least-squares temporal-difference algorithm of Bradtke and Barto. At present, there is only a convergence result by Bradtke and Barto for the LSTD(0) algorithm. Here, we strengthen this result by showing the convergence of LSTD(λ), with probability 1, for every λ ∈ [0, 1].

Copyright information

© Kluwer Academic Publishers 2003

Authors and Affiliations

  • A. NediĆ
    • 1
  • D. P. Bertsekas
    • 1
  1. 1.Department of Electrical Engineering and Computer ScienceM.I.T.CambridgeUSA