Machine Learning

, Volume 8, Issue 3, pp 341–362

The convergence of TD(λ) for general λ

  • Peter Dayan
Article

DOI: 10.1007/BF00992701

Cite this article as:
Dayan, P. Mach Learn (1992) 8: 341. doi:10.1007/BF00992701

Abstract

The method of temporal differences (TD) is one way of making consistent predictions about the future. This paper uses some analysis of Watkins (1989) to extend a convergence theorem due to Sutton (1988) from the case which only uses information from adjacent time steps to that involving information from arbitrary ones.

It also considers how this version of TD behaves in the face of linearly dependent representations for states—demonstrating that it still converges, but to a different answer from the least mean squares algorithm. Finally it adapts Watkins' theorem that Q-learning, his closely related prediction and action learning method, converges with probability one, to demonstrate this strong form of convergence for a slightly modified version of TD.

Keywords

Reinforcement learning temporal differences asynchronous dynamic programming 
Download to read the full article text

Copyright information

© Kluwer Academic Publishers 1992

Authors and Affiliations

  • Peter Dayan
    • 1
  1. 1.Centre for Cognitive Science & Department of PhysicsUniversity of EdinburghScotland

Personalised recommendations