Skip to main content

Recursive Least-Squares Learning with Eligibility Traces

  • Conference paper
Recent Advances in Reinforcement Learning (EWRL 2011)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7188))

Included in the following conference series:

Abstract

In the framework of Markov Decision Processes, we consider the problem of learning a linear approximation of the value function of some fixed policy from one trajectory possibly generated by some other policy. We describe a systematic approach for adapting on-policy learning least squares algorithms of the literature (LSTD [5], LSPE [15], FPKF [7] and GPTD [8]/KTD [10]) to off-policy learning with eligibility traces. This leads to two known algorithms, LSTD(λ)/LSPE(λ) [21] and suggests new extensions of FPKF and GPTD/KTD. We describe their recursive implementation, discuss their convergence properties, and illustrate their behavior experimentally. Overall, our study suggests that the state-of-art LSTD(λ) [21] remains the best least-squares algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Antos, A., Szepesvári, C., Munos, R.: Learning Near-Optimal Policies with Bellman-Residual Minimization Based Fitted Policy Iteration and a Single Sample Path. In: Lugosi, G., Simon, H.U. (eds.) COLT 2006. LNCS (LNAI), vol. 4005, pp. 574–588. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  2. Baird, L.C.: Residual Algorithms: Reinforcement Learning with Function Approximation. In: ICML (1995)

    Google Scholar 

  3. Bertsekas, D.P., Yu, H.: Projected Equation Methods for Approximate Solution of Large Linear Systems. J. Comp. and Applied Mathematics 227(1), 27–50 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  4. Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-Dynamic Programming. Athena Scientific (1996)

    Google Scholar 

  5. Boyan, J.A.: Technical Update: Least-Squares Temporal Difference Learning. Machine Learning 49(2-3), 233–246 (1999)

    Google Scholar 

  6. Bradtke, S.J., Barto, A.G.: Linear Least-Squares algorithms for temporal difference learning. Machine Learning 22(1-3), 33–57 (1996)

    Article  MATH  Google Scholar 

  7. Choi, D., Van Roy, B.: A Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal-Difference Learning. DEDS 16, 207–239 (2006)

    MATH  Google Scholar 

  8. Engel, Y.: Algorithms and Representations for Reinforcement Learning. Ph.D. thesis, Hebrew University (2005)

    Google Scholar 

  9. Geist, M., Pietquin, O.: Eligibility Traces through Colored Noises. In: ICUMT (2010)

    Google Scholar 

  10. Geist, M., Pietquin, O.: Kalman Temporal Differences. JAIR 39, 483–532 (2010)

    MathSciNet  MATH  Google Scholar 

  11. Geist, M., Pietquin, O.: Parametric Value Function Approximation: a Unified View. In: ADPRL (2011)

    Google Scholar 

  12. Kearns, M., Singh, S.: Bias-Variance Error Bounds for Temporal Difference Updates. In: COLT (2000)

    Google Scholar 

  13. Maei, H.R., Sutton, R.S.: GQ(λ): A general gradient algorithm for temporal-difference prediction learning with eligibility traces. In: Conference on Artificial General Intelligence (2010)

    Google Scholar 

  14. Munos, R.: Error Bounds for Approximate Policy Iteration. In: ICML (2003)

    Google Scholar 

  15. Nedić, A., Bertsekas, D.P.: Least Squares Policy Evaluation Algorithms with Linear Function Approximation. DEDS 13, 79–110 (2003)

    MATH  Google Scholar 

  16. Precup, D., Sutton, R.S., Singh, S.P.: Eligibility Traces for Off-Policy Policy Evaluation. In: ICML (2000)

    Google Scholar 

  17. Ripley, B.D.: Stochastic Simulation. Wiley & Sons (1987)

    Google Scholar 

  18. Scherrer, B.: Should one compute the Temporal Difference fix point or minimize the Bellman Residual? The unified oblique projection view. In: ICML (2010)

    Google Scholar 

  19. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction (Adaptive Computation and Machine Learning), 3rd edn. MIT Press (1998)

    Google Scholar 

  20. Tsitsiklis, J., Van Roy, B.: An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control 42(5), 674–690 (1997)

    Article  MATH  Google Scholar 

  21. Yu, H.: Convergence of Least-Squares Temporal Difference Methods under General Conditions. In: ICML (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Scherrer, B., Geist, M. (2012). Recursive Least-Squares Learning with Eligibility Traces. In: Sanner, S., Hutter, M. (eds) Recent Advances in Reinforcement Learning. EWRL 2011. Lecture Notes in Computer Science(), vol 7188. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29946-9_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-29946-9_14

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-29945-2

  • Online ISBN: 978-3-642-29946-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics