Actor-Critic Algorithm Based on Incremental Least-Squares Temporal Difference with Eligibility Trace

Cheng, Yuhu; Feng, Huanting; Wang, Xuesong

doi:10.1007/978-3-642-25944-9_24

Yuhu Cheng²³,
Huanting Feng²³ &
Xuesong Wang²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6839))

Included in the following conference series:

International Conference on Intelligent Computing

1 Citations

Abstract

Compared with value-function-based reinforcement learning (RL) methods, policy gradient reinforcement learning methods have better convergence, but large variance of policy gradient estimation influences the learning performance. In order to improve the convergence speed of policy gradient RL methods and the precision of gradient estimation, a kind of Actor-Critic (AC) learning algorithm based on incremental least-squares temporal difference with eligibility trace (iLSTD(λ)) is proposed by making use of the characteristics of AC framework, function approximator and iLSTD(λ) algorithm. The Critic estimates the value-function according to the iLSTD(λ) algorithm, and the Actor updates the policy parameter based on a regular gradient. Simulation results concerning a grid world with 10×10 size illustrate that the AC algorithm based on iLSTD(λ) not only has quick convergence speed but also has good gradient estimation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Reinforcement Learning Algorithms with Selector, Tuner, or Estimator

Article 19 September 2023

Advanced value iteration for discrete-time intelligent critic control: A survey

Article 21 May 2023

Optimal control of nonlinear system based on deterministic policy gradient with eligibility traces

Article 29 September 2023

References

Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Google Scholar
Sutton, R.S.: Learning to Predict by the Method of Temporal Differences. Machine Learning 3(1), 9–44 (1988)
Google Scholar
Bhatnagar, S., Bowling, M., Lee, M., et al.: Natural-gradient Actor-critic Algorithms. Automatica 45(11), 2471–2482 (2009)
Article MathSciNet MATH Google Scholar
Bhatnagar, S., Sutton, R.S., Ghavamzadeh, H., Lee, M.: Incremental Natural Actor-critic Algorithms. In: Proceedings of Advances in Neural Information Processing Systems, Vancouver, Canada, pp. 105–112. The MIT Press, Cambridge (2007)
Google Scholar
Bradtke, S.J., Barto, A.G.: Linear Least-squares Algorithms for Temporal Difference Learning. Machine Learning 22(1-3), 33–57 (1996)
Article MATH Google Scholar
Boyan, J.A.: Technical update: Least-squares Temporal Difference Learning. Machine Learning 49(2-3), 233–246 (2002)
Article MATH Google Scholar
Geramifard, A., Bowling, M., Zinkevich, M., Sutton, R.S.: iLSTD: Eligibility Traces and Convergence Analysis. In: Proceedings of Advances in Neural Information Processing Systems, Vancouver, Canada, pp. 826–833. The MIT Press, Cambridge (2006)
Google Scholar
Sutton, R.S., McAllester, D., Singh, S., Mansour, Y.: Policy Gradient Methods for Reinforcement Learning with Function Approximation. In: Proceedings of Advances in Neural Information Processing Systems, Denver, USA, pp. 1057–1063. The MIT Press, Cambridge (1999)
Google Scholar
Peters, J., Schaal, S.: Natural Actor-critic. Neurocomputing 71(7-9), 1180–1190 (2008)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Information and Electrical Engineering, China University of Mining and Technology, Xuzhou, Jiangsu, 221116, P.R. China
Yuhu Cheng, Huanting Feng & Xuesong Wang

Authors

Yuhu Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Huanting Feng
View author publications
You can also search for this author in PubMed Google Scholar
Xuesong Wang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Electronics and Information Engineering, Tongji University, 4800 Caoan Road, 201804, Shanghai, China
De-Shuang Huang
School of Computer and Communication Engineering, Zhengzhou University of Light Industry, No. 5, Dongfeng Road, Jinshui District, 450002, Zhengzhou, Henan, China
Yong Gan
Indian Institute of Technology Kanpur, 208016, Kanpur, India
Phalguni Gupta
Department of Biotechnology, Indian Institute of Technology Madras, 600 036, Chennai, Tamilnadu, India
M. Michael Gromiha

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cheng, Y., Feng, H., Wang, X. (2012). Actor-Critic Algorithm Based on Incremental Least-Squares Temporal Difference with Eligibility Trace. In: Huang, DS., Gan, Y., Gupta, P., Gromiha, M.M. (eds) Advanced Intelligent Computing Theories and Applications. With Aspects of Artificial Intelligence. ICIC 2011. Lecture Notes in Computer Science(), vol 6839. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25944-9_24

Download citation

DOI: https://doi.org/10.1007/978-3-642-25944-9_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25943-2
Online ISBN: 978-3-642-25944-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Actor-Critic Algorithm Based on Incremental Least-Squares Temporal Difference with Eligibility Trace

Abstract

Access this chapter

Preview

Similar content being viewed by others

Reinforcement Learning Algorithms with Selector, Tuner, or Estimator

Advanced value iteration for discrete-time intelligent critic control: A survey

Optimal control of nonlinear system based on deterministic policy gradient with eligibility traces

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Actor-Critic Algorithm Based on Incremental Least-Squares Temporal Difference with Eligibility Trace

Abstract

Access this chapter

Preview

Similar content being viewed by others

Reinforcement Learning Algorithms with Selector, Tuner, or Estimator

Advanced value iteration for discrete-time intelligent critic control: A survey

Optimal control of nonlinear system based on deterministic policy gradient with eligibility traces

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation