Generalized Compatible Function Approximation for Policy Gradient Search

Peng, Yiming; Chen, Gang; Zhang, Mengjie; Pang, Shaoning

doi:10.1007/978-3-319-46687-3_68

Yiming Peng¹⁹,
Gang Chen¹⁹,
Mengjie Zhang¹⁹ &
…
Shaoning Pang²⁰

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9947))

Included in the following conference series:

International Conference on Neural Information Processing

2577 Accesses
1 Citations

Abstract

Reinforcement learning aims at solving stochastic sequential decision making problems through direct trial-and-error interactions with the learning environment. In this paper, we will develop generalized compatible features to approximate value functions for reliable Reinforcement Learning. Further guided by an Actor-Critic Reinforcement Learning paradigm, we will also develop a generalized updating rule for policy gradient search in order to constantly improve learning performance. Our new updating rule has been examined on several benchmark learning problems. The experimental results on two problems will be reported specifically in this paper. Our results show that, under suitable generalization of the updating rule, the learning performance and reliability can be noticeably improved.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bhatnagar, S., Sutton, R.S., Ghavamzadeh, M., Lee, M.: Natural actor-critic algorithms. Automatica 45(11), 2471–2482 (2009)
Article MathSciNet MATH Google Scholar
Cartwright, J.: Roll over, boltzmann. Phys. World 27(5), 31–35 (2014)
Article Google Scholar
Chen, G., Douch, C.I.J., Zhang, M.: Reinforcement learning in continuous spaces by using learning fuzzy classifier systems. IEEE Trans. Evol. Comput. PP(99), 1 (2016)
Google Scholar
NeSI: New Zealand eScience Infrastructure (2016). https://www.nesi.org.nz/
Sutton, R.S., Barto, A.G.: Reinforcement Learning : An Introduction. MIT Press, Cambridge (1998)
Google Scholar
Sutton, R.S., Mcallester, D., Singh, S., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. Adv. Neural Inf. Process. Syst. 12, 1057–1063 (1999)
Google Scholar
Sutton, R.: Generalization in reinforcement learning: successful examples using sparse coarse coding. In: Advances in Neural Information Processing Systems, pp. 1038–1044 (1996)
Google Scholar
White, D.J.: A survey of applications of Markov decision processes. J. Oper. Res. Soc. 44, 1073–1096 (1993)
Article MATH Google Scholar

Download references

Acknowledgments

Authors appreciate all the supports from NeSI [4], who provides the High Performance Computing facility to ensure the success of our computationally heavy experiments.

Author information

Authors and Affiliations

School of Engineering and Computer Science, Victoria University of Wellington, Wellington, New Zealand
Yiming Peng, Gang Chen & Mengjie Zhang
Department of Computing, Unitec Institute of Technology, Auckland, New Zealand
Shaoning Pang

Authors

Yiming Peng
View author publications
You can also search for this author in PubMed Google Scholar
Gang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Mengjie Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Shaoning Pang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yiming Peng .

Editor information

Editors and Affiliations

The University of Tokyo, Tokyo, Japan
Akira Hirose
Kobe University, Kobe, Japan
Seiichi Ozawa
Okinawa Institute of Science and Technology Graduate University, Onna, Japan
Kenji Doya
Nara Institute of Science and Technology, Ikoma, Japan
Kazushi Ikeda
Kyungpook National University, Daegu, Korea (Republic of)
Minho Lee
Chinese Academy of Sciences, Beijing, China
Derong Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Peng, Y., Chen, G., Zhang, M., Pang, S. (2016). Generalized Compatible Function Approximation for Policy Gradient Search. In: Hirose, A., Ozawa, S., Doya, K., Ikeda, K., Lee, M., Liu, D. (eds) Neural Information Processing. ICONIP 2016. Lecture Notes in Computer Science(), vol 9947. Springer, Cham. https://doi.org/10.1007/978-3-319-46687-3_68

Download citation

DOI: https://doi.org/10.1007/978-3-319-46687-3_68
Published: 29 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46686-6
Online ISBN: 978-3-319-46687-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics