Abstract
Reinforcement learning aims at solving stochastic sequential decision making problems through direct trial-and-error interactions with the learning environment. In this paper, we will develop generalized compatible features to approximate value functions for reliable Reinforcement Learning. Further guided by an Actor-Critic Reinforcement Learning paradigm, we will also develop a generalized updating rule for policy gradient search in order to constantly improve learning performance. Our new updating rule has been examined on several benchmark learning problems. The experimental results on two problems will be reported specifically in this paper. Our results show that, under suitable generalization of the updating rule, the learning performance and reliability can be noticeably improved.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bhatnagar, S., Sutton, R.S., Ghavamzadeh, M., Lee, M.: Natural actor-critic algorithms. Automatica 45(11), 2471–2482 (2009)
Cartwright, J.: Roll over, boltzmann. Phys. World 27(5), 31–35 (2014)
Chen, G., Douch, C.I.J., Zhang, M.: Reinforcement learning in continuous spaces by using learning fuzzy classifier systems. IEEE Trans. Evol. Comput. PP(99), 1 (2016)
NeSI: New Zealand eScience Infrastructure (2016). https://www.nesi.org.nz/
Sutton, R.S., Barto, A.G.: Reinforcement Learning : An Introduction. MIT Press, Cambridge (1998)
Sutton, R.S., Mcallester, D., Singh, S., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. Adv. Neural Inf. Process. Syst. 12, 1057–1063 (1999)
Sutton, R.: Generalization in reinforcement learning: successful examples using sparse coarse coding. In: Advances in Neural Information Processing Systems, pp. 1038–1044 (1996)
White, D.J.: A survey of applications of Markov decision processes. J. Oper. Res. Soc. 44, 1073–1096 (1993)
Acknowledgments
Authors appreciate all the supports from NeSI [4], who provides the High Performance Computing facility to ensure the success of our computationally heavy experiments.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Peng, Y., Chen, G., Zhang, M., Pang, S. (2016). Generalized Compatible Function Approximation for Policy Gradient Search. In: Hirose, A., Ozawa, S., Doya, K., Ikeda, K., Lee, M., Liu, D. (eds) Neural Information Processing. ICONIP 2016. Lecture Notes in Computer Science(), vol 9947. Springer, Cham. https://doi.org/10.1007/978-3-319-46687-3_68
Download citation
DOI: https://doi.org/10.1007/978-3-319-46687-3_68
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46686-6
Online ISBN: 978-3-319-46687-3
eBook Packages: Computer ScienceComputer Science (R0)