Abstract
Deep reinforcement learning tends to have low sampling efficiency, and prioritized sampling algorithm can improve the sampling efficiency to a certain extent. The prioritized sampling algorithm can be used in deep deterministic policy gradient algorithm, and a small sample sorting method is proposed to solve the problem of high complexity of the common prioritized sampling algorithm. Simulation experiments prove that the improved deep deterministic policy gradient algorithm improves the sampling efficiency and the training performance is better.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
D. Silver, A. Huang, C.J. Maddison et al., Mastering the game of Go with deep neural networks and tree search. Nature 529(7587), 484 (2016)
D. Silver, J. Schrittwieser, K. Simonyan et al., Mastering the game of Go without human knowledge. Nature 550(7676), 354–359 (2017)
V. Mnih, K. Kavukcuoglu, D. Silver et al., Playing Atari with deep reinforcement learning. Comput. Sci. (2013)
V. Mnih, K. Kavukcuoglu, D. Silver et al., Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)
D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, M. Riedmiller, Deterministic policy gradient algorithms, in The International Conference on Machine Learning (ICML) (2014)
T.P. Lillicrap, J.J. Hunt, A. Pritzel et al., Continuous control with deep reinforcement learning. Comput. Sci. 8(6), A187 (2015)
T. Schaul, J. Quan, I. Antonoglou et al., Prioritized experience replay. Comput. Sci. (2015)
Zhou, Machine Learning (Tsinghua University Press, Beijing, 2016), pp. 377–382
J. Schulman, P. Moritz, S. Levine et al., High-dimensional continuous control using generalized advantage estimation. Comput. Sci. (2015)
R.S. Sutton, A.G. Barto et al., Introduction to reinforcement learning. Mach. Learn. 16(1), 285–286 (2005)
V. Konda, Actor-critic algorithms. SIAM J. Control Optim. 42(4), 1143–1166 (2006)
H.V. Van, A. Guez, D. Silver, Deep reinforcement learning with double q-learning, in Proceedings of the AAAI Conference on Artificial Intelligence. Phoenix, USA (2016), pp. 2094–2100
S. Thrun, A. Schwartz, Issues in using function approximation for reinforcement learning, in Proceedings of the 1993 Connectionist Models Summer School, Hillsdale, NJ, ed. by M. Mozer, P. Smolensky, D. Touretzky, J. Elman, A. Weigend (1993)
Y. Jia, Robust control with decoupling performance for steering and traction of 4WS vehicles under velocity-varying motion. IEEE Trans. Control Syst. Technol. 8(3), 554–569 (2000)
Y. Jia, Alternative proofs for improved LMI representations for the analysis and the design of continuous-time systems with polytopic type uncertainty: a predictive approach. IEEE Trans. Autom. Control 48(8), 1413–1416 (2003)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Zhang, H., Xiong, K., Bai, J. (2019). Improved Deep Deterministic Policy Gradient Algorithm Based on Prioritized Sampling. In: Jia, Y., Du, J., Zhang, W. (eds) Proceedings of 2018 Chinese Intelligent Systems Conference. Lecture Notes in Electrical Engineering, vol 528. Springer, Singapore. https://doi.org/10.1007/978-981-13-2288-4_21
Download citation
DOI: https://doi.org/10.1007/978-981-13-2288-4_21
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-2287-7
Online ISBN: 978-981-13-2288-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)