Advertisement

Improved Deep Deterministic Policy Gradient Algorithm Based on Prioritized Sampling

  • HaoYu Zhang
  • Kai Xiong
  • Jie Bai
Conference paper
Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 528)

Abstract

Deep reinforcement learning tends to have low sampling efficiency, and prioritized sampling algorithm can improve the sampling efficiency to a certain extent. The prioritized sampling algorithm can be used in deep deterministic policy gradient algorithm, and a small sample sorting method is proposed to solve the problem of high complexity of the common prioritized sampling algorithm. Simulation experiments prove that the improved deep deterministic policy gradient algorithm improves the sampling efficiency and the training performance is better.

Keywords

Deep reinforcement learning Deep deterministic policy gradient Prioritized sampling 

References

  1. 1.
    D. Silver, A. Huang, C.J. Maddison et al., Mastering the game of Go with deep neural networks and tree search. Nature 529(7587), 484 (2016)CrossRefGoogle Scholar
  2. 2.
    D. Silver, J. Schrittwieser, K. Simonyan et al., Mastering the game of Go without human knowledge. Nature 550(7676), 354–359 (2017)CrossRefGoogle Scholar
  3. 3.
    V. Mnih, K. Kavukcuoglu, D. Silver et al., Playing Atari with deep reinforcement learning. Comput. Sci. (2013)Google Scholar
  4. 4.
    V. Mnih, K. Kavukcuoglu, D. Silver et al., Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)CrossRefGoogle Scholar
  5. 5.
    D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, M. Riedmiller, Deterministic policy gradient algorithms, in The International Conference on Machine Learning (ICML) (2014)Google Scholar
  6. 6.
    T.P. Lillicrap, J.J. Hunt, A. Pritzel et al., Continuous control with deep reinforcement learning. Comput. Sci. 8(6), A187 (2015)Google Scholar
  7. 7.
    T. Schaul, J. Quan, I. Antonoglou et al., Prioritized experience replay. Comput. Sci. (2015)Google Scholar
  8. 8.
    Zhou, Machine Learning (Tsinghua University Press, Beijing, 2016), pp. 377–382Google Scholar
  9. 9.
    J. Schulman, P. Moritz, S. Levine et al., High-dimensional continuous control using generalized advantage estimation. Comput. Sci. (2015)Google Scholar
  10. 10.
    R.S. Sutton, A.G. Barto et al., Introduction to reinforcement learning. Mach. Learn. 16(1), 285–286 (2005)Google Scholar
  11. 11.
    V. Konda, Actor-critic algorithms. SIAM J. Control Optim. 42(4), 1143–1166 (2006)MathSciNetCrossRefGoogle Scholar
  12. 12.
    H.V. Van, A. Guez, D. Silver, Deep reinforcement learning with double q-learning, in Proceedings of the AAAI Conference on Artificial Intelligence. Phoenix, USA (2016), pp. 2094–2100Google Scholar
  13. 13.
    S. Thrun, A. Schwartz, Issues in using function approximation for reinforcement learning, in Proceedings of the 1993 Connectionist Models Summer School, Hillsdale, NJ, ed. by M. Mozer, P. Smolensky, D. Touretzky, J. Elman, A. Weigend (1993)Google Scholar
  14. 14.
    Y. Jia, Robust control with decoupling performance for steering and traction of 4WS vehicles under velocity-varying motion. IEEE Trans. Control Syst. Technol. 8(3), 554–569 (2000)Google Scholar
  15. 15.
    Y. Jia, Alternative proofs for improved LMI representations for the analysis and the design of continuous-time systems with polytopic type uncertainty: a predictive approach. IEEE Trans. Autom. Control 48(8), 1413–1416 (2003)Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  1. 1.Science and Technology on Space Intelligent Control LaboratoryBeijing Institute of Control EngineeringBeijingChina
  2. 2.Beijing Key Laboratory of Intelligent Space Robotic Systems Technology and ApplicationsBeijing Institute of Spacecraft System EngineeringBeijingChina

Personalised recommendations