Applied Intelligence

, Volume 49, Issue 2, pp 581–591 | Cite as

Applications of asynchronous deep reinforcement learning based on dynamic updating weights

  • Xingyu Zhao
  • Shifei DingEmail author
  • Yuexuan An
  • Weikuan Jia


Deep reinforcement learning based on the asynchronous method is a new kind of reinforcement learning. It takes a multithreading way to enable multiple agents to update the parameters asynchronously in different exploration spaces. In this way, agents no longer need experience to reply and can update parameters online. At the same time, the asynchronous method can greatly improve the convergence speed of the algorithms and significantly improve the convergence performance of the algorithms. Asynchronous deep reinforcement learning algorithms, especially asynchronous advantage actor-critic algorithm, are very effective in solving practical problems and have been widely used. However, in existing asynchronous deep reinforcement learning algorithms, when each thread pushes updates to the global thread, it adopts a uniform learning rate, and fails to take account of the different information transmitted by different threads at each update. When the update of the agent to global thread is more biased towards failure information, it has no obvious help to update the parameters of the learning system. Therefore, we introduce the dynamic weights to asynchronous deep reinforcement learning algorithms and propose a new reinforcement learning algorithm named asynchronous advantage actor-critic with dynamic updating weights (DWA3C). When the information pushed by an agent is obviously helpful for the improvement of the system performance, we will enhance the update range, otherwise, we will weaken that. In this way, we can significantly improve the convergence efficiencies and convergence performances of the asynchronous deep reinforcement learning algorithms. And we also test the effectiveness of the algorithm through experiments. The experimental results show that, in the same running time, the proposed algorithm can significantly improve the convergence efficiency and convergence performance compared with the existing algorithms.


Deep reinforcement learning Asynchronous Dynamic updating weights Multithreading Parallel reinforcement learning 



This work is supported by the Fundamental Research Funds for the Central Universities(No.2017XKZD03).


  1. 1.
    Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT PressGoogle Scholar
  2. 2.
    Mnih V, Kavukcuoglu K, Silver D et al (2013) Playing atari with deep reinforcement learning. In: Proceedings of workshops at the 26th neural information processing systems 2013. Lake Tahoe, USA, pp 201–220Google Scholar
  3. 3.
    Levine S, Finn C, Darrell T et al (2016) End-to-end training of deep visuomotor policies. J Mach Learn Res 17(39):1–40MathSciNetzbMATHGoogle Scholar
  4. 4.
    Zhang M, Mccarthy Z, Finn C, et al (2016) Learning deep neural network policies with continuous memory states. In: Proceedings of the international conference on robotics and automation. Stockholm, Sweden, pp 520–527Google Scholar
  5. 5.
    Satija H, Pineau J (2016) Simultaneous machine translation using deep reinforcement learning. In: Proceedings of the workshops of international conference on machine learning. New York, USA, pp 110–119Google Scholar
  6. 6.
    Oh J, Guo X, Lee H et al (2015) Action-conditional video prediction using deep networks in atari games. In: Advances in neural information processing systems, pp 2863–2871Google Scholar
  7. 7.
    Li J, Monroe W, Ritter A et al (2016) Deep reinforcement learning for dialogue generation. In: Proceedings of the conference on empirical methods in natural language processing. Austin, USA, pp 1192–1202Google Scholar
  8. 8.
    Sallab A, Abdou M, Perot E et al (2017) Deep reinforcement learning framework for autonomous driving. Electron Imag 19:70–76CrossRefGoogle Scholar
  9. 9.
    Caicedo J, Lazebnik S (2015) Active object localization with deep reinforcement learning. In: IEEE international conference on computer vision. IEEE, pp 2488–2496Google Scholar
  10. 10.
    Silver D, Huang A, Maddison CJ et al (2016) Mastering the game of Go with deep neural networks and tree search. Nature 529(7587):484–489CrossRefGoogle Scholar
  11. 11.
    Silver D, Schrittwieser J, Simonyan K et al (2017) Mastering the game of Go without human knowledge. Nature 550(7676):354–359CrossRefGoogle Scholar
  12. 12.
    Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540): 529–533CrossRefGoogle Scholar
  13. 13.
    Hasselt H, Guez A, Silver D (2016) Deep reinforcement learning with double Q-Learning. AAAI, pp 2094–2100Google Scholar
  14. 14.
    Wang Z, Freitas N, Lanctot M (2016) Dueling network architectures for deep reinforcement learning. In: Proceedings of the international conference on machine learning. New York, USA, pp 1995–2003Google Scholar
  15. 15.
    Schaul T, Quan J, Antonoglou I et al (2016) Prioritized experience replay. In: Proceedings of the 4th international conference on learning representations. San Juan, Puerto Rico, pp 322–355Google Scholar
  16. 16.
    Lillicrap TP, Hunt JJ, Pritzel A et al (2015) Continuous control with deep reinforcement learning. arXiv:1509.02971
  17. 17.
    Silver D, Lever G, Heess N et al (2014) Deterministic policy gradient algorithms. In: Proceedings of the 31st international conference on machine learning, pp 387–395Google Scholar
  18. 18.
    Konda V, Tsitsiklis J (2000) Actor-critic algorithms3 advances in neural information processing systems, pp 1008–1014Google Scholar
  19. 19.
    Mnih V, Badia A, Mirza M et al (2016) Asynchronous methods for deep reinforcement learning. In: International conference on machine learning, pp 1928–1937Google Scholar
  20. 20.
    Watkins CJCH (1989) Learning from delayed rewards. Robot Auton Syst 15(4):233–235Google Scholar
  21. 21.
    Ding S, Zhang N, Zhang J et al (2017) Unsupervised extreme learning machine with representational features. Int J Mach Learn Cybern 8(2):587–595CrossRefGoogle Scholar
  22. 22.
    Liao H, Ding S, Wang M et al (2016) An overview on rough neural networks. Neural Comput Appl 27(7):1805–1816CrossRefGoogle Scholar
  23. 23.
    Schulman J, Moritz P, Levine S et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv:1506.02438
  24. 24.
    Tieleman T, Hinton G (2012) Lecture 6.5-rmsprop: divide the gradient by a running average of its recent magnitude. In: COURSERA: neural networks for machine learning, p 4Google Scholar
  25. 25.
    Kingma DP, Ba J (2014) Adam A method for stochastic optimization. arXiv:1412.6980
  26. 26.
    Ferreira LA, Bianchi RAC, Santos PE et al (2017) Answer set programming for non-stationary Markov decision processes. Appl Intell 47(4):993–1007CrossRefGoogle Scholar
  27. 27.
    Pakizeh E, Pedram MM, Palhang M (2015) Multi-criteria expertness based cooperative method for SARSA and eligibility trace algorithms. Appl Intell 43(3):487–498CrossRefGoogle Scholar
  28. 28.
    Hessel M, Modayil J, Van Hasselt H et al (2017) Rainbow: combining improvements in deep reinforcement learning. arXiv:1710.02298
  29. 29.
    Vien NA, Ertel W, Chung TC (2013) Learning via human feedback in continuous state and action spaces. Appl Intell 39(2):267–278CrossRefGoogle Scholar
  30. 30.
    Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. In: Proceedings of the fourteenth international conference on artificial intelligence and statistics, pp 315–323Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  • Xingyu Zhao
    • 1
  • Shifei Ding
    • 1
    Email author
  • Yuexuan An
    • 1
  • Weikuan Jia
    • 2
  1. 1.School of Computer Science and TechnologyChina University of Mining and TechnologyXuzhouChina
  2. 2.School of Information Science and EngineeringShandong Normal UniversityJinanChina

Personalised recommendations