Applied Intelligence

, Volume 49, Issue 12, pp 4303–4318 | Cite as

An effective asynchronous framework for small scale reinforcement learning problems

  • Shifei DingEmail author
  • Xingyu Zhao
  • Xinzheng Xu
  • Tongfeng Sun
  • Weikuan Jia


Reinforcement learning is one of the research hotspots in the field of artificial intelligence in recent years. In the past few years, deep reinforcement learning has been widely used to solve various decision-making problems. However, due to the characteristics of neural networks, it is very easy to fall into local minima when facing small scale discrete space path planning problems. Traditional reinforcement learning uses continuous updating of a single agent when algorithm executes, which leads to a slow convergence speed. Although some scholars have done some improvement work to solve these problems, there are still many shortcomings to be overcome. In order to solve the above problems, we proposed a new asynchronous tabular reinforcement learning algorithms framework in this paper, and present four new variants of asynchronous reinforcement learning algorithms. We apply these algorithms on the standard reinforcement learning environments: frozen lake problem, cliff walking problem and windy gridworld problem, and the simulation results show that these methods can solve discrete space path planning problems efficiently and well balance the exploration and exploitation.


Reinforcement learning Path planning Asynchronous framework Machine learning Parallel framework 



This work is supported by the Fundamental Research Funds for the Central Universities (No.2017XKZD03).


  1. 1.
    Zhang C, Bi J, Xu S et al (2019) Multi-Imbalance: An open-source software for multi-class imbalance learning. Knowl-Based Syst. CrossRefGoogle Scholar
  2. 2.
    Fujita H, Cimr D (2019) Computer Aided detection for fibrillations and flutters using deep convolutional neural network. Inf Sci 486:231–239CrossRefGoogle Scholar
  3. 3.
    Zhang C, Liu C, Zhang X et al (2017) An up-to-date comparison of state-of-the-art classification algorithms. Expert Syst Appl 82:128–150CrossRefGoogle Scholar
  4. 4.
    Xiao Q, Dai J, Luo J et al (2019) Multi-view manifold regularized learning-based method for prioritizing candidate disease miRNAs. Knowl-Based Syst. CrossRefGoogle Scholar
  5. 5.
    Sutton R, Barto A (1998) Reinforcement learning: An introduction. MIT press, CambridgezbMATHGoogle Scholar
  6. 6.
    Silver D, Schrittwieser J, Simonyan K et al (2017) Mastering the game of Go without human knowledge. Nature 550(7676):354–359CrossRefGoogle Scholar
  7. 7.
    Silver D, Huang A, Maddison M et al (2016) Mastering the game of Go with deep neural networks and tree search. Nature 529(7587):484–489CrossRefGoogle Scholar
  8. 8.
    Mnih V, Kavukcuoglu K, Silver D et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533CrossRefGoogle Scholar
  9. 9.
    Mnih V, Kavukcuoglu K, Silver D et al (2013) Playing atari with deep reinforcement learning. Proceedings of Workshops at the 26th Neural Information Processing Systems, Lake Tahoe, pp 201–220Google Scholar
  10. 10.
    Levine S, Pastor P, Krizhevsky A et al (2016) Learning Hand-Eye Coordination for Robotic Grasping with Large-Scale Data Collection. International Symposium on Experimental Robotics. Springer, Cham, 173–184Google Scholar
  11. 11.
    Lenz I, Knepper R, Saxena A (2015) Deepmpc: learning deep latent features for model predictive control. In: Proceedings of the Robotics Science and Systems, Rome, pp 201–209Google Scholar
  12. 12.
    Satija H, Pineau J (2016) Simultaneous machine translation using deep reinforcement learning. Proceedings of the Workshops of International Conference on Machine Learning, New York, pp 110–119Google Scholar
  13. 13.
    Guo H (2015) Generating text with deep reinforcement learning. Proceedings of the Workshops of Advances in Neural Information Processing Systems, Montreal, pp 1–9Google Scholar
  14. 14.
    Li J, Monroe W, Ritter A et al (2016) Deep reinforcement learning for dialogue generation. Proceedings of the Conference on Empirical Methods in Natural Language Processing, Austin, pp 1192–1202Google Scholar
  15. 15.
    Caicedo J, Lazebnik S (2015) Active Object Localization with Deep Reinforcement Learning. IEEE International Conference on Computer Vision. IEEE, 2488–2496Google Scholar
  16. 16.
    Sutton R (1988) Learning to predict by the methods of temporal differences. Mach Learn 3(1):9–44Google Scholar
  17. 17.
    Watkins C (1989) Learning from delayed rewards. King's College, CambridgeGoogle Scholar
  18. 18.
    Rummery G, Niranjan M (1994) On-line Q-learning using connectionist systems. University of Cambridge, Department of Engineering, CambridgeGoogle Scholar
  19. 19.
    Singh S, Sutton R (1996) Reinforcement learning with replacing eligibility traces. Recent Advances in Reinforcement Learning, 123–158Google Scholar
  20. 20.
    Tsitsiklis J (1994) Asynchronous stochastic approximation and Q-learning. Mach Learn 16(3):185–202zbMATHGoogle Scholar
  21. 21.
    Mnih V, Badia A, Mirza M et al (2016) Asynchronous methods for deep reinforcement learning. International Conference on Machine Learning, 1928–1937Google Scholar
  22. 22.
    Zhao X, Ding S, An Y et al (2018) Asynchronous Reinforcement Learning Algorithms for Solving Discrete Space Path Planning Problems. Appl Intell 48(12):4889–4904CrossRefGoogle Scholar
  23. 23.
    Zhao X, Ding S, An Y et al (2019) Applications of asynchronous deep reinforcement learning based on dynamic updating weights. Appl Intell 49(2):581–591CrossRefGoogle Scholar
  24. 24.
    Zhao X, Ding S, An Y (2018) A new asynchronous architecture for tabular reinforcement learning algorithms. Proceedings of the Eighth International Conference on Extreme Learning Machines, 172–180Google Scholar
  25. 25.
    Nair A, Srinivasan P, Blackwell S et al (2015) Massively parallel methods for deep reinforcement learning. arXiv preprint arXiv:1507.04296Google Scholar
  26. 26.
    van Hasselt H (2010) Double Q-learning. Adv Neural Inf Proces Syst 23:2613–2621Google Scholar
  27. 27.
    Wang Y-H, Li T-H, Lin C-J (2013) Backward Q-learning: The combination of Sarsa algorithm and Q-learning. Eng Appl Artif Intell 26(9):2184–2193CrossRefGoogle Scholar
  28. 28.
    Tokic M, Palm G (2011) Value-difference based exploration: adaptive control between epsilon-greedy and softmax. Annual Conference on Artificial Intelligence, Berlin, pp 335–346Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  • Shifei Ding
    • 1
    • 2
    Email author
  • Xingyu Zhao
    • 1
    • 2
  • Xinzheng Xu
    • 1
    • 2
  • Tongfeng Sun
    • 1
    • 2
  • Weikuan Jia
    • 3
  1. 1.School of Computer Science and TechnologyChina University of Mining and TechnologyXuzhouChina
  2. 2.Mine DigitizationEngineering Research Center of Minstry of Education of the People′s Republic of ChinaXuzhouChina
  3. 3.School of Information Science and EngineeringShandong Normal UniversityJinanChina

Personalised recommendations