Skip to main content
Log in

Navigation for autonomous vehicles via fast-stable and smooth reinforcement learning

  • Article
  • Published:
Science China Technological Sciences Aims and scope Submit manuscript

Abstract

This paper investigates the navigation problem of autonomous vehicles based on reinforcement learning (RL) with both stability and smoothness guarantees. By introducing a data-based Lyapunov function, the stability criterion in mean cost is obtained, where the Lyapunov function has a property of fast descending. Then, an off-policy RL algorithm is proposed to train safe policies, in which a more strict constraint is exerted in the framework of model-free RL to ensure the fast convergence of policy generation, in contrast with the existing RL merely with stability guarantee. In addition, by simultaneously introducing constraints on action increments and action distribution variations, the difference between the adjacent actions is effectively alleviated to ensure the smoothness of the obtained policy, instead of only seeking the similarity of the distributions of adjacent actions as commonly done in the past literature. A navigation task of a ground differentially driven mobile vehicle in simulations is adopted to demonstrate the superiority of the proposed algorithm on the fast stability and smoothness.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

References

  1. Justesen N, Bontrager P, Togelius J, et al. Deep learning for video game playing. IEEE Trans Games, 2020, 12: 1–20

    Article  Google Scholar 

  2. Silver D, Huang A, Maddison C J, et al. Mastering the game of Go with deep neural networks and tree search. Nature, 2016, 529: 484–489

    Article  Google Scholar 

  3. Jeong G, Kim H Y. Improving financial trading decisions using deep Q-learning: Predicting the number of shares, action strategies, and transfer learning. Expert Syst Appl, 2019, 117: 125–138

    Article  Google Scholar 

  4. Deng Y, Bao F, Kong Y, et al. Deep direct reinforcement learning for financial signal representation and trading. IEEE Trans Neural Netw Learn Syst, 2016, 28: 653–664

    Article  Google Scholar 

  5. Sharma A R, Kaushik P. Literature survey of statistical, deep and reinforcement learning in natural language processing. In: Proceedings of the 2017 International Conference on Computing, Communication and Automation (ICCCA). Greater Noida: IEEE, 2017. 350–354

    Chapter  Google Scholar 

  6. Dong X, Zhang J, Cheng L, et al. A policy gradient algorithm integrating long and short-term rewards for soft continuum arm control. Sci China Tech Sci, 2022, 65: 2409–2419

    Article  Google Scholar 

  7. Chen Y F, Everett M, Liu M, et al. Socially aware motion planning with deep reinforcement learning. In: Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Vancouver: IEEE, 2017. 1343–1350

  8. Achiam J, Held D, Tamar A, et al. Constrained policy optimization. In: Proceedings of the International Conference on Machine Learning. Sydney, 2017. 22–31

  9. Berkenkamp F, Turchetta M, Schoellig A, et al. Safe model-based reinforcement learning with stability guarantees. In: Proceedings of the 31st Conference on Neural Information Processing Systems. Long Beach, 2017

  10. Cheng R, Orosz G, Murray R M, et al. End-to-end safe reinforcement learning through barrier functions for safety-critical continuous control tasks. In: Proceedings of the AAAI Conference on Artificial Intelligence. Honolulu, 2019. 33: 3387–3395

    Article  Google Scholar 

  11. Osinenko P, Beckenbach L, Göhrt T, et al. A reinforcement learning method with closed-loop stability guarantee. IFAC-PapersOnLine, 2020, 53: 8043–8048

    Article  Google Scholar 

  12. Gangopadhyay B, Dasgupta P, Dey S. Safe and stable RL (S2RL) driving policies using control barrier and control lyapunov functions. IEEE Trans Intell Veh, 2023, 8: 1889–1899

    Article  Google Scholar 

  13. Ding L, Li S, Gao H, et al. Adaptive partial reinforcement learning neural network-based tracking control for wheeled mobile robotic systems. IEEE Trans Syst Man Cybern Syst, 2018, 50: 2512–2523

    Article  Google Scholar 

  14. Khader S A, Yin H, Falco P, et al. Learning deep neural policies with stability guarantees. arXiv: 2103.16432

  15. Han M, Zhang L, Wang J, et al. Actor-critic reinforcement learning for control with stability guarantee. IEEE Robot Autom Lett, 2020, 5: 6217–6224

    Article  Google Scholar 

  16. Han M, Tian Y, Zhang L, et al. Reinforcement learning control of constrained dynamic systems with uniformly ultimate boundedness stability guarantee. Automatica, 2021, 129: 109689

    Article  MathSciNet  Google Scholar 

  17. Zhang L, Zhang R, Wu T, et al. Safe reinforcement learning with stability guarantee for motion planning of autonomous vehicles. IEEE Trans Neural Netw Learn Syst, 2021, 32: 5435–5444

    Article  Google Scholar 

  18. Pei M, An H, Liu B, et al. An improved dyna-Q algorithm for mobile robot path planning in unknown dynamic environment. IEEE Trans Syst Man Cybern Syst, 2021, 52: 4415–4425

    Article  Google Scholar 

  19. Xu X, Zuo L, Li X, et al. A reinforcement learning approach to autonomous decision making of intelligent vehicles on highways. IEEE Trans Syst Man Cybern Syst, 2020, 50: 3884–3897

    Google Scholar 

  20. Huang Z, Xu X, He H, et al. Parameterized batch reinforcement learning for longitudinal control of autonomous land vehicles. IEEE Trans Syst Man Cybern Syst, 2017, 49: 730–741

    Article  Google Scholar 

  21. Mysore S, Mabsout B, Mancuso R, et al. Regularizing action policies for smooth control with reinforcement learning. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA). Xi’an: IEEE, 2021. 1810–1816

    Google Scholar 

  22. Shen Q, Li Y, Jiang H, et al. Deep reinforcement learning with robust and smooth policy. In: Proceedings of the International Conference on Machine Learning. Vienna: JMLR, 2020. 8707–8718

    Google Scholar 

  23. Long P, Liu W, Pan J. Deep-learned collision avoidance policy for distributed multiagent navigation. IEEE Robot Autom Lett, 2017, 2: 656–663

    Article  Google Scholar 

  24. Long P, Fan T, Liao X, et al. Towards optimally decentralized multirobot collision avoidance via deep reinforcement learning. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA). Brisbane: IEEE, 2018. 6252–6259

    Google Scholar 

  25. Fan T, Long P, Liu W, et al. Distributed multi-robot collision avoidance via deep reinforcement learning for navigation in complex scenarios. Int J Robotics Res, 2020, 39: 856–892

    Article  Google Scholar 

  26. Sutton R S, Barto A G. Reinforcement learning: An introduction. Cambridge, Massachusetts: MIT Press, 2018

    Google Scholar 

  27. Haarnoja T, Zhou A, Abbeel P, et al. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: Proceedings of International Conference on Machine Learning. Stockholm, 2018. 1861–1870

  28. Van Hasselt H, Guez A, Silver D. Deep reinforcement learning with double Q-learning. In: Proceedings of the AAAI Conference on Artificial Intelligence. Phoenix: AAAI Press, 2016

    Google Scholar 

  29. Lillicrap T P, Hunt J J, Pritzel A, et al. Continuous control with deep reinforcement learning. arXiv: 1509.02971

  30. Cai G R, Yang S M, Du J, et al. Convolution without multiplication: A general speed up strategy for CNNs. Sci China Tech Sci, 2021, 64: 2627–2639

    Article  Google Scholar 

  31. Shi H, Shi L, Xu M, et al. End-to-end navigation strategy with deep reinforcement learning for mobile robots. IEEE Trans Ind Inf, 2019, 16: 2393–2402

    Article  Google Scholar 

  32. Quan H, Li Y, Zhang Y. A novel mobile robot navigation method based on deep reinforcement learning. Int J Adv Robotic Syst, 2020, 17, doi: https://doi.org/10.1177/1729881420921672

  33. Yu Y P, Liu J C, Wei C. Hawk and pigeon’s intelligence for UAV swarm dynamic combat game via competitive learning pigeon-inspired optimization. Sci China Tech Sci, 2022, 65: 1072–1086

    Article  Google Scholar 

  34. Bai T T, Wang D B, Masood R J. Formation control of quad-rotor UAV via PIO. Sci China Tech Sci, 2022, 65: 432–439

    Article  Google Scholar 

  35. Wang Q S, Zhuang H, Duan Z S, et al. Robust control of uncertain robotic systems: An adaptive friction compensation approach. Sci China Tech Sci, 2021, 64: 1228–1237

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to LiXian Zhang.

Additional information

This work was supported by the National Natural Science Foundation of China (Grant Nos. 62225305 and 12072088), the Fundamental Research Funds for the Central Universities, China (Grant Nos. HIT.OCEF.2022047, HIT.BRET.2022004 and HIT.DZIJ.2023049), the Grant JCKY2022603C016, State Key Laboratory of Robotics and System (HIT), and the Heilongjiang Touyan Team.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, R., Yang, J., Liang, Y. et al. Navigation for autonomous vehicles via fast-stable and smooth reinforcement learning. Sci. China Technol. Sci. 67, 423–434 (2024). https://doi.org/10.1007/s11431-023-2483-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11431-023-2483-x

Navigation