Abstract
This paper investigates the navigation problem of autonomous vehicles based on reinforcement learning (RL) with both stability and smoothness guarantees. By introducing a data-based Lyapunov function, the stability criterion in mean cost is obtained, where the Lyapunov function has a property of fast descending. Then, an off-policy RL algorithm is proposed to train safe policies, in which a more strict constraint is exerted in the framework of model-free RL to ensure the fast convergence of policy generation, in contrast with the existing RL merely with stability guarantee. In addition, by simultaneously introducing constraints on action increments and action distribution variations, the difference between the adjacent actions is effectively alleviated to ensure the smoothness of the obtained policy, instead of only seeking the similarity of the distributions of adjacent actions as commonly done in the past literature. A navigation task of a ground differentially driven mobile vehicle in simulations is adopted to demonstrate the superiority of the proposed algorithm on the fast stability and smoothness.
References
Justesen N, Bontrager P, Togelius J, et al. Deep learning for video game playing. IEEE Trans Games, 2020, 12: 1–20
Silver D, Huang A, Maddison C J, et al. Mastering the game of Go with deep neural networks and tree search. Nature, 2016, 529: 484–489
Jeong G, Kim H Y. Improving financial trading decisions using deep Q-learning: Predicting the number of shares, action strategies, and transfer learning. Expert Syst Appl, 2019, 117: 125–138
Deng Y, Bao F, Kong Y, et al. Deep direct reinforcement learning for financial signal representation and trading. IEEE Trans Neural Netw Learn Syst, 2016, 28: 653–664
Sharma A R, Kaushik P. Literature survey of statistical, deep and reinforcement learning in natural language processing. In: Proceedings of the 2017 International Conference on Computing, Communication and Automation (ICCCA). Greater Noida: IEEE, 2017. 350–354
Dong X, Zhang J, Cheng L, et al. A policy gradient algorithm integrating long and short-term rewards for soft continuum arm control. Sci China Tech Sci, 2022, 65: 2409–2419
Chen Y F, Everett M, Liu M, et al. Socially aware motion planning with deep reinforcement learning. In: Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Vancouver: IEEE, 2017. 1343–1350
Achiam J, Held D, Tamar A, et al. Constrained policy optimization. In: Proceedings of the International Conference on Machine Learning. Sydney, 2017. 22–31
Berkenkamp F, Turchetta M, Schoellig A, et al. Safe model-based reinforcement learning with stability guarantees. In: Proceedings of the 31st Conference on Neural Information Processing Systems. Long Beach, 2017
Cheng R, Orosz G, Murray R M, et al. End-to-end safe reinforcement learning through barrier functions for safety-critical continuous control tasks. In: Proceedings of the AAAI Conference on Artificial Intelligence. Honolulu, 2019. 33: 3387–3395
Osinenko P, Beckenbach L, Göhrt T, et al. A reinforcement learning method with closed-loop stability guarantee. IFAC-PapersOnLine, 2020, 53: 8043–8048
Gangopadhyay B, Dasgupta P, Dey S. Safe and stable RL (S2RL) driving policies using control barrier and control lyapunov functions. IEEE Trans Intell Veh, 2023, 8: 1889–1899
Ding L, Li S, Gao H, et al. Adaptive partial reinforcement learning neural network-based tracking control for wheeled mobile robotic systems. IEEE Trans Syst Man Cybern Syst, 2018, 50: 2512–2523
Khader S A, Yin H, Falco P, et al. Learning deep neural policies with stability guarantees. arXiv: 2103.16432
Han M, Zhang L, Wang J, et al. Actor-critic reinforcement learning for control with stability guarantee. IEEE Robot Autom Lett, 2020, 5: 6217–6224
Han M, Tian Y, Zhang L, et al. Reinforcement learning control of constrained dynamic systems with uniformly ultimate boundedness stability guarantee. Automatica, 2021, 129: 109689
Zhang L, Zhang R, Wu T, et al. Safe reinforcement learning with stability guarantee for motion planning of autonomous vehicles. IEEE Trans Neural Netw Learn Syst, 2021, 32: 5435–5444
Pei M, An H, Liu B, et al. An improved dyna-Q algorithm for mobile robot path planning in unknown dynamic environment. IEEE Trans Syst Man Cybern Syst, 2021, 52: 4415–4425
Xu X, Zuo L, Li X, et al. A reinforcement learning approach to autonomous decision making of intelligent vehicles on highways. IEEE Trans Syst Man Cybern Syst, 2020, 50: 3884–3897
Huang Z, Xu X, He H, et al. Parameterized batch reinforcement learning for longitudinal control of autonomous land vehicles. IEEE Trans Syst Man Cybern Syst, 2017, 49: 730–741
Mysore S, Mabsout B, Mancuso R, et al. Regularizing action policies for smooth control with reinforcement learning. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA). Xi’an: IEEE, 2021. 1810–1816
Shen Q, Li Y, Jiang H, et al. Deep reinforcement learning with robust and smooth policy. In: Proceedings of the International Conference on Machine Learning. Vienna: JMLR, 2020. 8707–8718
Long P, Liu W, Pan J. Deep-learned collision avoidance policy for distributed multiagent navigation. IEEE Robot Autom Lett, 2017, 2: 656–663
Long P, Fan T, Liao X, et al. Towards optimally decentralized multirobot collision avoidance via deep reinforcement learning. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA). Brisbane: IEEE, 2018. 6252–6259
Fan T, Long P, Liu W, et al. Distributed multi-robot collision avoidance via deep reinforcement learning for navigation in complex scenarios. Int J Robotics Res, 2020, 39: 856–892
Sutton R S, Barto A G. Reinforcement learning: An introduction. Cambridge, Massachusetts: MIT Press, 2018
Haarnoja T, Zhou A, Abbeel P, et al. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: Proceedings of International Conference on Machine Learning. Stockholm, 2018. 1861–1870
Van Hasselt H, Guez A, Silver D. Deep reinforcement learning with double Q-learning. In: Proceedings of the AAAI Conference on Artificial Intelligence. Phoenix: AAAI Press, 2016
Lillicrap T P, Hunt J J, Pritzel A, et al. Continuous control with deep reinforcement learning. arXiv: 1509.02971
Cai G R, Yang S M, Du J, et al. Convolution without multiplication: A general speed up strategy for CNNs. Sci China Tech Sci, 2021, 64: 2627–2639
Shi H, Shi L, Xu M, et al. End-to-end navigation strategy with deep reinforcement learning for mobile robots. IEEE Trans Ind Inf, 2019, 16: 2393–2402
Quan H, Li Y, Zhang Y. A novel mobile robot navigation method based on deep reinforcement learning. Int J Adv Robotic Syst, 2020, 17, doi: https://doi.org/10.1177/1729881420921672
Yu Y P, Liu J C, Wei C. Hawk and pigeon’s intelligence for UAV swarm dynamic combat game via competitive learning pigeon-inspired optimization. Sci China Tech Sci, 2022, 65: 1072–1086
Bai T T, Wang D B, Masood R J. Formation control of quad-rotor UAV via PIO. Sci China Tech Sci, 2022, 65: 432–439
Wang Q S, Zhuang H, Duan Z S, et al. Robust control of uncertain robotic systems: An adaptive friction compensation approach. Sci China Tech Sci, 2021, 64: 1228–1237
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was supported by the National Natural Science Foundation of China (Grant Nos. 62225305 and 12072088), the Fundamental Research Funds for the Central Universities, China (Grant Nos. HIT.OCEF.2022047, HIT.BRET.2022004 and HIT.DZIJ.2023049), the Grant JCKY2022603C016, State Key Laboratory of Robotics and System (HIT), and the Heilongjiang Touyan Team.
Rights and permissions
About this article
Cite this article
Zhang, R., Yang, J., Liang, Y. et al. Navigation for autonomous vehicles via fast-stable and smooth reinforcement learning. Sci. China Technol. Sci. 67, 423–434 (2024). https://doi.org/10.1007/s11431-023-2483-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11431-023-2483-x