Abstract
This paper addresses the issue of safety in reinforcement learning (RL) with disturbances and its application in the safety-constrained motion control of autonomous robots. To tackle this problem, a robust Lyapunov value function (rLVF) is proposed. The rLVF is obtained by introducing a data-based LVF under the worst-case disturbance of the observed state. Using the rLVF, a uniformly ultimate boundedness criterion is established. This criterion is desired to ensure that the cost function, which serves as a safety criterion, ultimately converges to a range via the policy to be designed. Moreover, to mitigate the drastic variation of the rLVF caused by differences in states, a smoothing regularization of the rLVF is introduced. To train policies with safety guarantees under the worst disturbances of the observed states, an off-policy robust RL algorithm is proposed. The proposed algorithm is applied to motion control tasks of an autonomous vehicle and a cartpole, which involve external disturbances and variations of the model parameters, respectively. The experimental results demonstrate the effectiveness of the theoretical findings and the advantages of the proposed algorithm in terms of robustness and safety.
References
Justesen N, Bontrager P, Togelius J, et al. Deep learning for video game playing. IEEE Trans Games, 2020, 12: 1–20
Sharma A R, Kaushik P. Literature survey of statistical, deep and reinforcement learning in natural language processing. In: Proceedings of the 2017 International Conference on Computing, Communication and Automation (ICCCA). Greater Noida: IEEE, 2017. 350–354
Dong X, Zhang J, Cheng L, et al. A policy gradient algorithm integrating long and short-term rewards for soft continuum arm control. Sci China Tech Sci, 2022, 65: 2409–2419
Wang Q S, Zhuang H, Duan Z S, et al. Robust control of uncertain robotic systems: An adaptive friction compensation approach. Sci China Tech Sci, 2021, 64: 1228–1237
Wu T, Zhu Y, Zhang L, et al. Unified terrestrial/aerial motion planning for hyTAQs via NMPC. IEEE Robot Autom Lett, 2023, 8: 1085–1092
Bai T T, Wang D B, Masood R J. Formation control of quad-rotor UAV via PIO. Sci China Tech Sci, 2022, 65: 432–439
Yu Y P, Liu J C, Wei C. Hawk and pigeons intelligence for UAV swarm dynamic combat game via competitive learning pigeon-inspired optimization. Sci China Tech Sci, 2022, 65: 1072–1086
Vacaro J, Marques G, Oliveira B, et al. Sim-to-real in reinforcement learning for everyone. In: Proceedings of the 2019 Latin American Robotics Symposium (LARS), 2019 Brazilian Symposium on Robotics (SBR) and 2019 Workshop on Robotics in Education (WRE). Rio Grande: IEEE, 2019. 305–310
Tan J, Zhang T, Coumans E, et al. Sim-to-real: Learning agile locomotion for quadruped robots. arXiv: 1804.10332
Mandlekar A, Zhu Y, Garg A, et al. Adversarially robust policy learning: Active construction of physically-plausible perturbations. In: Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Vancouver: IEEE, 2017. 3932–3939
Achiam J, Held D, Tamar A, et al. Constrained policy optimization. In: Proceedings of the International Conference on Machine Learning (ICML). Sydney: JMLR 2017. 22–31
Chow Y, Nachum O, Duenez-Guzman E, et al. A lyapunov-based approach to safe reinforcement learning. In: Proceedings of the Advances in Neural Information Processing Systems. Montreal, 2018
Zhang L, Zhang R, Wu T, et al. Safe reinforcement learning with stability guarantee for motion planning of autonomous vehicles. IEEE Trans Neural Netw Learn Syst, 2021, 32: 5435–5444
Kahn G, Villaflor A, Pong V, et al. Uncertainty-aware reinforcement learning for collision avoidance. arXiv: 1702.01182
Lütjens B, Everett M, How J P. Safe reinforcement learning with model uncertainty estimates. In: Proceedings of the 2019 International Conference on Robotics and Automation (ICRA). Montreal: IEEE, 2019. 8662–8668
Han M, Zhang L, Wang J, et al. Actor-critic reinforcement learning for control with stability guarantee. IEEE Robot Autom Lett, 2020, 5: 6217–6224
Han M, Tian Y, Zhang L, et al. Reinforcement learning control of constrained dynamic systems with uniformly ultimate boundedness stability guarantee. Automatica, 2021, 129: 109689
Abdullah M A, Ren H, Ammar H B, et al. Wasserstein robust reinforcement learning. arXiv: 1907.13196
Hou L, Pang L, Hong X, et al. Robust reinforcement learning with Wasserstein constraint. arXiv: 2006.00945
Pattanaik A, Tang Z, Liu S, et al. Robust deep reinforcement learning with adversarial attacks. arXiv: 1712.03632
Zhang H, Chen H, Xiao C, et al. Robust deep reinforcement learning against adversarial perturbations on state observations. In: Proceedings of the Advances in Neural Information Processing Systems. Beijing, 2020. 33: 21024–21037
Lütjens B, Everett M, How J P. Certified adversarial robustness for deep reinforcement learning. In: Proceedings of the Conference on Robot Learning. Cambridge, 2020. 1328–1337
Shen Q, Li Y, Jiang H, et al. Deep reinforcement learning with robust and smooth policy. In: Proceedings of the International Conference on Machine Learning (ICML). Vienna: JMLR, 2020. 8707–8718
Yang R, Bai C, Ma X, et al. RORL: Robust offline reinforcement learning via conservative smoothing. arXiv: 2206.02829
Han M, Tian Y, Zhang L, et al. H∞ model-free reinforcement learning with robust stability guarantee. In: Proceedings of the Advances in Neural Information Processing Systems. Rome, 2019
Tessler C, Efroni Y, Mannor S. Action robust reinforcement learning and applications in continuous control. In: Proceedings of the International Conference on Machine Learning (ICML). Long Beach, 2019. 6215–6224
Pinto L, Davidson J, Sukthankar R, et al. Robust adversarial reinforcement learning. In: Proceedings of the International Conference on Machine Learning (ICML). Sydney, 2017. 2817–2826
Pan X, Seita D, Gao Y, et al. Risk averse robust adversarial reinforcement learning. In: Proceedings of the 2019 International Conference on Robotics and Automation (ICRA). Montreal: IEEE, 2019. 8522–8528
Rajeswaran A, Ghotra S, Ravindran B, et al. Epopt: Learning robust neural network policies using model ensembles. arXiv: 1610.01283
Van Hasselt H, Guez A, Silver D. Deep reinforcement learning with double Q-learning. In: Proceedings of the AAAI Conference on Artificial Intelligence. Phoenix: AAAI, 2016. 30: 2094–2100
Cai G R, Yang S M, Du J, et al. Convolution without multiplication: A general speed up strategy for CNNs. Sci China Tech Sci, 2021, 64: 2627–2639
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was supported by the National Natural Science Foundation of China (Grant Nos. 62225305 and 12072088), the Fundamental Research Funds for the Central Universities, China (Grant Nos. HIT.BRET.2022004, HIT.OCEF.2022047, and HIT.DZIJ.2023049), the Grant JCKY2022603C016, State Key Laboratory of Robotics and System (HIT), and the Heilongjiang Touyan Team.
Rights and permissions
About this article
Cite this article
Zhang, R., Han, Y., Su, M. et al. Robust reinforcement learning with UUB guarantee for safe motion control of autonomous robots. Sci. China Technol. Sci. 67, 172–182 (2024). https://doi.org/10.1007/s11431-023-2435-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11431-023-2435-3