Robust reinforcement learning with UUB guarantee for safe motion control of autonomous robots

Zhang, RuiXian; Han, YiNing; Su, Man; Lin, ZeFeng; Li, HaoWei; Zhang, LiXian

doi:10.1007/s11431-023-2435-3

Robust reinforcement learning with UUB guarantee for safe motion control of autonomous robots

Article
Published: 18 December 2023

Volume 67, pages 172–182, (2024)
Cite this article

Science China Technological Sciences Aims and scope Submit manuscript

RuiXian Zhang¹,
YiNing Han²,
Man Su³,
ZeFeng Lin¹,
HaoWei Li¹ &
…
LiXian Zhang¹

89 Accesses
Explore all metrics

Abstract

This paper addresses the issue of safety in reinforcement learning (RL) with disturbances and its application in the safety-constrained motion control of autonomous robots. To tackle this problem, a robust Lyapunov value function (rLVF) is proposed. The rLVF is obtained by introducing a data-based LVF under the worst-case disturbance of the observed state. Using the rLVF, a uniformly ultimate boundedness criterion is established. This criterion is desired to ensure that the cost function, which serves as a safety criterion, ultimately converges to a range via the policy to be designed. Moreover, to mitigate the drastic variation of the rLVF caused by differences in states, a smoothing regularization of the rLVF is introduced. To train policies with safety guarantees under the worst disturbances of the observed states, an off-policy robust RL algorithm is proposed. The proposed algorithm is applied to motion control tasks of an autonomous vehicle and a cartpole, which involve external disturbances and variations of the model parameters, respectively. The experimental results demonstrate the effectiveness of the theoretical findings and the advantages of the proposed algorithm in terms of robustness and safety.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Justesen N, Bontrager P, Togelius J, et al. Deep learning for video game playing. IEEE Trans Games, 2020, 12: 1–20
Article Google Scholar
Sharma A R, Kaushik P. Literature survey of statistical, deep and reinforcement learning in natural language processing. In: Proceedings of the 2017 International Conference on Computing, Communication and Automation (ICCCA). Greater Noida: IEEE, 2017. 350–354
Chapter Google Scholar
Dong X, Zhang J, Cheng L, et al. A policy gradient algorithm integrating long and short-term rewards for soft continuum arm control. Sci China Tech Sci, 2022, 65: 2409–2419
Article Google Scholar
Wang Q S, Zhuang H, Duan Z S, et al. Robust control of uncertain robotic systems: An adaptive friction compensation approach. Sci China Tech Sci, 2021, 64: 1228–1237
Article Google Scholar
Wu T, Zhu Y, Zhang L, et al. Unified terrestrial/aerial motion planning for hyTAQs via NMPC. IEEE Robot Autom Lett, 2023, 8: 1085–1092
Article Google Scholar
Bai T T, Wang D B, Masood R J. Formation control of quad-rotor UAV via PIO. Sci China Tech Sci, 2022, 65: 432–439
Article Google Scholar
Yu Y P, Liu J C, Wei C. Hawk and pigeons intelligence for UAV swarm dynamic combat game via competitive learning pigeon-inspired optimization. Sci China Tech Sci, 2022, 65: 1072–1086
Article Google Scholar
Vacaro J, Marques G, Oliveira B, et al. Sim-to-real in reinforcement learning for everyone. In: Proceedings of the 2019 Latin American Robotics Symposium (LARS), 2019 Brazilian Symposium on Robotics (SBR) and 2019 Workshop on Robotics in Education (WRE). Rio Grande: IEEE, 2019. 305–310
Chapter Google Scholar
Tan J, Zhang T, Coumans E, et al. Sim-to-real: Learning agile locomotion for quadruped robots. arXiv: 1804.10332
Mandlekar A, Zhu Y, Garg A, et al. Adversarially robust policy learning: Active construction of physically-plausible perturbations. In: Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Vancouver: IEEE, 2017. 3932–3939
Google Scholar
Achiam J, Held D, Tamar A, et al. Constrained policy optimization. In: Proceedings of the International Conference on Machine Learning (ICML). Sydney: JMLR 2017. 22–31
Google Scholar
Chow Y, Nachum O, Duenez-Guzman E, et al. A lyapunov-based approach to safe reinforcement learning. In: Proceedings of the Advances in Neural Information Processing Systems. Montreal, 2018
Zhang L, Zhang R, Wu T, et al. Safe reinforcement learning with stability guarantee for motion planning of autonomous vehicles. IEEE Trans Neural Netw Learn Syst, 2021, 32: 5435–5444
Article Google Scholar
Kahn G, Villaflor A, Pong V, et al. Uncertainty-aware reinforcement learning for collision avoidance. arXiv: 1702.01182
Lütjens B, Everett M, How J P. Safe reinforcement learning with model uncertainty estimates. In: Proceedings of the 2019 International Conference on Robotics and Automation (ICRA). Montreal: IEEE, 2019. 8662–8668
Chapter Google Scholar
Han M, Zhang L, Wang J, et al. Actor-critic reinforcement learning for control with stability guarantee. IEEE Robot Autom Lett, 2020, 5: 6217–6224
Article Google Scholar
Han M, Tian Y, Zhang L, et al. Reinforcement learning control of constrained dynamic systems with uniformly ultimate boundedness stability guarantee. Automatica, 2021, 129: 109689
Article MathSciNet Google Scholar
Abdullah M A, Ren H, Ammar H B, et al. Wasserstein robust reinforcement learning. arXiv: 1907.13196
Hou L, Pang L, Hong X, et al. Robust reinforcement learning with Wasserstein constraint. arXiv: 2006.00945
Pattanaik A, Tang Z, Liu S, et al. Robust deep reinforcement learning with adversarial attacks. arXiv: 1712.03632
Zhang H, Chen H, Xiao C, et al. Robust deep reinforcement learning against adversarial perturbations on state observations. In: Proceedings of the Advances in Neural Information Processing Systems. Beijing, 2020. 33: 21024–21037
Google Scholar
Lütjens B, Everett M, How J P. Certified adversarial robustness for deep reinforcement learning. In: Proceedings of the Conference on Robot Learning. Cambridge, 2020. 1328–1337
Shen Q, Li Y, Jiang H, et al. Deep reinforcement learning with robust and smooth policy. In: Proceedings of the International Conference on Machine Learning (ICML). Vienna: JMLR, 2020. 8707–8718
Google Scholar
Yang R, Bai C, Ma X, et al. RORL: Robust offline reinforcement learning via conservative smoothing. arXiv: 2206.02829
Han M, Tian Y, Zhang L, et al. H_∞ model-free reinforcement learning with robust stability guarantee. In: Proceedings of the Advances in Neural Information Processing Systems. Rome, 2019
Tessler C, Efroni Y, Mannor S. Action robust reinforcement learning and applications in continuous control. In: Proceedings of the International Conference on Machine Learning (ICML). Long Beach, 2019. 6215–6224
Pinto L, Davidson J, Sukthankar R, et al. Robust adversarial reinforcement learning. In: Proceedings of the International Conference on Machine Learning (ICML). Sydney, 2017. 2817–2826
Pan X, Seita D, Gao Y, et al. Risk averse robust adversarial reinforcement learning. In: Proceedings of the 2019 International Conference on Robotics and Automation (ICRA). Montreal: IEEE, 2019. 8522–8528
Chapter Google Scholar
Rajeswaran A, Ghotra S, Ravindran B, et al. Epopt: Learning robust neural network policies using model ensembles. arXiv: 1610.01283
Van Hasselt H, Guez A, Silver D. Deep reinforcement learning with double Q-learning. In: Proceedings of the AAAI Conference on Artificial Intelligence. Phoenix: AAAI, 2016. 30: 2094–2100
Google Scholar
Cai G R, Yang S M, Du J, et al. Convolution without multiplication: A general speed up strategy for CNNs. Sci China Tech Sci, 2021, 64: 2627–2639
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Astronautics, Harbin Institute of Technology, Harbin, 150001, China
RuiXian Zhang, ZeFeng Lin, HaoWei Li & LiXian Zhang
School of Management, Harbin Institute of Technology, Harbin, 150001, China
YiNing Han
Beijing Institute of Tracking and Telecommunication Technology, Beijing, 100094, China
Man Su

Authors

RuiXian Zhang
View author publications
You can also search for this author in PubMed Google Scholar
YiNing Han
View author publications
You can also search for this author in PubMed Google Scholar
Man Su
View author publications
You can also search for this author in PubMed Google Scholar
ZeFeng Lin
View author publications
You can also search for this author in PubMed Google Scholar
HaoWei Li
View author publications
You can also search for this author in PubMed Google Scholar
LiXian Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to LiXian Zhang.

Additional information

This work was supported by the National Natural Science Foundation of China (Grant Nos. 62225305 and 12072088), the Fundamental Research Funds for the Central Universities, China (Grant Nos. HIT.BRET.2022004, HIT.OCEF.2022047, and HIT.DZIJ.2023049), the Grant JCKY2022603C016, State Key Laboratory of Robotics and System (HIT), and the Heilongjiang Touyan Team.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, R., Han, Y., Su, M. et al. Robust reinforcement learning with UUB guarantee for safe motion control of autonomous robots. Sci. China Technol. Sci. 67, 172–182 (2024). https://doi.org/10.1007/s11431-023-2435-3

Download citation

Received: 14 March 2023
Accepted: 17 May 2023
Published: 18 December 2023
Issue Date: January 2024
DOI: https://doi.org/10.1007/s11431-023-2435-3

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust reinforcement learning with UUB guarantee for safe motion control of autonomous robots

Abstract

Access this article

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation