Skip to main content
Log in

Robust reinforcement learning with UUB guarantee for safe motion control of autonomous robots

  • Article
  • Published:
Science China Technological Sciences Aims and scope Submit manuscript

Abstract

This paper addresses the issue of safety in reinforcement learning (RL) with disturbances and its application in the safety-constrained motion control of autonomous robots. To tackle this problem, a robust Lyapunov value function (rLVF) is proposed. The rLVF is obtained by introducing a data-based LVF under the worst-case disturbance of the observed state. Using the rLVF, a uniformly ultimate boundedness criterion is established. This criterion is desired to ensure that the cost function, which serves as a safety criterion, ultimately converges to a range via the policy to be designed. Moreover, to mitigate the drastic variation of the rLVF caused by differences in states, a smoothing regularization of the rLVF is introduced. To train policies with safety guarantees under the worst disturbances of the observed states, an off-policy robust RL algorithm is proposed. The proposed algorithm is applied to motion control tasks of an autonomous vehicle and a cartpole, which involve external disturbances and variations of the model parameters, respectively. The experimental results demonstrate the effectiveness of the theoretical findings and the advantages of the proposed algorithm in terms of robustness and safety.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

References

  1. Justesen N, Bontrager P, Togelius J, et al. Deep learning for video game playing. IEEE Trans Games, 2020, 12: 1–20

    Article  Google Scholar 

  2. Sharma A R, Kaushik P. Literature survey of statistical, deep and reinforcement learning in natural language processing. In: Proceedings of the 2017 International Conference on Computing, Communication and Automation (ICCCA). Greater Noida: IEEE, 2017. 350–354

    Chapter  Google Scholar 

  3. Dong X, Zhang J, Cheng L, et al. A policy gradient algorithm integrating long and short-term rewards for soft continuum arm control. Sci China Tech Sci, 2022, 65: 2409–2419

    Article  Google Scholar 

  4. Wang Q S, Zhuang H, Duan Z S, et al. Robust control of uncertain robotic systems: An adaptive friction compensation approach. Sci China Tech Sci, 2021, 64: 1228–1237

    Article  Google Scholar 

  5. Wu T, Zhu Y, Zhang L, et al. Unified terrestrial/aerial motion planning for hyTAQs via NMPC. IEEE Robot Autom Lett, 2023, 8: 1085–1092

    Article  Google Scholar 

  6. Bai T T, Wang D B, Masood R J. Formation control of quad-rotor UAV via PIO. Sci China Tech Sci, 2022, 65: 432–439

    Article  Google Scholar 

  7. Yu Y P, Liu J C, Wei C. Hawk and pigeons intelligence for UAV swarm dynamic combat game via competitive learning pigeon-inspired optimization. Sci China Tech Sci, 2022, 65: 1072–1086

    Article  Google Scholar 

  8. Vacaro J, Marques G, Oliveira B, et al. Sim-to-real in reinforcement learning for everyone. In: Proceedings of the 2019 Latin American Robotics Symposium (LARS), 2019 Brazilian Symposium on Robotics (SBR) and 2019 Workshop on Robotics in Education (WRE). Rio Grande: IEEE, 2019. 305–310

    Chapter  Google Scholar 

  9. Tan J, Zhang T, Coumans E, et al. Sim-to-real: Learning agile locomotion for quadruped robots. arXiv: 1804.10332

  10. Mandlekar A, Zhu Y, Garg A, et al. Adversarially robust policy learning: Active construction of physically-plausible perturbations. In: Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Vancouver: IEEE, 2017. 3932–3939

    Google Scholar 

  11. Achiam J, Held D, Tamar A, et al. Constrained policy optimization. In: Proceedings of the International Conference on Machine Learning (ICML). Sydney: JMLR 2017. 22–31

    Google Scholar 

  12. Chow Y, Nachum O, Duenez-Guzman E, et al. A lyapunov-based approach to safe reinforcement learning. In: Proceedings of the Advances in Neural Information Processing Systems. Montreal, 2018

  13. Zhang L, Zhang R, Wu T, et al. Safe reinforcement learning with stability guarantee for motion planning of autonomous vehicles. IEEE Trans Neural Netw Learn Syst, 2021, 32: 5435–5444

    Article  Google Scholar 

  14. Kahn G, Villaflor A, Pong V, et al. Uncertainty-aware reinforcement learning for collision avoidance. arXiv: 1702.01182

  15. Lütjens B, Everett M, How J P. Safe reinforcement learning with model uncertainty estimates. In: Proceedings of the 2019 International Conference on Robotics and Automation (ICRA). Montreal: IEEE, 2019. 8662–8668

    Chapter  Google Scholar 

  16. Han M, Zhang L, Wang J, et al. Actor-critic reinforcement learning for control with stability guarantee. IEEE Robot Autom Lett, 2020, 5: 6217–6224

    Article  Google Scholar 

  17. Han M, Tian Y, Zhang L, et al. Reinforcement learning control of constrained dynamic systems with uniformly ultimate boundedness stability guarantee. Automatica, 2021, 129: 109689

    Article  MathSciNet  Google Scholar 

  18. Abdullah M A, Ren H, Ammar H B, et al. Wasserstein robust reinforcement learning. arXiv: 1907.13196

  19. Hou L, Pang L, Hong X, et al. Robust reinforcement learning with Wasserstein constraint. arXiv: 2006.00945

  20. Pattanaik A, Tang Z, Liu S, et al. Robust deep reinforcement learning with adversarial attacks. arXiv: 1712.03632

  21. Zhang H, Chen H, Xiao C, et al. Robust deep reinforcement learning against adversarial perturbations on state observations. In: Proceedings of the Advances in Neural Information Processing Systems. Beijing, 2020. 33: 21024–21037

    Google Scholar 

  22. Lütjens B, Everett M, How J P. Certified adversarial robustness for deep reinforcement learning. In: Proceedings of the Conference on Robot Learning. Cambridge, 2020. 1328–1337

  23. Shen Q, Li Y, Jiang H, et al. Deep reinforcement learning with robust and smooth policy. In: Proceedings of the International Conference on Machine Learning (ICML). Vienna: JMLR, 2020. 8707–8718

    Google Scholar 

  24. Yang R, Bai C, Ma X, et al. RORL: Robust offline reinforcement learning via conservative smoothing. arXiv: 2206.02829

  25. Han M, Tian Y, Zhang L, et al. H model-free reinforcement learning with robust stability guarantee. In: Proceedings of the Advances in Neural Information Processing Systems. Rome, 2019

  26. Tessler C, Efroni Y, Mannor S. Action robust reinforcement learning and applications in continuous control. In: Proceedings of the International Conference on Machine Learning (ICML). Long Beach, 2019. 6215–6224

  27. Pinto L, Davidson J, Sukthankar R, et al. Robust adversarial reinforcement learning. In: Proceedings of the International Conference on Machine Learning (ICML). Sydney, 2017. 2817–2826

  28. Pan X, Seita D, Gao Y, et al. Risk averse robust adversarial reinforcement learning. In: Proceedings of the 2019 International Conference on Robotics and Automation (ICRA). Montreal: IEEE, 2019. 8522–8528

    Chapter  Google Scholar 

  29. Rajeswaran A, Ghotra S, Ravindran B, et al. Epopt: Learning robust neural network policies using model ensembles. arXiv: 1610.01283

  30. Van Hasselt H, Guez A, Silver D. Deep reinforcement learning with double Q-learning. In: Proceedings of the AAAI Conference on Artificial Intelligence. Phoenix: AAAI, 2016. 30: 2094–2100

    Google Scholar 

  31. Cai G R, Yang S M, Du J, et al. Convolution without multiplication: A general speed up strategy for CNNs. Sci China Tech Sci, 2021, 64: 2627–2639

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to LiXian Zhang.

Additional information

This work was supported by the National Natural Science Foundation of China (Grant Nos. 62225305 and 12072088), the Fundamental Research Funds for the Central Universities, China (Grant Nos. HIT.BRET.2022004, HIT.OCEF.2022047, and HIT.DZIJ.2023049), the Grant JCKY2022603C016, State Key Laboratory of Robotics and System (HIT), and the Heilongjiang Touyan Team.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, R., Han, Y., Su, M. et al. Robust reinforcement learning with UUB guarantee for safe motion control of autonomous robots. Sci. China Technol. Sci. 67, 172–182 (2024). https://doi.org/10.1007/s11431-023-2435-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11431-023-2435-3

Navigation