Robust control design for zero-sum differential games problem based on off-policy reinforcement learning technique

Zhuang, Hongji; Zhu, Hongxu; Wu, Shufan; Wang, Xiaoliang; Mu, Zhongcheng; Shen, Qiang

doi:10.1007/s42401-023-00263-0

Robust control design for zero-sum differential games problem based on off-policy reinforcement learning technique

Original Paper
Published: 12 December 2023

(2023)
Cite this article

Aerospace Systems Aims and scope Submit manuscript

Hongji Zhuang¹,
Hongxu Zhu¹,
Shufan Wu¹,
Xiaoliang Wang¹,
Zhongcheng Mu¹ &
…
Qiang Shen ORCID: orcid.org/0000-0001-8428-2356¹

135 Accesses
Explore all metrics

Abstract

This paper aims to figure out the robust zero-sum differential game problem using an off-policy reinforcement learning technique. The robust system model is first established based on the nominal one. The control strategy is proposed with the asymptotic stability and optimality being strictly proved. The off-policy reinforcement learning technique is built from the Bellman equation to generate the control policy. A potentially inaccurate system dynamic model’s influence is avoided because the outcome is attained from the system data set obtained. It is the first-time application of the off-policy RL algorithm on this robust two-player zero-sum differential game problem. Additionally, the final algorithm’s convergence is demonstrated, and a simulation example is run to confirm its efficacy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Algorithm 2

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Introduction to Reinforcement Learning

Robust optimality conditions for semi-infinite equilibrium problems involving data uncertainty

Article 09 April 2024

References

Sun J, Liu C, Zhao X (2018) Backstepping-based zero-sum differential games for missile-target interception systems with input and output constraints. IET Control Theory Appl 12(2):243–253
Article MathSciNet Google Scholar
Gao Y, Liu C, Jiang S et al (2022) Zero-sum differential games-based fast adaptive robust optimal sliding mode control design for uncertain missile autopilot with constrained input. Int J Control 95(7):1789–1801
Article MathSciNet MATH Google Scholar
Bogosyan S, Gokasan M, Vamvoudakis KG (2022) Zero-Sum Game (ZSG) based integral reinforcement learning for trajectory tracking control of autonomous smart car. In: 2022 IEEE 31st international symposium on industrial electronics (ISIE), Anchorage, AK, USA, pp 1–4
Lee D, Keimer A, Bayen AM et al (2020) Hamilton–Jacobi formulation for state-constrained optimal control and zero-sum game problems. 2020 59th IEEE conference on decision and control (CDC). pp 1078–1085
Zhuang H, Ding D, Chen H et al (2022) Effectiveness of reaction control system in hypersonic rarefied reactive flow. J Spacecr Rockets 59(3):717–727
Article Google Scholar
Van Brummelen J, O’Brien M, Gruyer D et al (2018) Autonomous vehicle perception: the technology of today and tomorrow. Transp Res Part C Emerg Technol 89:384–406
Article Google Scholar
Vamvoudakis KG, Lewis FL (2012) Online solution of nonlinear two-player zero-sum games using synchronous policy iteration. Int J Robust Nonlinear Control 22(13):1460–1483
Article MathSciNet MATH Google Scholar
Modares H, Lewis FL, Jiang ZP (2015) \({H} _ {{\infty }} \) tracking control of completely unknown continuous-time systems via off-policy reinforcement learning. IEEE Trans Neural Netw Learn Syst 26(10):2550–2562
Article MathSciNet Google Scholar
Xue S, Luo B, Liu D et al (2020) Constrained event-triggered \(H_{\infty }\) control based on adaptive dynamic programming with concurrent learning. IEEE Trans Syst Man Cybern Syst 52(1):357–369
Article Google Scholar
Luo B, Wu HN, Huang T (2014) Off-policy reinforcement learning for \( H_{\infty } \) control design. IEEE Trans Cybern 45(1):65–76
Article Google Scholar
Yang D, Li T, Zhang H et al (2019) Event-trigger-based robust control for nonlinear constrained-input systems using reinforcement learning method. Neurocomputing 340:158–170
Article Google Scholar
Kiumarsi B, Lewis FL, Jiang ZP (2017) \(H_{\infty }\) control of linear discrete-time systems: off-policy reinforcement learning. Automatica 78:144–152
Article MathSciNet MATH Google Scholar
Duan D, Liu C, Sun J (2020) Adaptive periodic event-triggered control for missile-target interception system with finite-horizon convergence. Trans Inst Meas Control 42(10):1808–1822
Article Google Scholar
Sun J, Liu C (2018) Finite-horizon differential games for missile-target interception system using adaptive dynamic programming with input constraints. Int J Syst Sci 49(2):264–283
Article MathSciNet MATH Google Scholar
Ji Y, Zhou H, Bai B (2018) Event-driven-modular adaptive backstepping optimal control for strict-feedback systems through zero-sum differential games. IEEE Access 8:126511–126522
Article Google Scholar
Sun J, Liu C (2018) Distributed zero-sum differential game for multi-agent systems in strict-feedback form with input saturation and output constraint. Neural Netw 106:8–19
Article MATH Google Scholar
Sun J, Liu C (2019) Decentralised zero-sum differential game for a class of large-scale interconnected systems via adaptive dynamic programming. Int J Control 92(12):2917–2927
Article MathSciNet MATH Google Scholar
Luo B, Wu HN, Huang T et al (2014) Data-based approximate policy iteration for affine nonlinear continuous-time optimal control design. Automatica 50(12):3281–3290
Article MathSciNet MATH Google Scholar
Jiang Y, Jiang ZP (2012) Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics. Automatica 48(10):2699–2704
Article MathSciNet MATH Google Scholar
Wang D, Liu D, Li H et al (2014) Neural-network-based robust optimal control design for a class of uncertain nonlinear systems via adaptive dynamic programming. Inf Sci 282:167–179
Article MathSciNet MATH Google Scholar
Abu-Khalaf M, Lewis FL (2005) Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach. Automatica 41(5):779–791
Vamvoudakis KG, Lewis FL (2010) Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica 46(5):878–888
Vamvoudakis KG, Lewis FL (2014) Online adaptive algorithm for optimal control with integral reinforcement learning. Int J Robust Nonlinear Control 24(17):2686–2710
Vamvoudakis KG, Lewis FL (2014) Successive Galerkin approximation algorithms for nonlinear optimal and robust control. Int J Control 71(5):717–743
MathSciNet Google Scholar
Jeffreys H, Jeffreys B, Swirles B (1999) Methods of mathematical physics. Cambridge University Press, Cambridge
Lepage GP (1978) A new algorithm for adaptive multidimensional integration. J Comput Phys 27(2):192–203
Article MathSciNet MATH Google Scholar

Download references

Funding

The funding supported this work include National Natural Science Foundation of China under Grant 62103275, U20B2054, and U20B2056, Natural Science Foundation of Shanghai under grant 20ZR1427000, and Shanghai Sailing Program under grant 20YF1421600.

Author information

Authors and Affiliations

Shanghai Jiaotong University, Shanghai, 200245, China
Hongji Zhuang, Hongxu Zhu, Shufan Wu, Xiaoliang Wang, Zhongcheng Mu & Qiang Shen

Authors

Hongji Zhuang
View author publications
You can also search for this author in PubMed Google Scholar
Hongxu Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Shufan Wu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoliang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zhongcheng Mu
View author publications
You can also search for this author in PubMed Google Scholar
Qiang Shen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qiang Shen.

Ethics declarations

Conflict of interest

The authors have not disclosed any competing interests.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhuang, H., Zhu, H., Wu, S. et al. Robust control design for zero-sum differential games problem based on off-policy reinforcement learning technique. AS (2023). https://doi.org/10.1007/s42401-023-00263-0

Download citation

Received: 04 September 2023
Revised: 06 November 2023
Accepted: 10 November 2023
Published: 12 December 2023
DOI: https://doi.org/10.1007/s42401-023-00263-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust control design for zero-sum differential games problem based on off-policy reinforcement learning technique

Abstract

Access this article

Similar content being viewed by others

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Introduction to Reinforcement Learning

Robust optimality conditions for semi-infinite equilibrium problems involving data uncertainty

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Robust control design for zero-sum differential games problem based on off-policy reinforcement learning technique

Abstract

Access this article

Similar content being viewed by others

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Introduction to Reinforcement Learning

Robust optimality conditions for semi-infinite equilibrium problems involving data uncertainty

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation