Approximate dynamic programming for two-player zero-sum game related to H ∞ control of unknown nonlinear continuous-time systems

Yasini, Sholeh; Sistani, Mohammad Bagher Naghibi; Karimpour, Ali

doi:10.1007/s12555-014-0085-5

Approximate dynamic programming for two-player zero-sum game related to H _∞ control of unknown nonlinear continuous-time systems

Regular Paper
Control Theory
Published: 18 December 2014

Volume 13, pages 99–109, (2015)
Cite this article

International Journal of Control, Automation and Systems Aims and scope Submit manuscript

Sholeh Yasini¹,
Mohammad Bagher Naghibi Sistani¹ &
Ali Karimpour¹

436 Accesses
28 Citations
Explore all metrics

Abstract

This paper develops a concurrent learning-based approximate dynamic programming (ADP) algorithm for solving the two-player zero-sum (ZS) game arising in H _∞ control of continuous-time (CT) systems with unknown nonlinear dynamics. First, the H _∞ control is formulated as a ZS game and then, an online algorithm is developed that learns the solution to the Hamilton-Jacobi-Isaacs (HJI) equation without using any knowledge on the system dynamics. This is achieved by using a neural network (NN) identifier to approximate the uncertain system dynamics. The algorithm is implemented on actor-critic-disturbance NN structure along with the NN identifier to approximate the optimal value function and the corresponding Nash solution of the game. All NNs are tuned at the same time. By using the idea of concurrent learning the need to check for the persistency of excitation condition is relaxed to simplified condition. The stability of the overall system is guaranteed and the convergence to the Nash solution of the game is shown. Simulation results show the effectiveness of the algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Finite-Horizon Near Optimal Design of Nonlinear Two-Player Zero-Sum Game in Presence of Completely Unknown Dynamics

Article 31 March 2015

Event-triggered adaptive dynamic programming for multi-player zero-sum games with unknown dynamics

Article 30 August 2020

A novel Z-function-based completely model-free reinforcement learning method to finite-horizon zero-sum game of nonlinear system

Article 09 January 2022

References

K. Zhou and J. C. Doyle, Essentials of Robust Control, Prentice-Hall,1997.
MATH Google Scholar
A. J. Van Der Schaft, “L ₂-gain analysis of nonlinear systems and nonlinear state-feedback H _∞ control,” IEEE Trans. on Automatic Control, vol. 37, no. 6, pp. 770–784, June 1992.
Article MATH Google Scholar
T. Basar and P. Bernhard, H _∞ Optimal Control and Related Minmax Design Problem, Boston, MA: Birkhäuser, 1995.
Google Scholar
R. Beard and T. McLain, “Successive Galerkin approximation algorithms for nonlinear optimal and robust control,” International Journal of Control, vol. 71, no. 5, pp. 717–743, 1998.
Article MATH MathSciNet Google Scholar
B. S. Kim and M. T. Lim, “Robust H _∞ control method for bilinear systems,” International Journal of Control, Automation, and Systems, vol. 1, no. 2, pp. 171–177, June 2003.
Google Scholar
S. L. Dai and J. Zhao, “Reliable H _∞ controller design for a class of uncertain linear systems with actuator failures,” International Journal of Control, Automation, and Systems, vol. 6, no. 6, pp. 954–959, December 2008.
MathSciNet Google Scholar
M. Abu-Khalaf, F. L. Lewis, and J. Huang, “Neurodynamic programming and zero-sum games for constrained control systems,” IEEE Trans. on Neural Networks, vol. 19, no. 7, pp. 1243–1252, July 2008.
Article Google Scholar
D. Shen and J. B. Cruz Jr, “Disturbance attenuation analysis of state feedback Nash strategy for two-player linear quadratic sequential games,” International Journal of Control, Automation, and Systems, vol. 7, no. 6, pp. 905–910, December 2009.
Article Google Scholar
T. Senthilkumar and P. Balasubramaniam, “Robust H _∞ control for nonlinear uncertain stochastic T-S fuzzy systems with time delays,” Applied Mathematics Letters, vol. 24, no. 12, pp. 1986–1994, December 2011.
Article MATH MathSciNet Google Scholar
P. Balasubramaniam and T. Senthilkumar, “Delay-dependent robust stabilization and H _∞ control for uncertain stochastic T-S fuzzy systems with discrete interval and distributed time-varying delays,” International Journal of Automation and Computing, vol. 10, no. 1, pp. 18–31, February 2013.
Article MathSciNet Google Scholar
P. J. Werbos, “Approximate dynamic programming for real-time control and neural modeling,” in Handbook of Intelligent Control, D. A. White and D. A. Sofge eds. Brentwood U.K.: Multiscience Press, 1992.
Google Scholar
D. P. Bertsekas and J. N. Tsitsiklis, Neuro-dynamic Programming, Athena Scientific, MA, 1996.
MATH Google Scholar
F. L. Lewis, D. Vrabie, and K. Vamvoudakis, “Reinforcement learning and feedback control: using natural decision methods to design optimal adaptive controllers,” IEEE Control Systems Magazine, vol. 32, no. 6, pp. 76–105, December 2012.
Article MathSciNet Google Scholar
M. Sharma and A. Verma, “Wavelet reduced order observer based adaptive tracking control for a class of uncertain nonlinear systems using reinforcement learning,” International Journal of Control, Automation, and Systems, vol. 11, no. 3, pp. 496–502, June 2013.
Article Google Scholar
D. Vrabie and F. L. Lewis, “Neural network approach to continuous time direct adaptive optimal control for partially unknown nonlinear systems,” Neural Networks, vol. 22, no. 3, pp. 237–246, April 2009.
Article Google Scholar
K. Vamvoudakis and F. L. Lewis, “Online actorcritic algorithm to solve the continuous infinite time horizon optimal control problem,” Automatica, vol. 46, no. 5, pp. 878–888, May 2010.
Article MATH MathSciNet Google Scholar
H. Modares, M. B. Naghibi Sistani, and F. L. Lewis, “A policy iteration approach to online optimal control of continuous-time constrained-input systems,” ISA Transactions, vol. 52, no. 5, pp. 611–621, September 2013.
Article Google Scholar
S. Bhasin, R. Kamalapurkar, M. Johnson, K. Vamvoudakis, F. L. Lewis, and D. Dixon, “A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems,” Automatica, vol. 59, no. 1, pp. 82–92, January 2013.
Article MathSciNet Google Scholar
D. Vrabie and F. L. Lewis, “Adaptive dynamic programming for online solution of a zero-sum differential game,” Journal of Control Theory and Applications, vol. 9, no. 3, pp. 353–360, July 2011.
Article MATH MathSciNet Google Scholar
K. Vamvoudakis and F. L. Lewis, “Online solution of nonlinear two-player zero-sum games using synchronous policy iteration,” International Journal of Robust and Nonlinear Control, vol. 22, no. 13, pp. 1460–1483, September 2012.
Article MATH MathSciNet Google Scholar
H. Modares, F. L. Lewis, and M. B. Naghibi Sistani, “Online solution of nonquadratic two-player zero-sum games arising in the H _∞ control of constrained input systems,” International Journal of Adaptive Control and Signal Processing, vol. 28, no. 3–5, pp. 232–254, March-May 2014.
Article MathSciNet Google Scholar
T. Dierks and S. Jagannathan, “Optimal control of affine nonlinear continuous-time systems using an online Hamilton-Jacobi-Isaacs formulation,” Proc. of the 49th Conf. Decision and Control, pp. 3048–3053, 2010.
Google Scholar
H. Wu and B. Luo, “Neural network based online simultaneous policy update algorithm for solving the HJI equation in nonlinear H _∞ control,” IEEE Trans. on Neural Networks and Learning Systems, vol. 23, no. 12, pp. 1884–1895, December 2012.
Article MathSciNet Google Scholar
M Johnson, S. Bhasin, and D. Dixon, “Nonlinear two-player zero-sum game approximate solution using a policy iteration algorithm,” Proc. of the Conf. Decision and Control, pp. 142–147, 2011.
Google Scholar
H. Modares, F. L. Lewis, and M. B. Naghibi Sistani, “Adaptive optimal control of unknown constrained-input systems using policy iteration and neural networks,” IEEE Trans. on Neural Networks and Learning Systems, vol. 24, no. 10, pp. 1513–1525, October 2013.
Article Google Scholar
R. Kamalapurkar, P. Walters, and W. Dixon, “Concurrent learning-based approximate optimal regulation,” Proc. of the Conf. Decision and Control, pp. 6256–6261, 2013.
Google Scholar
H. Modares, F. L. Lewis, M. B. Naghibi Sistani, G. Chowdhary, and T. Yucelen, “Adaptive optimal control for the partially-unknown constrained-input using policy iteration with experience replay,” Proc. of the Conf. AIAA Guidance, Navigation, and Control, pp. 1–11, 2013.
Google Scholar
H. Modares, F. L. Lewis, and M. B. Naghibi Sistani, “Integral reinforcement learning and experience replay for adaptive optimal control of partially- unknown constrained-input continuous-time systems,” Automatica, vol. 50, no. 1, pp. 193–202, January 2014.
Article MATH MathSciNet Google Scholar
F. L. Lewis, D. Vrabie, and V. Syrmos, Optimal Control, 3rd edition, Wiley, 2012.
Book MATH Google Scholar
K. Hornik, M. Stinchcombe, and H. White, “Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks,” Neural Networks, vol. 3, no. 5, pp. 551–560, 1990.
Article Google Scholar
F. L. Lewis, S. Jagannathan, and A. Yesildirek, Neural Network Control of Robot Manipulators and Nonlinear Systems, Taylor & Francis, New York, NY, USA, 1999.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering, Ferdowsi University of Mashhad, 91755-1111, Mashhad, Iran
Sholeh Yasini, Mohammad Bagher Naghibi Sistani & Ali Karimpour

Authors

Sholeh Yasini
View author publications
You can also search for this author in PubMed Google Scholar
Mohammad Bagher Naghibi Sistani
View author publications
You can also search for this author in PubMed Google Scholar
Ali Karimpour
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohammad Bagher Naghibi Sistani.

Additional information

Sholeh Yasini received her B.Sc. degree from University of Sistan and Balouchestan in 2003 and her M.Sc. degree from Ferdowsi University of Mashhad in 2008. She is currently working towards the Ph.D. degree at Ferdowsi University of Mashhad. Her research interests include optimal control, robust control, reinforcement learning, adaptive dynamic programming and system identification.

Mohammad Bagher Naghibi Sistani received his B.Sc. and M.Sc. degrees in Control Engineering with honors from University of Tehran, Tehran, Iran, in 1991 and 1995, respectively, and the Ph.D. degree from the Department of Electrical Engineering at Ferdowsi University of Mashhad in 2005. His research interests are reinforcement learning, adaptive dynamic programming, optimization and soft computing. He is now Assistant professor at Ferdowsi University of Mashhad, Mashhad, Iran.

Ali Karimpour received his M.Sc. and Ph.D. degrees in Electrical Engineering from the Department of Electrical Engineering at Ferdowsi University of Mashhad, Mashhad, Iran, in 1990 and 2003, respectively. His research interests are multivariable and robust control, power system dynamics, electricity market issues and system identification. He is now an associate professor at Ferdowsi University of Mashhad, Mashhad, Iran.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yasini, S., Sistani, M.B.N. & Karimpour, A. Approximate dynamic programming for two-player zero-sum game related to H _∞ control of unknown nonlinear continuous-time systems. Int. J. Control Autom. Syst. 13, 99–109 (2015). https://doi.org/10.1007/s12555-014-0085-5

Download citation

Received: 17 February 2014
Revised: 01 June 2014
Accepted: 23 June 2014
Published: 18 December 2014
Issue Date: February 2015
DOI: https://doi.org/10.1007/s12555-014-0085-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Approximate dynamic programming for two-player zero-sum game related to H ∞ control of unknown nonlinear continuous-time systems

Abstract

Access this article

Similar content being viewed by others

Finite-Horizon Near Optimal Design of Nonlinear Two-Player Zero-Sum Game in Presence of Completely Unknown Dynamics

Event-triggered adaptive dynamic programming for multi-player zero-sum games with unknown dynamics

A novel Z-function-based completely model-free reinforcement learning method to finite-horizon zero-sum game of nonlinear system

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation

Approximate dynamic programming for two-player zero-sum game related to H _∞ control of unknown nonlinear continuous-time systems