Abstract
This paper develops a concurrent learning-based approximate dynamic programming (ADP) algorithm for solving the two-player zero-sum (ZS) game arising in H ∞ control of continuous-time (CT) systems with unknown nonlinear dynamics. First, the H ∞ control is formulated as a ZS game and then, an online algorithm is developed that learns the solution to the Hamilton-Jacobi-Isaacs (HJI) equation without using any knowledge on the system dynamics. This is achieved by using a neural network (NN) identifier to approximate the uncertain system dynamics. The algorithm is implemented on actor-critic-disturbance NN structure along with the NN identifier to approximate the optimal value function and the corresponding Nash solution of the game. All NNs are tuned at the same time. By using the idea of concurrent learning the need to check for the persistency of excitation condition is relaxed to simplified condition. The stability of the overall system is guaranteed and the convergence to the Nash solution of the game is shown. Simulation results show the effectiveness of the algorithm.
Similar content being viewed by others
References
K. Zhou and J. C. Doyle, Essentials of Robust Control, Prentice-Hall,1997.
A. J. Van Der Schaft, “L 2-gain analysis of nonlinear systems and nonlinear state-feedback H ∞ control,” IEEE Trans. on Automatic Control, vol. 37, no. 6, pp. 770–784, June 1992.
T. Basar and P. Bernhard, H ∞ Optimal Control and Related Minmax Design Problem, Boston, MA: Birkhäuser, 1995.
R. Beard and T. McLain, “Successive Galerkin approximation algorithms for nonlinear optimal and robust control,” International Journal of Control, vol. 71, no. 5, pp. 717–743, 1998.
B. S. Kim and M. T. Lim, “Robust H ∞ control method for bilinear systems,” International Journal of Control, Automation, and Systems, vol. 1, no. 2, pp. 171–177, June 2003.
S. L. Dai and J. Zhao, “Reliable H ∞ controller design for a class of uncertain linear systems with actuator failures,” International Journal of Control, Automation, and Systems, vol. 6, no. 6, pp. 954–959, December 2008.
M. Abu-Khalaf, F. L. Lewis, and J. Huang, “Neurodynamic programming and zero-sum games for constrained control systems,” IEEE Trans. on Neural Networks, vol. 19, no. 7, pp. 1243–1252, July 2008.
D. Shen and J. B. Cruz Jr, “Disturbance attenuation analysis of state feedback Nash strategy for two-player linear quadratic sequential games,” International Journal of Control, Automation, and Systems, vol. 7, no. 6, pp. 905–910, December 2009.
T. Senthilkumar and P. Balasubramaniam, “Robust H ∞ control for nonlinear uncertain stochastic T-S fuzzy systems with time delays,” Applied Mathematics Letters, vol. 24, no. 12, pp. 1986–1994, December 2011.
P. Balasubramaniam and T. Senthilkumar, “Delay-dependent robust stabilization and H ∞ control for uncertain stochastic T-S fuzzy systems with discrete interval and distributed time-varying delays,” International Journal of Automation and Computing, vol. 10, no. 1, pp. 18–31, February 2013.
P. J. Werbos, “Approximate dynamic programming for real-time control and neural modeling,” in Handbook of Intelligent Control, D. A. White and D. A. Sofge eds. Brentwood U.K.: Multiscience Press, 1992.
D. P. Bertsekas and J. N. Tsitsiklis, Neuro-dynamic Programming, Athena Scientific, MA, 1996.
F. L. Lewis, D. Vrabie, and K. Vamvoudakis, “Reinforcement learning and feedback control: using natural decision methods to design optimal adaptive controllers,” IEEE Control Systems Magazine, vol. 32, no. 6, pp. 76–105, December 2012.
M. Sharma and A. Verma, “Wavelet reduced order observer based adaptive tracking control for a class of uncertain nonlinear systems using reinforcement learning,” International Journal of Control, Automation, and Systems, vol. 11, no. 3, pp. 496–502, June 2013.
D. Vrabie and F. L. Lewis, “Neural network approach to continuous time direct adaptive optimal control for partially unknown nonlinear systems,” Neural Networks, vol. 22, no. 3, pp. 237–246, April 2009.
K. Vamvoudakis and F. L. Lewis, “Online actorcritic algorithm to solve the continuous infinite time horizon optimal control problem,” Automatica, vol. 46, no. 5, pp. 878–888, May 2010.
H. Modares, M. B. Naghibi Sistani, and F. L. Lewis, “A policy iteration approach to online optimal control of continuous-time constrained-input systems,” ISA Transactions, vol. 52, no. 5, pp. 611–621, September 2013.
S. Bhasin, R. Kamalapurkar, M. Johnson, K. Vamvoudakis, F. L. Lewis, and D. Dixon, “A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems,” Automatica, vol. 59, no. 1, pp. 82–92, January 2013.
D. Vrabie and F. L. Lewis, “Adaptive dynamic programming for online solution of a zero-sum differential game,” Journal of Control Theory and Applications, vol. 9, no. 3, pp. 353–360, July 2011.
K. Vamvoudakis and F. L. Lewis, “Online solution of nonlinear two-player zero-sum games using synchronous policy iteration,” International Journal of Robust and Nonlinear Control, vol. 22, no. 13, pp. 1460–1483, September 2012.
H. Modares, F. L. Lewis, and M. B. Naghibi Sistani, “Online solution of nonquadratic two-player zero-sum games arising in the H ∞ control of constrained input systems,” International Journal of Adaptive Control and Signal Processing, vol. 28, no. 3–5, pp. 232–254, March-May 2014.
T. Dierks and S. Jagannathan, “Optimal control of affine nonlinear continuous-time systems using an online Hamilton-Jacobi-Isaacs formulation,” Proc. of the 49th Conf. Decision and Control, pp. 3048–3053, 2010.
H. Wu and B. Luo, “Neural network based online simultaneous policy update algorithm for solving the HJI equation in nonlinear H ∞ control,” IEEE Trans. on Neural Networks and Learning Systems, vol. 23, no. 12, pp. 1884–1895, December 2012.
M Johnson, S. Bhasin, and D. Dixon, “Nonlinear two-player zero-sum game approximate solution using a policy iteration algorithm,” Proc. of the Conf. Decision and Control, pp. 142–147, 2011.
H. Modares, F. L. Lewis, and M. B. Naghibi Sistani, “Adaptive optimal control of unknown constrained-input systems using policy iteration and neural networks,” IEEE Trans. on Neural Networks and Learning Systems, vol. 24, no. 10, pp. 1513–1525, October 2013.
R. Kamalapurkar, P. Walters, and W. Dixon, “Concurrent learning-based approximate optimal regulation,” Proc. of the Conf. Decision and Control, pp. 6256–6261, 2013.
H. Modares, F. L. Lewis, M. B. Naghibi Sistani, G. Chowdhary, and T. Yucelen, “Adaptive optimal control for the partially-unknown constrained-input using policy iteration with experience replay,” Proc. of the Conf. AIAA Guidance, Navigation, and Control, pp. 1–11, 2013.
H. Modares, F. L. Lewis, and M. B. Naghibi Sistani, “Integral reinforcement learning and experience replay for adaptive optimal control of partially- unknown constrained-input continuous-time systems,” Automatica, vol. 50, no. 1, pp. 193–202, January 2014.
F. L. Lewis, D. Vrabie, and V. Syrmos, Optimal Control, 3rd edition, Wiley, 2012.
K. Hornik, M. Stinchcombe, and H. White, “Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks,” Neural Networks, vol. 3, no. 5, pp. 551–560, 1990.
F. L. Lewis, S. Jagannathan, and A. Yesildirek, Neural Network Control of Robot Manipulators and Nonlinear Systems, Taylor & Francis, New York, NY, USA, 1999.
Author information
Authors and Affiliations
Corresponding author
Additional information
Sholeh Yasini received her B.Sc. degree from University of Sistan and Balouchestan in 2003 and her M.Sc. degree from Ferdowsi University of Mashhad in 2008. She is currently working towards the Ph.D. degree at Ferdowsi University of Mashhad. Her research interests include optimal control, robust control, reinforcement learning, adaptive dynamic programming and system identification.
Mohammad Bagher Naghibi Sistani received his B.Sc. and M.Sc. degrees in Control Engineering with honors from University of Tehran, Tehran, Iran, in 1991 and 1995, respectively, and the Ph.D. degree from the Department of Electrical Engineering at Ferdowsi University of Mashhad in 2005. His research interests are reinforcement learning, adaptive dynamic programming, optimization and soft computing. He is now Assistant professor at Ferdowsi University of Mashhad, Mashhad, Iran.
Ali Karimpour received his M.Sc. and Ph.D. degrees in Electrical Engineering from the Department of Electrical Engineering at Ferdowsi University of Mashhad, Mashhad, Iran, in 1990 and 2003, respectively. His research interests are multivariable and robust control, power system dynamics, electricity market issues and system identification. He is now an associate professor at Ferdowsi University of Mashhad, Mashhad, Iran.
Rights and permissions
About this article
Cite this article
Yasini, S., Sistani, M.B.N. & Karimpour, A. Approximate dynamic programming for two-player zero-sum game related to H ∞ control of unknown nonlinear continuous-time systems. Int. J. Control Autom. Syst. 13, 99–109 (2015). https://doi.org/10.1007/s12555-014-0085-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12555-014-0085-5