Skip to main content
Log in

Approximate dynamic programming for two-player zero-sum game related to H control of unknown nonlinear continuous-time systems

  • Regular Paper
  • Control Theory
  • Published:
International Journal of Control, Automation and Systems Aims and scope Submit manuscript

Abstract

This paper develops a concurrent learning-based approximate dynamic programming (ADP) algorithm for solving the two-player zero-sum (ZS) game arising in H control of continuous-time (CT) systems with unknown nonlinear dynamics. First, the H control is formulated as a ZS game and then, an online algorithm is developed that learns the solution to the Hamilton-Jacobi-Isaacs (HJI) equation without using any knowledge on the system dynamics. This is achieved by using a neural network (NN) identifier to approximate the uncertain system dynamics. The algorithm is implemented on actor-critic-disturbance NN structure along with the NN identifier to approximate the optimal value function and the corresponding Nash solution of the game. All NNs are tuned at the same time. By using the idea of concurrent learning the need to check for the persistency of excitation condition is relaxed to simplified condition. The stability of the overall system is guaranteed and the convergence to the Nash solution of the game is shown. Simulation results show the effectiveness of the algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. K. Zhou and J. C. Doyle, Essentials of Robust Control, Prentice-Hall,1997.

    MATH  Google Scholar 

  2. A. J. Van Der Schaft, “L 2-gain analysis of nonlinear systems and nonlinear state-feedback H control,” IEEE Trans. on Automatic Control, vol. 37, no. 6, pp. 770–784, June 1992.

    Article  MATH  Google Scholar 

  3. T. Basar and P. Bernhard, H Optimal Control and Related Minmax Design Problem, Boston, MA: Birkhäuser, 1995.

    Google Scholar 

  4. R. Beard and T. McLain, “Successive Galerkin approximation algorithms for nonlinear optimal and robust control,” International Journal of Control, vol. 71, no. 5, pp. 717–743, 1998.

    Article  MATH  MathSciNet  Google Scholar 

  5. B. S. Kim and M. T. Lim, “Robust H control method for bilinear systems,” International Journal of Control, Automation, and Systems, vol. 1, no. 2, pp. 171–177, June 2003.

    Google Scholar 

  6. S. L. Dai and J. Zhao, “Reliable H controller design for a class of uncertain linear systems with actuator failures,” International Journal of Control, Automation, and Systems, vol. 6, no. 6, pp. 954–959, December 2008.

    MathSciNet  Google Scholar 

  7. M. Abu-Khalaf, F. L. Lewis, and J. Huang, “Neurodynamic programming and zero-sum games for constrained control systems,” IEEE Trans. on Neural Networks, vol. 19, no. 7, pp. 1243–1252, July 2008.

    Article  Google Scholar 

  8. D. Shen and J. B. Cruz Jr, “Disturbance attenuation analysis of state feedback Nash strategy for two-player linear quadratic sequential games,” International Journal of Control, Automation, and Systems, vol. 7, no. 6, pp. 905–910, December 2009.

    Article  Google Scholar 

  9. T. Senthilkumar and P. Balasubramaniam, “Robust H control for nonlinear uncertain stochastic T-S fuzzy systems with time delays,” Applied Mathematics Letters, vol. 24, no. 12, pp. 1986–1994, December 2011.

    Article  MATH  MathSciNet  Google Scholar 

  10. P. Balasubramaniam and T. Senthilkumar, “Delay-dependent robust stabilization and H control for uncertain stochastic T-S fuzzy systems with discrete interval and distributed time-varying delays,” International Journal of Automation and Computing, vol. 10, no. 1, pp. 18–31, February 2013.

    Article  MathSciNet  Google Scholar 

  11. P. J. Werbos, “Approximate dynamic programming for real-time control and neural modeling,” in Handbook of Intelligent Control, D. A. White and D. A. Sofge eds. Brentwood U.K.: Multiscience Press, 1992.

    Google Scholar 

  12. D. P. Bertsekas and J. N. Tsitsiklis, Neuro-dynamic Programming, Athena Scientific, MA, 1996.

    MATH  Google Scholar 

  13. F. L. Lewis, D. Vrabie, and K. Vamvoudakis, “Reinforcement learning and feedback control: using natural decision methods to design optimal adaptive controllers,” IEEE Control Systems Magazine, vol. 32, no. 6, pp. 76–105, December 2012.

    Article  MathSciNet  Google Scholar 

  14. M. Sharma and A. Verma, “Wavelet reduced order observer based adaptive tracking control for a class of uncertain nonlinear systems using reinforcement learning,” International Journal of Control, Automation, and Systems, vol. 11, no. 3, pp. 496–502, June 2013.

    Article  Google Scholar 

  15. D. Vrabie and F. L. Lewis, “Neural network approach to continuous time direct adaptive optimal control for partially unknown nonlinear systems,” Neural Networks, vol. 22, no. 3, pp. 237–246, April 2009.

    Article  Google Scholar 

  16. K. Vamvoudakis and F. L. Lewis, “Online actorcritic algorithm to solve the continuous infinite time horizon optimal control problem,” Automatica, vol. 46, no. 5, pp. 878–888, May 2010.

    Article  MATH  MathSciNet  Google Scholar 

  17. H. Modares, M. B. Naghibi Sistani, and F. L. Lewis, “A policy iteration approach to online optimal control of continuous-time constrained-input systems,” ISA Transactions, vol. 52, no. 5, pp. 611–621, September 2013.

    Article  Google Scholar 

  18. S. Bhasin, R. Kamalapurkar, M. Johnson, K. Vamvoudakis, F. L. Lewis, and D. Dixon, “A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems,” Automatica, vol. 59, no. 1, pp. 82–92, January 2013.

    Article  MathSciNet  Google Scholar 

  19. D. Vrabie and F. L. Lewis, “Adaptive dynamic programming for online solution of a zero-sum differential game,” Journal of Control Theory and Applications, vol. 9, no. 3, pp. 353–360, July 2011.

    Article  MATH  MathSciNet  Google Scholar 

  20. K. Vamvoudakis and F. L. Lewis, “Online solution of nonlinear two-player zero-sum games using synchronous policy iteration,” International Journal of Robust and Nonlinear Control, vol. 22, no. 13, pp. 1460–1483, September 2012.

    Article  MATH  MathSciNet  Google Scholar 

  21. H. Modares, F. L. Lewis, and M. B. Naghibi Sistani, “Online solution of nonquadratic two-player zero-sum games arising in the H control of constrained input systems,” International Journal of Adaptive Control and Signal Processing, vol. 28, no. 3–5, pp. 232–254, March-May 2014.

    Article  MathSciNet  Google Scholar 

  22. T. Dierks and S. Jagannathan, “Optimal control of affine nonlinear continuous-time systems using an online Hamilton-Jacobi-Isaacs formulation,” Proc. of the 49th Conf. Decision and Control, pp. 3048–3053, 2010.

    Google Scholar 

  23. H. Wu and B. Luo, “Neural network based online simultaneous policy update algorithm for solving the HJI equation in nonlinear H control,” IEEE Trans. on Neural Networks and Learning Systems, vol. 23, no. 12, pp. 1884–1895, December 2012.

    Article  MathSciNet  Google Scholar 

  24. M Johnson, S. Bhasin, and D. Dixon, “Nonlinear two-player zero-sum game approximate solution using a policy iteration algorithm,” Proc. of the Conf. Decision and Control, pp. 142–147, 2011.

    Google Scholar 

  25. H. Modares, F. L. Lewis, and M. B. Naghibi Sistani, “Adaptive optimal control of unknown constrained-input systems using policy iteration and neural networks,” IEEE Trans. on Neural Networks and Learning Systems, vol. 24, no. 10, pp. 1513–1525, October 2013.

    Article  Google Scholar 

  26. R. Kamalapurkar, P. Walters, and W. Dixon, “Concurrent learning-based approximate optimal regulation,” Proc. of the Conf. Decision and Control, pp. 6256–6261, 2013.

    Google Scholar 

  27. H. Modares, F. L. Lewis, M. B. Naghibi Sistani, G. Chowdhary, and T. Yucelen, “Adaptive optimal control for the partially-unknown constrained-input using policy iteration with experience replay,” Proc. of the Conf. AIAA Guidance, Navigation, and Control, pp. 1–11, 2013.

    Google Scholar 

  28. H. Modares, F. L. Lewis, and M. B. Naghibi Sistani, “Integral reinforcement learning and experience replay for adaptive optimal control of partially- unknown constrained-input continuous-time systems,” Automatica, vol. 50, no. 1, pp. 193–202, January 2014.

    Article  MATH  MathSciNet  Google Scholar 

  29. F. L. Lewis, D. Vrabie, and V. Syrmos, Optimal Control, 3rd edition, Wiley, 2012.

    Book  MATH  Google Scholar 

  30. K. Hornik, M. Stinchcombe, and H. White, “Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks,” Neural Networks, vol. 3, no. 5, pp. 551–560, 1990.

    Article  Google Scholar 

  31. F. L. Lewis, S. Jagannathan, and A. Yesildirek, Neural Network Control of Robot Manipulators and Nonlinear Systems, Taylor & Francis, New York, NY, USA, 1999.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohammad Bagher Naghibi Sistani.

Additional information

Sholeh Yasini received her B.Sc. degree from University of Sistan and Balouchestan in 2003 and her M.Sc. degree from Ferdowsi University of Mashhad in 2008. She is currently working towards the Ph.D. degree at Ferdowsi University of Mashhad. Her research interests include optimal control, robust control, reinforcement learning, adaptive dynamic programming and system identification.

Mohammad Bagher Naghibi Sistani received his B.Sc. and M.Sc. degrees in Control Engineering with honors from University of Tehran, Tehran, Iran, in 1991 and 1995, respectively, and the Ph.D. degree from the Department of Electrical Engineering at Ferdowsi University of Mashhad in 2005. His research interests are reinforcement learning, adaptive dynamic programming, optimization and soft computing. He is now Assistant professor at Ferdowsi University of Mashhad, Mashhad, Iran.

Ali Karimpour received his M.Sc. and Ph.D. degrees in Electrical Engineering from the Department of Electrical Engineering at Ferdowsi University of Mashhad, Mashhad, Iran, in 1990 and 2003, respectively. His research interests are multivariable and robust control, power system dynamics, electricity market issues and system identification. He is now an associate professor at Ferdowsi University of Mashhad, Mashhad, Iran.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yasini, S., Sistani, M.B.N. & Karimpour, A. Approximate dynamic programming for two-player zero-sum game related to H control of unknown nonlinear continuous-time systems. Int. J. Control Autom. Syst. 13, 99–109 (2015). https://doi.org/10.1007/s12555-014-0085-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12555-014-0085-5

Keywords

Navigation