Abstract
In this paper, we develop a model-free integral policy iteration algorithm to learn online the Nash equilibrium solution of two-player zero-sum differential games with completely unknown nonlinear continuous-time dynamics. The developed algorithm updates value function, control and disturbance policies simultaneously. To implement this algorithm, three neural networks are used to approximate the game value function, the control policy and the disturbance policy. The least squares method is used to estimate the unknown parameters of the neural networks. The effectiveness of the developed scheme is demonstrated by a simulation example.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Lewis, F.L., Liu, D.: Reinforcement Learning and Approximate Dynamic Programming for Feedback Control. Wiley, Hoboken (2012)
Vamvoudakis, K.G., Lewis, F.L.: Online Actor-critic Algorithm to Solve the Continuous-time Infinite Horizon Optimal Control Problem. Automatica 46, 878–888 (2010)
Zhang, H., Cui, L., Zhang, X., Luo, Y.: Data-driven Robust Approximate Optimal Tracking Control for Unknown General Nonlinear Systems Using Adaptive Dynamic Programming Method. IEEE Trans. Neural Netw. 22, 2226–2236 (2011)
Bhasin, S., Kamalapurkar, R., Johnson, M., Vamvoudakis, K.G., Lewis, F.L., Dixon, W.E.: A Novel Actor-critic-identifier Architecture for Approximate Optimal Control of Uncertain Nonlinear Systems. Automatica 49, 82–92 (2013)
Vrabie, D., Pastravanu, O., Abu-Khalaf, M., Lewis, F.L.: Adaptive Optimal Control for Continuous-time Linear Systems Based on Policy Iteration. Automatica 45, 477–484 (2009)
Vrabie, D., Lewis, F.L.: Neural Network Approach to Continuous-time Direct Adaptive Optimal Control for Partially Unknown Nonlinear Systems. Neural Netw. 22, 237–246 (2009)
Mehta, P., Meyn, S.: Q-learning and Pontryagins Minimum Principle. In: Proceedings of the 48th IEEE Conference on Decision and Control, pp. 3598–3605 (2009)
Lee, J.Y., Park, J.B., Choi, Y.H.: Integral Q-learning and Explorized Policy Iteration for Adaptive Optimal Control of Continuous-time Linear Systems. Automatica 48, 2850–2859 (2012)
Lee, J.Y., Park, J.B., Choi, Y.H.: Integral Reinforcement Learning with Explorations for Continuous-time Nonlinear Systems. In: Proceedings of the 2012 IEEE World Congress on Computational Intelligence, pp. 1042–1047 (2012)
Jiang, Y., Jiang, Z.P.: Computational Adaptive Optimal Control for Continuous-time Linear Systems with Completely Unknown Dynamics. Automatica 48, 2699–2704 (2012)
Basar, T., Olsder, G.J.: Dynamic Noncooperative Game, 2nd edn. SIAM, Philadelphia (1997)
Abu-Khalaf, M., Lewis, F.L., Huang, J.: Neurodynamic Progarmming and Zero-sum Games for Constrained Control Systems. IEEE Trans. Neural Netw. 19, 1243–1252 (2008)
Zhang, H., Wei, Q., Liu, D.: An Iterative Adaptive Dynamic Programming Method for Solving a Class of Nonlinear Zero-sum Differential Games. Automatica 47, 207–214 (2011)
Vamvoudakis, K.G., Lewis, F.L.: Online Solution of Nonlinear Two-player Zero-sum Games using Synchronous Policy Iteration. Int. J. Robust. Nonlinear Control 22, 1460–1483 (2012)
Johnson, M., Bhasin, S., Dixon, W.E.: Nonlinear Two-player Zero-sum Game Approximate Solution Using a Policy Iteration Algorithm. In: Proceedings of Conference on Decision and Control and European Control Conference, pp. 142–147 (2011)
Varbie, D., Lewis, F.L.: Adaptive Dynamic Programming for Online Solution of a Zero-sum Differential Game. J. Control Theory Appl. 9, 353–360 (2011)
Wu, H.N., Luo, B.: Neural Network Based Online Simultaneous Policy Update Algorithm for Solving the HJI Equation in Nonlinear H  ∞  Control. IEEE Trans. Neural Netw. and Learn. Syst. 23, 1884–1895 (2012)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Li, H., Liu, D., Wang, D. (2013). Integral Policy Iteration for Zero-Sum Games with Completely Unknown Nonlinear Dynamics. In: Lee, M., Hirose, A., Hou, ZG., Kil, R.M. (eds) Neural Information Processing. ICONIP 2013. Lecture Notes in Computer Science, vol 8226. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-42054-2_29
Download citation
DOI: https://doi.org/10.1007/978-3-642-42054-2_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-42053-5
Online ISBN: 978-3-642-42054-2
eBook Packages: Computer ScienceComputer Science (R0)