Discrete-time dynamic graphical games: model-free reinforcement learning solution

Abouheaf, Mohammed I.; Lewis, Frank L.; Mahmoud, Magdi S.; Mikulski, Dariusz G.

doi:10.1007/s11768-015-3203-x

Discrete-time dynamic graphical games: model-free reinforcement learning solution

Published: 31 March 2015

Volume 13, pages 55–69, (2015)
Cite this article

Control Theory and Technology Aims and scope Submit manuscript

Mohammed I. Abouheaf¹,
Frank L. Lewis^2,3,
Magdi S. Mahmoud¹ &
…
Dariusz G. Mikulski⁴

785 Accesses
54 Citations
1 Altmetric
Explore all metrics

Abstract

This paper introduces a model-free reinforcement learning technique that is used to solve a class of dynamic games known as dynamic graphical games. The graphical game results from multi-agent dynamical systems, where pinning control is used to make all the agents synchronize to the state of a command generator or a leader agent. Novel coupled Bellman equations and Hamiltonian functions are developed for the dynamic graphical games. The Hamiltonian mechanics are used to derive the necessary conditions for optimality. The solution for the dynamic graphical game is given in terms of the solution to a set of coupled Hamilton-Jacobi-Bellman equations developed herein. Nash equilibrium solution for the graphical game is given in terms of the solution to the underlying coupled Hamilton-Jacobi-Bellman equations. An online model-free policy iteration algorithm is developed to learn the Nash solution for the dynamic graphical game. This algorithm does not require any knowledge of the agents’ dynamics. A proof of convergence for this multi-agent learning algorithm is given under mild assumption about the inter-connectivity properties of the graph. A gradient descent technique with critic network structures is used to implement the policy iteration algorithm to solve the graphical game online in real-time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-agent graphical games with input constraints: an online learning solution

Article 01 May 2020

Iterative ADP learning algorithms for discrete-time multi-player games

Article 12 January 2018

Approximate-optimal control algorithm for constrained zero-sum differential games through event-triggering mechanism

Article 13 December 2018

References

P. J. Werbos. Neural networks for control and system identification. Proceedings of the 28th IEEE Conference on Decision and Control, New York: IEEE, 1989: 260–265.
Chapter Google Scholar
P. J. Werbos. Approximate Dynamic Programming for Real-time Control and Neural Modeling. Handbook of Intelligent Control. D. A. White, D. A. Sofge (ed.). New York: Van Nostrand Reinhold, 1992.
M. I. Abouheaf, F. L. Lewis, S. Haesaert, et al. Multi-agent discrete-time graphical games: interactive Nash equilibrium and value iteration solution. Proceedings of the American Control Conference, New York: IEEE, 2013: 4189–4195.
Google Scholar
K. G. Vamvoudakis, F. L. Lewis, G. R. Hudas. Multi-agent differential graphical games: online adaptive learning solution for synchronization with optimality. Automatica, 2012, 48(8): 1598–1611.
Article MATH MathSciNet Google Scholar
A. E. Bryson. Optimal control-1950 to 1985. IEEE Control Systems, 1996, 16(3): 26–33.
Article Google Scholar
F. L. Lewis, D. Vrabie, V. L. Syrmos. Optimal Control. 3rd ed. New York: John Wiley & Sons, 2012.
Book MATH Google Scholar
J. E. Marsden, M. West. Discrete mechanics and variational integrators. Acta Numerica, 2001, 10(5): 357–514.
Article MATH MathSciNet Google Scholar
Y. B. Suris. Discrete Lagrangian models. International School on Discrete Integrable Systems, Berlin: Springer, 2004: 111–184.
Chapter Google Scholar
S. Lall, M. West. Discrete variational Hamiltonian mechanics. Journal of Physics A: Mathematical and General, 2006, 39(19): 5509–5519.
Article MATH MathSciNet Google Scholar
S. Mu, T. Chu, L. Wang. Coordinated collective motion in a motile particle group with a leader. Physica A, 2005, 351(2/4): 211–226.
Article Google Scholar
R. W. Beard, V. Stepanyan. Synchronization of information in distributed multiple vehicle coordinated control. Proceedings of the IEEE Conference on Decision and Control, Maui: IEEE, 2003: 2029–2034.
Google Scholar
A. Jadbabaie, J. Lin, A. Morse. Coordination of groups of mobile autonomous agents using nearest neighbor rules. IEEE Transactions on Automatic Control, 2003, 48(6): 988–1001.
Article MathSciNet Google Scholar
R. Olfati-Saber, R. Murray. Consensus problems in networks of agents with switching topology and time-delays. IEEE Transactions on Automatic Control, 2004, 49(9): 1520–1533.
Article MathSciNet Google Scholar
Z. Qu. Cooperative Control of Dynamical Systems: Applications to Autonomous Vehicles. New York: Springer, 2009.
Google Scholar
W. Ren, R. Beard, E. Atkins. A survey of consensus problems in multi-agent coordination. Proceedings of the American Control Conference, New York: IEEE, 2005: 1859–1864.
Google Scholar
J. Tsitsiklis. Problems in Decentralized Decision Making and Computation. Ph.D. dissertation. Cambridge, MA: Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, 1984.
Google Scholar
Z. Li, Z. Duan, G. Chen, et al. Consensus of multi-agent systems and synchronization of complex networks: A unified viewpoint. IEEE Transactions on Circuits and Systems, 2010, 57(1): 213–224.
Article MathSciNet Google Scholar
X. Li, X. Wang, G. Chen. Pinning a complex dynamical network to its equilibrium. IEEE Transactions on Circuits and Systems, 2004, 51(10): 2074–2087.
Article MathSciNet Google Scholar
W. Ren, K. Moore, Y. Chen. High-order and model reference consensus algorithms in cooperative control of multivehicle systems. Journal of Dynamic Systems, Measurement and Control, 2007, 129(5): 678–688.
Article Google Scholar
J. Kuang, J. Zhu. On consensus protocols for high-order multiagent systems. Journal of Control Theory and Applications, 2010, 8(4): 406–412.
Article MathSciNet Google Scholar
S. Zhang, G. Duan. Consensus seeking in multi-agent cooperative control systems with bounded control input. Journal of Control Theory and Applications, 2011, 9(2): 210–214.
Article MathSciNet Google Scholar
R. Gopalakrishnan, J. R. Marden, A. Wierman. An architectural view of game theoretic control. Performance Evaluation Review, 2011, 38(3): 31–36.
Article Google Scholar
T. Başar, G. J. Olsder. Dynamic Non-cooperative Game Theory. Classics in Applied Mathematics. 2nd ed. Philadelphia: SIAM, 1999.
Google Scholar
G. Freiling, G. Jank, H. Abou-Kandil. On global existence of Solutions to coupled matrix Riccati equations in closed-loop Nash Games. IEEE Transactions on Automatic Control, 2002, 41(2): 264–269.
Article MathSciNet Google Scholar
Z. Gajic, T.-Y. Li. Simulation results for two new algorithms for solving coupled algebraic Riccati equations. Proceedings of the 3rd International Symposium on Differential Games. Sophia Antipolis, France, 1988.
Google Scholar
A. G. Barto, R. S. Sutton, C. W. Anderson. Neuron like adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems Man and Cybernetics, 1983, 13(5): 834–846.
Article Google Scholar
R. Howard. Dynamic Programming and Markov Processes. Cambridge, MA: MIT Press, 1960.
MATH Google Scholar
R. Bellman. Dynamic Programming. Princeton: Princeton University Press, 1957.
MATH Google Scholar
D. P. Bertsekas, J. N. Tsitsiklis. Neuro-dynamic Programming. Belmont, MA: Athena Scientific 1996.
MATH Google Scholar
P. J. Werbos. Intelligence in the Brain: a theory of how it works and how to build it. Conference on Goal-Directed Neural Systems, Oxford: Pergamon-Elsevier Science Ltd., 2009: 200–212.
Google Scholar
D. Vrabie, F. L. Lewis. Adaptive dynamic programming for online solution of a zero-sum differential game. Journal of Control Theory and Applications, 2011, 9(3): 353–360.
Article MATH MathSciNet Google Scholar
J. Morimoto, G. Zeglin, C. Atkeson. Minimax differential dynamic programming: application to a biped walking robot. IEEE/RSJ International Conference on Intelligent Robots and Systems, New York: IEEE, 2003: 1927–1932.
Google Scholar
T. Landelius. Reinforcement Learning and Distributed Local Model Synthesis. Ph.D. dissertation. Sweden: Linkoping University, 1997.
Google Scholar
R. S. Sutton, A. G. Barto. Reinforcement Learning - An Introduction. Cambridge, MA: MIT Press, 1998.
Book Google Scholar
S. Sen, G. Weiss. Learning In Multiagent Systems, in Multiagent Systems: A Modern Approach to Distributed Artificial Intelligence. Cambridge, MA: MIT Press, 1999: 259–298.
Google Scholar
K. G. Vamvoudakis, F. L. Lewis. Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica, 2010, 46(5): 878–888.
Article MATH MathSciNet Google Scholar
K. G. Vamvoudakis, F. L. Lewis. Multi-player non-zero sum games: online adaptive learning solution of coupled Hamilton-Jacobi equations. Automatica, 2011, 47(8): 1556–1569.
Article MATH MathSciNet Google Scholar
D. Vrabie, O. Pastravanu, F. L. Lewis, et al. Adaptive optimal control for continuous-time linear systems based on policy iteration. Automatica, 2009, 45(2): 477–484.
Article MATH MathSciNet Google Scholar
D. P. Bertsekas. Approximate policy iteration: A survey and some new methods. Journal of Control Theory and Applications, 2011, 9(3): 310–335.
Article MATH MathSciNet Google Scholar
L. Busoniu, R. Babuska, B. De-Schutter. A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man and Cybernetics, 2008, 38(2): 156–172.
Article Google Scholar
P. Vrancx, K. Verbeeck, A. Nowe. Decentralized learning in markov games. IEEE Transactions on Systems, Man and Cybernetics, 2008, 38(4): 976–981.
Article Google Scholar
M. L. Littman. Value-function reinforcement learning in Markov games. Cognitive Systems Research, 2001, 2(1): 55–66.
Article Google Scholar
Y. Jiang, Z. Jiang. Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics. Automatica, 2012, 48(10): 2699–2704.
Article MATH MathSciNet Google Scholar
Y. Jiang, Z. Jiang. Global adaptive dynamic programming for continuous-time nonlinear systems. 2013: http://arxiv.org/abs/1401.0020.
Google Scholar
T. Dierks, S. Jagannathan. Optimal control of affine nonlinear continuous-time systems using an online Hamilton-Jacobi-Isaacs formulation. Proceedings of the 49th IEEE Conference on Decision and Control, New York: IEEE, 2010: 3048–3053.
Chapter Google Scholar
M. Johnson, T. Hiramatsu, N. Fitz-Coy, et al. Asymptotic Stackelberg optimal control design for an uncertain Euler Lagrange system. Proceedings of the 49th IEEE Conference on Decision and Control, New York: IEEE, 2010: 6686–6691.
Chapter Google Scholar
F. L. Lewis. Applied Optimal Control and Estimation: Digital Design and Implementation. Englewood Cliffs: Prentice Hall, 1992.
Google Scholar
S. Khoo, L. Xie, Z. Man. Robust finite-time consensus tracking algorithm for multirobot systems. IEEE/ASME Transactions on Mechatronics, 2009, 14(2): 219–228.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Systems Engineering Department, King Fahd University of Petroleum & Mineral, Dhahran, 31261, Saudi Arabia
Mohammed I. Abouheaf & Magdi S. Mahmoud
UTA Research Institute, University of Texas at Arlington, Fort Worth, Texas, USA
Frank L. Lewis
State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University, Shenyang Liaoning, 110819, China
Frank L. Lewis
Ground Vehicle Robotics (GVR), U.S. Army TARDEC, Warren, MI, USA
Dariusz G. Mikulski

Authors

Mohammed I. Abouheaf
View author publications
You can also search for this author in PubMed Google Scholar
Frank L. Lewis
View author publications
You can also search for this author in PubMed Google Scholar
Magdi S. Mahmoud
View author publications
You can also search for this author in PubMed Google Scholar
Dariusz G. Mikulski
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohammed I. Abouheaf.

Additional information

This work was supported by the Deanship of Scientific Research at King Fahd University of Petroleum & Minerals Project (No. JF141002), the National Science Foundation (No. ECCS-1405173), the Office of Naval Research (Nos. N000141310562, N000141410718), the U.S. Army Research Office (No. W911NF-11-D-0001), the National Natural Science Foundation of China (No. 61120106011), and the Project 111 from the Ministry of Education of China (No. B08015).

Mohammed I. ABOUHEAF was born in Smanoud, Egypt. He received his B.Sc. and M.Sc. degrees in Electronics and Communication Engineering, Mansoura College of Engineering, Mansoura, Egypt, in 2000 and 2006, respectively. He worked as an assistant lecturer with the Air Defense College, Alexandria, Egypt (2001–2002). He worked as a planning engineer for the Maintenance Department, Suez Oil Company (SUCO), South Sinai, Egypt (2002–2004). He worked as an assistant lecturer with the Electrical Engineering Department, Aswan College of Energy Engineering, Aswan, Egypt (2004–2008). He received his Ph.D. degree in Electrical Engineering, University of Texas at Arlington (UTA), Arlington, Texas, U.S.A. in 2012. He worked as a postdoctoral fellow with the University of Texas at Arlington Research Institute (UTARI), Fort Worth, Texas, U.S.A. (2012–2013). He worked as Adjunct Faculty with the Electrical Engineering Department, University of Texas at Arlington (UTA), Arlington, Texas, U.S.A. (2012–2013). He was a member of the Advanced Controls and Sensor Group (ACS) and the Energy Systems Research Center (ESRC), University of Texas at Arlington, Arlington, Texas, U.S.A. (2008–2012). Currently, he is Assistant Professor with the Systems Engineering Department, King Fahd University of Petroleum and Minerals (KFUPM), Dhahran, Saudi Arabia. His research interests include optimal control, adaptive control, reinforcement learning, fuzzy systems, game theory, microgrids, and economic dispatch.

Frank L. LEWIS is Member, National Academy of Inventors. Fellow IEEE, Fellow IFAC, Fellow, U.K. Institute of Measurement & Control, PE Texas, U.K. Chartered Engineer, UTA Distinguished Scholar Professor, UTA Distinguished Teaching Professor, and Moncrief-O’Donnell Chair at the University of Texas at Arlington Research Institute. He is Qian Ren Thousand Talents Consulting Professor, Northeastern University, Shenyang, China. He obtained the B.S. degree in Physics/EE and the MSEE at Rice University, the M.S. in Aeronautical Engineering from University of West Florida, and the Ph.D. degree at Georgia Institute of Technology. He works in feedback control, intelligent systems, cooperative control systems, and nonlinear systems. He is Author of 6 U.S. patents, numerous journal special issues, journal papers, and 20 books, including Optimal Control, Aircraft Control, Optimal Estimation, and Robot Manipulator Control which are used as university textbooks worldwide. He received the Fulbright Research Award, NSF Research Initiation Grant, ASEE Terman Award, Int. Neural Network Soc. Gabor Award, U.K. Inst Measurement & Control Honeywell Field Engineering Medal, and IEEE Computational Intelligence Society Neural Networks Pioneer Award. He received Outstanding Service Award from Dallas IEEE Section, and selected as Engineer of the year by Ft. Worth IEEE Section. He was listed in Ft. Worth Business Press Top 200 Leaders in Manufacturing. He received Texas Regents Outstanding Teaching Award 2013. He is Distinguished Visiting Professor at Nanjing University of Science & Technology and Project 111 Professor at Northeastern University in Shenyang, China. He is Founding Member of the Board of Governors of the Mediterranean Control Association.

Magdi S. MAHMOUD obtained the B.Sc. degree (Honors) in Communication Engineering, M.Sc. degree in Electronic Engineering, and Ph.D. degree in Systems Engineering, all from Cairo University in 1968, 1972 and 1974, respectively. He has been a professor of Engineering since 1984. He is now a distinguished professor at KFUPM, Saudi Arabia. He was on the faculty at different universities worldwide including Egypt (CU, AUC), Kuwait (KU), U.A.E. (UAEU), U.K. (UMIST), U.S.A. (Pitt, Case Western), Singapore (Nanyang) and Australia (Adelaide). He lectured in Venezuela (Caracas), Germany (Hanover), U.K. ((Kent), U.S.A. (UoSA), Canada (Montreal) and China (BIT, Yanshan). He is the principal author of thirty-four (34) books, inclusive book-chapters and the author/co-author of more than 510 peer-reviewed papers. He is the recipient of two national, one regional and four university prizes for outstanding research in engineering and applied mathematics. He is a fellow of the IEE, a senior member of the IEEE, the CEI (U.K.), and a registered consultant engineer of information engineering and systems (Egypt). He is currently actively engaged in teaching and research in the development of modern methodologies to distributed control and filtering, networked-control systems, triggering mechanisms in dynamical systems, fault-tolerant systems and information technology. He is a fellow of the IEEE, a senior member of the IEEE, the CEI (U.K.), and a registered consultant engineer of information engineering and systems Egypt.

Dariusz MIKULSKI is a research computer scientist in Ground Vehicle Robotics at the U.S. Army Tank-Automotive Research Development and Engineering Center in Warren, MI. He currently works on research to improve cooperative teaming and cyber security in military unmanned convoy operations. Dr. Mikulski earned his Ph.D. degree in Electrical and Computer Engineering at Oakland University in Rochester Hills, Michigan in 2013. He also earned his B.Sc. in Computer Science from the University of Michigan in Ann Arbor and Masters in Computer Science and Engineering from Oakland University.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Abouheaf, M.I., Lewis, F.L., Mahmoud, M.S. et al. Discrete-time dynamic graphical games: model-free reinforcement learning solution. Control Theory Technol. 13, 55–69 (2015). https://doi.org/10.1007/s11768-015-3203-x

Download citation

Received: 31 December 2013
Revised: 02 January 2015
Accepted: 15 January 2015
Published: 31 March 2015
Issue Date: February 2015
DOI: https://doi.org/10.1007/s11768-015-3203-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Discrete-time dynamic graphical games: model-free reinforcement learning solution

Abstract

Access this article

Similar content being viewed by others

Multi-agent graphical games with input constraints: an online learning solution

Iterative ADP learning algorithms for discrete-time multi-player games

Approximate-optimal control algorithm for constrained zero-sum differential games through event-triggering mechanism

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Discrete-time dynamic graphical games: model-free reinforcement learning solution

Abstract

Access this article

Similar content being viewed by others

Multi-agent graphical games with input constraints: an online learning solution

Iterative ADP learning algorithms for discrete-time multi-player games

Approximate-optimal control algorithm for constrained zero-sum differential games through event-triggering mechanism

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation