Abstract
This paper introduces a model-free reinforcement learning technique that is used to solve a class of dynamic games known as dynamic graphical games. The graphical game results from multi-agent dynamical systems, where pinning control is used to make all the agents synchronize to the state of a command generator or a leader agent. Novel coupled Bellman equations and Hamiltonian functions are developed for the dynamic graphical games. The Hamiltonian mechanics are used to derive the necessary conditions for optimality. The solution for the dynamic graphical game is given in terms of the solution to a set of coupled Hamilton-Jacobi-Bellman equations developed herein. Nash equilibrium solution for the graphical game is given in terms of the solution to the underlying coupled Hamilton-Jacobi-Bellman equations. An online model-free policy iteration algorithm is developed to learn the Nash solution for the dynamic graphical game. This algorithm does not require any knowledge of the agents’ dynamics. A proof of convergence for this multi-agent learning algorithm is given under mild assumption about the inter-connectivity properties of the graph. A gradient descent technique with critic network structures is used to implement the policy iteration algorithm to solve the graphical game online in real-time.
Similar content being viewed by others
References
P. J. Werbos. Neural networks for control and system identification. Proceedings of the 28th IEEE Conference on Decision and Control, New York: IEEE, 1989: 260–265.
P. J. Werbos. Approximate Dynamic Programming for Real-time Control and Neural Modeling. Handbook of Intelligent Control. D. A. White, D. A. Sofge (ed.). New York: Van Nostrand Reinhold, 1992.
M. I. Abouheaf, F. L. Lewis, S. Haesaert, et al. Multi-agent discrete-time graphical games: interactive Nash equilibrium and value iteration solution. Proceedings of the American Control Conference, New York: IEEE, 2013: 4189–4195.
K. G. Vamvoudakis, F. L. Lewis, G. R. Hudas. Multi-agent differential graphical games: online adaptive learning solution for synchronization with optimality. Automatica, 2012, 48(8): 1598–1611.
A. E. Bryson. Optimal control-1950 to 1985. IEEE Control Systems, 1996, 16(3): 26–33.
F. L. Lewis, D. Vrabie, V. L. Syrmos. Optimal Control. 3rd ed. New York: John Wiley & Sons, 2012.
J. E. Marsden, M. West. Discrete mechanics and variational integrators. Acta Numerica, 2001, 10(5): 357–514.
Y. B. Suris. Discrete Lagrangian models. International School on Discrete Integrable Systems, Berlin: Springer, 2004: 111–184.
S. Lall, M. West. Discrete variational Hamiltonian mechanics. Journal of Physics A: Mathematical and General, 2006, 39(19): 5509–5519.
S. Mu, T. Chu, L. Wang. Coordinated collective motion in a motile particle group with a leader. Physica A, 2005, 351(2/4): 211–226.
R. W. Beard, V. Stepanyan. Synchronization of information in distributed multiple vehicle coordinated control. Proceedings of the IEEE Conference on Decision and Control, Maui: IEEE, 2003: 2029–2034.
A. Jadbabaie, J. Lin, A. Morse. Coordination of groups of mobile autonomous agents using nearest neighbor rules. IEEE Transactions on Automatic Control, 2003, 48(6): 988–1001.
R. Olfati-Saber, R. Murray. Consensus problems in networks of agents with switching topology and time-delays. IEEE Transactions on Automatic Control, 2004, 49(9): 1520–1533.
Z. Qu. Cooperative Control of Dynamical Systems: Applications to Autonomous Vehicles. New York: Springer, 2009.
W. Ren, R. Beard, E. Atkins. A survey of consensus problems in multi-agent coordination. Proceedings of the American Control Conference, New York: IEEE, 2005: 1859–1864.
J. Tsitsiklis. Problems in Decentralized Decision Making and Computation. Ph.D. dissertation. Cambridge, MA: Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, 1984.
Z. Li, Z. Duan, G. Chen, et al. Consensus of multi-agent systems and synchronization of complex networks: A unified viewpoint. IEEE Transactions on Circuits and Systems, 2010, 57(1): 213–224.
X. Li, X. Wang, G. Chen. Pinning a complex dynamical network to its equilibrium. IEEE Transactions on Circuits and Systems, 2004, 51(10): 2074–2087.
W. Ren, K. Moore, Y. Chen. High-order and model reference consensus algorithms in cooperative control of multivehicle systems. Journal of Dynamic Systems, Measurement and Control, 2007, 129(5): 678–688.
J. Kuang, J. Zhu. On consensus protocols for high-order multiagent systems. Journal of Control Theory and Applications, 2010, 8(4): 406–412.
S. Zhang, G. Duan. Consensus seeking in multi-agent cooperative control systems with bounded control input. Journal of Control Theory and Applications, 2011, 9(2): 210–214.
R. Gopalakrishnan, J. R. Marden, A. Wierman. An architectural view of game theoretic control. Performance Evaluation Review, 2011, 38(3): 31–36.
T. Başar, G. J. Olsder. Dynamic Non-cooperative Game Theory. Classics in Applied Mathematics. 2nd ed. Philadelphia: SIAM, 1999.
G. Freiling, G. Jank, H. Abou-Kandil. On global existence of Solutions to coupled matrix Riccati equations in closed-loop Nash Games. IEEE Transactions on Automatic Control, 2002, 41(2): 264–269.
Z. Gajic, T.-Y. Li. Simulation results for two new algorithms for solving coupled algebraic Riccati equations. Proceedings of the 3rd International Symposium on Differential Games. Sophia Antipolis, France, 1988.
A. G. Barto, R. S. Sutton, C. W. Anderson. Neuron like adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems Man and Cybernetics, 1983, 13(5): 834–846.
R. Howard. Dynamic Programming and Markov Processes. Cambridge, MA: MIT Press, 1960.
R. Bellman. Dynamic Programming. Princeton: Princeton University Press, 1957.
D. P. Bertsekas, J. N. Tsitsiklis. Neuro-dynamic Programming. Belmont, MA: Athena Scientific 1996.
P. J. Werbos. Intelligence in the Brain: a theory of how it works and how to build it. Conference on Goal-Directed Neural Systems, Oxford: Pergamon-Elsevier Science Ltd., 2009: 200–212.
D. Vrabie, F. L. Lewis. Adaptive dynamic programming for online solution of a zero-sum differential game. Journal of Control Theory and Applications, 2011, 9(3): 353–360.
J. Morimoto, G. Zeglin, C. Atkeson. Minimax differential dynamic programming: application to a biped walking robot. IEEE/RSJ International Conference on Intelligent Robots and Systems, New York: IEEE, 2003: 1927–1932.
T. Landelius. Reinforcement Learning and Distributed Local Model Synthesis. Ph.D. dissertation. Sweden: Linkoping University, 1997.
R. S. Sutton, A. G. Barto. Reinforcement Learning - An Introduction. Cambridge, MA: MIT Press, 1998.
S. Sen, G. Weiss. Learning In Multiagent Systems, in Multiagent Systems: A Modern Approach to Distributed Artificial Intelligence. Cambridge, MA: MIT Press, 1999: 259–298.
K. G. Vamvoudakis, F. L. Lewis. Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica, 2010, 46(5): 878–888.
K. G. Vamvoudakis, F. L. Lewis. Multi-player non-zero sum games: online adaptive learning solution of coupled Hamilton-Jacobi equations. Automatica, 2011, 47(8): 1556–1569.
D. Vrabie, O. Pastravanu, F. L. Lewis, et al. Adaptive optimal control for continuous-time linear systems based on policy iteration. Automatica, 2009, 45(2): 477–484.
D. P. Bertsekas. Approximate policy iteration: A survey and some new methods. Journal of Control Theory and Applications, 2011, 9(3): 310–335.
L. Busoniu, R. Babuska, B. De-Schutter. A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man and Cybernetics, 2008, 38(2): 156–172.
P. Vrancx, K. Verbeeck, A. Nowe. Decentralized learning in markov games. IEEE Transactions on Systems, Man and Cybernetics, 2008, 38(4): 976–981.
M. L. Littman. Value-function reinforcement learning in Markov games. Cognitive Systems Research, 2001, 2(1): 55–66.
Y. Jiang, Z. Jiang. Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics. Automatica, 2012, 48(10): 2699–2704.
Y. Jiang, Z. Jiang. Global adaptive dynamic programming for continuous-time nonlinear systems. 2013: http://arxiv.org/abs/1401.0020.
T. Dierks, S. Jagannathan. Optimal control of affine nonlinear continuous-time systems using an online Hamilton-Jacobi-Isaacs formulation. Proceedings of the 49th IEEE Conference on Decision and Control, New York: IEEE, 2010: 3048–3053.
M. Johnson, T. Hiramatsu, N. Fitz-Coy, et al. Asymptotic Stackelberg optimal control design for an uncertain Euler Lagrange system. Proceedings of the 49th IEEE Conference on Decision and Control, New York: IEEE, 2010: 6686–6691.
F. L. Lewis. Applied Optimal Control and Estimation: Digital Design and Implementation. Englewood Cliffs: Prentice Hall, 1992.
S. Khoo, L. Xie, Z. Man. Robust finite-time consensus tracking algorithm for multirobot systems. IEEE/ASME Transactions on Mechatronics, 2009, 14(2): 219–228.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was supported by the Deanship of Scientific Research at King Fahd University of Petroleum & Minerals Project (No. JF141002), the National Science Foundation (No. ECCS-1405173), the Office of Naval Research (Nos. N000141310562, N000141410718), the U.S. Army Research Office (No. W911NF-11-D-0001), the National Natural Science Foundation of China (No. 61120106011), and the Project 111 from the Ministry of Education of China (No. B08015).
Mohammed I. ABOUHEAF was born in Smanoud, Egypt. He received his B.Sc. and M.Sc. degrees in Electronics and Communication Engineering, Mansoura College of Engineering, Mansoura, Egypt, in 2000 and 2006, respectively. He worked as an assistant lecturer with the Air Defense College, Alexandria, Egypt (2001–2002). He worked as a planning engineer for the Maintenance Department, Suez Oil Company (SUCO), South Sinai, Egypt (2002–2004). He worked as an assistant lecturer with the Electrical Engineering Department, Aswan College of Energy Engineering, Aswan, Egypt (2004–2008). He received his Ph.D. degree in Electrical Engineering, University of Texas at Arlington (UTA), Arlington, Texas, U.S.A. in 2012. He worked as a postdoctoral fellow with the University of Texas at Arlington Research Institute (UTARI), Fort Worth, Texas, U.S.A. (2012–2013). He worked as Adjunct Faculty with the Electrical Engineering Department, University of Texas at Arlington (UTA), Arlington, Texas, U.S.A. (2012–2013). He was a member of the Advanced Controls and Sensor Group (ACS) and the Energy Systems Research Center (ESRC), University of Texas at Arlington, Arlington, Texas, U.S.A. (2008–2012). Currently, he is Assistant Professor with the Systems Engineering Department, King Fahd University of Petroleum and Minerals (KFUPM), Dhahran, Saudi Arabia. His research interests include optimal control, adaptive control, reinforcement learning, fuzzy systems, game theory, microgrids, and economic dispatch.
Frank L. LEWIS is Member, National Academy of Inventors. Fellow IEEE, Fellow IFAC, Fellow, U.K. Institute of Measurement & Control, PE Texas, U.K. Chartered Engineer, UTA Distinguished Scholar Professor, UTA Distinguished Teaching Professor, and Moncrief-O’Donnell Chair at the University of Texas at Arlington Research Institute. He is Qian Ren Thousand Talents Consulting Professor, Northeastern University, Shenyang, China. He obtained the B.S. degree in Physics/EE and the MSEE at Rice University, the M.S. in Aeronautical Engineering from University of West Florida, and the Ph.D. degree at Georgia Institute of Technology. He works in feedback control, intelligent systems, cooperative control systems, and nonlinear systems. He is Author of 6 U.S. patents, numerous journal special issues, journal papers, and 20 books, including Optimal Control, Aircraft Control, Optimal Estimation, and Robot Manipulator Control which are used as university textbooks worldwide. He received the Fulbright Research Award, NSF Research Initiation Grant, ASEE Terman Award, Int. Neural Network Soc. Gabor Award, U.K. Inst Measurement & Control Honeywell Field Engineering Medal, and IEEE Computational Intelligence Society Neural Networks Pioneer Award. He received Outstanding Service Award from Dallas IEEE Section, and selected as Engineer of the year by Ft. Worth IEEE Section. He was listed in Ft. Worth Business Press Top 200 Leaders in Manufacturing. He received Texas Regents Outstanding Teaching Award 2013. He is Distinguished Visiting Professor at Nanjing University of Science & Technology and Project 111 Professor at Northeastern University in Shenyang, China. He is Founding Member of the Board of Governors of the Mediterranean Control Association.
Magdi S. MAHMOUD obtained the B.Sc. degree (Honors) in Communication Engineering, M.Sc. degree in Electronic Engineering, and Ph.D. degree in Systems Engineering, all from Cairo University in 1968, 1972 and 1974, respectively. He has been a professor of Engineering since 1984. He is now a distinguished professor at KFUPM, Saudi Arabia. He was on the faculty at different universities worldwide including Egypt (CU, AUC), Kuwait (KU), U.A.E. (UAEU), U.K. (UMIST), U.S.A. (Pitt, Case Western), Singapore (Nanyang) and Australia (Adelaide). He lectured in Venezuela (Caracas), Germany (Hanover), U.K. ((Kent), U.S.A. (UoSA), Canada (Montreal) and China (BIT, Yanshan). He is the principal author of thirty-four (34) books, inclusive book-chapters and the author/co-author of more than 510 peer-reviewed papers. He is the recipient of two national, one regional and four university prizes for outstanding research in engineering and applied mathematics. He is a fellow of the IEE, a senior member of the IEEE, the CEI (U.K.), and a registered consultant engineer of information engineering and systems (Egypt). He is currently actively engaged in teaching and research in the development of modern methodologies to distributed control and filtering, networked-control systems, triggering mechanisms in dynamical systems, fault-tolerant systems and information technology. He is a fellow of the IEEE, a senior member of the IEEE, the CEI (U.K.), and a registered consultant engineer of information engineering and systems Egypt.
Dariusz MIKULSKI is a research computer scientist in Ground Vehicle Robotics at the U.S. Army Tank-Automotive Research Development and Engineering Center in Warren, MI. He currently works on research to improve cooperative teaming and cyber security in military unmanned convoy operations. Dr. Mikulski earned his Ph.D. degree in Electrical and Computer Engineering at Oakland University in Rochester Hills, Michigan in 2013. He also earned his B.Sc. in Computer Science from the University of Michigan in Ann Arbor and Masters in Computer Science and Engineering from Oakland University.
Rights and permissions
About this article
Cite this article
Abouheaf, M.I., Lewis, F.L., Mahmoud, M.S. et al. Discrete-time dynamic graphical games: model-free reinforcement learning solution. Control Theory Technol. 13, 55–69 (2015). https://doi.org/10.1007/s11768-015-3203-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11768-015-3203-x