Skip to main content
Log in

Discrete-time dynamic graphical games: model-free reinforcement learning solution

  • Published:
Control Theory and Technology Aims and scope Submit manuscript

Abstract

This paper introduces a model-free reinforcement learning technique that is used to solve a class of dynamic games known as dynamic graphical games. The graphical game results from multi-agent dynamical systems, where pinning control is used to make all the agents synchronize to the state of a command generator or a leader agent. Novel coupled Bellman equations and Hamiltonian functions are developed for the dynamic graphical games. The Hamiltonian mechanics are used to derive the necessary conditions for optimality. The solution for the dynamic graphical game is given in terms of the solution to a set of coupled Hamilton-Jacobi-Bellman equations developed herein. Nash equilibrium solution for the graphical game is given in terms of the solution to the underlying coupled Hamilton-Jacobi-Bellman equations. An online model-free policy iteration algorithm is developed to learn the Nash solution for the dynamic graphical game. This algorithm does not require any knowledge of the agents’ dynamics. A proof of convergence for this multi-agent learning algorithm is given under mild assumption about the inter-connectivity properties of the graph. A gradient descent technique with critic network structures is used to implement the policy iteration algorithm to solve the graphical game online in real-time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. P. J. Werbos. Neural networks for control and system identification. Proceedings of the 28th IEEE Conference on Decision and Control, New York: IEEE, 1989: 260–265.

    Chapter  Google Scholar 

  2. P. J. Werbos. Approximate Dynamic Programming for Real-time Control and Neural Modeling. Handbook of Intelligent Control. D. A. White, D. A. Sofge (ed.). New York: Van Nostrand Reinhold, 1992.

  3. M. I. Abouheaf, F. L. Lewis, S. Haesaert, et al. Multi-agent discrete-time graphical games: interactive Nash equilibrium and value iteration solution. Proceedings of the American Control Conference, New York: IEEE, 2013: 4189–4195.

    Google Scholar 

  4. K. G. Vamvoudakis, F. L. Lewis, G. R. Hudas. Multi-agent differential graphical games: online adaptive learning solution for synchronization with optimality. Automatica, 2012, 48(8): 1598–1611.

    Article  MATH  MathSciNet  Google Scholar 

  5. A. E. Bryson. Optimal control-1950 to 1985. IEEE Control Systems, 1996, 16(3): 26–33.

    Article  Google Scholar 

  6. F. L. Lewis, D. Vrabie, V. L. Syrmos. Optimal Control. 3rd ed. New York: John Wiley & Sons, 2012.

    Book  MATH  Google Scholar 

  7. J. E. Marsden, M. West. Discrete mechanics and variational integrators. Acta Numerica, 2001, 10(5): 357–514.

    Article  MATH  MathSciNet  Google Scholar 

  8. Y. B. Suris. Discrete Lagrangian models. International School on Discrete Integrable Systems, Berlin: Springer, 2004: 111–184.

    Chapter  Google Scholar 

  9. S. Lall, M. West. Discrete variational Hamiltonian mechanics. Journal of Physics A: Mathematical and General, 2006, 39(19): 5509–5519.

    Article  MATH  MathSciNet  Google Scholar 

  10. S. Mu, T. Chu, L. Wang. Coordinated collective motion in a motile particle group with a leader. Physica A, 2005, 351(2/4): 211–226.

    Article  Google Scholar 

  11. R. W. Beard, V. Stepanyan. Synchronization of information in distributed multiple vehicle coordinated control. Proceedings of the IEEE Conference on Decision and Control, Maui: IEEE, 2003: 2029–2034.

    Google Scholar 

  12. A. Jadbabaie, J. Lin, A. Morse. Coordination of groups of mobile autonomous agents using nearest neighbor rules. IEEE Transactions on Automatic Control, 2003, 48(6): 988–1001.

    Article  MathSciNet  Google Scholar 

  13. R. Olfati-Saber, R. Murray. Consensus problems in networks of agents with switching topology and time-delays. IEEE Transactions on Automatic Control, 2004, 49(9): 1520–1533.

    Article  MathSciNet  Google Scholar 

  14. Z. Qu. Cooperative Control of Dynamical Systems: Applications to Autonomous Vehicles. New York: Springer, 2009.

    Google Scholar 

  15. W. Ren, R. Beard, E. Atkins. A survey of consensus problems in multi-agent coordination. Proceedings of the American Control Conference, New York: IEEE, 2005: 1859–1864.

    Google Scholar 

  16. J. Tsitsiklis. Problems in Decentralized Decision Making and Computation. Ph.D. dissertation. Cambridge, MA: Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, 1984.

    Google Scholar 

  17. Z. Li, Z. Duan, G. Chen, et al. Consensus of multi-agent systems and synchronization of complex networks: A unified viewpoint. IEEE Transactions on Circuits and Systems, 2010, 57(1): 213–224.

    Article  MathSciNet  Google Scholar 

  18. X. Li, X. Wang, G. Chen. Pinning a complex dynamical network to its equilibrium. IEEE Transactions on Circuits and Systems, 2004, 51(10): 2074–2087.

    Article  MathSciNet  Google Scholar 

  19. W. Ren, K. Moore, Y. Chen. High-order and model reference consensus algorithms in cooperative control of multivehicle systems. Journal of Dynamic Systems, Measurement and Control, 2007, 129(5): 678–688.

    Article  Google Scholar 

  20. J. Kuang, J. Zhu. On consensus protocols for high-order multiagent systems. Journal of Control Theory and Applications, 2010, 8(4): 406–412.

    Article  MathSciNet  Google Scholar 

  21. S. Zhang, G. Duan. Consensus seeking in multi-agent cooperative control systems with bounded control input. Journal of Control Theory and Applications, 2011, 9(2): 210–214.

    Article  MathSciNet  Google Scholar 

  22. R. Gopalakrishnan, J. R. Marden, A. Wierman. An architectural view of game theoretic control. Performance Evaluation Review, 2011, 38(3): 31–36.

    Article  Google Scholar 

  23. T. Başar, G. J. Olsder. Dynamic Non-cooperative Game Theory. Classics in Applied Mathematics. 2nd ed. Philadelphia: SIAM, 1999.

    Google Scholar 

  24. G. Freiling, G. Jank, H. Abou-Kandil. On global existence of Solutions to coupled matrix Riccati equations in closed-loop Nash Games. IEEE Transactions on Automatic Control, 2002, 41(2): 264–269.

    Article  MathSciNet  Google Scholar 

  25. Z. Gajic, T.-Y. Li. Simulation results for two new algorithms for solving coupled algebraic Riccati equations. Proceedings of the 3rd International Symposium on Differential Games. Sophia Antipolis, France, 1988.

    Google Scholar 

  26. A. G. Barto, R. S. Sutton, C. W. Anderson. Neuron like adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems Man and Cybernetics, 1983, 13(5): 834–846.

    Article  Google Scholar 

  27. R. Howard. Dynamic Programming and Markov Processes. Cambridge, MA: MIT Press, 1960.

    MATH  Google Scholar 

  28. R. Bellman. Dynamic Programming. Princeton: Princeton University Press, 1957.

    MATH  Google Scholar 

  29. D. P. Bertsekas, J. N. Tsitsiklis. Neuro-dynamic Programming. Belmont, MA: Athena Scientific 1996.

    MATH  Google Scholar 

  30. P. J. Werbos. Intelligence in the Brain: a theory of how it works and how to build it. Conference on Goal-Directed Neural Systems, Oxford: Pergamon-Elsevier Science Ltd., 2009: 200–212.

    Google Scholar 

  31. D. Vrabie, F. L. Lewis. Adaptive dynamic programming for online solution of a zero-sum differential game. Journal of Control Theory and Applications, 2011, 9(3): 353–360.

    Article  MATH  MathSciNet  Google Scholar 

  32. J. Morimoto, G. Zeglin, C. Atkeson. Minimax differential dynamic programming: application to a biped walking robot. IEEE/RSJ International Conference on Intelligent Robots and Systems, New York: IEEE, 2003: 1927–1932.

    Google Scholar 

  33. T. Landelius. Reinforcement Learning and Distributed Local Model Synthesis. Ph.D. dissertation. Sweden: Linkoping University, 1997.

    Google Scholar 

  34. R. S. Sutton, A. G. Barto. Reinforcement Learning - An Introduction. Cambridge, MA: MIT Press, 1998.

    Book  Google Scholar 

  35. S. Sen, G. Weiss. Learning In Multiagent Systems, in Multiagent Systems: A Modern Approach to Distributed Artificial Intelligence. Cambridge, MA: MIT Press, 1999: 259–298.

    Google Scholar 

  36. K. G. Vamvoudakis, F. L. Lewis. Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica, 2010, 46(5): 878–888.

    Article  MATH  MathSciNet  Google Scholar 

  37. K. G. Vamvoudakis, F. L. Lewis. Multi-player non-zero sum games: online adaptive learning solution of coupled Hamilton-Jacobi equations. Automatica, 2011, 47(8): 1556–1569.

    Article  MATH  MathSciNet  Google Scholar 

  38. D. Vrabie, O. Pastravanu, F. L. Lewis, et al. Adaptive optimal control for continuous-time linear systems based on policy iteration. Automatica, 2009, 45(2): 477–484.

    Article  MATH  MathSciNet  Google Scholar 

  39. D. P. Bertsekas. Approximate policy iteration: A survey and some new methods. Journal of Control Theory and Applications, 2011, 9(3): 310–335.

    Article  MATH  MathSciNet  Google Scholar 

  40. L. Busoniu, R. Babuska, B. De-Schutter. A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man and Cybernetics, 2008, 38(2): 156–172.

    Article  Google Scholar 

  41. P. Vrancx, K. Verbeeck, A. Nowe. Decentralized learning in markov games. IEEE Transactions on Systems, Man and Cybernetics, 2008, 38(4): 976–981.

    Article  Google Scholar 

  42. M. L. Littman. Value-function reinforcement learning in Markov games. Cognitive Systems Research, 2001, 2(1): 55–66.

    Article  Google Scholar 

  43. Y. Jiang, Z. Jiang. Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics. Automatica, 2012, 48(10): 2699–2704.

    Article  MATH  MathSciNet  Google Scholar 

  44. Y. Jiang, Z. Jiang. Global adaptive dynamic programming for continuous-time nonlinear systems. 2013: http://arxiv.org/abs/1401.0020.

    Google Scholar 

  45. T. Dierks, S. Jagannathan. Optimal control of affine nonlinear continuous-time systems using an online Hamilton-Jacobi-Isaacs formulation. Proceedings of the 49th IEEE Conference on Decision and Control, New York: IEEE, 2010: 3048–3053.

    Chapter  Google Scholar 

  46. M. Johnson, T. Hiramatsu, N. Fitz-Coy, et al. Asymptotic Stackelberg optimal control design for an uncertain Euler Lagrange system. Proceedings of the 49th IEEE Conference on Decision and Control, New York: IEEE, 2010: 6686–6691.

    Chapter  Google Scholar 

  47. F. L. Lewis. Applied Optimal Control and Estimation: Digital Design and Implementation. Englewood Cliffs: Prentice Hall, 1992.

    Google Scholar 

  48. S. Khoo, L. Xie, Z. Man. Robust finite-time consensus tracking algorithm for multirobot systems. IEEE/ASME Transactions on Mechatronics, 2009, 14(2): 219–228.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohammed I. Abouheaf.

Additional information

This work was supported by the Deanship of Scientific Research at King Fahd University of Petroleum & Minerals Project (No. JF141002), the National Science Foundation (No. ECCS-1405173), the Office of Naval Research (Nos. N000141310562, N000141410718), the U.S. Army Research Office (No. W911NF-11-D-0001), the National Natural Science Foundation of China (No. 61120106011), and the Project 111 from the Ministry of Education of China (No. B08015).

Mohammed I. ABOUHEAF was born in Smanoud, Egypt. He received his B.Sc. and M.Sc. degrees in Electronics and Communication Engineering, Mansoura College of Engineering, Mansoura, Egypt, in 2000 and 2006, respectively. He worked as an assistant lecturer with the Air Defense College, Alexandria, Egypt (2001–2002). He worked as a planning engineer for the Maintenance Department, Suez Oil Company (SUCO), South Sinai, Egypt (2002–2004). He worked as an assistant lecturer with the Electrical Engineering Department, Aswan College of Energy Engineering, Aswan, Egypt (2004–2008). He received his Ph.D. degree in Electrical Engineering, University of Texas at Arlington (UTA), Arlington, Texas, U.S.A. in 2012. He worked as a postdoctoral fellow with the University of Texas at Arlington Research Institute (UTARI), Fort Worth, Texas, U.S.A. (2012–2013). He worked as Adjunct Faculty with the Electrical Engineering Department, University of Texas at Arlington (UTA), Arlington, Texas, U.S.A. (2012–2013). He was a member of the Advanced Controls and Sensor Group (ACS) and the Energy Systems Research Center (ESRC), University of Texas at Arlington, Arlington, Texas, U.S.A. (2008–2012). Currently, he is Assistant Professor with the Systems Engineering Department, King Fahd University of Petroleum and Minerals (KFUPM), Dhahran, Saudi Arabia. His research interests include optimal control, adaptive control, reinforcement learning, fuzzy systems, game theory, microgrids, and economic dispatch.

Frank L. LEWIS is Member, National Academy of Inventors. Fellow IEEE, Fellow IFAC, Fellow, U.K. Institute of Measurement & Control, PE Texas, U.K. Chartered Engineer, UTA Distinguished Scholar Professor, UTA Distinguished Teaching Professor, and Moncrief-O’Donnell Chair at the University of Texas at Arlington Research Institute. He is Qian Ren Thousand Talents Consulting Professor, Northeastern University, Shenyang, China. He obtained the B.S. degree in Physics/EE and the MSEE at Rice University, the M.S. in Aeronautical Engineering from University of West Florida, and the Ph.D. degree at Georgia Institute of Technology. He works in feedback control, intelligent systems, cooperative control systems, and nonlinear systems. He is Author of 6 U.S. patents, numerous journal special issues, journal papers, and 20 books, including Optimal Control, Aircraft Control, Optimal Estimation, and Robot Manipulator Control which are used as university textbooks worldwide. He received the Fulbright Research Award, NSF Research Initiation Grant, ASEE Terman Award, Int. Neural Network Soc. Gabor Award, U.K. Inst Measurement & Control Honeywell Field Engineering Medal, and IEEE Computational Intelligence Society Neural Networks Pioneer Award. He received Outstanding Service Award from Dallas IEEE Section, and selected as Engineer of the year by Ft. Worth IEEE Section. He was listed in Ft. Worth Business Press Top 200 Leaders in Manufacturing. He received Texas Regents Outstanding Teaching Award 2013. He is Distinguished Visiting Professor at Nanjing University of Science & Technology and Project 111 Professor at Northeastern University in Shenyang, China. He is Founding Member of the Board of Governors of the Mediterranean Control Association.

Magdi S. MAHMOUD obtained the B.Sc. degree (Honors) in Communication Engineering, M.Sc. degree in Electronic Engineering, and Ph.D. degree in Systems Engineering, all from Cairo University in 1968, 1972 and 1974, respectively. He has been a professor of Engineering since 1984. He is now a distinguished professor at KFUPM, Saudi Arabia. He was on the faculty at different universities worldwide including Egypt (CU, AUC), Kuwait (KU), U.A.E. (UAEU), U.K. (UMIST), U.S.A. (Pitt, Case Western), Singapore (Nanyang) and Australia (Adelaide). He lectured in Venezuela (Caracas), Germany (Hanover), U.K. ((Kent), U.S.A. (UoSA), Canada (Montreal) and China (BIT, Yanshan). He is the principal author of thirty-four (34) books, inclusive book-chapters and the author/co-author of more than 510 peer-reviewed papers. He is the recipient of two national, one regional and four university prizes for outstanding research in engineering and applied mathematics. He is a fellow of the IEE, a senior member of the IEEE, the CEI (U.K.), and a registered consultant engineer of information engineering and systems (Egypt). He is currently actively engaged in teaching and research in the development of modern methodologies to distributed control and filtering, networked-control systems, triggering mechanisms in dynamical systems, fault-tolerant systems and information technology. He is a fellow of the IEEE, a senior member of the IEEE, the CEI (U.K.), and a registered consultant engineer of information engineering and systems Egypt.

Dariusz MIKULSKI is a research computer scientist in Ground Vehicle Robotics at the U.S. Army Tank-Automotive Research Development and Engineering Center in Warren, MI. He currently works on research to improve cooperative teaming and cyber security in military unmanned convoy operations. Dr. Mikulski earned his Ph.D. degree in Electrical and Computer Engineering at Oakland University in Rochester Hills, Michigan in 2013. He also earned his B.Sc. in Computer Science from the University of Michigan in Ann Arbor and Masters in Computer Science and Engineering from Oakland University.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Abouheaf, M.I., Lewis, F.L., Mahmoud, M.S. et al. Discrete-time dynamic graphical games: model-free reinforcement learning solution. Control Theory Technol. 13, 55–69 (2015). https://doi.org/10.1007/s11768-015-3203-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11768-015-3203-x

Keywords

Navigation