Skip to main content

Lyapunov-Based Reinforcement Learning for Decentralized Multi-agent Control

Part of the Lecture Notes in Computer Science book series (LNAI,volume 12547)


Decentralized multi-agent control has broad applications, ranging from multi-robot cooperation to distributed sensor networks. In decentralized multi-agent control, systems are complex with unknown or highly uncertain dynamics, where traditional model-based control methods can hardly be applied. Compared with model-based control in control theory, deep reinforcement learning (DRL) is promising to learn the controller/policy from data without the knowing system dynamics. However, to directly apply DRL to decentralized multi-agent control is challenging, as interactions among agents make the learning environment non-stationary. More importantly, the existing multi-agent reinforcement learning (MARL) algorithms cannot ensure the closed-loop stability of a multi-agent system from a control-theoretic perspective, so the learned control polices are highly possible to generate abnormal or dangerous behaviors in real applications. Hence, without stability guarantee, the application of the existing MARL algorithms to real multi-agent systems is of great concern, e.g., UAVs, robots, and power systems, etc. In this paper, we aim to propose a new MARL algorithm for decentralized multi-agent control with a stability guarantee. The new MARL algorithm, termed as a multi-agent soft-actor critic (MASAC), is proposed under the well-known framework of “centralized-training-with-decentralized-execution”. The closed-loop stability is guaranteed by the introduction of a stability constraint during the policy improvement in our MASAC algorithm. The stability constraint is designed based on Lyapunov’s method in control theory. To demonstrate the effectiveness, we present a multi-agent navigation example to show the efficiency of the proposed MASAC algorithm.


  • Multi-agent reinforcement learning
  • Lyapunov stability
  • Decentralized control
  • Collective robotic systems

This is a preview of subscription content, access via your institution.

Buying options

USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-030-64096-5_5
  • Chapter length: 14 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
USD   59.99
Price excludes VAT (USA)
  • ISBN: 978-3-030-64096-5
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   74.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.
Fig. 3.


  1. Ackermann, J., Gabler, V., Osa, T., Sugiyama, M.: Reducing overestimation bias in multi-agent domains using double centralized critics. arXiv preprint arXiv:1910.01465 (2019)

  2. Andrychowicz, M., et al.: Learning dexterous in-hand manipulation. Int. J. Robot. Res. 30(1), 3–20 (2020).

    CrossRef  Google Scholar 

  3. Bakule, L.: Decentralized control: an overview. Annu. Rev. Control 32, 87–98 (2008).

    CrossRef  Google Scholar 

  4. van den Berg, J., Lin, M.C., Manocha, D.: Reciprocal velocity obstacles for real-time multi-agent navigation. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Pasadena, CA, USA. IEEE, May 2008.

  5. Berkenkamp, F., Turchetta, M., Schoellig, A.P., Krause, A.: Safe model-based reinforcement learning with stability guarantees. arXiv preprint arXiv:1705.08551 (2017)

  6. Burmeister, B., Haddadi, A., Matylis, G.: Application of multi-agent systems in traffic and transportation. IEE Proc. Softw. Eng. 144(1), 51–60 (1997).

    CrossRef  Google Scholar 

  7. Chen, Y.F., Everett, M., Liu, M., How, J.P.: Socially aware motion planning with deep reinforcement learning. In: Proceedings of 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada, September 2017.

  8. Chen, Y.F., Liu, M., Everett, M., How, J.P.: Decentralized non-communicating multiagent collision avoidance with deep reinforcement learning. In: Proceedings of 2017 IEEE International Conference on Robotics and Automation, Singapore, Singapore, June 2017.

  9. Cheng, Z., Zhang, H.T., Fan, M.C., Chen, G.: Distributed consensus of multi-agent systems with input constraints: a model predictive control approach. IEEE Trans. Circuits Syst. I Regul. Paper 62(3), 825–834 (2015).

    CrossRef  MathSciNet  Google Scholar 

  10. Chow, Y., Nachum, O., Duenez-Guzman, E., Ghavamzadeh, M.: A Lyapunov-based approach to safe reinforcement learning. arXiv preprint arXiv:1805.07708 (2018)

  11. Diestel, R.: Graph Theory, 2nd edn. Springer, New York (2000).

    CrossRef  MATH  Google Scholar 

  12. Everett, M., Chen, Y.F., How, J.P.: Collision avoidance in pedestrian-rich environments with deep reinforcement learning. arXiv preprint arXiv:1910.11689 (2019)

  13. Feddema, J.T., Lewis, C., Schoenwald, D.A.: Decentralized control of cooperative robotic vehicles: theory and application. IEEE Trans. Robot. Autom. 18(5), 852–864 (2002).

    CrossRef  Google Scholar 

  14. Foerster, J.N., Farquhar, G., Afouras, T., Nardelli, N., Whiteson, S.: Counterfactual multi-agent policy gradients. arXiv preprint arXiv:1705.08926 (2017)

  15. Fujimoto, S., van Hoof, H., Meger, D.: Addressing function approximation error in actor-critic methods. arXiv preprint arXiv:1802.09477 (2018)

  16. Godsil, C., Royle, G.: Algebraic Graph Theory. Springer, New York (2000).

    CrossRef  MATH  Google Scholar 

  17. Guo, Y., Hill, D.J., Wang, Y.: Nonlinear decentralized control of large-scale power systems. Automatica 36(9), 1275–1289 (2000).

    CrossRef  MathSciNet  MATH  Google Scholar 

  18. Haarnoja, T., et al.: Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905 (2018)

  19. Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. arXiv preprint arXiv:1801.01290 (2018)

  20. Han, M., Tian, Y., Zhang, L., Wang, J., Pan, W.: \(h_{\infty }\) model-free reinforcement learning with robust stability guarantee. arXiv preprint arXiv:1911.02875 (2019)

  21. van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double q-learning. arXiv preprint arXiv:1509.06461 (2015)

  22. Iqbal, S., Sha, F.: Actor-attention-critic for multi-agent reinforcement learning. arXiv preprint arXiv:1810.0291 (2019)

  23. Keviczky, T., Borrelli, F., Balas, G.J.: Decentralized receding horizon control for large scale dynamically decoupled systems. Automatica 42, 2105–2115 (2006).

    CrossRef  MathSciNet  MATH  Google Scholar 

  24. Khalil, H.K.: Nonlinear Systems, 3rd edn. Prentice Hall, Upper Saddle River (2001)

    Google Scholar 

  25. Kiran, B.R., et al.: Deep reinforcement learning for autonomous driving: a survey. arXiv preprint arXiv:2002.00444 (2020)

  26. Levine, S.: Reinforcement learning and control as probabilistic inference: tutorial and review. arXiv preprint arXiv:1805.00909 (2018)

  27. Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015)

  28. Lin, J., Morse, A., Anderson, B.: Lenient learners in cooperative multiagent systems. In: Proceedings of 42nd IEEE International Conference on Decision and Control (IEEE Cat. No. 03CH37475), Maui, HI, USA, December 2003.

  29. Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, P., Mordatch, I.: Multi-agent actor-critic for mixed cooperative-competitive environments. arXiv preprint arXiv:1706.02275 (2018)

  30. Kalashnikov, D.: QT-Opt: scalable deep reinforcement learning for vision-based robotic manipulation. arXiv preprint arXiv:1806.10293 (2018)

  31. Mayne, D., Rawlings, J., Rao, C., Scokaert, P.O.M.: Constrained model predictive control: stability and optimality. Automatica 36(6), 789–814 (2000).

    CrossRef  MathSciNet  MATH  Google Scholar 

  32. Olfati-Saber, R., Shamma, J.: Consensus filters for sensor networks and distributed sensor fusion. In: Proceedings of 44-th IEEE International Conference on Decision and Control, Seville, Spain, December 2005.

  33. Olfati-Saber, R., Fax, J.A., Murray, R.M.: Consensus and cooperation in networked multi-agent systems. Proc. IEEE 95(1), (2007).

  34. Ren, W.: Distributed leaderless consensus algorithms for networked Euler-Lagrange systems. Int. J. Control 82(11), 2137–2149 (2009).

    CrossRef  MathSciNet  MATH  Google Scholar 

  35. Ren, W., Beard, R.W.: Decentralized scheme for spacecraft formation flying via the virtual structure approach. J. Guid. Control Dyn. 27(1), 706–716 (2004).

    CrossRef  Google Scholar 

  36. Rezaee, H., Abdollahi, F.: A decentralized cooperative control scheme with obstacle avoidance for a team of mobile robots. IEEE Trans. Industr. Electron. 61(1), 347–354 (2014).

    CrossRef  Google Scholar 

  37. Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: Proceedings of the 31st International Conference on Machine Learning, Lille, France, pp. 1889–1897, June 2015

    Google Scholar 

  38. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)

  39. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introductions, 2nd edn. The MIT Press, Cambridge (2018)

    MATH  Google Scholar 

  40. Vásárhelyi, G., Virágh, C., Somorjai, G., Nepusz, T., Eiben, A.E., Vicsek, T.: Optimized flocking of autonomous drones in confined environments. Sci. Robot. 3(20), (2018).

  41. Wang, J., Xin, M.: Integrated optimal formation control of multiple unmanned aerial vehicles. IEEE Trans. Control Syst. Technol. 21(5), 1731–1744 (2013).

    CrossRef  Google Scholar 

  42. Yang, Y., Luo, R., Li, M., Zhou, M., Zhang, W., Wang, J.: Mean field multi-agent reinforcement learning. arXiv preprint arXiv:1802.05438 (2018)

  43. Zhang, Q., Liu, H.H.T.: Aerodynamic model-based robust adaptive control for close formation flight. Aerosp. Sci. Technol. 79, 5–16 (2018).

    CrossRef  Google Scholar 

  44. Zhang, Q., Liu, H.H.T.: UDE-based robust command filtered backstepping control for close formation flight. IEEE Trans. Industr. Electron. 65(11), 8818–8827 (2018). Accessed 12 Mar 2018

    CrossRef  Google Scholar 

  45. Zhang, Q., Pan, W., Reppa, V.: Model-reference reinforcement learning control of autonomous surface vehicles with uncertainties. arXiv preprint arXiv:2003.13839 (2020)

  46. Zhang, Q., Pan, W., Reppa, V.: Model-reference reinforcement learning for collision-free tracking control of autonomous surface vehicles. arXiv preprint arXiv:2008.07240 (2020)

  47. Ziebart, B.D.: Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy, December 2010.

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Wei Pan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Zhang, Q., Dong, H., Pan, W. (2020). Lyapunov-Based Reinforcement Learning for Decentralized Multi-agent Control. In: Taylor, M.E., Yu, Y., Elkind, E., Gao, Y. (eds) Distributed Artificial Intelligence. DAI 2020. Lecture Notes in Computer Science(), vol 12547. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-64095-8

  • Online ISBN: 978-3-030-64096-5

  • eBook Packages: Computer ScienceComputer Science (R0)