Abstract
Decentralized multi-agent control has broad applications, ranging from multi-robot cooperation to distributed sensor networks. In decentralized multi-agent control, systems are complex with unknown or highly uncertain dynamics, where traditional model-based control methods can hardly be applied. Compared with model-based control in control theory, deep reinforcement learning (DRL) is promising to learn the controller/policy from data without the knowing system dynamics. However, to directly apply DRL to decentralized multi-agent control is challenging, as interactions among agents make the learning environment non-stationary. More importantly, the existing multi-agent reinforcement learning (MARL) algorithms cannot ensure the closed-loop stability of a multi-agent system from a control-theoretic perspective, so the learned control polices are highly possible to generate abnormal or dangerous behaviors in real applications. Hence, without stability guarantee, the application of the existing MARL algorithms to real multi-agent systems is of great concern, e.g., UAVs, robots, and power systems, etc. In this paper, we aim to propose a new MARL algorithm for decentralized multi-agent control with a stability guarantee. The new MARL algorithm, termed as a multi-agent soft-actor critic (MASAC), is proposed under the well-known framework of “centralized-training-with-decentralized-execution”. The closed-loop stability is guaranteed by the introduction of a stability constraint during the policy improvement in our MASAC algorithm. The stability constraint is designed based on Lyapunov’s method in control theory. To demonstrate the effectiveness, we present a multi-agent navigation example to show the efficiency of the proposed MASAC algorithm.
Keywords
- Multi-agent reinforcement learning
- Lyapunov stability
- Decentralized control
- Collective robotic systems
This is a preview of subscription content, access via your institution.
Buying options



References
Ackermann, J., Gabler, V., Osa, T., Sugiyama, M.: Reducing overestimation bias in multi-agent domains using double centralized critics. arXiv preprint arXiv:1910.01465 (2019)
Andrychowicz, M., et al.: Learning dexterous in-hand manipulation. Int. J. Robot. Res. 30(1), 3–20 (2020). https://doi.org/10.1177/0278364919887447
Bakule, L.: Decentralized control: an overview. Annu. Rev. Control 32, 87–98 (2008). https://doi.org/10.1016/j.arcontrol.2008.03.004
van den Berg, J., Lin, M.C., Manocha, D.: Reciprocal velocity obstacles for real-time multi-agent navigation. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Pasadena, CA, USA. IEEE, May 2008. https://doi.org/10.1109/ROBOT.2008.4543489
Berkenkamp, F., Turchetta, M., Schoellig, A.P., Krause, A.: Safe model-based reinforcement learning with stability guarantees. arXiv preprint arXiv:1705.08551 (2017)
Burmeister, B., Haddadi, A., Matylis, G.: Application of multi-agent systems in traffic and transportation. IEE Proc. Softw. Eng. 144(1), 51–60 (1997). https://doi.org/10.1049/ip-sen:19971023
Chen, Y.F., Everett, M., Liu, M., How, J.P.: Socially aware motion planning with deep reinforcement learning. In: Proceedings of 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada, September 2017. https://doi.org/10.1109/IROS.2017.8202312
Chen, Y.F., Liu, M., Everett, M., How, J.P.: Decentralized non-communicating multiagent collision avoidance with deep reinforcement learning. In: Proceedings of 2017 IEEE International Conference on Robotics and Automation, Singapore, Singapore, June 2017. https://doi.org/10.1109/ICRA.2017.7989037
Cheng, Z., Zhang, H.T., Fan, M.C., Chen, G.: Distributed consensus of multi-agent systems with input constraints: a model predictive control approach. IEEE Trans. Circuits Syst. I Regul. Paper 62(3), 825–834 (2015). https://doi.org/10.1109/TCSI.2014.2367575
Chow, Y., Nachum, O., Duenez-Guzman, E., Ghavamzadeh, M.: A Lyapunov-based approach to safe reinforcement learning. arXiv preprint arXiv:1805.07708 (2018)
Diestel, R.: Graph Theory, 2nd edn. Springer, New York (2000). https://doi.org/10.1007/978-3-662-53622-3
Everett, M., Chen, Y.F., How, J.P.: Collision avoidance in pedestrian-rich environments with deep reinforcement learning. arXiv preprint arXiv:1910.11689 (2019)
Feddema, J.T., Lewis, C., Schoenwald, D.A.: Decentralized control of cooperative robotic vehicles: theory and application. IEEE Trans. Robot. Autom. 18(5), 852–864 (2002). https://doi.org/10.1109/TRA.2002.803466
Foerster, J.N., Farquhar, G., Afouras, T., Nardelli, N., Whiteson, S.: Counterfactual multi-agent policy gradients. arXiv preprint arXiv:1705.08926 (2017)
Fujimoto, S., van Hoof, H., Meger, D.: Addressing function approximation error in actor-critic methods. arXiv preprint arXiv:1802.09477 (2018)
Godsil, C., Royle, G.: Algebraic Graph Theory. Springer, New York (2000). https://doi.org/10.1007/978-1-4613-0163-9
Guo, Y., Hill, D.J., Wang, Y.: Nonlinear decentralized control of large-scale power systems. Automatica 36(9), 1275–1289 (2000). https://doi.org/10.1016/S0005-1098(00)00038-8
Haarnoja, T., et al.: Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905 (2018)
Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. arXiv preprint arXiv:1801.01290 (2018)
Han, M., Tian, Y., Zhang, L., Wang, J., Pan, W.: \(h_{\infty }\) model-free reinforcement learning with robust stability guarantee. arXiv preprint arXiv:1911.02875 (2019)
van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double q-learning. arXiv preprint arXiv:1509.06461 (2015)
Iqbal, S., Sha, F.: Actor-attention-critic for multi-agent reinforcement learning. arXiv preprint arXiv:1810.0291 (2019)
Keviczky, T., Borrelli, F., Balas, G.J.: Decentralized receding horizon control for large scale dynamically decoupled systems. Automatica 42, 2105–2115 (2006). https://doi.org/10.1016/j.automatica.2006.07.008
Khalil, H.K.: Nonlinear Systems, 3rd edn. Prentice Hall, Upper Saddle River (2001)
Kiran, B.R., et al.: Deep reinforcement learning for autonomous driving: a survey. arXiv preprint arXiv:2002.00444 (2020)
Levine, S.: Reinforcement learning and control as probabilistic inference: tutorial and review. arXiv preprint arXiv:1805.00909 (2018)
Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015)
Lin, J., Morse, A., Anderson, B.: Lenient learners in cooperative multiagent systems. In: Proceedings of 42nd IEEE International Conference on Decision and Control (IEEE Cat. No. 03CH37475), Maui, HI, USA, December 2003. https://doi.org/10.1109/CDC.2003.1272825
Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, P., Mordatch, I.: Multi-agent actor-critic for mixed cooperative-competitive environments. arXiv preprint arXiv:1706.02275 (2018)
Kalashnikov, D.: QT-Opt: scalable deep reinforcement learning for vision-based robotic manipulation. arXiv preprint arXiv:1806.10293 (2018)
Mayne, D., Rawlings, J., Rao, C., Scokaert, P.O.M.: Constrained model predictive control: stability and optimality. Automatica 36(6), 789–814 (2000). https://doi.org/10.1016/S0005-1098(99)00214-9
Olfati-Saber, R., Shamma, J.: Consensus filters for sensor networks and distributed sensor fusion. In: Proceedings of 44-th IEEE International Conference on Decision and Control, Seville, Spain, December 2005. https://doi.org/10.1109/CDC.2005.1583238
Olfati-Saber, R., Fax, J.A., Murray, R.M.: Consensus and cooperation in networked multi-agent systems. Proc. IEEE 95(1), (2007). https://doi.org/10.1109/JPROC.2006.887293
Ren, W.: Distributed leaderless consensus algorithms for networked Euler-Lagrange systems. Int. J. Control 82(11), 2137–2149 (2009). https://doi.org/10.1080/00207170902948027
Ren, W., Beard, R.W.: Decentralized scheme for spacecraft formation flying via the virtual structure approach. J. Guid. Control Dyn. 27(1), 706–716 (2004). https://doi.org/10.2514/1.9287
Rezaee, H., Abdollahi, F.: A decentralized cooperative control scheme with obstacle avoidance for a team of mobile robots. IEEE Trans. Industr. Electron. 61(1), 347–354 (2014). https://doi.org/10.1109/TIE.2013.2245612
Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: Proceedings of the 31st International Conference on Machine Learning, Lille, France, pp. 1889–1897, June 2015
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introductions, 2nd edn. The MIT Press, Cambridge (2018)
Vásárhelyi, G., Virágh, C., Somorjai, G., Nepusz, T., Eiben, A.E., Vicsek, T.: Optimized flocking of autonomous drones in confined environments. Sci. Robot. 3(20), (2018). https://doi.org/10.1126/scirobotics.aat3536
Wang, J., Xin, M.: Integrated optimal formation control of multiple unmanned aerial vehicles. IEEE Trans. Control Syst. Technol. 21(5), 1731–1744 (2013). https://doi.org/10.1109/TCST.2012.2218815
Yang, Y., Luo, R., Li, M., Zhou, M., Zhang, W., Wang, J.: Mean field multi-agent reinforcement learning. arXiv preprint arXiv:1802.05438 (2018)
Zhang, Q., Liu, H.H.T.: Aerodynamic model-based robust adaptive control for close formation flight. Aerosp. Sci. Technol. 79, 5–16 (2018). https://doi.org/10.1016/j.ast.2018.05.029
Zhang, Q., Liu, H.H.T.: UDE-based robust command filtered backstepping control for close formation flight. IEEE Trans. Industr. Electron. 65(11), 8818–8827 (2018). https://doi.org/10.1109/TIE.2018.2811367. Accessed 12 Mar 2018
Zhang, Q., Pan, W., Reppa, V.: Model-reference reinforcement learning control of autonomous surface vehicles with uncertainties. arXiv preprint arXiv:2003.13839 (2020)
Zhang, Q., Pan, W., Reppa, V.: Model-reference reinforcement learning for collision-free tracking control of autonomous surface vehicles. arXiv preprint arXiv:2008.07240 (2020)
Ziebart, B.D.: Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy, December 2010. https://doi.org/10.1184/R1/6720692.v1. https://kilthub.cmu.edu/articles/Modeling_Purposeful_Adaptive_Behavior_with_the_Principle_of_Maximum_Causal_Entropy/6720692
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Zhang, Q., Dong, H., Pan, W. (2020). Lyapunov-Based Reinforcement Learning for Decentralized Multi-agent Control. In: Taylor, M.E., Yu, Y., Elkind, E., Gao, Y. (eds) Distributed Artificial Intelligence. DAI 2020. Lecture Notes in Computer Science(), vol 12547. Springer, Cham. https://doi.org/10.1007/978-3-030-64096-5_5
Download citation
DOI: https://doi.org/10.1007/978-3-030-64096-5_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-64095-8
Online ISBN: 978-3-030-64096-5
eBook Packages: Computer ScienceComputer Science (R0)