A DDPG-based solution for optimal consensus of continuous-time linear multi-agent systems

Li, Ye; Liu, ZhongXin; Lan, Ge; Sader, Malika; Chen, ZengQiang

doi:10.1007/s11431-022-2216-9

A DDPG-based solution for optimal consensus of continuous-time linear multi-agent systems

Article
Published: 18 July 2023

Volume 66, pages 2441–2453, (2023)
Cite this article

Science China Technological Sciences Aims and scope Submit manuscript

Ye Li^1,2,
ZhongXin Liu^1,2,
Ge Lan³,
Malika Sader^1,2 &
…
ZengQiang Chen^1,2

118 Accesses
Explore all metrics

Abstract

Modeling a system in engineering applications is a time-consuming and labor-intensive task, as system parameters may change with temperature, component aging, etc. In this paper, a novel data-driven model-free optimal controller based on deep deterministic policy gradient (DDPG) is proposed to address the problem of continuous-time leader-following multi-agent consensus. To deal with the problem of the dimensional explosion of state space and action space, two different types of neural nets are utilized to fit them instead of the time-consuming state iteration process. With minimal energy consumption, the proposed controller achieves consensus only based on the consensus error and does not require any initial admissible policies. Besides, the controller is self-learning, which means it can achieve optimal control by learning in real time as the system parameters change. Finally, the proofs of convergence and stability, as well as some simulation experiments, are provided to verify the algorithm’s effectiveness.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improved Gradient-descent-based Control Algorithms for Multi-agent Systems With Fixed and Switching Topology

Article 25 September 2023

Neuro-adaptive augmented distributed nonlinear dynamic inversion for consensus of nonlinear agents with unknown external disturbance

Article Open access 07 February 2022

Adaptive Neural Consensus Tracking for Second-Order Nonlinear Multi-agent Systems with Full-State Constraints

References

Su H S, Zhang J X, Zeng Z G. Formation-containment control of multi-robot systems under a stochastic sampling mechanism. Sci China Tech Sci, 2020, 63: 1025–1034
Article Google Scholar
Li Z, Yu H, Zhang G, et al. Network-wide traffic signal control optimization using a multi-agent deep reinforcement learning. Transp Res Part C-Emerg Tech, 2021, 125: 103059
Article Google Scholar
Waschneck B, Reichstaller A, Belzner L, et al. Optimization of global production scheduling with deep reinforcement learning. Procedia CIRP, 2018, 72: 1264–1269
Article Google Scholar
Cui K, Koeppl H. Approximately solving mean field games via entropy-regularized deep reinforcement learning. In: Proceedings of the 24th International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research. San Diego, 2021. 1909–1917
Lei L, Tan Y, Zheng K, et al. Deep reinforcement learning for autonomous internet of things: Model, applications and challenges. IEEE Commun Surv Tutorials, 2020, 22: 1722–1760
Article Google Scholar
Difilippo G, Fanti M P, Mangini A M. Maximizing convergence speed for second order consensus in leaderless multi-agent systems. IEEE CAA J Autom Sin, 2021, 9: 259–269
Article MathSciNet Google Scholar
Yu W, Chen G, Cao M. Some necessary and sufficient conditions for second-order consensus in multi-agent dynamical systems. Automatica, 2010, 46: 1089–1095
Article MathSciNet MATH Google Scholar
Ma L, Wang Z, Han Q L, et al. Consensus control of stochastic multi-agent systems: A survey. Sci China Inf Sci, 2017, 60: 120201
Article MathSciNet Google Scholar
Wei Q, Wang X, Zhong X, et al. Consensus control of leader-following multi-agent systems in directed topology with heterogeneous disturbances. IEEE CAA J Autom Sin, 2021, 8: 423–431
Article MathSciNet Google Scholar
Cai Y, Zhang H, Zhang J, et al. Fixed-time leader-following/containment consensus for a class of nonlinear multi-agent systems. Inform Sci, 2021, 555: 58–84
Article MathSciNet MATH Google Scholar
Wang H, Xue B, Xue A. Leader-following consensus control for semi-Markov jump multi-agent systems: An adaptive event-triggered scheme. J Franklin Inst, 2021, 358: 428–447
Article MathSciNet MATH Google Scholar
Wang X X, Liu Z X, Chen Z Q. Event-triggered fault-tolerant consensus control with control allocation in leader-following multi-agent systems. Sci China Tech Sci, 2021, 64: 879–889
Article Google Scholar
Zhu W, Jiang Z P. Event-based leader-following consensus of multiagent systems with input time delay. IEEE Trans Automat Control, 2014, 60: 1362–1367
Article MATH Google Scholar
Sardellitti S, Barbarossa S, Swami A. Optimal topology control and power allocation for minimum energy consumption in consensus networks. IEEE Trans Signal Process, 2011, 60: 383–399
Article MathSciNet MATH Google Scholar
Li Q, Wei J, Gou Q, et al. Distributed adaptive fixed-time formation control for second-order multi-agent systems with collision avoidance. Inform Sci, 2021, 564: 27–44
Article MathSciNet Google Scholar
He X Y, Wang Q Y, Hao Y Q. Finite-time adaptive formation control for multi-agent systems with uncertainties under collision avoidance and connectivity maintenance. Sci China Tech Sci, 2020, 63: 2305–2314
Article Google Scholar
Gronauer S, Diepold K. Multi-agent deep reinforcement learning: A survey. Artif Intell Rev, 2022, 55: 895–943
Article Google Scholar
Jiang R, Wang Z, He B, et al. A data-efficient goal-directed deep reinforcement learning method for robot visuomotor skill. Neurocomputing, 2021, 462: 389–401
Article Google Scholar
Zhang Y, Meng F, Li P, et al. MS-Ranker: Accumulating evidence from potentially correct candidates via reinforcement learning for answer selection. Neurocomputing, 2021, 449: 270–279
Article Google Scholar
Werbos P J, Miller W T, Sutton R S. A menu of designs for reinforcement learning over time. Neural Netw Contr, 1990, 3: 67–95
Google Scholar
Doya K. Reinforcement learning in continuous time and space. Neural Comput, 2000, 12: 219–245
Article Google Scholar
Modares H, Lewis F L. Linear quadratic tracking control of partially-unknown continuous-time systems using reinforcement learning. IEEE Trans Automat Control, 2014, 59: 3051–3056
Article MathSciNet MATH Google Scholar
Modares H, Lewis F L. Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning. Automatica, 2014, 50: 1780–1792
Article MathSciNet MATH Google Scholar
Luo B, Wu H N, Huang T, et al. Reinforcement learning solution for HJB equation arising in constrained optimal control problem. Neural Networks, 2015, 71: 150–158
Article MATH Google Scholar
Fujita T, Ushio T. Reinforcement learning-based optimal control considering L computation time delay of linear discrete-time systems. In: IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning. Orlando, 2014. 1–6
Kiumarsi B, Lewis F L, Modares H, et al. Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics. Automatica, 2014, 50: 1167–1175
Article MathSciNet MATH Google Scholar
Li H, Liu D, Wang D. Integral reinforcement learning for linear continuous-time zero-sum games with completely unknown dynamics. IEEE Trans Automat Sci Eng, 2014, 11: 706–714
Article Google Scholar
Zhang X, Liu Y, Xu X, et al. Structural relational inference actor-critic for multi-agent reinforcement learning. Neurocomputing, 2021, 459: 383–394
Article Google Scholar
Vamvoudakis K G, Lewis F L, Hudas G R. Multi-agent differential graphical games: Online adaptive learning solution for synchronization with optimality. Automatica, 2012, 48: 1598–1611
Article MathSciNet MATH Google Scholar
Abouheaf M I, Lewis F L, Vamvoudakis K G, et al. Multi-agent discrete-time graphical games and reinforcement learning solutions. Automatica, 2014, 50: 3038–3053
Article MathSciNet MATH Google Scholar
Abouheaf M, Lewis F, Haesaert S, et al. Multi-agent discrete-time graphical games: Interactive Nash equilibrium and value iteration solution. In: Proceedings of the 2013 American Control Conference. Washington DC, 2013. 4189–4195
Chen CLP, Wen G X, Liu Y J, et al. Adaptive consensus control for a class of nonlinear multiagent time-delay systems using neural networks. IEEE Trans Neural Netw Learn Syst, 2014, 25: 1217–1226
Article Google Scholar
Li Y, Wang F, Liu Z, et al. Leader-follower optimal consensus of discrete-time linear multi-agent systems based on Q-learning. In: Proceedings of the 2021 Chinese Intelligent Systems Conference. Fuzhou, 2021. Singapore: Springer, 2022: 492–501
Chapter Google Scholar
Zhu Z, Wang F, Liu Z, et al. Consensus of discrete-time multi-agent system based on Q-learning. Control Theory Appl, 2021, 38: 997–1005
MATH Google Scholar
Mu C, Zhao Q, Gao Z, et al. Q-learning solution for optimal consensus control of discrete-time multiagent systems using reinforcement learning. J Franklin Inst, 2019, 356: 6946–6967
Article MathSciNet MATH Google Scholar
Zou W, Zhou C, Guo J, et al. Global adaptive leader-following consensus for second-order nonlinear multiagent systems with switching topologies. IEEE Trans Circuits Syst II Express Briefs, 2020, 68: 702–706
Google Scholar
Lillicrap T P, Hunt J J, Pritzel A, et al. Continuous control with deep reinforcement learning. arXiv: 1509.02971
Zhang H, Jiang H, Luo Y, et al. Data-driven optimal consensus control for discrete-time multi-agent systems with unknown dynamics using reinforcement learning method. IEEE Trans Ind Electron, 2016, 64: 4091–4100
Article Google Scholar
Abouheaf M I, Lewis F L, Mahmoud M S, et al. Discrete-time dynamic graphical games: Model-free reinforcement learning solution. Control Theor Technol, 2015, 13: 55–69
Article MathSciNet MATH Google Scholar
Schaul T, Quan J, Antonoglou I, et al. Prioritized experience replay. arXiv: 1511.05952
Lazaric A, Restelli M, Bonarini A. Reinforcement learning in continuous action spaces through sequential monte carlo methods. Proc Adv Neural Inf Process Syst, 2007, 20: 1–8
Google Scholar

Download references

Author information

Authors and Affiliations

College of Artificial Intelligence, Nankai University, Tianjin, 300350, China
Ye Li, ZhongXin Liu, Malika Sader & ZengQiang Chen
Tianjin Key Laboratory of Intelligent Robotics, Nankai University, Tianjin, 300350, China
Ye Li, ZhongXin Liu, Malika Sader & ZengQiang Chen
College of Software, Nankai University, Tianjin, 300350, China
Ge Lan

Authors

Ye Li
View author publications
You can also search for this author in PubMed Google Scholar
ZhongXin Liu
View author publications
You can also search for this author in PubMed Google Scholar
Ge Lan
View author publications
You can also search for this author in PubMed Google Scholar
Malika Sader
View author publications
You can also search for this author in PubMed Google Scholar
ZengQiang Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to ZhongXin Liu.

Additional information

This work was supported by the Tianjin Natural Science Foundation of China (Grant No. 20JCYBJC01060), the National Natural Science Foundation of China (Grant Nos. 62103203 and 61973175), and the Fundamental Research Funds for the Central Universities, Nankai University (Grant No. 63221218).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, Y., Liu, Z., Lan, G. et al. A DDPG-based solution for optimal consensus of continuous-time linear multi-agent systems. Sci. China Technol. Sci. 66, 2441–2453 (2023). https://doi.org/10.1007/s11431-022-2216-9

Download citation

Received: 01 June 2022
Accepted: 20 September 2022
Published: 18 July 2023
Issue Date: August 2023
DOI: https://doi.org/10.1007/s11431-022-2216-9

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A DDPG-based solution for optimal consensus of continuous-time linear multi-agent systems

Abstract

Access this article

Similar content being viewed by others

Improved Gradient-descent-based Control Algorithms for Multi-agent Systems With Fixed and Switching Topology

Neuro-adaptive augmented distributed nonlinear dynamic inversion for consensus of nonlinear agents with unknown external disturbance

Adaptive Neural Consensus Tracking for Second-Order Nonlinear Multi-agent Systems with Full-State Constraints

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Navigation

A DDPG-based solution for optimal consensus of continuous-time linear multi-agent systems

Abstract

Access this article

Similar content being viewed by others

Improved Gradient-descent-based Control Algorithms for Multi-agent Systems With Fixed and Switching Topology

Neuro-adaptive augmented distributed nonlinear dynamic inversion for consensus of nonlinear agents with unknown external disturbance

Adaptive Neural Consensus Tracking for Second-Order Nonlinear Multi-agent Systems with Full-State Constraints

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation