Abstract
Modeling a system in engineering applications is a time-consuming and labor-intensive task, as system parameters may change with temperature, component aging, etc. In this paper, a novel data-driven model-free optimal controller based on deep deterministic policy gradient (DDPG) is proposed to address the problem of continuous-time leader-following multi-agent consensus. To deal with the problem of the dimensional explosion of state space and action space, two different types of neural nets are utilized to fit them instead of the time-consuming state iteration process. With minimal energy consumption, the proposed controller achieves consensus only based on the consensus error and does not require any initial admissible policies. Besides, the controller is self-learning, which means it can achieve optimal control by learning in real time as the system parameters change. Finally, the proofs of convergence and stability, as well as some simulation experiments, are provided to verify the algorithm’s effectiveness.
Similar content being viewed by others
References
Su H S, Zhang J X, Zeng Z G. Formation-containment control of multi-robot systems under a stochastic sampling mechanism. Sci China Tech Sci, 2020, 63: 1025–1034
Li Z, Yu H, Zhang G, et al. Network-wide traffic signal control optimization using a multi-agent deep reinforcement learning. Transp Res Part C-Emerg Tech, 2021, 125: 103059
Waschneck B, Reichstaller A, Belzner L, et al. Optimization of global production scheduling with deep reinforcement learning. Procedia CIRP, 2018, 72: 1264–1269
Cui K, Koeppl H. Approximately solving mean field games via entropy-regularized deep reinforcement learning. In: Proceedings of the 24th International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research. San Diego, 2021. 1909–1917
Lei L, Tan Y, Zheng K, et al. Deep reinforcement learning for autonomous internet of things: Model, applications and challenges. IEEE Commun Surv Tutorials, 2020, 22: 1722–1760
Difilippo G, Fanti M P, Mangini A M. Maximizing convergence speed for second order consensus in leaderless multi-agent systems. IEEE CAA J Autom Sin, 2021, 9: 259–269
Yu W, Chen G, Cao M. Some necessary and sufficient conditions for second-order consensus in multi-agent dynamical systems. Automatica, 2010, 46: 1089–1095
Ma L, Wang Z, Han Q L, et al. Consensus control of stochastic multi-agent systems: A survey. Sci China Inf Sci, 2017, 60: 120201
Wei Q, Wang X, Zhong X, et al. Consensus control of leader-following multi-agent systems in directed topology with heterogeneous disturbances. IEEE CAA J Autom Sin, 2021, 8: 423–431
Cai Y, Zhang H, Zhang J, et al. Fixed-time leader-following/containment consensus for a class of nonlinear multi-agent systems. Inform Sci, 2021, 555: 58–84
Wang H, Xue B, Xue A. Leader-following consensus control for semi-Markov jump multi-agent systems: An adaptive event-triggered scheme. J Franklin Inst, 2021, 358: 428–447
Wang X X, Liu Z X, Chen Z Q. Event-triggered fault-tolerant consensus control with control allocation in leader-following multi-agent systems. Sci China Tech Sci, 2021, 64: 879–889
Zhu W, Jiang Z P. Event-based leader-following consensus of multiagent systems with input time delay. IEEE Trans Automat Control, 2014, 60: 1362–1367
Sardellitti S, Barbarossa S, Swami A. Optimal topology control and power allocation for minimum energy consumption in consensus networks. IEEE Trans Signal Process, 2011, 60: 383–399
Li Q, Wei J, Gou Q, et al. Distributed adaptive fixed-time formation control for second-order multi-agent systems with collision avoidance. Inform Sci, 2021, 564: 27–44
He X Y, Wang Q Y, Hao Y Q. Finite-time adaptive formation control for multi-agent systems with uncertainties under collision avoidance and connectivity maintenance. Sci China Tech Sci, 2020, 63: 2305–2314
Gronauer S, Diepold K. Multi-agent deep reinforcement learning: A survey. Artif Intell Rev, 2022, 55: 895–943
Jiang R, Wang Z, He B, et al. A data-efficient goal-directed deep reinforcement learning method for robot visuomotor skill. Neurocomputing, 2021, 462: 389–401
Zhang Y, Meng F, Li P, et al. MS-Ranker: Accumulating evidence from potentially correct candidates via reinforcement learning for answer selection. Neurocomputing, 2021, 449: 270–279
Werbos P J, Miller W T, Sutton R S. A menu of designs for reinforcement learning over time. Neural Netw Contr, 1990, 3: 67–95
Doya K. Reinforcement learning in continuous time and space. Neural Comput, 2000, 12: 219–245
Modares H, Lewis F L. Linear quadratic tracking control of partially-unknown continuous-time systems using reinforcement learning. IEEE Trans Automat Control, 2014, 59: 3051–3056
Modares H, Lewis F L. Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning. Automatica, 2014, 50: 1780–1792
Luo B, Wu H N, Huang T, et al. Reinforcement learning solution for HJB equation arising in constrained optimal control problem. Neural Networks, 2015, 71: 150–158
Fujita T, Ushio T. Reinforcement learning-based optimal control considering L computation time delay of linear discrete-time systems. In: IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning. Orlando, 2014. 1–6
Kiumarsi B, Lewis F L, Modares H, et al. Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics. Automatica, 2014, 50: 1167–1175
Li H, Liu D, Wang D. Integral reinforcement learning for linear continuous-time zero-sum games with completely unknown dynamics. IEEE Trans Automat Sci Eng, 2014, 11: 706–714
Zhang X, Liu Y, Xu X, et al. Structural relational inference actor-critic for multi-agent reinforcement learning. Neurocomputing, 2021, 459: 383–394
Vamvoudakis K G, Lewis F L, Hudas G R. Multi-agent differential graphical games: Online adaptive learning solution for synchronization with optimality. Automatica, 2012, 48: 1598–1611
Abouheaf M I, Lewis F L, Vamvoudakis K G, et al. Multi-agent discrete-time graphical games and reinforcement learning solutions. Automatica, 2014, 50: 3038–3053
Abouheaf M, Lewis F, Haesaert S, et al. Multi-agent discrete-time graphical games: Interactive Nash equilibrium and value iteration solution. In: Proceedings of the 2013 American Control Conference. Washington DC, 2013. 4189–4195
Chen CLP, Wen G X, Liu Y J, et al. Adaptive consensus control for a class of nonlinear multiagent time-delay systems using neural networks. IEEE Trans Neural Netw Learn Syst, 2014, 25: 1217–1226
Li Y, Wang F, Liu Z, et al. Leader-follower optimal consensus of discrete-time linear multi-agent systems based on Q-learning. In: Proceedings of the 2021 Chinese Intelligent Systems Conference. Fuzhou, 2021. Singapore: Springer, 2022: 492–501
Zhu Z, Wang F, Liu Z, et al. Consensus of discrete-time multi-agent system based on Q-learning. Control Theory Appl, 2021, 38: 997–1005
Mu C, Zhao Q, Gao Z, et al. Q-learning solution for optimal consensus control of discrete-time multiagent systems using reinforcement learning. J Franklin Inst, 2019, 356: 6946–6967
Zou W, Zhou C, Guo J, et al. Global adaptive leader-following consensus for second-order nonlinear multiagent systems with switching topologies. IEEE Trans Circuits Syst II Express Briefs, 2020, 68: 702–706
Lillicrap T P, Hunt J J, Pritzel A, et al. Continuous control with deep reinforcement learning. arXiv: 1509.02971
Zhang H, Jiang H, Luo Y, et al. Data-driven optimal consensus control for discrete-time multi-agent systems with unknown dynamics using reinforcement learning method. IEEE Trans Ind Electron, 2016, 64: 4091–4100
Abouheaf M I, Lewis F L, Mahmoud M S, et al. Discrete-time dynamic graphical games: Model-free reinforcement learning solution. Control Theor Technol, 2015, 13: 55–69
Schaul T, Quan J, Antonoglou I, et al. Prioritized experience replay. arXiv: 1511.05952
Lazaric A, Restelli M, Bonarini A. Reinforcement learning in continuous action spaces through sequential monte carlo methods. Proc Adv Neural Inf Process Syst, 2007, 20: 1–8
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was supported by the Tianjin Natural Science Foundation of China (Grant No. 20JCYBJC01060), the National Natural Science Foundation of China (Grant Nos. 62103203 and 61973175), and the Fundamental Research Funds for the Central Universities, Nankai University (Grant No. 63221218).
Rights and permissions
About this article
Cite this article
Li, Y., Liu, Z., Lan, G. et al. A DDPG-based solution for optimal consensus of continuous-time linear multi-agent systems. Sci. China Technol. Sci. 66, 2441–2453 (2023). https://doi.org/10.1007/s11431-022-2216-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11431-022-2216-9