Abstract
This paper focuses on the optimal output synchronization control problem of heterogeneous multiagent systems (HMASs) subject to nonidentical communication delays by a reinforcement learning method. Compared with existing studies assuming that the precise model of the leader is globally or distributively accessible to all or some of the followers, the leader’s precise dynamical model is entirely inaccessible to all the followers in this paper. A data-based learning algorithm is first proposed to reconstruct the leader’s unknown system matrix online. A distributed predictor subject to communication delays is further devised to estimate the leader’s state, where interaction delays are allowed to be nonidentical. Then, a learning-based local controller, together with a discounted performance function, is projected to reach the optimal output synchronization. Bellman equations and game algebraic Riccati equations are constructed to learn the optimal solution by developing a model-based reinforcement learning (RL) algorithm online without solving regulator equations, which is followed by a model-free off-policy RL algorithm to relax the requirement of all agents’ dynamics faced by the model-based RL algorithm. The optimal tracking control of HMASs subject to unknown leader dynamics and communication delays is shown to be solvable under the proposed RL algorithms. Finally, the effectiveness of theoretical analysis is verified by numerical simulations.
Similar content being viewed by others
References
Ni W, Cheng D. Leader-following consensus of multi-agent systems under fixed and switching topologies. Syst Control Lett, 2010, 59: 209–217
Yu W W, Wang H, Hong H F, et al. Distributed cooperative anti-disturbance control of multi-agent systems: an overview. Sci China Inf Sci, 2017, 60: 110202
Liu T F, Qi J, Jiang Z P. Distributed containment control of multi-agent systems with velocity and acceleration saturations. Automatica, 2020, 117: 108992
Xu Y, Fang M, Shi P, et al. Multileader multiagent systems containment control with event-triggering. IEEE Trans Syst Man Cybern Syst, 2021, 51: 1642–1651
Liu T F, Qin Z, Jiang Z P. A new look at distributed optimal output agreement of multi-agent systems. Automatica, 2022, 136: 110053
Su Y F, Huang J. Cooperative output regulation of linear multi-agent systems. IEEE Trans Automat Contr, 2012, 57: 1062–1066
Lu M, Liu L. Cooperative output regulation of linear multi-agent systems by a novel distributed dynamic compensator. IEEE Trans Automat Contr, 2017, 62: 6481–6488
Deng C, Yang G H. Distributed adaptive fault-tolerant control approach to cooperative output regulation for linear multi-agent systems. Automatica, 2019, 103: 62–68
Dong S L, Chen G R, Liu M Q, et al. Cooperative neural-adaptive fault-tolerant output regulation for heterogeneous nonlinear uncertain multiagent systems with disturbance. Sci China Inf Sci, 2021, 64: 172212
Li G Q, Wang L. Adaptive output consensus of heterogeneous nonlinear multiagent systems: a distributed dynamic compensator approach. IEEE Trans Automat Contr, 2023, 68: 2483–2489
Xu Y, Wu Z G. Distributed adaptive event-triggered fault-tolerant synchronization for multiagent systems. IEEE Trans Ind Electron, 2021, 68: 1537–1547
Cai H, Lewis F L, Hu G, et al. The adaptive distributed observer approach to the cooperative output regulation of linear multi-agent systems. Automatica, 2017, 75: 299–305
Cai H, Huang J. Output based adaptive distributed output observer for leader-follower multiagent systems. Automatica, 2021, 125: 109413
Sutton R S, Barto A G. Introduction to Reinforcement Learning. Cambridge: MIT Press, 1998
Modares H, Nageshrao S P, Lopes G A D, et al. Optimal model-free output synchronization of heterogeneous systems using off-policy reinforcement learning. Automatica, 2016, 71: 334–341
Yang Y, Modares H, Wunsch D C, et al. Leader-follower output synchronization of linear heterogeneous systems with active leader using reinforcement learning. IEEE Trans Neural Netw Learn Syst, 2018, 29: 2139–2153
Mu C, Zhao Q, Sun C. Optimal tracking control of heterogeneous MASs using event-driven adaptive observer and reinforcement learning. IEEE Trans Neural Netw Learn Syst, 2022. doi: https://doi.org/10.1109/TNNLS.2022.3208237
Xu Y, Wu Z G. Data-efficient off-policy learning for distributed optimal tracking control of HMAS with unidentified exosystem dynamics. IEEE Trans Neural Netw Learn Syst, 2022. doi: https://doi.org/10.1109/TNNLS.2022.3172130
Li Q, Xia L, Song R. Leader-follower bipartite output synchronization on signed digraphs under adversarial factors via data-based reinforcement learning. IEEE Trans Neural Netw Learn Syst, 2020, 31: 4185–4195
Olfati-Saber R, Murray R M. Consensus problems in networks of agents with switching topology and time-delays. IEEE Trans Automat Contr, 2004, 49: 1520–1533
Zhou B, Lin Z. Consensus of high-order multi-agent systems with large input and communication delays. Automatica, 2014, 50: 452–464
Yang X, Zhou B. Consensus of discrete-time multiagent systems with input delays by truncated pseudo-predictor feedback. IEEE Trans Cybern, 2017, 49: 505–516
Ge X, Han Q L, Zhang X M. Achieving cluster formation of multi-agent systems under aperiodic sampling and communication delays. IEEE Trans Ind Electron, 2017, 65: 3417–3426
Lu M, Liu L. Consensus of linear multi-agent systems subject to communication delays and switching networks. Int J Robust Nonlinear Control, 2017, 27: 1379–1396
Lu M, Liu L. Distributed feedforward approach to cooperative output regulation subject to communication delays and switching networks. IEEE Trans Automat Contr, 2017, 62: 1999–2005
Yu J, Wang L. Group consensus in multi-agent systems with switching topologies and communication delays. Syst Control Lett, 2010, 59: 340–348
Jiang Y, Jiang Z P. Robust adaptive dynamic programming for large-scale systems with an application to multimachine power systems. IEEE Trans Circuits Syst II, 2012, 59: 693–697
Wang X, Sun J, Wang G, et al. Data-driven control of distributed event-triggered network systems. IEEE CAA J Autom Sin, 2023, 10: 351–364
Moreau L. Stability of continuous-time distributed consensus algorithms. In: Proceedings of the 43rd IEEE Conference on Decision and Control (CDC), 2004. 4: 3998–4003
Modares H, Lewis F L, Jiang Z P. H∞ tracking control of completely unknown continuous-time systems via off-policy reinforcement learning. IEEE Trans Neural Netw Learn Syst, 2015, 26: 2550–2562
Vamvoudakis K G, Lewis F L, Hudas G R. Multi-agent differential graphical games: online adaptive learning solution for synchronization with optimality. Automatica, 2012, 48: 1598–1611
Liu W, Sun J, Wang G, et al. Data-driven resilient predictive control under denial-of-service. IEEE Trans Automat Contr, 2022. doi: https://doi.org/10.1109/TAC.2022.3209399
Xu Y, Sun J, Pan Y J, et al. Dynamic deadband event-triggered strategy for distributed adaptive consensus control with applications to circuit systems. IEEE Trans Circuits Syst I, 2022, 69: 4663–4673
Zhang P, Liu T F, Jiang Z P. Event-triggered stabilization of a class of nonlinear time-delay systems. IEEE Trans Automat Contr, 2021, 66: 421–428
Xu Y. Resilient secure control of networked systems over unreliable communication networks. IEEE Trans Ind Inf, 2022, 18: 4069–4077
Tao Y Y, Wu Z G. Asynchronous control of two-dimensional Markov jump Roesser systems: an event-triggering strategy. IEEE Trans Netw Sci Eng, 2022, 9: 2278–2289
Acknowledgements
This work was supported by National Natural Science Foundation of China (Grant Nos. 62103047, U19662-02), Beijing Institute of Technology Research Fund Program for Young Scholars, and Young Elite Scientists Sponsorship Program by BAST (Grant No. BYESS2023365)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Xu, Y., Wu, ZG., Che, WW. et al. Reinforcement learning-based unknown reference tracking control of HMASs with nonidentical communication delays. Sci. China Inf. Sci. 66, 170203 (2023). https://doi.org/10.1007/s11432-022-3729-7
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11432-022-3729-7