Skip to main content
Log in

A DDPG-based solution for optimal consensus of continuous-time linear multi-agent systems

  • Article
  • Published:
Science China Technological Sciences Aims and scope Submit manuscript

Abstract

Modeling a system in engineering applications is a time-consuming and labor-intensive task, as system parameters may change with temperature, component aging, etc. In this paper, a novel data-driven model-free optimal controller based on deep deterministic policy gradient (DDPG) is proposed to address the problem of continuous-time leader-following multi-agent consensus. To deal with the problem of the dimensional explosion of state space and action space, two different types of neural nets are utilized to fit them instead of the time-consuming state iteration process. With minimal energy consumption, the proposed controller achieves consensus only based on the consensus error and does not require any initial admissible policies. Besides, the controller is self-learning, which means it can achieve optimal control by learning in real time as the system parameters change. Finally, the proofs of convergence and stability, as well as some simulation experiments, are provided to verify the algorithm’s effectiveness.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Su H S, Zhang J X, Zeng Z G. Formation-containment control of multi-robot systems under a stochastic sampling mechanism. Sci China Tech Sci, 2020, 63: 1025–1034

    Article  Google Scholar 

  2. Li Z, Yu H, Zhang G, et al. Network-wide traffic signal control optimization using a multi-agent deep reinforcement learning. Transp Res Part C-Emerg Tech, 2021, 125: 103059

    Article  Google Scholar 

  3. Waschneck B, Reichstaller A, Belzner L, et al. Optimization of global production scheduling with deep reinforcement learning. Procedia CIRP, 2018, 72: 1264–1269

    Article  Google Scholar 

  4. Cui K, Koeppl H. Approximately solving mean field games via entropy-regularized deep reinforcement learning. In: Proceedings of the 24th International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research. San Diego, 2021. 1909–1917

  5. Lei L, Tan Y, Zheng K, et al. Deep reinforcement learning for autonomous internet of things: Model, applications and challenges. IEEE Commun Surv Tutorials, 2020, 22: 1722–1760

    Article  Google Scholar 

  6. Difilippo G, Fanti M P, Mangini A M. Maximizing convergence speed for second order consensus in leaderless multi-agent systems. IEEE CAA J Autom Sin, 2021, 9: 259–269

    Article  MathSciNet  Google Scholar 

  7. Yu W, Chen G, Cao M. Some necessary and sufficient conditions for second-order consensus in multi-agent dynamical systems. Automatica, 2010, 46: 1089–1095

    Article  MathSciNet  MATH  Google Scholar 

  8. Ma L, Wang Z, Han Q L, et al. Consensus control of stochastic multi-agent systems: A survey. Sci China Inf Sci, 2017, 60: 120201

    Article  MathSciNet  Google Scholar 

  9. Wei Q, Wang X, Zhong X, et al. Consensus control of leader-following multi-agent systems in directed topology with heterogeneous disturbances. IEEE CAA J Autom Sin, 2021, 8: 423–431

    Article  MathSciNet  Google Scholar 

  10. Cai Y, Zhang H, Zhang J, et al. Fixed-time leader-following/containment consensus for a class of nonlinear multi-agent systems. Inform Sci, 2021, 555: 58–84

    Article  MathSciNet  MATH  Google Scholar 

  11. Wang H, Xue B, Xue A. Leader-following consensus control for semi-Markov jump multi-agent systems: An adaptive event-triggered scheme. J Franklin Inst, 2021, 358: 428–447

    Article  MathSciNet  MATH  Google Scholar 

  12. Wang X X, Liu Z X, Chen Z Q. Event-triggered fault-tolerant consensus control with control allocation in leader-following multi-agent systems. Sci China Tech Sci, 2021, 64: 879–889

    Article  Google Scholar 

  13. Zhu W, Jiang Z P. Event-based leader-following consensus of multiagent systems with input time delay. IEEE Trans Automat Control, 2014, 60: 1362–1367

    Article  MATH  Google Scholar 

  14. Sardellitti S, Barbarossa S, Swami A. Optimal topology control and power allocation for minimum energy consumption in consensus networks. IEEE Trans Signal Process, 2011, 60: 383–399

    Article  MathSciNet  MATH  Google Scholar 

  15. Li Q, Wei J, Gou Q, et al. Distributed adaptive fixed-time formation control for second-order multi-agent systems with collision avoidance. Inform Sci, 2021, 564: 27–44

    Article  MathSciNet  Google Scholar 

  16. He X Y, Wang Q Y, Hao Y Q. Finite-time adaptive formation control for multi-agent systems with uncertainties under collision avoidance and connectivity maintenance. Sci China Tech Sci, 2020, 63: 2305–2314

    Article  Google Scholar 

  17. Gronauer S, Diepold K. Multi-agent deep reinforcement learning: A survey. Artif Intell Rev, 2022, 55: 895–943

    Article  Google Scholar 

  18. Jiang R, Wang Z, He B, et al. A data-efficient goal-directed deep reinforcement learning method for robot visuomotor skill. Neurocomputing, 2021, 462: 389–401

    Article  Google Scholar 

  19. Zhang Y, Meng F, Li P, et al. MS-Ranker: Accumulating evidence from potentially correct candidates via reinforcement learning for answer selection. Neurocomputing, 2021, 449: 270–279

    Article  Google Scholar 

  20. Werbos P J, Miller W T, Sutton R S. A menu of designs for reinforcement learning over time. Neural Netw Contr, 1990, 3: 67–95

    Google Scholar 

  21. Doya K. Reinforcement learning in continuous time and space. Neural Comput, 2000, 12: 219–245

    Article  Google Scholar 

  22. Modares H, Lewis F L. Linear quadratic tracking control of partially-unknown continuous-time systems using reinforcement learning. IEEE Trans Automat Control, 2014, 59: 3051–3056

    Article  MathSciNet  MATH  Google Scholar 

  23. Modares H, Lewis F L. Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning. Automatica, 2014, 50: 1780–1792

    Article  MathSciNet  MATH  Google Scholar 

  24. Luo B, Wu H N, Huang T, et al. Reinforcement learning solution for HJB equation arising in constrained optimal control problem. Neural Networks, 2015, 71: 150–158

    Article  MATH  Google Scholar 

  25. Fujita T, Ushio T. Reinforcement learning-based optimal control considering L computation time delay of linear discrete-time systems. In: IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning. Orlando, 2014. 1–6

  26. Kiumarsi B, Lewis F L, Modares H, et al. Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics. Automatica, 2014, 50: 1167–1175

    Article  MathSciNet  MATH  Google Scholar 

  27. Li H, Liu D, Wang D. Integral reinforcement learning for linear continuous-time zero-sum games with completely unknown dynamics. IEEE Trans Automat Sci Eng, 2014, 11: 706–714

    Article  Google Scholar 

  28. Zhang X, Liu Y, Xu X, et al. Structural relational inference actor-critic for multi-agent reinforcement learning. Neurocomputing, 2021, 459: 383–394

    Article  Google Scholar 

  29. Vamvoudakis K G, Lewis F L, Hudas G R. Multi-agent differential graphical games: Online adaptive learning solution for synchronization with optimality. Automatica, 2012, 48: 1598–1611

    Article  MathSciNet  MATH  Google Scholar 

  30. Abouheaf M I, Lewis F L, Vamvoudakis K G, et al. Multi-agent discrete-time graphical games and reinforcement learning solutions. Automatica, 2014, 50: 3038–3053

    Article  MathSciNet  MATH  Google Scholar 

  31. Abouheaf M, Lewis F, Haesaert S, et al. Multi-agent discrete-time graphical games: Interactive Nash equilibrium and value iteration solution. In: Proceedings of the 2013 American Control Conference. Washington DC, 2013. 4189–4195

  32. Chen CLP, Wen G X, Liu Y J, et al. Adaptive consensus control for a class of nonlinear multiagent time-delay systems using neural networks. IEEE Trans Neural Netw Learn Syst, 2014, 25: 1217–1226

    Article  Google Scholar 

  33. Li Y, Wang F, Liu Z, et al. Leader-follower optimal consensus of discrete-time linear multi-agent systems based on Q-learning. In: Proceedings of the 2021 Chinese Intelligent Systems Conference. Fuzhou, 2021. Singapore: Springer, 2022: 492–501

    Chapter  Google Scholar 

  34. Zhu Z, Wang F, Liu Z, et al. Consensus of discrete-time multi-agent system based on Q-learning. Control Theory Appl, 2021, 38: 997–1005

    MATH  Google Scholar 

  35. Mu C, Zhao Q, Gao Z, et al. Q-learning solution for optimal consensus control of discrete-time multiagent systems using reinforcement learning. J Franklin Inst, 2019, 356: 6946–6967

    Article  MathSciNet  MATH  Google Scholar 

  36. Zou W, Zhou C, Guo J, et al. Global adaptive leader-following consensus for second-order nonlinear multiagent systems with switching topologies. IEEE Trans Circuits Syst II Express Briefs, 2020, 68: 702–706

    Google Scholar 

  37. Lillicrap T P, Hunt J J, Pritzel A, et al. Continuous control with deep reinforcement learning. arXiv: 1509.02971

  38. Zhang H, Jiang H, Luo Y, et al. Data-driven optimal consensus control for discrete-time multi-agent systems with unknown dynamics using reinforcement learning method. IEEE Trans Ind Electron, 2016, 64: 4091–4100

    Article  Google Scholar 

  39. Abouheaf M I, Lewis F L, Mahmoud M S, et al. Discrete-time dynamic graphical games: Model-free reinforcement learning solution. Control Theor Technol, 2015, 13: 55–69

    Article  MathSciNet  MATH  Google Scholar 

  40. Schaul T, Quan J, Antonoglou I, et al. Prioritized experience replay. arXiv: 1511.05952

  41. Lazaric A, Restelli M, Bonarini A. Reinforcement learning in continuous action spaces through sequential monte carlo methods. Proc Adv Neural Inf Process Syst, 2007, 20: 1–8

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to ZhongXin Liu.

Additional information

This work was supported by the Tianjin Natural Science Foundation of China (Grant No. 20JCYBJC01060), the National Natural Science Foundation of China (Grant Nos. 62103203 and 61973175), and the Fundamental Research Funds for the Central Universities, Nankai University (Grant No. 63221218).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, Y., Liu, Z., Lan, G. et al. A DDPG-based solution for optimal consensus of continuous-time linear multi-agent systems. Sci. China Technol. Sci. 66, 2441–2453 (2023). https://doi.org/10.1007/s11431-022-2216-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11431-022-2216-9

Navigation