Skip to main content
Log in

Enhancing Multi-agent Coordination via Dual-channel Consensus

  • Research Article
  • Published:
Machine Intelligence Research Aims and scope Submit manuscript

Abstract

Successful coordination in multi-agent systems requires agents to achieve consensus. Previous works propose methods through information sharing, such as explicit information sharing via communication protocols or exchanging information implicitly via behavior prediction. However, these methods may fail in the absence of communication channels or due to biased modeling. In this work, we propose to develop dual-channel consensus (DuCC) via contrastive representation learning for fully cooperative multi-agent systems, which does not need explicit communication and avoids biased modeling. DuCC comprises two types of consensus: temporally extended consensus within each agent (inner-agent consensus) and mutual consensus across agents (inter-agent consensus). To achieve DuCC, we design two objectives to learn representations of slow environmental features for inner-agent consensus and to realize cognitive consistency as inter-agent consensus. Our DuCC is highly general and can be flexibly combined with various MARL algorithms. The extensive experiments on StarCraft multi-agent challenge and Google research football demonstrate that our method efficiently reaches consensus and performs superiorly to state-of-the-art MARL algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. K. Q. Zhang, Z. R. Yang, T. Başar. Multi-agent reinforcement learning: A selective overview of theories and algorithms. Handbook of Reinforcement Learning and Control, K. G. Vamvoudakis, Y. Wan, F. L. Lewis, D. Cansever, Eds., Cham, Germany: Springer, pp.321–384, 2021. DOI: https://doi.org/10.1007/978-3-030-60990-0_12.

    Chapter  Google Scholar 

  2. W. Ren, R. W. Beard, E. M. Atkins. A survey of consensus problems in multi-agent coordination. In Proceedings of the American Control Conference, Portland, USA, pp. 1859–1864, 2005. DOI: https://doi.org/10.1109/ACC.2005.1470239.

  3. H. Y. Mao, W. L. Liu, J. Y. Hao, J. Luo, D. Li, Z. C. Zhang, J. Wang, Z. Xiao. Neighborhood cognition consistent multi-agent reinforcement learning. In Proceedings of the 34th AAAI Conference on Artificial Intelligence, New York, USA, pp. 7219–7226, 2020. DOI: https://doi.org/10.1609/aaai.v34i05.6212.

  4. J. N. Foerster, Y. M. Assael, N. de Freitas, S. Whiteson. Learning to communicate with deep multi-agent reinforcement learning. In Proceedings of the 30th International Conference on Neural Information Processing Systems, Barcelona, Spain, pp. 2145–2153, 2016.

  5. J. C. Jiang, Z. Q. Lu. Learning attentional communication for multi-agent cooperation. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, Montreal, Canada, pp. 7265–7275, 2018.

  6. A. Das, T. Gervet, J. Romoff, D. Batra, D. Parikh, M. Rabbat, J. Pineau. TarMAC: Targeted multi-agent communication. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, USA, pp. 1538–1546, 2019.

  7. Y. Liu, W. X. Wang, Y. J. Hu, J. Y. Hao, X. G. Chen, Y. Gao. Multi-agent game abstraction via graph attention neural network. In Proceedings of the 34th AAAI Conference on Artificial Intelligence, New York, USA, pp. 7211–7218, 2020. DOI: https://doi.org/10.1609/aaai.v34i05.6211.

  8. A. Rasouli, I. Kotseruba, J. K. Tsotsos. Agreeing to cross: How drivers and pedestrians communicate. In Proceedings of IEEE Intelligent Vehicles Symposium, Los Angeles, USA, pp. 264–269, 2017. DOI: https://doi.org/10.1109/IVS.2017.7995730.

  9. Z. Tian, S. H. Zou, I. Davies, T. Warr, L. S. Wu, H. B. Ammar, J. Wang. Learning to communicate implicitly by actions. In Proceedings of the 34th AAAI Conference on Artificial Intelligence, New York, USA, pp. 7261–7268, 2020. DOI: https://doi.org/10.1609/aaai.v34i05.6217.

  10. S. Gronauer, K. Diepold. Multi-agent deep reinforcement learning: A survey. Artificial Intelligence Review, vol. 55, no. 2, pp. 895–943, 2022. DOI: https://doi.org/10.1007/s10462-021-09996-w.

    Article  Google Scholar 

  11. T. Rashid, M. Samvelyan, C. S. de Witt, G. Farquhar, J. N. Foerster, S. Whiteson. QMIX: Monotonic value function factorisation for deep multi-agent reinforcement learning. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, pp. 4292–4301, 2018.

  12. J. H. Wang, Z. Z. Ren, T. Liu, Y. Yu, C. J. Zhang. QPLEX: Duplex dueling multi-agent Q-learning. In Proceedings of the 9th International Conference on Learning Representations, 2021.

  13. P. Khosla, P. Teterwak, C. Wang, A. Sarna, Y. L. Tian, P. Isola, A. Maschinot, C. Liu, D. Krishnan. Supervised contrastive learning. In Proceedings of the 34th International Conference on Neural Information Processing Systems, pp. 18661–18673, 2020.

  14. C. A. S. de Witt, J. N. Foerster, G. Farquhar, P. H. S. Torr, W. Böehmer, S. Whiteson. Multi-agent common knowledge reinforcement learning. In Proceedings of the 33rd International Conference on Neural Information Processing Systems, Vancouver, Canada, Article number 890, 2019.

  15. M. Samvelyan, T. Rashid, C. S. de Witt, G. Farquhar, N. Nardelli, T. G. J. Rudner, C. M. Hung, P. H. S. Torr, J. N. Foerster, S. Whiteson. The StarCraft multi-agent challenge. In Proceedings of the 18th International Conference on Autonomous Agents and Multi-Agent Systems, Montreal, Canada, pp. 2186–2188, 2019.

  16. K. Kurach, A. Raichuk, P. Stańczyk, M. Zając, O. Bachem, L. Espeholt, C. Riquelme, D. Vincent, M. Michalski, O. Bousquet, S. Gelly. Google research football: A novel reinforcement learning environment. In Proceedings of the 34th AAAI Conference on Artificial Intelligence, New York, USA, pp. 4501–4510, 2020. DOI: https://doi.org/10.1609/aaai.v34i04.5878.

  17. S. Sukhbaatar, A. Szlam, R. Fergus. Learning multiagent communication with backpropagation. In Proceedings of the 30th International Conference on Neural Information Processing Systems, Barcelona, Spain, pp. 2252–2260, 2016.

  18. A. Singh, T. Jain, S. Sukhbaatar. Learning when to communicate at scale in multi-agent cooperative and competitive tasks. In Proceedings of the 7th International Conference on Learning Representations, New Orleans, USA, 2019.

  19. S. Iqbal, F. Sha. Actor-attention-critic for multi-agent reinforcement learning. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, USA, pp. 2961–2970, 2019.

  20. Y. Hoshen. VAIN: Attentional multi-agent predictive modeling. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, USA, pp. 2698–2708, 2017.

  21. K. Q. Zhang, Z. R. Yang, T. Basar. Networked multi-agent reinforcement learning in continuous spaces. In Proceedings of IEEE Conference on Decision and Control, Miami, USA, pp. 2771–2776, 2018. DOI: https://doi.org/10.1109/CDC.2018.8619581.

  22. K. Q. Zhang, Z. R. Yang, H. Liu, T. Zhang, T. Basar. Fully decentralized multi-agent reinforcement learning with networked agents. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, pp. 5867–5876, 2018.

  23. T. S. Chu, S. Chinchali, S. Katti. Multi-agent reinforcement learning for networked system control. In Proceedings of the 8th International Conference on Learning Representations, Addis Ababa, Ethiopia, 2020.

  24. R. D. Wang, X. He, R. S. Yu, W. Qiu, B. An, Z. Rabinovich. Learning efficient multi-agent communication: An information bottleneck approach. In Proceedings of the 37th International Conference on Machine Learning, Article number 919, 2020.

  25. H. Y. Mao, Z. B. Gong, Z. C. Zhang, Z. Xiao, Y. Ni. Learning multi-agent communication under limited-bandwidth restriction for internet packet routing, [Online], Available: https://arxiv.org/abs/1903.05561, 2019.

  26. P. Sunehag, G. Lever, A. Gruslys, W. M. Czarnecki, V. Zambaldi, M. Jaderberg, M. Lanctot, N. Sonnerat, J. Z. Leibo, K. Tuyls, T. Graepel. Value-decomposition networks for cooperative multi-agent learning, [Online], Available: https://arxiv.org/abs/1706.05296, 2017.

  27. A. Mahajan, T. Rashid, M. Samvelyan, S. Whiteson. MAVEN: Multi-agent variational exploration. In Proceedings of the 33rd International Conference on Neural Information Processing Systems, Vancouver, Canada, Article number 684, 2019.

  28. T. H. Wang, T. Gupta, A. Mahajan, B. Peng, S. Whiteson, C. J. Zhang. RODE: Learning roles to decompose multi-agent tasks. In Proceedings of the 9th International Conference on Learning Representations, 2021.

  29. J. Q. Ruan, L. H. Meng, X. T. Xiong, D. P. Xing, B. Xu. Learning multi-agent action coordination via electing first-move agent. In Proceedings of the 32nd International Conference on Automated Planning and Scheduling, Singapore, Singapore, pp. 624–628, 2022. DOI: https://doi.org/10.1609/icaps.v32il.19850.

  30. X. J. Zhang, Y. Liu, H. Y. Mao, C. Yu. Common belief multi-agent reinforcement learning based on variational recurrent models. Neurocomputing, vol. 513, pp.341–350, 2022. DOI: https://doi.org/10.1016/j.neucom.2022.09.144.

    Article  Google Scholar 

  31. Z. W. Xu, B. Zhang, D. P. Li, Z. R. Zhang, G. C. Zhou, H. Chen, G. L. Fan. Consensus learning for cooperative multi-agent reinforcement learning. In Proceedings of the 37th AAAI Conference on Artificial Intelligence, Washington DC, USA, pp. 11726–11734, 2023. DOI: https://doi.org/10.1609/aaai.v37il0.26385.

  32. L. R. Medsker, L. C. Jain. Recurrent Neural Networks: Design and Applications, Boca Raton, USA: CRC Press, 1999. DOI: https://doi.org/10.1201/9781003040620.

    Book  Google Scholar 

  33. S. Y. Li, L. L. Zheng, J. H. Wang, C. J. Zhang. Learning subgoal representations with slow dynamics. In Proceedings of the 9th International Conference on Learning Representations, 2021.

  34. S. Y. Li, J. Zhang, J. H. Wang, Y. Yu, C. J. Zhang. Active hierarchical exploration with stable subgoal representation learning. In Proceedings of the 10th International Conference on Learning Representations, 2022.

  35. F. Schroff, D. Kalenichenko, J. Phibin. FaceNet: A unified embedding for face recognition and clustering. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Boston, USA, pp. 815–823, 2015. DOI: https://doi.org/10.1109/CVPR.2015.7298682.

  36. O. Nachum, S. X. Gu, H. Lee, S. Levine. Near-optimal representation learning for hierarchical reinforcement learning. In Proceedings of the 7th International Conference on Learning Representations, New Orleans, USA, 2019.

  37. D. Yarats, I. Kostrikov, R. Fergus. Image augmentation is all you need: Regularizing deep reinforcement learning from pixels. In Proceedings of the 9th International Conference on Learning Representations, 2021.

  38. M. Laskin, A. Srinivas, P. Abbeel. CURL: Contrastive unsupervised representations for reinforcement learning. In Proceedings of the 37th International Conference on Machine Learning, pp. 5639–5650, 2020.

  39. A. van den Oord, Y. Z. Li, O. Vinyals. Representation learning with contrastive predictive coding, [Online], Available: https://arxiv.org/abs/1807.03748, 2018.

  40. Y. L. Lo, B. Sengupta. Learning to ground decentralized multi-agent communication with contrastive learning, [Online], Available: https://arxiv.org/abs/2203.03344, 2022.

  41. F. A. Oliehoek, M. T. J. Spaan, N. Vlassis. Optimal and approximate Q-value functions for decentralized POM-DPs. Journal of Artificial Intelligence Research, vol. 32, pp. 289–353, 2008. DOI: https://doi.org/10.1613/jair.2447.

    Article  MathSciNet  Google Scholar 

  42. W. Ren, R. W. Beard. Consensus seeking in multiagent systems under dynamically changing interaction topologies. IEEE Transactions on Automatic Control, vol. 50, no. 5, pp. 655–661, 2005. DOI: https://doi.org/10.1109/TAC.2005.846556.

    Article  MathSciNet  Google Scholar 

  43. L. J. Shan, H. Zhu. Consistency check in modelling multi-agent systems. In Proceedings of the 28th Annual International Computer Software and Applications Conference, Hong Kong, China, pp. 114–119, 2004. DOI: https://doi.org/10.1109/CMPSAC.2004.1342814.

  44. J. Y. Yu, L. Wang. Group consensus of multi-agent systems with undirected communication graphs. In Proceedings of the 7th Asian Control Conference, Hong Kong, China, pp. 105–110, 2009.

  45. L. Wiskott. Learning invariance manifolds. Neurocomputing, vol. 26–27, pp. 925–932, 1999. DOI: https://doi.org/10.1016/S0925-2312(99)00011-9.

    Article  Google Scholar 

  46. L. Wiskott, T. J. Sejnowski. Slow feature analysis: Unsupervised learning of invariances. Neural Computation, vol. 14, no. 4, pp. 715–770, 2002. DOI: https://doi.org/10.1162/089976602317318938.

    Article  PubMed  Google Scholar 

  47. D. Jayaraman, K. Grauman. Slow and steady feature analysis: Higher order temporal coherence in video. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, pp. 3852–3861, 2016. DOI: https://doi.org/10.1109/CVPR.2016.418.

  48. A. Jansen, M. Plakal, R. Pandya, D. P. W. Ellis, S. Hershey, J. Y. Liu, R. C. Moore, R. A. Saurous. Unsupervised learning of semantic audio representations. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, Calgary, Canada, pp. 126–130, 2018. DOI: https://doi.org/10.1109/ICASSP.2018.8461684.

  49. Y. Bengio, A. Courville, P. Vincent. Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no.8, pp. 1798–1828, 2013. DOI: https://doi.org/10.1109/TPAMI.2013.50.

    Article  PubMed  Google Scholar 

  50. T. Lesort, N. Díaz-Rodríguez, J. F. Goudou, D. Filliat. State representation learning for control: An overview. Neural Networks, vol. 108, pp. 379–392, 2018. DOI: https://doi.org/10.1016/j.neunet.2018.07.006.

    Article  PubMed  Google Scholar 

  51. R. Hadsell, S. Chopra, Y. LeCun. Dimensionality reduction by learning an invariant mapping. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, New York, USA, pp. 1735–1742, 2006. DOI: https://doi.org/10.1109/CVPR.2006.100.

  52. T. T. Xiao, X. L. Wang, A. A. Efros, T. Darrell. What should not be contrastive in contrastive learning. In Proceedings of the 9th International Conference on Learning Representations, 2021.

  53. P. H. Le-Khac, G. Healy, A. F. Smeaton. Contrastive representation learning: A framework and review. IEEE Access, vol. 8, pp. 193907–193934, 2020. DOI: https://doi.org/10.1109/ACCESS.2020.3031549.

    Article  Google Scholar 

  54. A. Radford, K. Narasimhan, T. Salimans, I. Sutskever. Improving language understanding by generative pre-training, [Online], Available: https://api.semanticscholar.org/CorpusID:49313245, 2018.

  55. J. Devlin, M. W. Chang, K. Lee, K. Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, USA, pp.4171–4186, 2019. DOI: https://doi.org/10.18653/v1/N19-1423.

  56. K. M. He, H. Q. Fan, Y. X. Wu, S. N. Xie, R. Girshick. Momentum contrast for unsupervised visual representation learning. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, pp. 9729–9738, 2020. DOI: https://doi.org/10.1109/CVPR42600.2020.00975.

  57. T. Chen, S. Kornblith, M. Norouzi, G. Hinton. A simple framework for contrastive learning of visual representations. In Proceedings of the 37th International Conference on Machine Learning, Article number 149, 2020.

  58. C. Doersch. Tutorial on variational autoencoders, [Online], Available: https://arxiv.org/abs/1606.05908, 2016.

  59. D. P. Kingma, D. J. Rezende, S. Mohamed, M. Welling. Semi-supervised learning with deep generative models. In Proceedings of the 27th International Conference on Neural Information Processing Systems, Montreal, Canada, pp. 3581–3589, 2014.

  60. T. Schaul, J. Quan, I. Antonoglou, D. Silver. Prioritized experience replay. In Proceedings of the 4th International Conference on Learning Representations, San Juan, Puerto Rico, 2016.

  61. J. Schulman, P. Moritz, S. Levine, M. Jordan, P. Abbeel. High-dimensional continuous control using generalized advantage estimation, [Online], Available: https://arxiv.org/abs/1506.02438, 2015.

  62. K. Son, D. Kim, W. J. Kang, D. Hostallero, Y. Yi. QTRAN: Learning to factorize with transformation for cooperative multi-agent reinforcement learning. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, USA, pp. 5887–5896, 2019.

  63. J. Foerster, G. Farquhar, T. Afouras, N. Nardelli, S. Whiteson. Counterfactual multi-agent policy gradients. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence, New Orleans, USA, pp. 2974–2982, 2018. DOI: https://doi.org/10.1609/aaai.v32i1.11794.

  64. C. Yu, A. Velu, E. Vinitsky, J. X. Gao, Y. Wang, A. M. Bayen, Y. Wu. The surprising effectiveness of PPO in cooperative multi-agent games. In Proceedings of the 36th Conference on Neural Information Processing Systems, New Orleans, USA, 2022.

  65. T. H. Wang, J. H. Wang, C. Y. Zheng, C. J. Zhang. Learning nearly decomposable value functions via communication minimization. In Proceedings of the 8th International Conference on Learning Representations, Addis Ababa, Ethiopia, 2020.

  66. L. Yuan, J. H. Wang, F. X. Zhang, C. H. Wang, Z. Z. Zhang, Y. Yu, C. J. Zhang. Multi-agent incentive communication via decentralized teammate modeling. In Proceedings of the 36th AAAI Conference on Artificial Intelligence, pp. 9466–9474, 2022. DOI: https://doi.org/10.1609/aaai.v36i9.21179.

  67. V. Nair, G. E. Hinton. Rectified linear units improve restricted Boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning, Haifa, Israel, pp. 807–814, 2010.

  68. D. P. Kingma, J. Ba. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations, San Diego, USA, 2015.

  69. A. Graves, J. Schmidhuber. Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Networks, vol. 18, no. 5–6, pp.602–610, 2005. DOI: https://doi.org/10.1016/j.neunet.2005.06.042.

    Article  PubMed  Google Scholar 

Download references

Acknowledgements

This work was supported by the Strategic Priority Research Program of the Chinese Academy of Sciences, China (No. XDA27030300) and the Program for National Nature Science Foundation of China (62073324).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bo Xu.

Ethics declarations

Bo Xu is an associate editor for Machine Intelligence Research and was not involved in the editorial review, or the decision to publish this article. All authors declare that there are no other competing interests.

Additional information

Colored figures are available in the online version at https://link.springer.com/journal/11633

Qingyang Zhang received the B.Sc. degree in communication engineering from Shandong University, China in 2019. She is currently a Ph. D. degree candidate in pattern recognition and intelligent systems at the Institute of Automation, Chinese Academy of Sciences and the University of Chinese Academy of Sciences, China.

Her research interests include hierarchical reinforcement learning, contrastive learning, representation learning, and multi-agent reinforcement learning.

Kaishen Wang received the B. Sc. degree in electrical engineering and automation from China University of Mining and Technology, China in 2015. He is currently a master student in computer science and technology at the Institute of Automation, Chinese Academy of Sciences and the University of Chinese Academy of Sciences, China.

His research interests include hierarchical reinforcement learning and multi-agent reinforcement learning.

Jingqing Ruan received the B.Sc. degree in software engineering from North China Electric Power University, China in 2019. She is currently a Ph.D. degree candidate in pattern recognition and intelligent systems at the Institute of Automation, Chinese Academy of Sciences and the University of Chinese Academy of Sciences, China.

Her research interests include multi-agent reinforcement learning, multi-agent planning on game AI, and graph-based multi-agent coordination.

Yiming Yang received the B.Sc. degree in electronic information engineering from the Beijing Institute of Technology, China in 2019. Currently, he is a Ph.D. degree candidate at the Institute of Automation, Chinese Academy of Sciences, China.

His research interests include reinforcement learning, robotics, and generative model.

Dengpeng Xing received the B.Sc. degree in mechanical electronics and the M. Sc. degree in mechanical manufacturing and automation from Tianjin University, China in 2002 and 2006, and the Ph.D. degree in control science and engineering from Shanghai Jiao Tong University, China in 2010. He is currently a professor at the National Key Laboratory for Multi-modal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, China.

His research interests include robot control and learning and decision-making intelligence for complex systems.

Bo Xu received the B.Sc. degree in electrical engineering from Zhejiang University, China in 1988, and the M. Sc. and Ph.D. degrees in pattern recognition and intelligent system from Institute of Automation, Chinese Academy of Sciences, China in 1992 and 1997, respectively. He is a professor, the director of the Institute of Automation, Chinese Academy of Sciences, and deputy director of the Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, China.

His research interests include brain-inspired intelligence, brain-inspired cognitive models, natural language processing and understanding, and brain-inspired robotics.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, Q., Wang, K., Ruan, J. et al. Enhancing Multi-agent Coordination via Dual-channel Consensus. Mach. Intell. Res. 21, 349–368 (2024). https://doi.org/10.1007/s11633-023-1464-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11633-023-1464-2

Keywords

Navigation