Full communication memory networks for team-level cooperation learning

Wang, Yutong; Wang, Yizhuo; Sartoretti, Guillaume

doi:10.1007/s10458-023-09617-6

Full communication memory networks for team-level cooperation learning

Published: 07 August 2023

Volume 37, article number 33, (2023)
Cite this article

Autonomous Agents and Multi-Agent Systems Aims and scope Submit manuscript

Yutong Wang¹,
Yizhuo Wang¹ &
Guillaume Sartoretti¹

194 Accesses
Explore all metrics

Abstract

Communication in multi-agent systems is a key driver of team-level cooperation, for instance allowing individual agents to augment their knowledge about the world in partially-observable environments. In this paper, we propose two reinforcement learning-based multi-agent models, namely FCMNet and FCMTran. The two models both allow agents to simultaneously learn a differentiable communication mechanism that connects all agents as well as a common, cooperative policy conditioned upon received information. FCMNet utilizes multiple directional Long Short-Term Memory chains to sequentially transmit and encode the current observation-based messages sent by every other agent at each timestep. FCMTran further relies on the encoder of a modified transformer to simultaneously aggregate multiple self-generated messages sent by all agents at the previous timestep into a single message that is used in the current timestep. Results from evaluating our models on a challenging set of StarCraft II micromanagement tasks with shared rewards show that FCMNet and FCMTran both outperform recent communication-based methods and value decomposition methods in almost all tested StarCraft II micromanagement tasks. We further improve the performance of our models by combining them with value decomposition techniques; there, in particular, we show that FCMTran with value decomposition significantly pushes the state-of-the-art on one of the hardest benchmark tasks without any task-specific tuning. We also investigate the robustness of FCMNet under communication disturbances (i.e., binarized messages, random message loss, and random communication order) in an asymmetric collaborative pathfinding task with individual rewards, demonstrating FMCNet’s potential applicability in real-world robotic tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 2

Fig. 6

Multi-agent deep reinforcement learning: a survey

Article Open access 15 April 2021

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

A review of motion planning algorithms for intelligent robots

Article Open access 25 November 2021

Availability of data and materials

Not applicable.

Notes

The full code will be made available publicly upon paper acceptance.

References

Arulkumaran, K., Cully, A., Togelius, J. (2019). Alphastar: An evolutionary computation perspective. In: Proceedings of the Genetic and Evolutionary Computation Conference Companion, pp. 314–315
Berner, C., Brockman, G., Chan, B., Cheung, V., Debiak, P., Dennison, C., Farhi, D., Fischer, Q., Hashme, S., Hesse, C., et al. (2019). Dota 2 with large scale deep reinforcement learning. arXiv preprint arXiv:1912.06680
Kiran, B. R., Sobh, I., Talpaert, V., Mannion, P., Al Sallab, A. A., Yogamani, S., & Pérez, P. (2021). Deep reinforcement learning for autonomous driving: A survey. IEEE Transactions on Intelligent Transportation Systems., 23(6), 4909.
Article Google Scholar
Wang, S.-J., & Chang, S. (2021). Autonomous bus fleet control using multiagent reinforcement learning. Journal of Advanced Transportation, 2021, 1–4.
Google Scholar
Damani, M., Luo, Z., Wenzel, E., & Sartoretti, G. (2021). Primal \(_2\): Pathfinding via reinforcement and imitation multi-agent learning-lifelong. IEEE Robotics and Automation Letters, 6(2), 2666–2673.
Article Google Scholar
Sartoretti, G., Wu, Y., Paivine, W., Kumar, T.S., Koenig, S., Choset, H. (2019) Distributed reinforcement learning for multi-robot decentralized collective construction. In: Distributed Autonomous Robotic Systems (DARS 2018), pp. 35–49
Wang, Y., Damani, M., Wang, P., Cao, Y., & Sartoretti, G. (2022). Distributed reinforcement learning for robot teams: a review. Current Robotics Reports, 3(4), 239–257.
Article Google Scholar
Hernandez-Leal, P., Kartal, B., Taylor, M.E. (2018). Is multiagent deep reinforcement learning the answer or the question? a brief survey. learning 21: 22
Kim, D., Moon, S., Hostallero, D., Kang, W.J., Lee, T., Son, K., Yi, Y. (2019). Learning to schedule communication in multi-agent reinforcement learning. arXiv preprint arXiv:1902.01554
Liu, Y., Wang, W., Hu, Y., Hao, J., Chen, X., & Gao, Y. (2020). Multi-agent game abstraction via graph attention neural network. Proceedings of the AAAI Conference on Artificial Intelligence, 34, 7211–7218.
Article Google Scholar
Jiang, J., & Lu, Z. (2018). Learning attentional communication for multi-agent cooperation. Advances in neural information processing systems, 31, 102.
Google Scholar
Samvelyan, M., Rashid, T., De Witt, C.S., Farquhar, G., Nardelli, N., Rudner, T.G., Hung, C.-M., Torr, P.H., Foerster, J., Whiteson, S. (2019). The starcraft multi-agent challenge. arXiv preprint arXiv:1902.04043
Rashid, T., Samvelyan, M., Schroeder, C., Farquhar, G., Foerster, J, Whiteson, S. (2018). Qmix: Monotonic value function factorisation for deep multi-agent reinforcement learning. In: International Conference on Machine Learning, pp. 4295–4304. PMLR
Sunehag, P., Lever, G., Gruslys, A., Czarnecki, W.M., Zambaldi, V., Jaderberg, M., Lanctot, M., Sonnerat, N., Leibo, J.Z., Tuyls, K., et al. (2017). Value-decomposition networks for cooperative multi-agent learning. arXiv preprint arXiv:1706.05296
Freed, B., Sartoretti, G., Hu, J., & Choset, H. (2020). Communication learning via backpropagation in discrete channels with unknown noise. Proceedings of the AAAI Conference on Artificial Intelligence, 34, 7160–7168.
Article Google Scholar
Foerster, J., Assael, I. A., De Freitas, N., & Whiteson, S. (2016). Learning to communicate with deep multi-agent reinforcement learning. Advances in Neural Information Processing Systems, 29, 16.
Google Scholar
Sukhbaatar, S., Fergus, R., et al. (2016). Learning multiagent communication with backpropagation. Advances in Neural Information Processing Systems, 29, 2016.
Google Scholar
Peng, P., Wen, Y., Yang, Y., Yuan, Q., Tang, Z., Long, H., Wang, J. (2017). Multiagent bidirectionally-coordinated nets: Emergence of human-level coordination in learning to play starcraft combat games. arXiv preprint arXiv:1703.10069
Kong, X., Xin, B., Liu, F., Wang, Y. (2017). Revisiting the master-slave architecture in multi-agent deep reinforcement learning. arXiv preprint arXiv:1712.07305
Niu, Y., Paleja, R.R., Gombolay, M.C. (2021). Multi-agent graph-attention communication and teaming. In: AAMAS, pp. 964–973
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł, & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30, 17.
Google Scholar
Li, W., Luo, H., Lin, Z., Zhang, C., Lu, Z., Ye, D. (2023). A survey on transformers in reinforcement learning. arXiv preprint arXiv:2301.03044
Parisotto, E., Song, F., Rae, J., Pascanu, R., Gulcehre, C., Jayakumar, S., Jaderberg, M., Kaufman, R.L., Clark, A., Noury, S., et al. (2020). Stabilizing transformers for reinforcement learning. In: International Conference on Machine Learning, pp. 7487–7498. PMLR
Cao, Y., Wang, Y., Vashisth, A., Fan, H., Sartoretti, G.A. (2022). CAtNIPP: Context-aware attention-based network for informative path planning. In: 6th Annual Conference on Robot Learning. https://openreview.net/forum?id=cAIIbdNAeNa
Cao, Y., Hou, T., Wang, Y., Yi, X., Sartoretti, G. (2023). Ariadne: A reinforcement learning approach using attention-based deep networks for exploration. arXiv preprint arXiv:2301.11575
Chen, L., Lu, K., Rajeswaran, A., Lee, K., Grover, A., Laskin, M., Abbeel, P., Srinivas, A., & Mordatch, I. (2021). Decision transformer: Reinforcement learning via sequence modeling. Advances in Neural Information Processing Systems, 34, 15084–15097.
Google Scholar
Shang, J., Kahatapitiya, K., Li, X., & Ryoo, M. S. (2022). Starformer: Transformer with state-action-reward representations for visual reinforcement learning. European conference on computer vision (pp. 462–479). London: Springer.
Google Scholar
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O. (2017). Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347
He, K., Zhang, X., Ren, S., Sun, J. (2016). Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778.
Ba, J.L., Kiros, J.R., Hinton, G.E. (2016). Layer normalization. arXiv preprint arXiv:1607.06450.
Su, J., Adams, S., & Beling, P. (2021). Value-decomposition multi-agent actor-critics. Proceedings of the AAAI Conference on Artificial Intelligence, 35(13), 11352–11360. https://doi.org/10.1609/aaai.v35i13.17353
Article Google Scholar
Yu, C., Velu, A., Vinitsky, E., Wang, Y., Bayen, A., Wu, Y. (2021). The surprising effectiveness of ppo in cooperative, multi-agent games. arXiv preprint arXiv:2103.01955.
Hu, J., Jiang, S., Harding, S.A., Wu, H., Liao, S.-w. (2021). Rethinking the implementation tricks and monotonicity constraint in cooperative multi-agent reinforcement learning. arXiv preprint arXiv:2102.03479 .
Sunehag, P., Lever, G., Gruslys, A., Czarnecki, W.M., Zambaldi, V., Jaderberg, M., Lanctot, M., Sonnerat, N., Leibo, J.Z., Tuyls, K., et al. (2018). Value-decomposition networks for cooperative multi-agent learning based on team reward. In: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, pp. 2085–2087.
Courbariaux, M., Bengio, Y., & David, J.-P. (2015). Binaryconnect: Training deep neural networks with binary weights during propagations. Advances in Neural Information Processing Systems, 28, 3123–3131.
Google Scholar
Williams, R. J. (1992). Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8(3), 229–256.
Article Google Scholar
Toderici, G., O’Malley, S.M., Hwang, S.J., Vincent, D., Minnen, D., Baluja, S., Covell, M., Sukthankar, R. (2015). Variable rate image compression with recurrent neural networks. arXiv preprint arXiv:1511.06085.

Download references

Acknowledgements

We would like to thank Mehul Damani and Benjamin Freed for their feedback on earlier drafts of this paper. We are also grateful to Benjamin Freed and Rohan James for very helpful research discussions.

Funding

This work was founded by the Singapore Ministry of Education Academic Research Fund Tier 1.

Author information

Authors and Affiliations

Mechanical Engineering, National University of Singapore, 9 Engineering Drive 1, Singapore, 117575, Republic of Singapore
Yutong Wang, Yizhuo Wang & Guillaume Sartoretti

Authors

Yutong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yizhuo Wang
View author publications
You can also search for this author in PubMed Google Scholar
Guillaume Sartoretti
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Yutong Wang and Guillaume Sartoretti contributed to the study conception and design. Code writing, data collection and analysis were performed by Yutong Wang and Yizhuo Wang. The first draft of the manuscript was written by Yutong Wang and all authors then commented and edited the final manuscript.

Corresponding author

Correspondence to Guillaume Sartoretti.

Ethics declarations

Conflict of interest/Conflict of interest

The authors have no relevant conflict of interest/competing interests to disclose.

Ethics approval

Not applicable.

Consent to participate

Not applicable.

Consent for publication

All authors approved the paper to be published

Code availability

Code will be made available publicly upon paper acceptance.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 4 (mp4 4,819 KB)

Supplementary file 2 (mp4 10,568 KB)

Supplementary file 3 (mp4 5,385 KB)

Supplementary file 4 (GIF 642 KB)

Appendices

Appendix A: Experimental setup

For both FCMNet and FCMTran, the code of the neural network part was written in torch 1.9.0 and relied on Ray 1.2.0 to employ 16 processes to collect data in parallel. The convergence speed of models varies with different tasks. For the \(5m\_vs\_6m\) task, on a computer equipped with one Nvidia GeForce RTX 2080 Ti GPUs and one Intel(R) Core(TM) i9-10900KF CPU (10 cores, 20 threads), using the standard hyperparameters listed in next section, FCMNet converges within 3 M timesteps (1.3 h of wall clock time), FCMTran will converge within 20 M timesteps (6.5 h).

Appendix B: Hyperparameters

Table 3 presents the hyperparameters used to train our models evaluated in Sect. 5.1 of this paper. These hyperparameters were obtained by coarse and empirical hyperparameter tuning without detailed grid searches (i.e., we believe that it is not fair to compare our models with fine-tuned QMIX), and can be used for every SMAC task listed in the paper. The first 11 hyperparameters are general hyperparameters for training neural networks using PPO, 12-15 are unique hyperparameters for FCMNet and FCMTran, and the last hyperparameter is for value decomposition.

Table 3 Hyperparameters table

Full size table

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wang, Y., Wang, Y. & Sartoretti, G. Full communication memory networks for team-level cooperation learning. Auton Agent Multi-Agent Syst 37, 33 (2023). https://doi.org/10.1007/s10458-023-09617-6

Download citation

Accepted: 18 July 2023
Published: 07 August 2023
DOI: https://doi.org/10.1007/s10458-023-09617-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Full communication memory networks for team-level cooperation learning

Abstract

Access this article

Similar content being viewed by others

Multi-agent deep reinforcement learning: a survey

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

A review of motion planning algorithms for intelligent robots

Availability of data and materials

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest/Conflict of interest

Ethics approval

Consent to participate

Consent for publication

Code availability

Additional information

Publisher's Note

Supplementary Information

Supplementary file 4 (GIF 642 KB)

Appendices

Appendix A: Experimental setup

Appendix B: Hyperparameters

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Full communication memory networks for team-level cooperation learning

Abstract

Access this article

Similar content being viewed by others

Multi-agent deep reinforcement learning: a survey

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

A review of motion planning algorithms for intelligent robots

Availability of data and materials

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest/Conflict of interest

Ethics approval

Consent to participate

Consent for publication

Code availability

Additional information

Publisher's Note

Supplementary Information

Supplementary file 4 (GIF 642 KB)

Appendices

Appendix A: Experimental setup

Appendix B: Hyperparameters

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation