Skip to main content
Log in

Full communication memory networks for team-level cooperation learning

  • Published:
Autonomous Agents and Multi-Agent Systems Aims and scope Submit manuscript

Abstract

Communication in multi-agent systems is a key driver of team-level cooperation, for instance allowing individual agents to augment their knowledge about the world in partially-observable environments. In this paper, we propose two reinforcement learning-based multi-agent models, namely FCMNet and FCMTran. The two models both allow agents to simultaneously learn a differentiable communication mechanism that connects all agents as well as a common, cooperative policy conditioned upon received information. FCMNet utilizes multiple directional Long Short-Term Memory chains to sequentially transmit and encode the current observation-based messages sent by every other agent at each timestep. FCMTran further relies on the encoder of a modified transformer to simultaneously aggregate multiple self-generated messages sent by all agents at the previous timestep into a single message that is used in the current timestep. Results from evaluating our models on a challenging set of StarCraft II micromanagement tasks with shared rewards show that FCMNet and FCMTran both outperform recent communication-based methods and value decomposition methods in almost all tested StarCraft II micromanagement tasks. We further improve the performance of our models by combining them with value decomposition techniques; there, in particular, we show that FCMTran with value decomposition significantly pushes the state-of-the-art on one of the hardest benchmark tasks without any task-specific tuning. We also investigate the robustness of FCMNet under communication disturbances (i.e., binarized messages, random message loss, and random communication order) in an asymmetric collaborative pathfinding task with individual rewards, demonstrating FMCNet’s potential applicability in real-world robotic tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Availability of data and materials

Not applicable.

Notes

  1. The full code will be made available publicly upon paper acceptance.

References

  1. Arulkumaran, K., Cully, A., Togelius, J. (2019). Alphastar: An evolutionary computation perspective. In: Proceedings of the Genetic and Evolutionary Computation Conference Companion, pp. 314–315

  2. Berner, C., Brockman, G., Chan, B., Cheung, V., Debiak, P., Dennison, C., Farhi, D., Fischer, Q., Hashme, S., Hesse, C., et al. (2019). Dota 2 with large scale deep reinforcement learning. arXiv preprint arXiv:1912.06680

  3. Kiran, B. R., Sobh, I., Talpaert, V., Mannion, P., Al Sallab, A. A., Yogamani, S., & Pérez, P. (2021). Deep reinforcement learning for autonomous driving: A survey. IEEE Transactions on Intelligent Transportation Systems., 23(6), 4909.

    Article  Google Scholar 

  4. Wang, S.-J., & Chang, S. (2021). Autonomous bus fleet control using multiagent reinforcement learning. Journal of Advanced Transportation, 2021, 1–4.

    Google Scholar 

  5. Damani, M., Luo, Z., Wenzel, E., & Sartoretti, G. (2021). Primal \(_2\): Pathfinding via reinforcement and imitation multi-agent learning-lifelong. IEEE Robotics and Automation Letters, 6(2), 2666–2673.

    Article  Google Scholar 

  6. Sartoretti, G., Wu, Y., Paivine, W., Kumar, T.S., Koenig, S., Choset, H. (2019) Distributed reinforcement learning for multi-robot decentralized collective construction. In: Distributed Autonomous Robotic Systems (DARS 2018), pp. 35–49

  7. Wang, Y., Damani, M., Wang, P., Cao, Y., & Sartoretti, G. (2022). Distributed reinforcement learning for robot teams: a review. Current Robotics Reports, 3(4), 239–257.

    Article  Google Scholar 

  8. Hernandez-Leal, P., Kartal, B., Taylor, M.E. (2018). Is multiagent deep reinforcement learning the answer or the question? a brief survey. learning 21: 22

  9. Kim, D., Moon, S., Hostallero, D., Kang, W.J., Lee, T., Son, K., Yi, Y. (2019). Learning to schedule communication in multi-agent reinforcement learning. arXiv preprint arXiv:1902.01554

  10. Liu, Y., Wang, W., Hu, Y., Hao, J., Chen, X., & Gao, Y. (2020). Multi-agent game abstraction via graph attention neural network. Proceedings of the AAAI Conference on Artificial Intelligence, 34, 7211–7218.

    Article  Google Scholar 

  11. Jiang, J., & Lu, Z. (2018). Learning attentional communication for multi-agent cooperation. Advances in neural information processing systems, 31, 102.

    Google Scholar 

  12. Samvelyan, M., Rashid, T., De Witt, C.S., Farquhar, G., Nardelli, N., Rudner, T.G., Hung, C.-M., Torr, P.H., Foerster, J., Whiteson, S. (2019). The starcraft multi-agent challenge. arXiv preprint arXiv:1902.04043

  13. Rashid, T., Samvelyan, M., Schroeder, C., Farquhar, G., Foerster, J, Whiteson, S. (2018). Qmix: Monotonic value function factorisation for deep multi-agent reinforcement learning. In: International Conference on Machine Learning, pp. 4295–4304. PMLR

  14. Sunehag, P., Lever, G., Gruslys, A., Czarnecki, W.M., Zambaldi, V., Jaderberg, M., Lanctot, M., Sonnerat, N., Leibo, J.Z., Tuyls, K., et al. (2017). Value-decomposition networks for cooperative multi-agent learning. arXiv preprint arXiv:1706.05296

  15. Freed, B., Sartoretti, G., Hu, J., & Choset, H. (2020). Communication learning via backpropagation in discrete channels with unknown noise. Proceedings of the AAAI Conference on Artificial Intelligence, 34, 7160–7168.

    Article  Google Scholar 

  16. Foerster, J., Assael, I. A., De Freitas, N., & Whiteson, S. (2016). Learning to communicate with deep multi-agent reinforcement learning. Advances in Neural Information Processing Systems, 29, 16.

    Google Scholar 

  17. Sukhbaatar, S., Fergus, R., et al. (2016). Learning multiagent communication with backpropagation. Advances in Neural Information Processing Systems, 29, 2016.

    Google Scholar 

  18. Peng, P., Wen, Y., Yang, Y., Yuan, Q., Tang, Z., Long, H., Wang, J. (2017). Multiagent bidirectionally-coordinated nets: Emergence of human-level coordination in learning to play starcraft combat games. arXiv preprint arXiv:1703.10069

  19. Kong, X., Xin, B., Liu, F., Wang, Y. (2017). Revisiting the master-slave architecture in multi-agent deep reinforcement learning. arXiv preprint arXiv:1712.07305

  20. Niu, Y., Paleja, R.R., Gombolay, M.C. (2021). Multi-agent graph-attention communication and teaming. In: AAMAS, pp. 964–973

  21. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł, & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30, 17.

    Google Scholar 

  22. Li, W., Luo, H., Lin, Z., Zhang, C., Lu, Z., Ye, D. (2023). A survey on transformers in reinforcement learning. arXiv preprint arXiv:2301.03044

  23. Parisotto, E., Song, F., Rae, J., Pascanu, R., Gulcehre, C., Jayakumar, S., Jaderberg, M., Kaufman, R.L., Clark, A., Noury, S., et al. (2020). Stabilizing transformers for reinforcement learning. In: International Conference on Machine Learning, pp. 7487–7498. PMLR

  24. Cao, Y., Wang, Y., Vashisth, A., Fan, H., Sartoretti, G.A. (2022). CAtNIPP: Context-aware attention-based network for informative path planning. In: 6th Annual Conference on Robot Learning. https://openreview.net/forum?id=cAIIbdNAeNa

  25. Cao, Y., Hou, T., Wang, Y., Yi, X., Sartoretti, G. (2023). Ariadne: A reinforcement learning approach using attention-based deep networks for exploration. arXiv preprint arXiv:2301.11575

  26. Chen, L., Lu, K., Rajeswaran, A., Lee, K., Grover, A., Laskin, M., Abbeel, P., Srinivas, A., & Mordatch, I. (2021). Decision transformer: Reinforcement learning via sequence modeling. Advances in Neural Information Processing Systems, 34, 15084–15097.

    Google Scholar 

  27. Shang, J., Kahatapitiya, K., Li, X., & Ryoo, M. S. (2022). Starformer: Transformer with state-action-reward representations for visual reinforcement learning. European conference on computer vision (pp. 462–479). London: Springer.

    Google Scholar 

  28. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O. (2017). Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347

  29. He, K., Zhang, X., Ren, S., Sun, J. (2016). Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778.

  30. Ba, J.L., Kiros, J.R., Hinton, G.E. (2016). Layer normalization. arXiv preprint arXiv:1607.06450.

  31. Su, J., Adams, S., & Beling, P. (2021). Value-decomposition multi-agent actor-critics. Proceedings of the AAAI Conference on Artificial Intelligence, 35(13), 11352–11360. https://doi.org/10.1609/aaai.v35i13.17353

    Article  Google Scholar 

  32. Yu, C., Velu, A., Vinitsky, E., Wang, Y., Bayen, A., Wu, Y. (2021). The surprising effectiveness of ppo in cooperative, multi-agent games. arXiv preprint arXiv:2103.01955.

  33. Hu, J., Jiang, S., Harding, S.A., Wu, H., Liao, S.-w. (2021). Rethinking the implementation tricks and monotonicity constraint in cooperative multi-agent reinforcement learning. arXiv preprint arXiv:2102.03479 .

  34. Sunehag, P., Lever, G., Gruslys, A., Czarnecki, W.M., Zambaldi, V., Jaderberg, M., Lanctot, M., Sonnerat, N., Leibo, J.Z., Tuyls, K., et al. (2018). Value-decomposition networks for cooperative multi-agent learning based on team reward. In: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, pp. 2085–2087.

  35. Courbariaux, M., Bengio, Y., & David, J.-P. (2015). Binaryconnect: Training deep neural networks with binary weights during propagations. Advances in Neural Information Processing Systems, 28, 3123–3131.

    Google Scholar 

  36. Williams, R. J. (1992). Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8(3), 229–256.

    Article  Google Scholar 

  37. Toderici, G., O’Malley, S.M., Hwang, S.J., Vincent, D., Minnen, D., Baluja, S., Covell, M., Sukthankar, R. (2015). Variable rate image compression with recurrent neural networks. arXiv preprint arXiv:1511.06085.

Download references

Acknowledgements

We would like to thank Mehul Damani and Benjamin Freed for their feedback on earlier drafts of this paper. We are also grateful to Benjamin Freed and Rohan James for very helpful research discussions.

Funding

This work was founded by the Singapore Ministry of Education Academic Research Fund Tier 1.

Author information

Authors and Affiliations

Authors

Contributions

Yutong Wang and Guillaume Sartoretti contributed to the study conception and design. Code writing, data collection and analysis were performed by Yutong Wang and Yizhuo Wang. The first draft of the manuscript was written by Yutong Wang and all authors then commented and edited the final manuscript.

Corresponding author

Correspondence to Guillaume Sartoretti.

Ethics declarations

Conflict of interest/Conflict of interest

The authors have no relevant conflict of interest/competing interests to disclose.

Ethics approval

Not applicable.

Consent to participate

Not applicable.

Consent for publication

All authors approved the paper to be published

Code availability

Code will be made available publicly upon paper acceptance.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 4 (mp4 4,819 KB)

Supplementary file 2 (mp4 10,568 KB)

Supplementary file 3 (mp4 5,385 KB)

Supplementary file 4 (GIF 642 KB)

Appendices

Appendix A: Experimental setup

For both FCMNet and FCMTran, the code of the neural network part was written in torch 1.9.0 and relied on Ray 1.2.0 to employ 16 processes to collect data in parallel. The convergence speed of models varies with different tasks. For the \(5m\_vs\_6m\) task, on a computer equipped with one Nvidia GeForce RTX 2080 Ti GPUs and one Intel(R) Core(TM) i9-10900KF CPU (10 cores, 20 threads), using the standard hyperparameters listed in next section, FCMNet converges within 3 M timesteps (1.3 h of wall clock time), FCMTran will converge within 20 M timesteps (6.5 h).

Appendix B: Hyperparameters

Table 3 presents the hyperparameters used to train our models evaluated in Sect. 5.1 of this paper. These hyperparameters were obtained by coarse and empirical hyperparameter tuning without detailed grid searches (i.e., we believe that it is not fair to compare our models with fine-tuned QMIX), and can be used for every SMAC task listed in the paper. The first 11 hyperparameters are general hyperparameters for training neural networks using PPO, 12-15 are unique hyperparameters for FCMNet and FCMTran, and the last hyperparameter is for value decomposition.

Table 3 Hyperparameters table

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, Y., Wang, Y. & Sartoretti, G. Full communication memory networks for team-level cooperation learning. Auton Agent Multi-Agent Syst 37, 33 (2023). https://doi.org/10.1007/s10458-023-09617-6

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10458-023-09617-6

Keywords

Navigation