Abstract
Centralized training and decentralized execution have become a basic setting for multi-agent reinforcement learning. As the number of agents increases, the performance of the actors that only use their own local observations with centralized critics is prone to bottlenecks in complex scenarios. Recent research has shown that agents learn when to communicate to share information efficiently, that agents communicate with each other in a right time during the execution phase to complete the cooperation task. Therefore, in this paper, we proposed a model that learn when to communicate under the centralized critic supporting, so that the agent is able to adaptive control communication under the centralized critic learned by global environmental information. Experiments in a cooperation scenario demonstrate the advantages of model. With our proposed cooperation model, agents are able to block communication at an appropriate time under the centralized critic setting and cooperation with each other at the task.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Chen, F., Ren, W., et al.: On the control of multi-agent systems: a survey. Found. Trends® Syst. Control 6(4), 339–499 (2019)
d’Inverno, M., Luck, M.: An operational analysis of agent relationships. In: Understanding Agent Systems. Springer Series on Agent Technology. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-662-10702-7_4
Silver, D., et al.: Mastering the game of go with deep neural networks and tree search. nature 529(7587), 484–489 (2016)
Fan, T., Long, P., Liu, W., Pan, J.: Distributed multi-robot collision avoidance via deep reinforcement learning for navigation in complex scenarios. Int. J. Robot. Res. 39, 856–892 (2020)
Xiao, Y., Hoffman, J., Xia, T., Amato, C.: Learning multi-robot decentralized macro-action-based policies via a centralized Q-net. In: 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 10695–10701. IEEE (2020)
Kiran, B.R., Sobh, I., Talpaert, V., Mannion, P., Perez, P.: Deep reinforcement learning for autonomous driving: a survey. IEEE Trans. Intell. Transp. Syst. 23, 4909–4926 (2022)
Lowe, R., Wu, Y., Tamar, A., Harb, J., Pieter Abbeel, O., Mordatch, I.: Multi-agent actor-critic for mixed cooperative-competitive environments. In: Advances in Neural Information Processing Systems, vol. 30, pp. 6379–6390 (2017)
Hernandez-Leal, P., Kartal, B., Taylor, M.E.: A survey and critique of multiagent deep reinforcement learning. Auton. Agent. Multi-Agent Syst. 33(6), 750–797 (2019). https://doi.org/10.1007/s10458-019-09421-1
Sukhbaatar, S., Fergus, R., et al.: Learning multiagent communication with backpropagation. In: Advances in Neural Information Processing Systems, pp. 2244–2252 (2016)
Foerster, J., Assael, I.A., De Freitas, N., Whiteson, S.: Learning to communicate with deep multi-agent reinforcement learning. In: Advances in Neural Information Processing Systems, pp. 2137–2145 (2016)
Singh, A., Jain, T., Sukhbaatar, S.: Learning when to communicate at scale in multiagent cooperative and competitive tasks. In: International Conference on Learning Representations (2018)
Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 8(3), 229–256 (1992)
Das, A., et al.: TarMAC: targeted multi-agent communication. In: International Conference on Machine Learning, pp. 1538–1546 (2019)
Jiang, J., Lu, Z.: Learning attentional communication for multi-agent cooperation. In: Advances in Neural Information Processing Systems, pp. 7254–7264 (2018)
Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015)
Iqbal, S., Sha, F.: Actor-attention-critic for multi-agent reinforcement learning. In: International Conference on Machine Learning, pp. 2961–2970. PMLR (2019)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press (2018)
Xu, K., et al.: Show, attend and tell: neural image caption generation with visual attention. In: International Conference on Machine Learning, pp. 2048–2057. PMLR (2015)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Acknowledgments
This work was supported in part by the National Key Research and Development Program of China (No. 2017YFB1001902), the National Natural Science Foundation of China (No. 61876151, 62032018) and the Fundamental Research Funds for the Central Universities (No. 3102019DX1005).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Sun, Q., Yao, Y., Yi, P., Zhou, X., Yang, G. (2022). Learning When to Communicate Among Actors with the Centralized Critic for the Multi-agent System. In: Sun, Y., et al. Computer Supported Cooperative Work and Social Computing. ChineseCSCW 2021. Communications in Computer and Information Science, vol 1492. Springer, Singapore. https://doi.org/10.1007/978-981-19-4549-6_11
Download citation
DOI: https://doi.org/10.1007/978-981-19-4549-6_11
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-4548-9
Online ISBN: 978-981-19-4549-6
eBook Packages: Computer ScienceComputer Science (R0)