Abstract
In cooperative multi-agent reinforcement learning (MARL), where agents only have access to partial observations, efficiently leveraging local information is critical. During long-time observations, agents can build awareness for teammates to alleviate the restriction of partial observability. However, previous MARL methods usually neglect awareness learning from local information for better collaboration. To address this problem, we propose a novel framework, multi-agent local information decomposition for awareness of teammates (LINDA), with which agents learn to decompose local information and build awareness for each teammate. We model the awareness as stochastic random variables and perform representation learning to ensure the informativeness of awareness representations by maximizing the mutual information between awareness and the actual trajectory of the corresponding agent. LINDA is agnostic to specific algorithms and can be flexibly integrated with different MARL methods. Sufficient experiments show that the proposed framework learns informative awareness from local partial observations for better collaboration and significantly improves the learning performance, especially on challenging tasks.
Similar content being viewed by others
References
Tuyls K, Weiss G. Multiagent learning: basics, challenges, and prospects. AI Mag, 2012, 33: 41
Gronauer S, Diepold K. Multi-agent deep reinforcement learning: a survey. Artif Intell Rev, 2022, 55: 895–943
Cui H, Zhang Z. A cooperative multi-agent reinforcement learning method based on coordination degree. IEEE Access, 2021, 9: 123805
OroojlooyJadid A, Hajinezhad D. A review of cooperative multi-agent deep reinforcement learning. 2019. ArXiv:1908.03963
Cao Y, Yu W, Ren W, et al. An overview of recent progress in the study of distributed multi-agent coordination. IEEE Trans Ind Inf, 2012, 9: 427–438
Zhou M, Luo J, Villela J, et al. Smarts: scalable multi-agent reinforcement learning training school for autonomous driving. 2020. ArXiv:2010.09776
Nowé A, Vrancx P, de Hauwere Y M. Game theory and multi-agent reinforcement learning. In: Reinforcement Learning. Berlin: Springer, 2012. 441–470
Christianos F, Schäfer L, Albrecht S V. Shared experience actor-critic for multi-agent reinforcement learning. In: Proceedings of Advances in Neural Information Processing Systems, 2020
Lyu X, Xiao Y, Daley B, et al. The contrasting centralized and decentralized critics in multi-agent reinforcement learning. In: Proceedings of the 20th International Conference on Autonomous Agents and Multiagent Systems, 2021. 844–852
Sunehag P, Lever G, Gruslys A, et al. Value-decomposition networks for cooperative multi-agent learning based on team reward. In: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, 2018. 2085–2087
Rashid T, Samvelyan M, de Witt C S, et al. QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning. In: Proceedings of the 35th International Conference on Machine Learning, 2018. 4292–4301
Wang J, Ren Z, Liu T, et al. QPLEX: duplex dueling multi-agent q-learning. In: Proceedings of the 9th International Conference on Learning Representations, 2021
Wang J, Zhang Y, Kim T, et al. Shapley q-value: a local reward approach to solve global reward games. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence, 2020. 7285–7292
Wang T, Zeng L, Dong W, et al. Context-aware sparse deep coordination graphs. 2021. ArXiv:2106.02886
Wang J, Wang J, Zhang Y, et al. SHAQ: incorporating shapley value theory into q-learning for multi-agent reinforcement learning. 2021. ArXiv:2105.15013
Rashid T, Farquhar G, Peng B, et al. Weighted QMIX: expanding monotonic value function factorisation for deep multi-agent reinforcement learning. In: Proceedings of Advances in Neural Information Processing Systems, 2020
Cho K, van Merrienboer B, Gülçehre Ç, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2014. 1724–1734
Stone P, Kaminka G A, Kraus S, et al. Ad hoc autonomous agent teams: collaboration without pre-coordination. In: Proceedings of the 24th AAAI Conference on Artificial Intelligence, 2010
He H, Boyd-Graber J L. Opponent modeling in deep reinforcement learning. In: Proceedings of the 33rd International Conference on Machine Learning, 2016. 1804–1813
Hong Z, Su S, Shann T, et al. A deep policy inference q-network for multi-agent systems. In: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, 2018. 1388–1396
Papoudakis G, Albrecht S V. Variational autoencoders for opponent modeling in multi-agent systems. 2020. ArXiv:2001.10829
Papoudakis G, Christianos F, Albrecht S V. Local information opponent modelling using variational autoencoders. 2020. ArXiv:2006.09447
Albrecht S V, Stone P. Reasoning about hypothetical agent behaviours and their parameters. In: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems, 2017. 547–555
Vinyals O, Ewalds T, Bartunov S, et al. Starcraft II: a new challenge for reinforcement learning. 2017. ArXiv:1708.04782
Samvelyan M, Rashid T, de Witt C S, et al. The starcraft multi-agent challenge. In: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, 2019. 2186–2188
Lowe R, Wu Y, Tamar A, et al. Multi-agent actor-critic for mixed cooperative-competitive environments. In: Proceedings of Advances in Neural Information Processing Systems, 2017. 6379–6390
Foerster J N, Farquhar G, Afouras T, et al. Counterfactual multi-agent policy gradients. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence, 2018. 2974–2982
Iqbal S, Sha F. Actor-attention-critic for multi-agent reinforcement learning. In: Proceedings of the 36th International Conference on Machine Learning, 2019. 2961–2970
Son K, Kim D, Kang W J, et al. QTRAN: learning to factorize with transformation for cooperative multi-agent reinforcement learning. In: Proceedings of the 36th International Conference on Machine Learning, 2019. 5887–5896
Wang T, Dong H, Lesser V R, et al. ROMA: multi-agent reinforcement learning with emergent roles. In: Proceedings of the 37th International Conference on Machine Learning, 2020. 9876–9886
Wang T, Wang J, Zheng C, et al. Learning nearly decomposable value functions via communication minimization. In: Proceedings of the 8th International Conference on Learning Representations, 2020
Wang T, Gupta T, Mahajan A, et al. RODE: learning roles to decompose multi-agent tasks. In: Proceedings of the 9th International Conference on Learning Representations, 2021
Xie A, Losey D P, Tolsma R, et al. Learning latent representations to influence multi-agent interaction. In: Proceedings of the 4th Conference on Robot Learning, 2020. 575–588
Wang W, Yang T, Liu Y, et al. Action semantics network: considering the effects of actions in multiagent systems. In: Proceedings of the 8th International Conference on Learning Representations, 2020
Zhang T, Xu H, Wang X, et al. Multi-agent collaboration via reward attribution decomposition. 2020. ArXiv:2010.08531
Hu S, Zhu F, Chang X, et al. UPDeT: universal multi-agent reinforcement learning via policy decoupling with transformers. 2021. ArXiv:2101.08001
Wu B. Hierarchical macro strategy model for MOBA game AI. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence, 2019. 1206–1213
Palanisamy P. Multi-agent connected autonomous driving using deep reinforcement learning. In: Proceedings of International Joint Conference on Neural Networks, 2020. 1–7
Albrecht S V, Stone P. Autonomous agents modelling other agents: a comprehensive survey and open problems. Artif Intelligence, 2018, 258: 66–95
Oliehoek F A, Amato C. A Concise Introduction to Decentralized POMDPs. Berlin: Springer, 2016
Oliehoek F A, Spaan M T J, Vlassis N. Optimal and approximate q-value functions for decentralized POMDPs. J Artif Intell Res, 2008, 32: 289–353
Kraemer L, Banerjee B. Multi-agent reinforcement learning as a rehearsal for decentralized planning. Neurocomputing, 2016, 190: 82–94
Kingma D P, Welling M. Auto-encoding variational bayes. 2013. ArXiv:1312.6114
Alemi A A, Fischer I, Dillon J V, et al. Deep variational information bottleneck. In: Proceedings of the 5th International Conference on Learning Representations, 2017
van der Maaten L, Hinton G. Visualizing data using t-SNE. J Mach Learn Res, 2008, 9: 2579–2605
Stone P, Veloso M. Task decomposition, dynamic role assignment, and low-bandwidth communication for real-time strategic teamwork. Artif Intell, 1999, 110: 241–273
Lhaksmana K M, Murakami Y, Ishida T. Role-based modeling for designing agent behavior in self-organizing multi-agent systems. Int J Soft Eng Knowl Eng, 2018, 28: 79–96
Acknowledgements
This work was supported by National Natural Science Foundation of China (Grant No. 61773198).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Cao, J., Yuan, L., Wang, J. et al. LINDA: multi-agent local information decomposition for awareness of teammates. Sci. China Inf. Sci. 66, 182101 (2023). https://doi.org/10.1007/s11432-021-3479-9
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11432-021-3479-9