A Survey of Multi-Agent Deep Reinforcement Learning with Communication*

Communication is an effective mechanism for coordinating the behaviors of multiple agents, broadening their views of the environment, and to support their collaborations. In the field of multi-agent deep reinforcement learning (MADRL), agents can improve the overall learning performance and achieve their objectives through communication. Agents can communicate various types of messages, either to all agents or to specific agent groups, or conditioned on specific constraints. With the growing body of research work in MADRL with communication (Comm-MADRL), there is a lack of a systematic and structural approach to distinguish and classify existing Comm-MADRL approaches. In this paper, we survey recent works in the Comm-MADRL field and consider various aspects of communication that can play a role in designing and developing multi-agent reinforcement learning systems. With these aspects in mind, we propose 9 dimensions along which Comm-MADRL approaches can be analyzed, developed, and compared. By projecting existing works into the multi-dimensional space, we discover interesting trends. We also propose some novel directions for designing future Comm-MADRL systems through exploring possible combinations of the dimensions.


INTRODUCTION
Many real-world scenarios, such as autonomous driving [9], sensor networks [11], robotics [3] and game-playing [1,10], can be modeled as multi-agent systems.Such multi-agent systems can be designed and developed using multi-agent reinforcement learning (MARL) techniques to learn the behavior of individual agents, which can be cooperative, competitive, or a mixture of them.As agents are often distributed in the environment where they only have access to their local observations rather than the complete state of the environment, partial observability becomes an essential assumption in MARL [2,5,7].Moreover, MARL suffers from the non-stationary issue [8], since each agent faces a dynamic environment that can be influenced by the changing and adapting policies of other agents.Communication has been viewed as a vital means to tackle the problems of partial observability and non-stationary in MARL.Agents can communicate individual information, e.g., observations, intentions, experiences, or derived features, to have a broader view of the environment, which in turn allows them to make well-informed decisions [8,12].
Due to the recent success of deep learning [4] and its application to reinforcement learning [6], multi-agent deep reinforcement learning (MADRL) has witnessed great achievements in recent years, where agents can process high-dimensional data and have generalization ability in large state and action spaces [2,5].We notice that a large number of research works focus on learning tasks with communication, which aim at learning to solve domainspecific tasks, such as navigation, traffic, and video games, through communicating and information sharing.To the best of our knowledge, there is a lack of survey literature that can cover recent works on learning tasks with communication in multi-agent deep reinforcement learning (Comm-MADRL).Most Comm-MADRL surveys cover only a small number of research works without proposing a fine-grained classification system to compare and analyze them.
In our survey paper, we review the Comm-MADRL literature by focusing on how communication can be utilized to improve the performance of MADRL techniques.Specifically, we identify 9 dimensions that correspond to unique aspects of Comm-MADRL systems and call them: Controlled Goals, Communication Constraints, Communicatee Type, Communication Policy, Communicated Messages, Message Combination, Inner Integration, Learning Methods, and Training Schemes.These dimensions, which form the skeleton of a Comm-MADRL system, can be used to analyze and gain insights into designed Comm-MADRL approaches thoroughly.By mapping recent Comm-MADRL approaches into this multi-dimensional structure, we not only provide insight into the current state of the art in this field but also determine some important directions for designing future Comm-MADRL systems * .1.We further outline a systematic procedure for providing a guideline to effectively navigate through these dimensions when developing Comm-MADRL systems.The procedure allows us to organize the dimensions, demonstrate their relevance in system design, and guide the creation of customized Comm-MADRL systems in a step-by-step manner.
As outlined in Procedure 1, reinforcement learning agents employ communication throughout their learning and decision-making.Initially, the learning objective for the agents is set, defining rewards that induce cooperative, competitive, or mixed behaviors, as captured by dimension 1.We then consider potential communicationspecified settings like limited resources, addressing the need for realistic scenarios as described in dimension 2. Dimension 3 identifies potential communicatees, determining the agents for messages to be received, which varies across domains.At each time step, agents decide when and with whom to communicate, as highlighted in dimension 4. The patterns of communication occurrences are structured like a graph, where links, either undirected or directed, aid information exchange.Subsequently, messages that encapsulate agents' understanding of the environment are generated and shared, relating to dimension 5. Given that agents often receive multiple messages, they must decide on how to combine these messages effectively.This process, crucial for integrating messages into their policies or value functions, is captured in dimensions 6 and 7.In cases of Comm-MADRL studies focusing on emergent language (i.e., learning tasks with emergent language), where messages are modeled as communicative acts emitted alongside domain-level actions, a specific rearrangement of the procedure is required.Here, messages are not observed by other agents until the next time step.Therefore, the processes outlined in dimensions 6 and 7 (lines 8 and 9) are moved to the front of those in dimension 4 (line 6).This rearrangement allows agents to combine and integrate messages from the previous time step before initiating new communication.As a result, agents make decisions and perform actions in the environment based not only on their environmental observations but also on information obtained from other agents (lines 10 and 11).During the training phase, experiences from both environmental interactions and inter-agent communication are utilized to train how agents will behave and communicate, i.e., agents' policies, value functions, and communication processes, as characterized in dimensions 8 and 9 (line 14).

CONCLUSIONS
Our survey proposes to classify the literature based on 9 dimensions.These dimensions constitute the basis of designing Comm-MADRL systems.We further categorize existing works under each dimension, where readers can easily compare research works from a unique perspective.Based on those dimensions, we also observe findings through the trend of the literature and identify new research directions by filling the gap among recent works.Our survey concludes that while the number of works in Comm-MADRL is notable and represents significant achievements, communication can be more fruitful and versatile to incorporate non-cooperative settings, heterogeneous players, and large-scale multi-agent systems.Agents can communicate information not only from raw image inputs or handcrafted features but also from diverse data sources such as voice and text.Furthermore, we can explore novel metrics to better understand the contribution of communication to the overall learning.

Table 1 :
Proposed dimensions and research questions.
Learning tasks with communication in multi-agent deep reinforcement learning is a challenging problem.Numerous studies have emerged, developing effective and efficient Comm-MADRL systems, with overlapping characteristics.To better distinguish among these models, we propose classifying them based on several dimensions in Comm-MADRL system design.We start by focusing on three key components of Comm-MADRL systems: problem settings, communication processes, and training processes.Problem settings concern the settings of Comm-MADRL systems developed for learning, encompassing both communication-specific settings (e.g., communication constraints) and non-communication-specified settings (e.g., reward configurations).Communication processes concern the decision as to whether to communicate or not, and what message to communicate.Training processes concern the learning of both agents and communication within MADRL.Based on the three key components, we identify and summarize 9 research questions that commonly arise in Comm-MADRL system design, corresponding to 9 dimensions as detailed in Table Procedure 1 A guideline of Comm-MADRL systems