A behavioral task for exploring dynamics of communication system in dilemma situations

This research proposes a behavioral task to demonstrate the process of evolution of human communication systems based on the Machiavellian intelligence hypothesis, claiming that human sophisticated social intelligence such as linguistic ability has been formed through behaviors that maximize self-interest in a competitive social situation. The proposed task was designed as a dilemma game involving messaging to establish Machiavellian communication. The game was developed based on experimental semiotics, a method that generates novel artificial language and examines language functions. Through the proposed task, pairs of participants attach meanings to arbitral graphic symbols forming novel communication systems. In case studies using this task, participants modified or ambiguated the communication system by means of a dilemma between sharing and monopolizing rewards. The result suggests that the proposed game causes ambiguation of the communication system that functions equivocally.


Introduction
Humans communicate socially by mentalizing each other [2,5] and particularly need to utilize their high thinking ability to share their intentions with others [10], which enables a cooperative society to be established. There are several hypotheses regarding the origin and evolution of such advanced intelligence. One such popular hypothesis is the Machiavellian intelligence hypothesis. This claims that such a sophisticated ability has been produced by behaving to enhance self-interest by concealing betrayal intentions through lies and deception [3,4]. When humans engage in such Machiavellian communication, language seems to play a crucial role, because the intention to monopolize profit can be concealed through the clever use of language. However, the fundamental question about how humans have acquired such linguistic ability remains unanswered.
Experimental semiotics (ES) is a discipline that can be utilized in the laboratory to observe how human language emerges and evolves. This uses a constructive experimental method to restrict participants' means of communication in collaborative tasks, such as computer games [7,8,11]. In such tasks, meaning is attached to arbitral graphic symbols through interactions between the participants making a new and novel communication system. In a past study by one of the authors of this paper [9], the ES method was improved to make it easier to quantitatively analyze the formation process of the communication system. The research proposed a collaborative task in a grid-world with 2 × 2 square rooms (upper-left, upperright, lower-left, and lower-right). Two participants who are randomly allotted to one of the four rooms simultaneously decide to either stay in the initial room or to move vertically or horizontally to one of the adjacent rooms aiming to meet with each other. In this task, a participant cannot see the other participant's room before the movement. In such a collaborative situation, the person usually uses language to share their intentions before making the decision. Contrary to such a daily social situation, conventional communication systems (such as verbal utterances, gestures, and facial expressions) are restricted in the experiment. Instead of these systems, each of the participants can send a message that consists of two symbols selected from five figures (such as a square, parallelogram, trapezoid, diamond, or triangle). 1 The graphical figures have no attached meaning at the beginning of the experiment; however, meaning gradually develops as the task is repeated. Consequently, the participants build a novel communication system, much like language. For example, the meaningless triangle figure becomes a code such as indicating that "I will go to the upper left room." During these ES tasks, the goal of the game was to complete the collaborative task (namely, meeting in the same room). Thus, only simple communication systems were established among the participants. It would be reasonable for a primitive communication model to employ such a simple situation. However, in modern human society, sometimes complicated communication is required to choose to either "cooperate" or "defect". Therefore, such a task is insufficient for explaining communications in real human society. We believe that this problem can be solved by redesigning the task to add complexity: the dilemma of cooperation and defection.
The dilemma game task requires participants to choose between cooperation and defection [1]. One such task is a stag hunt, which has two pure strategies (the Nash equilibria): hunting a stag (cooperative choice) as a gain dominance strategy and hunting a hare (betrayal choice) as a risk dominance strategy. This task has been used in several behavior studies, where humans (or animals, for example chimpanzees) play the game in instantiated situations, not only in pure game-theoretical studies. The graphical figures have no attached meaning other than restricted common explicit communication with chimpanzees or human participants. Previous results indicated that chimpanzees could not overcome the uncertainties of others' behavior, falling into betrayal choice and gaining only a small profit. Conversely, human participants could build trust in a relationship through means such as eye contact, making an amicable choice, thus gaining large profits. With regard to the iterated stag hunt tasks, Yoshida et al. [12] proposed a computational model of dynamic belief inference for this game and evaluated human performance. They concluded that an appearance of cooperative behavior increases with the sophistication of the inferences to the other's mental states. According to these studies, it can be considered that high inference ability and quality of communication affect participants' behavior choices in the dilemma environment.
Based on these related studies, the present research proposes an experimental task to examine how the dilemma of cooperation and betrayal affects the evolution of communication systems. The task combines behavioral game theory with ES studies by reproducing the environment in which complicated communication systems emerge and evolve. The present research also conducts small case studies using the proposed task to demonstrate what kind of communication system is established and changed in the interaction between participants. This is achieved by providing a dilemma situation with restricting conventional language communication. Specifically, we focus on the establishment of ambiguous communication that conceals betrayal intention in these case studies.

Proposed task
To explore how the dilemma of betrayal and cooperation affects the evolution of communication systems, we developed a novel experimental task. This section explains (1) the environment, (2) the flow, and (3) the factors that influence profit-seeking behaviors in the task.

Environment
Following the study introduced in the previous section [9], we developed a task employing a grid-world where players simultaneously move their locations while referring to exchanged graphical symbols. As modifications from the previous study, the grid-world was expanded from 2 × 2 to 3 × 3 , and the number of figures used as a message was increased from five to ten to realize a more complex situation (see Fig. 1). Furthermore, a "reward" (coins in Fig. 1) was added to the world to construct a dilemma situation, where the participants chose to share or monopolize the reward.
During the game, each player was only able to see the other player or reward when they were located in either the adjacent or same room (see "Placement" in Fig. 1). This means there are gaps in information between two players. We suppose that this setting allows participants to make Machiavellian communication concealing their betrayal intentions. For example, in Fig. 1, the blue player observes the reward while the orange player does not know where the reward is. In this case, the blue player can choose to monopolize the reward or share the reward by informing 1 The participants could be selected those figures in the order of . Thus, in addition to sending the figures, they could send a blank ( ) messages.
the orange player of the reward location. Furthermore, if the player succeeds in hiding their betrayal intention, they can monopolize the reward while being shared by other players when they cannot directly observe the reward in the future round. Figure 2 illustrates the overview of the game process. The game is a round-based game, and each round has four independent phases, as follows:

Task flow
1. Placement phase: When a participant starts the game as a player, two players and one reward are randomly assigned to different rooms. Therefore, sometimes both players are located rooms adjacent to the reward where both players can see the reward. In other situations, only one player is located in such a room while the other player cannot see the reward from a room that is not adjacent to the reward. In the latter situation, a gap in information between the two players emerges.

Messaging phase:
Each player can send a message at any time during this phase. A message is constructed by combining two graphic figures (see messaging in Fig. 1). Since the message is delivered immediately after pressing the send button, one of the players can decide their message while observing the received message.

Moving phase:
The player makes a decision whether to move to the adjacent room (see the blue player in Fig. 1) or stay in the same room (see the orange player in Fig. 1).

Judgement phase:
The game calculates the score based on the payoff structure (mentioned later) and judges how the next round will begin. If the reward was taken, the new round begins with the placement phase allotting new rooms to two participants and a reward (the two right bottom-up paths in Fig. 2); otherwise, it begins with the messaging phase taking over the rooms of the players and a reward from the previous round (the middle bottom-up path in Fig. 2).
This study supposes that such a task causes the dilemma between monopolizing (getting the reward by one player) and sharing (splitting the reward by two players).

Factors changing profit-seeking behaviors
The task has parameters relating the payoff structure that determine the ratio of the profit gained by betrayal and cooperative behaviors corresponding respectively to monopolizing (m) and sharing (s) the rewards in the task. These payoff structures basically follow the dilemma of game theory-particularly the stag hunt task as a reference. At the same time, the current task is quite different from common dilemma Fig. 1 The environment of the game Fig. 2 The flowchart of the game task games in that the present study had a communication phase before decision making.
Note that such a phase causes the participants to impose a cognitive cost when attaching meaning to their own messages and interpreting others' messages. In addition, the total amount of rewards gained in the game is influenced by time spent on messaging. For this reason, participants were able to avoid loss of time due to messaging and moving. Thus, the actual dynamic payoff structure changes depending on the various following factors: each participant's thoughts, previous behavior, and remaining time.
Summarizing the above, although we assume that the values of m and s work as factors to influence the behaviors of cooperation and defect, these values alone cannot specify equilibrium behavior in the situation. The focus of the proposed task is not to determine such a "true payoff structure." Instead, the messaging phase created and evolved a communication system, which we observed and analyzed. We particularly try to quantify ambiguity of communication systems and examine the relationship between cooperative and defect relations.

Case studies
The case studies were conducted to demonstrate the relationship between the formation of the communication system and behaviors of exploring the reward in the grid-world. Three case studies were conducted with different payoff structures to explore variations in communication systems caused by several dilemma situations.

Participants and conditions
Six male participants (Mean age = 21.7years, SD age = 1.25 ) were assigned into three pairs, which involved different conditions of the payoff structure: the m < s condition ( m = 1, s = 2 ); m > s condition ( m = 2, s = 1 ); and m = s condition ( m = 1, s = 1 ). The numbers in parentheses indicate the reward values gained by each behavior of monopoly and sharing. In addition, the round in which no one got the reward was set as 0 points under any conditions. Table 1 summarizes the above parameter settings.

Materials
The task was implemented as a web application using Apache Web server, PHP, Ajax, and MySQL. The participants who were physically located in the other rooms accessed the server from individual computers. The experimental environment was presented in a display and the manipulation to the environment (creating a message and moving a room) was made by a mouse. In the messaging phase, the currently selected figure was presented on a clickable area of HTML. Each time a participant clicked the area, a figure appeared in the order presented in Fig. 1. Such manipulation of the environment generated a cost as discussed in Sect. 2.3.

Procedure
The overall procedure of the studies is presented in table 2. The procedure was divided into two main parts: the coordination task and the dilemma task. Contrary to the dilemma task that was explained in Sect. 2, the coordination task only allowed participants to share the reward. That is, the participants received the reward only when they simultaneously moved to the reward room. This coordination task was conducted to construct the basis of the communication system. In the following section, we explain how the constructed communication systems were changed during the dilemma task.
In the instructions for these tasks, we did not explain the aim of these tasks but did explain the basic rules and the scores obtained from each behavior to prevent bias influencing behaviors in the task. Participants were only required to "maximize your score." Participants were also asked to write down their ideas during the task on note paper provided by the experimenter.
The experimental procedure also contained questionnaires for each task. In the questionnaire, we asked the participant "what meanings did you attach to each figure?" This question was set to support the quantitative analysis of the messaging phase.  Execution of coordination task 10 Questionnaire for the coordination task 5 Instruction of dilemma task 30 Execution of dilemma task 10 Questionnaire for the dilemma task 120 Total time of the study

Results and discussion
The aim of this study was to demonstrate what kind of communication system would be established and changed in the interaction between participants by providing a dilemma of sharing and monopolizing rewards. To answer these questions, we first examined the tendencies of sharing and monopolizing behaviors in each payoff condition showing the result concerning the obtained rewards for each participant. Following this analysis, we demonstrate the changes in communication system caused by the patterns of rewardseeking behaviors (monopolizing and sharing). Figure 3 presents the cumulative number of rewards taken under each condition and situation by each participant. Note that the figure does not present a score computed based on table 1. Rather, it presents the number of rewards that the player obtained until that round. In Fig. 3, the blue line indicates the reward in the coordination task while the other two lines indicate the rewards in the dilemma task. In these graphs, we assigned the participants who gained more rewards in the dilemma task as player 1 and the other participant as player 2. The difference in the number of rewards between the two players in the dilemma task indicates how they decided whether to monopolize or share the reward. If the players shared the rewards, the red and green lines would synchronize with each other, while if one player maintained one-sided monopolizing, the difference of the two lines would become larger.

Reward-seeking behaviours
From Fig. 3a, we can observe a large difference between the participants under the m > s condition compared with the other conditions. This result, not surprisingly, indicates that the participants changed their behavior according to the given payoff structure. Given this result, the following sections analyze how such relations of cooperation/defect influence changes in communication during the task.

Changes of communication systems
We assume that the emergence of Machiavellian communication is accompanied by equivocal linguistic expressions such as homophones. If a player receives an ambiguous message from the partner's monopolizing behavior, the player cannot decide whether the partner's behavior was due to the intention of betrayal or mere miscommunication. Based on this assumption, we investigated whether the communication system established in the collaborative situation was changed through the dilemma task by analyzing each player's message log.
In this analysis, we first assume a communication system ( CS j ) as a mapping from a set M, whose members are each a token of message m to a semantic space S i , consisting of a referenced state s in the task ( CS j ∶ M → S i ). Here, i differentiates variations of semantic space while j corresponds to variation of communication systems derived from the assumed semantic space.
To quantify how the actually obtained data matches to CS j , we need to specify S i referred by the communication system. Among possible semantic spaces in this task, 2 this study focused on the destination of a player after sending a message (the rooms in the grid-world). From the information about the destination, the players in this task can reason the partner's situation. If the same message was delivered as in the previous round, this would indicate the player has found Fig. 3 The cumulative number of rewards taken the reward and is waiting for the partner's movement to a room adjacent to the reward.
Supposing a semantic space as referring to destination, the required number of m (a token of message) is determined as nine, that is the same number of s (a refereed state of the task) while there are a total of 100 tokens in this task (combinations of the two figures selected from 10 types). From possible patterns selecting nine from 100, we limited those constructed only from one figure ignoring the combinations of the two figures. Thus, we assumed that the 9 types of figures (excluding blank from 10 types) are used to refer the rooms of the destination.
By setting the assumptions so far, we could compute match rate that indicates the ratio of the used message to fit each possible communication system. Here, the number of communication systems (the number of j) is computed as the permutation of the 10 types of figures and destinations ( 10 P 9 = 3628800). 3 Table 3 presents an example of this analysis obtained from all usage of the left figure in the coordination and dilemma tasks conducted by the m > s condition. The table presents the top five and the worst communication systems. Each column represents a combination of a specific figure and a destination room. Table 4 Figure 4 represents changes of the match rate during the tasks as the heat maps whose vertical and horizontal axes respectively indicate the round of the game and the patterns of communication systems sorted with the match rate. The color of each cell in the heat map represents the degree of the match rate: the darker cells indicate a lower match rate while the brighter ones indicate a higher match rate. Contrary to those in tables 3 and 4, the match rate in Fig. 4 was calculated for each pair and each round. Since calculating a match rate required a moderate sized sample, we used recent 10 rounds of movement as a denominator (a sliding window of 10 rounds). 4 Together, the bright horizontal continuous rows in the heat map indicate a communication system that is highly consistent with the actual behavioral data. Conversely, the bright vertical continuous columns show that the message sent by the participants match many communication systems, suggesting that the participants sent ambiguous messages.
From these figures, we can observe the differences of patterns between conditions. In particular, the players in the m = s condition were quite different from the other conditions, indicating that they did not converge to the specific communication systems. This result may suggest that the communication systems made by them were not included in the assumed set of communication systems that assign the destination in the moving phase as meanings of each graphical symbol. This interpretation is supported by the results in table 4. The participants also reported other types of communication systems in the questionnaire conducted after the task. To present this, answers for the post questionnaire are presented in the appendix.
Contrary to the players in the m = s conditions, the players in the m > s and m < s conditions indicate fine-grained temporal dynamics of convergence to the specific communication systems during the tasks. To closely examine the temporal patterns, we quantified the degree of convergence of the communication system in the following analysis.

Relations between reward-seeking and changes of communication systems
In this analysis, we assume that there are several relationships between the reward-seeking behaviors and the changes in communication systems. If participants establish clear communications systems, they can easily collaborate to share the rewards. Otherwise, they may choose monopolizing behavior, as suggested by Duguid et al. [6]. We can also assume Machiavellian communication where the participants send deceptive messages aiming to conceal their betrayal intention. We explored such relations using standard deviation (SD) as the index of convergence in communication systems, which is determined from the following equation.
where n is the number of patterns of the communication system (3628800), i is the order of match rate, x i is the match  rate in a communication system, and x is the mean of match rates in each round. If a communication system has a significantly higher match rate than the others (convergence to a single communication system), this value will be high. If there is no such communication system (ambiguous communication), the value of this index will be low.
In this analysis, we focus on the dilemma task. Fig. 5 describes how this index fluctuated and how the rewardseeking behavior of the participants related to this fluctuation during the dilemma task. In the three graphs, the horizontal axes present the rounds while the vertical axes present the behavior patterns of reward-seeking and changes in communication systems. The green lines represent the difference of reward between two players, showing patterns of reward-seeking behaviors in the dilemma task. The blue and orange lines indicate the convergence indices (Conv) for player 1 and player 2, respectively, which was calculated from Fig. 4. Consistent with Fig. 3, we can observe the growing trend of the green line in the m > s condition while there is no clear one-sided relation in the other conditions. To examine the relationship between the reward-seeking behavior and the convergence of communication systems, we calculated Pearson's correlation coefficients between the indices (Table 5).
Although we found significant correlations between the reward-seeking behavior and the changes in communication systems for all the conditions (other than the m = s condition), the patterns of the correlation were different between the m > s and m < s conditions. In the m > s condition, as the difference of rewards grew the communication systems became ambiguous (the lower Conv value). To explore the reason for this, we examined the raw data of the message log and found that in the final phase of the game, the two participants sent the blank graphics to the others regardless of their destination. These processes suggest that the one-sided monopolization behavior destroyed the communications system that had been constructed in the coordination task. Contrary to this, in the m < s condition, the reward difference was correlated with player 1's communication system. As the player gained fewer rewards, the communication system made by the player became more concrete.

Conclusion
This paper proposed a behavioral task to explore the dynamics of the communication system in dilemma situations and conducted case studies using the proposed task. The proposed task was based on the ES studies adding dilemma relations between betrayal and cooperation. The results of the case studies demonstrated the influence of payoff structures on reward-seeking behaviors and changes in communication systems. In the conditions where monopolizing behavior had a high incentive, participants destroyed the communication system, while in the condition where the sharing behavior had a high incentive, participants maintained the communication system.
These results are not surprising; however, we consider that our proposed task has the advantage of examining the temporal dynamics of forming the communication system quantitatively. The important result obtained in the m < s condition is that the communication system did not converge even in the final phase of the game. This process implies that some kind of Machiavellian interactions occurred in this condition. In fact, player 2 in the m < s condition reported that he sent ambiguous messages at the beginning of the dilemma task to disturb the partner at the interview after the task. The partner of this player also wrote a note reporting that player 2 had some times lied during the task. In future study, we will explore the relationship between such complicated interactions with the sophistication of constructed communication systems.