A folk theorem with codes of conduct and communication

We study self-referential games in which players have the ability to commit to a code of conduct—a complete description of how they play and their opponents should play. Each player receives a private signal about each others’ code of conduct and their codes of conduct specify how to react to these signals. When only some players receive informative signals, players are allowed to communicate using public messages. Our characterization of the effect of communication on the equilibrium payoffs yields a folk theorem and players share their private information truthfully in equilibrium. We also provide an application of codes of conduct: games that are played through computer programs.


Introduction
In many economic environments people develop social norms or conventions when they regularly play familiar (or similar classes of) games, specifying behavior and choices everyone is expected to conform to. These social norms naturally endow people with the ability to commit, and this in turn allows the possibility that certain behavior can be visible to other people due to costs of choosing alternatives outside the conventional norm (see, for example, Schelling 1960;Frank 1988). Such consensus on behavior gives rise to the emergence of cooperation even when these economic relationships are not sustained in the long run, and therefore, not susceptible to future incentives.
In this paper, we examine such situations where players employ codes of conduct which are defined as a complete specification of how they play and their opponents should play. Players also receive private signals about what code of conduct their opponents may be using, while their own code of conduct enables them to respond to these signals. We focus on the limit case of perfectly informative signals because we are interested in applications such as games played through agents whose codes of conduct determine their compensation schemes (the contracts players sign with their agents are observable), if the agent is human, or are embedded in their programming, if automated.
We show a folk theorem for finite normal form games using simple trigger codes of conduct and under two observability assumptions. First, we make the relatively standard assumption that all players can observe their opponents' codes of conduct as in many models of conditional commitment devices (see, for example, Tennenholtz 2004; Kalai et al. 2010). We demonstrate that our codes of conduct generalize the commitment device space described by Kalai et al. (2010) and we discuss how codes of conduct can be applied to, but are not restricted to computer algorithms (program equilibrium as described in Tennenholtz (2004)). These program strategies are selfreferential in the sense that they take as input the opponents' programs; then they syntactically compare them with its own description and execute an output strategy depending whether they are equal or not.
Our main result, however, is for settings in which not all players can observe opponents' codes of conduct. In large communities, for example, a group of people can be excluded from monitoring or have limited ability to screen whether other people would behave within the current convention. In general, it may be reasonable to assume that people have good information about whether a few people with whom they closely interact use the same code of conduct that they do, but not so reasonable that they would have this information about everyone in the community. However, we allow the possibility that people can communicate via cheap talk what they have observed about others. Specifically, we extend the idea of self-referential games to assume that after observing private signals about rivals' codes of conduct players are allowed to send public "cheap talk" messages. Communication is per se costless but within the self-referential framework we show that it becomes a "signaling" device since the code of conduct includes the cheap talk messages, thereby players may receive different private signals depending on different message strategies. Although players choose strategically what to communicate, we show that there is an equilibrium in which players will reveal their private information truthfully. As a result, our key finding is that with public communication it is sufficient to get a folk theorem that every player is observed by at least two other players.

Related literature
The key feature that distinguishes our paper from conditional commitment device models is that our commitment device (code of conduct) is a function of signals rather than the opponents' devices. The two most closely related papers are Tennenholtz (2004) and Kalai et al. (2010) in which agents commit to a particular behavior in response to their opponents' device. By contrast, in our model the commitment device affects the likelihood of signals on which the other players would base their behavior; it requires an indirect connection between commitment and observation, thereby relying on a weaker observability assumption and expanding the number of applications our model can be used for. Our model encompasses Tennenholtz's model of conditional commitment devices where players play through computer programs and receive as input their opponents' program. Unlike Tennenholtz, we fully describe the space of codes of conduct and our folk theorem reaches efficiency. Similar to Kalai et al. (2010), we characterize the space of codes of conduct avoiding typical circularity problems, but in contrast to their model, we allow for mixed strategies, do not require the use of jointly controlled lotteries, and consider normal form games with more than two players. In their models, every player condition her play on all the other players' devices, and their equilibrium construction breaks down if this observability assumption is not satisfied. Unlike these papers, our main focus is on situations where players are able to observe some opponents' code of conduct, and we propose a theoretical approach for incorporating public communication via cheap talk messages into conditional commitment device models.
More recently, attention has been drawn to noisy environments; Block and Levine (2015) examine players that observe imperfectly informative signals about each others' codes of conduct, as Levine and Pesendorfer (2007) do within a evolutionary framework. Block and Levine (2015) prove a folk theorem for repeated games with private monitoring. How the period at which players receive signals about others' codes of conduct affects the equilibrium payoff set was explored by Block (2013). In a much less noisy environment, Bachi et al. (2014) study games in which deceptive players may betray their true intentions. They show a folk theorem for two-player normal form games if the cost of deception is sufficiently low. By contrast, in our model these kinds of costs are embedded in the likelihood of private signals but could be readily incorporated. Although a leading example to motivate commitment in this literature is the idea of a communication phase before the actual play of the underlying game, to the best of our knowledge this is the first paper that incorporates explicitly such a communication round.

The model 2.1 The baseline game
We study a finite N -player normal form game = (I, (S i , u i ) i∈I ). There is a set of N players I = {1, 2, . . . , N } indexed by i. Each player i chooses a strategy s i from the finite strategy set S i . Let S = × i S i be the product space of the individual strategy sets. Let s ∈ S denote a strategy profile. The payoff of player i is u i : S → R.
We write (S i ) for the set of mixed strategies for player i and let (S) = × i (S i ). To avoid dealing with measure theoretic considerations when we describe the selfreferential framework, we restrict attention to a finite subset of those mixed strategies for each player i that we denote by S i ⊆ (S i ) with generic element σ i . We extend payoffs to mixed strategy profiles σ ∈ S = × i S i in the standard way. Define a minmax strategy against player i as σ

The self-referential game
We present the model and notation that we introduced in Block and Levine (2015). For any baseline game , we embed in the self-referential framework, and therefore define the self-referential game G( ). In the beginning of the game G, every player i privately observes a signal A strategy for player i in G is a mapping from his set of signals to his subset of strategies in and a mapping for each other player from their set of signals to their subset of strategies in . We referred to such strategy as a code of conduct denoted by r i . We emphasize that a code of conduct for each single player specifies how all players should play. The reason why we assume that a code of conduct specifies what the player expects from others is that, first, this is how the majority of social norms are built and, second, it allows us to have a well-defined notion of agreement between players when the game is asymmetric and players have different roles. 1 Formally, a code of conduct r i is a 1 × N vector for which each coordinate j corresponds to a mapping from a set of signals Y j to the subset of mixed strategies S j . 2 We think of codes of conduct as social norms that emerge when people interact in familiar games. Specifically, each player chooses a code of conduct that everyone is supposed to follow (what to play conditional on each private signal y j ) and simultaneously commits to follow the code himself. We assume that there is a common space of codes of conduct where S Y j j represents the set of all mappings with domain Y j and range S j . 3 For sake of simplicity and to avoid existence issues, we assume that the minmax strategies and any mixed strategy equilibrium of the underlying game belongs to R 0 . 4 We write r ∈ R = × i R 0 for the profile of codes of conduct. With some abuse of notation, we write r i (y i ) = r i i (y i ). Crucial to codes of conduct is the ability of players to receive signals about the codes of conduct used by opponents. We model this by assuming that for each r ∈ R there is a probability distribution π(·|r ) ∈ (Y ) over Y . We let π i (·|r ) denote the marginal probability distribution of π(·|r ) over Y i . For any y i ∈ Y i , let π i (y i |r ) be the probability of y i given r ∈ R.
We can now define the expected payoff from using codes of conduct: for player i is Note that codes of conduct determine both how players behave as a function of the signals they receive and the probability distribution over the signals players receive about each others' codes of conduct. As a result, the expected payoff of player i can be decomposed into two parts: the first part u i depends on the actual play, that is, r i (y i ) = σ i for each i ∈ I ; while the second part π(y|r ) is determined by both what players planned to choose and what they expected from their opponents, that is, The timeline of the self-referential game is as follows: 1. Each player simultaneously chooses r i , which is not observed by the other players. 2. Each player privately observes the realization y i of his own signal. 3. Each player chooses a strategy σ i according to r i . A Nash equilibrium in the self-referential game G (or a self-referential equilibrium) is a profile of codes of conduct r ∈ R such that for all players i and anyr i = r i , it follows

The folk theorem
Social norms are essentially based on a broad notion of "reciprocity," meaning that people conform and expect others to conform, and people would conform if all others conform. We proceed to define a self-referential framework within which social norms might be such that if people are likely to act according to the current social norm, then alternative behavior should be visible to other people. We first assume that each player is able to detect all the opponents that do not choose the same code of conduct prescribed by the conventional norm. In other words, players can directly observe their opponents' codes of conduct as in Tennenholtz (2004) and Kalai et al. (2010). Specifically, we say that the self-referential game G permits detection if for any code of conduct profile r ∈ R where r i = r j for all i, j, and for each player i, there exists a subset of signals Y i j ⊂ Y j for all players j = i such that π j (Y i j |r i , r −i ) = 1 for anyr i = r i and π j (Y i j |r ) = 0. We view this detection notion to be plausible in small communities, and when players delegate their play either to an agent (with irreversible compensation schemes) or to programs (for example financial trading and proxies; see Section 4).
Next, we state our first main result:

for all players i with strategy profile σ ∈ S and G permits detection, then there exists an r
Proof Take any σ ∈ S such that for any i, u i (σ ) ≥ u i . Consider the code of conduct r i ∈ R 0 that prescribes If all players choose r i , any player i would get U i (r ) = u i (σ ). Contrary, if player i adheres to somer i so thatr i (y i ) =σ i for all y i ∈ Y i and anyσ i ∈ S i ; and r j i (y j ) = r j i (y j ), he gets U i (r i , r −i ) = u i . It follows then that r is a Nash equilibrium of the self-referential game.
This theorem is similar to the "benchmark theorem" in Levine and Pesendorfer (2007) with the difference that we consider asymmetry and more than two players. Notice that to obtain a full folk theorem, for example, for games that do not have Pareto efficient payoffs in pure strategies or pure minmax strategies, we do not need to use jointly controlled lotteries as Kalai et al. (2010) require because players do not condition directly on opponents' codes of conduct, and hence they can commit to randomize after receiving information about opponents' codes of conduct. Since we can accommodate correlated strategies by defining S i as a subset of such strategies and assuming that players have access to a public randomization device, our folk theorem attains efficiency unlike program equilibria (Tennenholtz 2004).

Only some players informed
While it may be a reasonable approximation that players can detect whether or not some other players use the same code of conduct as themselves, in many settings it is reasonable to suppose that they can do so only for a small subset of other players who they observe closely. Depending on the context, a small group of members in a community might be able to identify who comply with the established convention but may require the help of other members to inflict a social punishment on deviators. Relaxing the observability assumption, we imposed above creates a difficulty when establishing a folk theorem because players needs to communicate what they observe about others' codes of conduct, and they must have an incentive to report truthfully. We find that the property the self-referential game must satisfy to prove a folk theorem is that each player's code of conduct is observed by at least two other players when they can publicly communicate. This observability assumption is weaker than the one imposed by Tennenholtz (2004) and Kalai et al. (2010).
We assume a cheap talk communication stage: after receiving their private signals y i ∈ Y i , players simultaneously make announcements z i ∈ Z 0 that can be observed by everyone. 5 The set of possible announcements Z 0 is finite and common to all players. A profile of announcements is z ∈ Z = × i Z 0 . Note that the identity of both the announcer and the subset of players who are thought to have deviated are crucial. Let z D i ∈ Z 0 be player i's announcement pointing that a subset of opponents D ⊆ I have chosen a different code of conduct (that is some players potentially adhere to a different social norm). We require that there be at least 2 N of such possible announcements, that is, # Z 0 ≥ 2 N . We allow for not sending a message {∅} ∈ Z 0 . A strategy for player i consists of an announcement policy m i : Y i → Z 0 and an implementation policy φ i (·, y i ) : Z → S i given any private signal y i ∈ Y i . In an extended self-referential game E, a code of conduct now specifies r j i = (m j i , φ j i ) for each j ∈ I , which belongs to the common space of codes of conduct As before codes of conduct not only determine behavior as a function of signals, but also the probability distribution over the signals; for r ∈ × i R 1 we continue to denote this by π(·|r ) ∈ (Y ).
To prove a folk theorem, we require that each player is observed by at least two other players. This avoids the possibility that one player deviates, and at the same time, points the finger at the only person who is able to monitor her. In this case, the remaining players do not know who to punish and may not be able to punish both. Formally, we say that the extended self-referential game E weakly permits detection, meaning that for any code of conduct profile r ∈ R with r i = r j for all i, j, and for any player i there is subsets of signals What weak detection says in a sense is that there are "neutral" witnesses, that is, people who observe wrongdoing but who cannot be credibly accused of wrongdoing by the wrongdoer and this information is common knowledge in the society.
We are now in a position to state the second main result of the paper: Theorem 2 For all σ ∈ S such that v i = u i (σ ) ≥ u i for all i ∈ I , if the extended self-referential game E weakly permits detection, then there is an r ∈ R such that (v 1 , . . . , v N ) is a Nash equilibrium payoff of E.
Proof Take σ ∈ S such that u i (σ ) ≥ u i for all i. We construct r i ∈ R 0 as follows.
j for all y j ∈ Y k j or for all z ∈ Z such that z k l ∈ z for some l, k ∈ I ; otherwise φ j i (z, y j ) = σ j . If all players choose r i then U i (r ) = u i (σ ). We begin by checking potential deviations. It suffices to check the following cases. Suppose player By weak permit detection, there is a player j = i that receives y j ∈ Y i j and would announce z i j so player i would face his minmax payoff U i (r i , r −i ) = u i , even if player j = k there will be a mutually accusation which is not possible. Alternatively, Again, by weak permit detection, there is a player j = i that receives y j ∈ Y i j so will make the announcement z i j , and therefore player i would obtain his minmax payoff U i (r i , r −i ) = u i .
The reason for requiring that each player is monitored by at least two other players is that if players respond to unique announcements, then a player can always foil the system by choosing a different code of conduct and announcing another player has "misbehaved," since communication is based on cheap talk messages. At worst, when he is detected there will be two such announcements so his opponents not only do not know who to punish, but also may not be able to punish both announcers. However, if the self-referential game weakly permits detection then we can specify that when three players announce deviations and one points to the others, then the player who has two accusations is punished. In addition to circumventing the problem of meaningful announcements, communication raises the issue that players may not report observations if they would be punished by doing so. But such incentives are also corrected within the self-referential framework since the cheap talk messages are part of the code of conduct, and consequently can be detected. This is why costless communication becomes a "signal" device in this context. Thus, we construct equilibria where players voluntarily communicate what they observed about their opponents' codes of conduct and show that communication is a powerful tool to allow players to coordinate behavior when they do not share consensus on whether all players comply with the conventional norm.

Application: codes of conduct as computer algorithms
As online markets grow, people are using more often computer programs to trade on their behalf, such as "proxies" that bid on online auctions and/or keep track of posted prices, and such as click stream pricing techniques used by many websites. Sellers that operate through websites have lots of commitment power and could use computer programs employed by the buyers. Therefore, a natural physical model of strategies is that players play by submitting computer programs to play on their behalf. We next describe how self-referential games formalize these ideas and encompass the notion of program equilibrium (Tennenholtz 2004).
In the self-referential framework, computer programs work as follows. Fix a signal profile set Y and break the program into two parts, one of which generates Y based on analyzing the programs, the other of which maps Y into the strategy profile set S. The programs also receive as input the program of the other player, that is, programs work as files as well. A well-known result is the impossibility of running an algorithm in which we are able to read the opponent's program and best respond to it. On the other hand, it is still possible to write down a computer program that makes a binary choice: give one response if both programs are the same, and give an alternative response if different. Specifically, there is a finite language L of computer statements, and a finite limit l on the length of a program. 6 The space of computer programs is P = {(x n ) n=1 t ∈ L|t ≤ l}, the set of all sequences in L of length less than or equal to l. Each program p i ∈ P produces outputs p i : P × P → {1, 2, . . . , ∞} × S i . 7 The interpretation is that the program p i ( p 1 , p 2 ) = (ν i , σ i ) produces the result σ i after ν i steps. In case ν i = ∞, the program does not halt. Notice that depending on L these programs can be either Turing or finite state machines. A self-referential strategy is a pair consisting of a default strategy σ i ∈ S i and a program, r i = (σ i , p i ). After players submit their programs p 1 , p 2 , each program p i is given itself and the program submitted by the opposing player p −i as inputs. All programs are halted after ν steps. If p i ( p 1 , p 2 ) = (ν i , σ i ) and ν i ≤ ν, that is, the program halted in time we then define the mapping r i : P × P → S i as follows: r i ( p 1 , p 2 ) = σ i , otherwise r i ( p 1 , p 2 ) = σ i . Finally, to map these computer algorithms to a self-referential game we take Y = S. The probability distribution π(·|r ) is given by π(y|r ) = 1 if y i = r i ( p 1 , p 2 ) for all i, and π(y|r ) = 0 otherwise.
We now define a specific notion of self-referential games that is suitable for this context.

Definition 1
The strategy space S i is self-referential with respect to the deadline ν if for every pair of actions a i , a i there exists a strategy σ i = (d i , p i ) ∈ S i , such that The next example shows that there are games that satisfy this definition of selfreferential strategies and illustrates that we can easily construct self-referential equilibria with efficient outcomes that are not feasible in the baseline game for any Nash equilibria.
Example 1 We consider a two-player trading game where each player owns an object that can be traded with the opponent. The action space is A = {0, 1}, where 0 represents "not trade with your opponent" and 1 represents "trade with your opponent." The good is worth γ > 1 to the opponent, and 1 to the owner. The players' dominant strategy is to keep the good for themselves and the Nash equilibrium of the game is a = (0, 0). We will show that instead of an entire strategy space, a simple strategy satisfies definition 1 and that the cooperative outcome can be sustained in a self-referential equilibrium. Let the default strategy be σ i = 0, no trading. Here, the language L is the Windows command language and the listing (program) p i is given below: @echo off if "0" EQU "%3" goto sameactions echo 0 goto finish :sameactions echo n | comp %2 %4 if %errorlevel% EQU 0 goto cooperate echo 0 goto finish :cooperate echo 1 :finish This program runs from the Windows command line, and takes as inputs four arguments: a digit describing the "own" default action, an "own" filename, an opponent default action and an opponent filename. If the opponent default action is 0 and the opponent program p −i is identical to the listing above, the program p i generates as its final output the number 1; otherwise, it generates the number 0. 8 Since it has access to sequence of its own instructions, it compares them to the sequence of opponent program instructions to check if they are the same or not. 9 Although in this listing all the actual work is done by the "comp" command it is easy enough to write a program that compares two files, and takes a number of steps proportional to the length of the shorter file. In other words, the program works in finite, and relatively short time.
When both players choose the above program, the two programs are syntactically the same, and then both choose a = 1 and obtain γ . Yet, if only one player submits a different program that differs syntactically from the proposed listing, that player receives at most 1 < γ . We have shown that there is a self-referential equilibrium for the trading game that yields always trading by both players if the output actions prescribed by each program are executed simultaneously. Notice that we might be able to sustain any feasible individually rational payoff, as we showed in Theorem 1, if we replace the default strategy by the minmax strategy and target strategy by the aimed individually rational strategy profile.

Conclusion
We showed a folk theorem for normal form games where players observe perfectly informative signals that point at deviant codes of conduct, and hence deviators are punished with certainty. We further weakened the assumption about who observe these signals, highlighting the importance of communication in situations where players have the ability to use conditional commitment devices. We provided an important application of codes of conduct in this specific environment and described how these strategies can be represented as computer algorithms.