Memory retrieval and harshness of conflict in the hawk–dove game

We study the long-run dynamics of a repeated non-symmetric hawk–dove type interaction between agents of two different populations. Agents choose a strategy based on their previous experience with the other population by sampling from a collective memory of past interactions. We assume that the sample size differs between populations and define a measure of harshness of conflict in the hawk–dove interaction. We then show how the properties of the long-run equilibrium depend on the harshness of conflict and the relative length of the sample. In symmetric interactions, if conflict is harsh, the population which samples relatively more past interactions is able to appropriate a higher payoff in the long-run, while the population with a relatively smaller sample does so if conflict is mild. These results hold subject to constraints on the sample size which we discuss in detail. We further extend our results to non-symmetric hawk–dove games.

2012; Kimbrough et al. 2020), evolutionary anthropology (Glowacki et al. 2020), and psychology (Böhm et al. 2020). Outcomes of conflicts vary from active aggression and fighting (Archer 1988;Huntingford and Turner 1987) to resource sharing (Wilkinson 1984). Among other types of interactions, the hawk-dove game is a simplified representation of conflict within the context of resources sharing (Rusch and Gavrilets 2020;Smith and Price 1973). If agents of a single population are randomly matched, any payoff-monotone dynamics (i.e., a dynamics where a higher average payoff leads to a higher growth rate) brings the system to the mixed state of the hawk-dove game in which a fraction of the population plays aggressively (i.e., hawk) and fights over resources while the rest acts peacefully (i.e., dove) and avoids conflict (Weibull 1997). In a two-population setting, instead, under payoff-monotone dynamics the system converges from any initial state to an asymmetric pure state in which all agents in one population play hawk and all agents in the other population play dove (Oprea et al. 2011). These states may not occur in the long run if the dynamics is not payoffmonotone (Arigapudi et al. 2021;Bilancini et al. 2021).
With two populations, two possible pure asymmetric states can occur: either population 1 plays dove and population 2 plays hawk, or the inverse can hold true. For simplicity, we refer to these two states as equilibria. The existence of two possible equilibria imposes a selection problem. The dynamics leading to one of the two equilibria depend, at least in the medium-to-long run, on the initial state and the equilibria's respective basins of attraction. In the very long run, however, literature on stochastic stability has shown that the initial state becomes irrelevant (Foster and Young 1990): small noise renders the dynamic system ergodic and, thus, a population keeps moving across the entire state space. Once the noise abates, the system dynamic spends most of its time at the equilibrium state to which access requires the lowest number of errors and which is hence the easiest to reach.
By using the latter approach to study the dynamics of the hawk-dove game, our research is situated at the intersection of the literature analyzing conflict based on the hawk-dove game (Smith 1979(Smith , 1982, the theoretical research on the evolution of traits and behaviors in animal and human populations (Hofbauer et al. 1998;Gintis et al. 2000;Newton 2018) and the theories on the long-run evolution of conventions and institutions (Kandori et al. 1993;Young 1993a). In this paper we extend an idea that was sketched in Young (2001, Ch. 5), providing detailed conditions for the longrun equilibria in a two-player hawk-dove type of interaction, in particular revealing the crucial role played by the harshness of conflict.
More specifically, we consider a two-population setting, where agents of one population are matched with agents of the other population to compete repeatedly over time for resources in a hawk-dove type of interaction. Agents sample from a collective memory of the last actions and determine an optimal strategy based on the relative frequency of each strategy-a process known as adaptive play. In addition, we assume that agents within the same population base their actions on samples of the same size, but the sample sizes differ between the two populations. We find that the stochastically stable equilibrium-that is the equilibrium at which the system spends most of the time in the presence of small random errors-depends on the harshness of conflict which is defined by the cost of losing a fight relative to the benefit derived from winning the resource. If conflict is harsh, the population with the larger sample size chooses strategy hawk and the population with the smaller sample size opts for dove. The reverse occurs if conflict is mild. Consequently, our results confirm recent findings in the literature that the cost of fighting can play a crucial role in hawk-dove type interactions (Hall et al. 2020).
The intuition behind our result is that if the harshness of conflict is high, relatively few errors among a dove player population are needed for the other population to accommodate and switch to dove in return. Consequently, if the former population uses the larger sample, the latter population relies on a smaller sample and, thus, requires to observe a smaller number of hawk plays to switch the best response strategy to dove. Consequently, fewer perturbations are required to push a population into the basin of attraction of the equilibrium in which the population with the smaller sample size plays dove and the population with the larger sample size chooses hawk. If the harshness of conflict is mild, instead, relatively few errors of initial hawk player population are required to induce a transition between the equilibrium states. The easiest transition then occurs if the population with the larger sample size erroneously chooses dove, eventually leading to a stochastically stable state in which the population with the smaller sample size chooses hawk.
Our results differ from those obtained in Young (1993b) for a Nash bargaining game. In the latter setting, a population always benefits from a larger sample size. This result therefore corresponds to the case of harsh conflict in our setting. The harshness of conflict determines the sensitivity to changes in the strategy of others. Therefore, if conflict is harsh, a transition out of the basin of an equilibrium is more likely to be caused by erroneous choices among the dove playing population. If conflict is mild, on the other hand, a transition is initiated by erroneous actions in the hawk playing population. In the Nash bargaining game, on the other hand, the easiest way to exit an equilibrium is always given by mistakes coming from agents in the population with a smaller share in the division of the surplus: such agents, by mistake, can demand something more and this quite easily convinces agents in the other population to demand something less, because by doing so they accept a small loss to prevent the possibility of a big loss that would happen in case the sum of their demands exceeds the surplus at stake.
The remainder of this paper is organized as follows: in Sect. 2, we describe the main characteristics of the model and the dynamics of the unperturbed game. In Sect. 3 we present our main results that determine the long-run dynamics of the perturbed game. Finally, we provide a discussion in Sect. 4. All proofs are relegated to Appendix A, as well as some further results and an example of a transition.

The model
We define two populations of finite size: blue agents denoted by B and yellow agents denoted by Y . Time is discrete t = 1, 2, . . . and in each period, one agent is drawn at random from each population to interact in the hawk-dove game depicted in Fig. 1. Each agent i chooses a pure strategy s i from a strategy set S i = {H , D} with i = {B, Y }. Play at time t is defined as s(t) = (s B (t), s Y (t)) and the payoff of each player i is π i (s(t)) according to the payoff matrix. We assume that C i > V i .

Yellow population H D
Blue population Fig. 1 The hawk-dove game. V B and V Y are the values of the resource possession for the blue and the yellow population, respectively. C B and C Y are the costs of losing fight for the blue agents and the yellow agents, respectively play of yellow play of blue play of yellow sampled by the blue play of blue sampled by the yellow Agents recall the last m periods of play between both populations, hence m can be interpreted as the (collective) long-term memory length. A history of play encompassing the last m periods is described by h(t) = (s(t), s(t − 1), . . . , s(t − m + 1)), with t denoting the current period.
Furthermore, agents adjust their choices over time according to the adaptive learning assumptions in Young (1993a). In general, agents select the best response to a randomly drawn sample of k opponents' plays in their memory. See Fig. 2 for an example of a history with drawn samples. In case of multiple best responses, all of them have positive probability to be selected. As is standard in the literature, we refer to k as sample size, and we interpret it as working memory. Differently from most of the literature on the hawk-dove game, we assume that the sample size is population dependent, with k B denoting the sample size of blue agents, and k Y denoting the sample size of yellow agents. In particular, we consider the case in which the yellow agents have a shorter sample size than the blue agents. In the following, we generally assume that k Y < k B ≤ m. We denote by n t B the number of D instances recorded in the blue agents' memory, and by n t Y the number of D instances recorded in the yellow agents' memory. At the end of each interaction period, the current play is registered in the memory and the oldest play is forgotten. Transition between states is defined by transition matrix T , with T hh being the probability of moving from history h to history h in one period of time according to the above adjustment dynamics. It must hold that T hh > 0 only if h can be obtained from h by deleting the rightmost play of the game and adding a new play of the game to the left of the sequence.
Any state h consisting of m repetitions of a strict Nash equilibrium constitutes a convention, that is inescapable given the defined dynamics. The hawk-dove game described in Fig. 1 In other words, if the relative frequency of D in the sample exceeds α * i , the optimal response is to play H in the current period. The ratio α * i can also to be understood as a measure of the severity of a conflict. We call a conflict harsh for population i if α * i > 0.5, and mild if α * i < 0.5. We observe that the harshness of conflict determines which action performs better in case of uncertainty and, hence, it is related to risk dominance (Harsanyi and Selten 1988). Indeed, when α * i > 0.5 action D has a greater expected payoff for an agent in population i against the belief that the opponent plays the two actions with 50% each, which means that dove is less risky. For α * i < 0.5 the reverse result is obtained and action H is less risky.
A set of states C is a recurrent class if: (i) for every pair of histories h, h ∈ C, there is a positive probability to move from h to h in a finite number of steps, i.e., there exists some n such that T n hh > 0; (ii) for every h ∈ C and h / ∈ C, h is not accessible from h, i.e., T n hh = 0 for every n. By definition, a convention is a recurrent class comprised of a single state. However, there is no guarantee in general that the system will converge to a convention, since it might cycle within a set of states. The following Lemma gives a necessary and sufficient condition to ensure convergence to a convention:

Perturbed dynamics
Now suppose that in general, a player does not choose a strategy that is a best response to the sample, but chooses one of the two strategies at random with a small probability ε close to zero. Any history of play h in t can then move to any other state h in t +m with positive probability. The Markov chain is irreducible and aperiodic, and the process thus ergodic. In the following, we determine the conditions for the convention in the long-run, first for the case in which payoffs are symmetric and thereafter for the case in which payoffs are asymmetric.

Symmetric harshness of conflict
The stochastic potential of a convention is the minimum number of errors involved in the transition from the opposite convention to the former convention (see Online Resource for an example of a complete transition). A convention is stochastically stable if it has minimum stochastic potential (Young 1993a). The following Lemma characterizes the stochastic potentials of the two conventions if payoffs are symmetric, i.e., We denote the common harshness of conflict with α * .

Lemma 2 The stochastic potentials of conventions h D H and h H D are given by, respectively, r D H
We have the following: Proposition 1 If conflict is harsh: In other words, if conflict is harsh then the convention in which blue agents only choose H and yellow agents only choose D is always stochastically stable, and there exists a region in the parameter space in which it is the unique stochastically stable convention.
In contrast: Proposition 2 If conflict is mild:  The intuition of these results requires us to identify the easiest transition path between conventions. We therefore need to determine which population and which strategy are least resilient against erroneous choices of the other player population. First, we note that the least resilient population is always the one with the smaller sample size, because a smaller number of erroneous choices need to be sampled to reach a critical threshold that induces a strategy change. Second, we further note that if conflict is harsh, the lower threshold is defined by the hawk strategy since loosing a fight is very costly. In return, the lower threshold is defined by the dove strategy, if conflict is mild. As a result of these two observations, the convention least resilient to erroneous play is the one in which the population with shorter sample size is playing hawk and the other population is playing dove if conflict is harsh. If conflict is mild, the least resilient convention is the one in which the population with shorter sample size is playing dove. Fig. 3 provides an illustrative example with α * = 0.7, k B = 4, and k Y = 3. Convention h D H can be exited through one mistake by blue agents or three mistakes by yellow agents. Conversely, convention h H D can be exited through three mistakes by blue agents or two mistakes by yellow agents. Indeed, the least resilient population is the yellow one (k B > k Y ) and the least resilient strategy is hawk (α * > 0.5) and thus h H D defines the stochastically stable convention. In general, it can happen that both conventions are stochastically stable: this situation occurs when the overall resistance to move from h H D to h D H and the overall resistance to move from h D H to h H D fall between two consecutive integers, so that the actual number of mistakes required to complete the transitions are the same for both conventions. Condition (b) in Propositions 1 and 2 state a sufficient condition for having a unique stochastically stable convention. Further note that if k Y tends to infinity, and therefore also k B and m tend to infinity, conditions (a) is sufficient as condition (b) is generally met.

Asymmetric harshness of conflict
We now consider the asymmetric case in which α * Fig. 1. We make the additional assumption that k B + k Y ≤ m to guarantee that previous erroneous play of one population remain in memory long enough to trigger a shift in the best-response of the other population. The stochastic potential of each convention is given in the next Lemma.

Lemma 3 The stochastic potentials of conventions h D H and h H D are given by, respectively, r D H
The following Proposition provides the sufficient conditions for the conventions to be stochastically stable.

Proposition 3 We have:
Notice that the slope of T (α * B ; k B , k Y ) is k B /k Y when it is no flat. We note that the easiest transition cannot always be identified by looking at the least resilient strategy adopted by the least resilient population. The number of mistakes required to trigger a transition depends on: (i) strategy that is the least resilient for each population; (ii) the fraction of mistakes in the sample of play required to abandon the least resilient strategy; (iii) the sample size, that translates such fraction into a number of mistakes. Indeed, the easiest transition can be the one where the agents in the most resilient population change strategy in response to mistakes from the others. This can happen when the harshness of conflict of the most resilient population is sufficiently more extreme (i.e., sufficiently close to either 0 or 1) and thus, very few mistakes are needed to trigger the transition despite their longer sample size. We notice that this observation plays a role for the selection of the long-run convention only if conflict is harsh or mild for both populations at the same time, as illustrated in Fig. 4.
If k B (along with m) tends to infinity (i.e., the population can keep all past play in the memory and no play is forgotten), the slope is vertical and only the horizontal interval of T (α * B ; k B , k Y ) matters. In this case the sufficient conditions for each convention to be stochastically stable depend only on the harshness of conflict of the population with the shortest sample size. The second limit case is k B = k Y . In this case the Fig. 4 In the blue area the convention h H D is stochastically stable, while in the yellow area the convention h D H is stochastically stable. k B = 8 and k Y = 3 two populations draw samples of the same length and the slope of the two oblique intervals is equal to one. The convention in which the population with the lower level of harshness plays hawk is always stochastically stable.
The following Proposition defines the conditions for the uniqueness of the stochastically stable convention.

then the convention h D H is the only stochastically stable convention;
Where: In the dark-blue area, h H D is the unique stochastically stable convention, while in the dark-yellow area, h D H is the unique stochastically stable convention. k B = 8 and k Y = 3 and The results are illustrated in Fig. 5. Similar to our discussion of Propositions 1 and 2, we note here that the conditions in Proposition 4 converge to the conditions in Proposition 3 as k Y tends to infinity, i.e., the two thresholds T H D (α * B ; k B , k Y ) and . We stress that the conditions that we provide in our propositions for identifying a stochastically stable convention are sufficient but not necessary. Hence we assess the tightness of these conditions through numerical calculation with specific values of m, k B , and k Y , varying α * B and α * Y . In Fig. 6 we show the results of the computed stochastic potentials of the conventions.

Discussion
In this paper, we studied the long-run dynamics of the two-population hawk-dove game under perturbed adaptive learning. We demonstrated that information heterogeneity between two populations caused by a different sample size of past interactions affects long-run dynamics and hence, a population's ability to secure a resource. In particular, we showed that the harshness of conflict plays a critical role: if the cost of losing a fight is small (large) relative to the benefit of the resource, the population with the smaller (larger) sample chooses hawk and the other population chooses dove. Consequently, the impact of an information advantage matters in a non-trivial way (Alós-Ferrer and Shi 2012), and our results indicate that it is an essential component that needs to be carefully considered when modelling the dynamics of conflict (Rusch and Gavrilets 2020). Since we obtain conditions under random matching of the members of two populations, future research should investigate the robustness of our results if mixing is assortative, interactions occur on social networks or if agents are spatially segregated (Aydogmus 2018). Common wisdom suggests that having more abundant cognitive or physical resources is, ceteris paribus, beneficial for the evolutionary success of any living species. The reason why we observe species with rather limited cognitive capacities is generally attributed to the increasing cost of such an apparatus. Yet, our results suggest that cognitive limitations can result in a relative fitness advantage even in the absence of costs of sustenance. In conflict situations, similar to the hawk-dove game, the population with a smaller working memory tends to be more aggressive and earn higher payoffs than the population with larger working memory which is more peaceful, if conflict is mild.
Our theoretical results can be interpreted under an evolutionary and a socio-political perspective and therefore have implications for the literature in two different disciplines. In a biological context, a harsher conflict refers to stronger selection pressure and higher payoffs imply higher evolutionary fitness. Better memory and cognitive abilities are linked to larger brain sizes and a larger hippocampus (Farris 2015 andNave et al. 2019). Yet, larger brains entail additional costs related to longer weaning, gestation and higher calorie requirements. Gonzalez-Voyer et al. (2016) have shown that extinction vulnerability and larger brain sizes are positively correlated for primates even in the absence of these additional costs. Based on our analysis, these evolutionary results might be explained by mild conflict situations (see also Doi and Nakamaru 2018). On the other hand, Sol et al. (2002) and Sol et al. (1993) show that larger brains benefit a species' attempt to invade new habitats, which is in line with our results in harsh conflict situations. Similarly, Montgomery et al. (2010) show that primates experienced an increase in relative brain mass in some species while other branches of the primate family illustrate a decrease in relative brain mass over evolutionary time. The evolutionary factors leading to different evolutionary trends have not been conclusively determined in the literature and our results offer some theoretical indications as to how selection pressure may have engendered the different evolutionary trends across various primate species.
The socio-political interpretation of our results applies to ethnic and sectarian conflict. Payoffs imply access to political, social, and economic resources and harshness of conflict refers to the severity of the measures taken by the groups involved relative to the respective benefits from winning the conflict. Memory is to be interpreted as the collective memory of an ethnicity or sect, and sampling as referring to past collective memories. Our results then imply that collective memories are particularly frequently invoked in times of severe ethnic and sectarian conflict. Salloukh (2019), for example, shows that the various sectarian leaders used war memories during the Lebanese Civil War to increase tensions between groups and foster their own political and economic benefits (see also Ille 2021, for a further discussion and theoretical results). Our results further imply that harshness of conflict and sampling/use of past collective memory are self-reinforcing as one leads to the other. Consequently, future extensions of our model may endogenize the severity and thus harshness of conflict as part of an evolutionary process.
We have investigated the role of heterogeneous sample sizes between populations when the behavioral rule is best response to the sampled frequency of play in the other population. Different results may hold if at least some of the agents employ different behavioral rules (Alós-Ferrer and Buckenmaier 2020; Khan 2021), as such as social imitation (Alós-Ferrer 2008), cognitive hierarchies (Khan and Peeters 2014) and past performance of own actions (Sarin 2000). Experimental research may be useful to empirically assess the actual context-specific application of behavioral rules Mäs and Nax 2016;Lim and Neary 2016). Finally, the role of varying sample sizes within each population could be fruitfully explored in future research.
For simplicity, we assumed that population sizes are identical and fixed. In a more realistic context, increased payoffs translate into higher fitness. At the same time, different population sizes alter the frequency of pairwise interactions and thus affect the updating process. A larger group size, on the other hand, reduces the cost of conflict and thus the harshness measure. Future research should find that the interplay between population dynamics and harshness of conflict may be conducive to interesting insights.

Proof of Lemma 1
We first show the "if" part of the statement.
Consider a generic history h, which represents the state of the system at time t, and select a pair of agents to play the hawk-dove game. If there is a positive probability that they play either (H , D) or (D, H ), and they actually do so, then the following pair of agents that is drawn to play the game has a positive probability to play as the previous pair. Indeed, suppose (without loss of generality) that they play (D, H ). Then, we note that n t+1 B ≥ n t B and n t+1 Y ≤ n t Y . By repeating this argument for m times, we conclude that with positive probability a convention is reached.
Suppose now that, starting from h, with probability 1 the pair of selected agents plays either (H , H ) or (D, D). At the following period, if the pair of selected agents can play (H , D) or (D, H ) with positive probability, then we can apply the argument of the previous paragraph. Otherwise, we move to the following period. At period t + m, either at some period agents have played (H , D) or (D, H ) (so that with positive probability a convention is reached), or all plays of the game in memory are either (H , H ) or (D, D). In the latter case, we note that n t+m Y = n t+m B . At period t + m, if the pair of selected agents has a positive probability to play (H , D) or (D, H ), then we are done. Otherwise, with probability 1 they play either (H , H ) or (D, D). Without loss of generality, we assume that they play (D, D). This means that n t B /k B < α * B and n t Y /k Y < α * Y . After such agents play (D, D), we have n t+1 B ≥ n t B and n t+1 Y ≥ n t Y . The following pair of agents either plays (D, D) with probability 1, or not. If (D, D) is played with probability 1, then we move to the following period. We proceed this way until we find a period, call itt, in which (D, D) is not played with probability 1. We observe such at must occur in at most m periods, if the memory contains D actions only. We now show that at periodt the selected pair agents cannot play (H , H ) with probability 1, which means that they play (H , D) or (D, H ) with positive probability, and hence a convention is then reached with positive probability.
We first consider the case (i) in which min{k B , k Y } < m holds, which means that at least for one population, say Y , we have k Y < m. Suppose at time (t − 1) the yellow agent plays D with probability 1, and instead plays H with positive probability at timet. It must be true At timet there is a positive probability to select a sample with nt −1 Y = nt Y − 1 to whom action D is the best response for yellow agent, thus showing that (H , H ) is not played with probability 1.
We now consider the case (ii) in which q = mα * Y for some integer q (we have chosen G, without loss of generality), and we suppose again that the yellow agent takes action H with positive probability. Since nt Y /m < α * Y , this means that nt +1 Y = nt Y + 1 = q (otherwise there would not exist an integer q such that q = mα * Y ). Therefore, both D and H are best responses for the yellow agent, which implies that action D is chosen with positive probability, thus showing that (H , H ) is not played with probability 1.
We finally consider the case (iii) in which there exists an integer number q such that α * Y m < q < α * B m (we have chosen α * Y < α * B , without loss of generality). Since nt Y /m < α * Y , nt B /m < α * B , and nt Y = nt B , if (D, D) is not played with probability 1, then the only possibility is that nt +1 Y = nt +1 B = nt Y + 1 = nt B + 1 = q, which implies that H is the unique best response for the yellow agent, and D is the unique best response for the yellow agent, thus showing that (H , H ) is not played with probability 1.
We now show the "only if" part of the statement, by contraposition. The negation of conditions (i), (ii) and (iii) amounts to assuming that k B = k Y = m, and there exists an integer number q < m such that q < α * B < q + 1, q < α * Y < q + 1. Assume that the state at time t is given by a history where (D, D) has been played for q times, and (H , H ) has been played the remaining m − q times. The pair of selected agents at time t plays (D, D) with probability 1, since n t Y = q < α * Y and n t B = q < α * B . This implies that n t+1 Y = q + 1 > α * Y and n t+1 B = q + 1 > α * B , and hence the pair of selected agents at time t + 1 plays (H , H ) with probability 1, thus determining that n t+2 Y = q = n t+2 B . Therefore, the cycle between a state with q occurrences of (D, D) in memory and a state with q + 1 occurrences continues forever, so that we will never have convergence to a convention.

Proof of Lemma 2
Since we assume k Y < k B condition (i) of Lemma 1 is respected. Let G be a 2 × 2 coordination game with the corresponding conventions (strict Nash equilibria) (H , D) and (D, H ), and corresponding absorbing states with history h H D and h D H . Assume that row players have sample size k B and the column players have sample size k Y . For payoffs as in matrix (Fig. 1) we define α * = 1 − V C . Parameter α * then refers to the necessary share of yellow (blue) players choosing strategy D in the sample of a blue (yellow) player to induce a shift in the best response play of a player to H.
Assume that the blue and yellow player populations are currently in h H D . Hence, a blue player B currently playing strategy s B = H will only change strategy to s B = D if there is a sufficient number of yellow players playing s Y = H in his sample. Thus, there must be at least (1 − α * )k B players committing an error in subsequent periods, occurring with probability ε (1−α * )k B . For a yellow player Y with s Y = D to switch to strategy s Y = H there must be a sufficient number of blue players playing s B = D in his sample. Hence, there must be again at least α * k Y of these players in m, happening with probability of at least ε α * k Y . Therefore, the minimum of (1 − α * )k B and α * k Y is the stochastic potential of h D H . The proof for a shift from h D H to h H D is analogous. We

Proof of Proposition 1 (a) We have to prove that r H D ≤ r D H .
Firstly, we show that We find sufficient conditions for the convention h H D to be the only stochastically stable convention. Given Theorem 1 in Young (1993a), it suffices to show that r H D < r D H . By point (a), By rearranging terms, we obtain:

Proof of Proposition 2
The proof is analogous to the proof of Proposition 1, once α * is replaced by 1 − α * .

Proof of Lemma 3
The proof is analogous to the proof of Lemma 2 with the exception that α * B = α * Y . Assuming that k B + k Y ≤ m and k Y < k B we respect condition (i) of Lemma 1.
Assume that the blue and yellow player populations are currently in h H D . Hence, a blue player B currently playing strategy s B = H will only change strategy to s B = D if there is a sufficient number of yellow players playing s Y = H in his sample. Thus, there must be at least (1−α * B )k B players committing an error in subsequent periods, which occurs with probability ε (1−α * B )k B . For a yellow player Y with s Y = D to switch to strategy s Y = H there must be a sufficient number of blue players playing s B = D in his sample. Hence, there must be again at least α * Y k Y of these players in m, which occurs with a probability of at least ε α * Y k Y . Therefore, the minimum of (1 − α * B )k B and α * Y k Y is the stochastic potential of h D H . The proof for a shift from h D H to h H D is analogous. We

Proof of Proposition 3
We proceed by dividing the (α * B , α * Y ) plane into four different areas characterized by different values of min{α * from Lemma 3 r D H ≤ r H D and then from Theorem 1 in (Young 1993a) we find that h D H is stochastically stable; • The second area is characterized by min{α from Lemma 3, r H D ≤ r D H and then from Theorem 1 in (Young 1993a) we find that h H D is stochastically stable; • The third area is characterized by min{α The condition that characterize this area can be rewritten as max{α * Y , (1 − α * Y ), α * B , (1 − α * B )} = (1 − α * B ). Since k B > k Y , we have α * Y k Y < (1 − α * B )k B , and from Lemma 3 we obtain r D H = α * Y k Y . If α * Y < 0.5 and α * Y k Y < α * B k B , then r D H ≤ r H D and for Theorem 1 in (Young 1993a) the convention h D H is stochastically stable; if α * Y < 0.5 and α * Y k Y > α * B k B , then r H D ≤ r D H and for Theorem 1 in (Young 1993a) the convention h H D is stochastically stable; If instead α * Y > 0.5 we have that α * Y k Y > min (1 − α * Y )k Y , α * B k B resulting in r H D ≤ r D H . From Lemma 3 and Theorem 1 in (Young 1993a), we find that h H D is stochastically stable; • The fourth area is characterized by min{α . The condition that characterize this area can be rewritten as max{α *  (Young 1993a) we find that h D H is stochastically stable; The result follows by putting together the regions in which the two conventions are stochastically stable (see Fig. 4).
Then h H D is the unique stochastically stable convention if at least one of the following condition holds: The result of part (a) follows by putting together the regions in which h H D is the unique stochastically stable convention (see Fig. 5). It is possible to prove part (b) of the Proposition analogously by finding sufficient conditions to have r D H < r H D .