A case of evolutionarily stable attainable equilibrium in the laboratory

We reinvestigate data from the voting experiment of Forsythe et al. (Soc Choice Welf 10:223–247, 1993). In every one of 24 rounds, 28 players were randomly (re)allocated into two groups of 14 to play a voting stage game with or without a preceding opinion poll phase. We find that the null hypothesis that play in every round is given by a particular evolutionarily stable attainable equilibrium of the 14-player stage game cannot be rejected if we account for risk aversion (or a heightened concern for coordination), calibrated in another treatment.


Introduction
The purpose of this paper is to demonstrate, by means of a case study, how careful theory can be helpful in analyzing the data of laboratory experiments of (strategic) human interaction. For this case study, we chose an experiment performed by Forsythe et al. (1993). The authors' aim was to perform an exploratory analysis of how opinion polls impact voting behavior in an election and to experimentally assess Duverger's "law," see Duverger (1954), that in any majority-rule election only two parties receive votes. They did not perform a full theoretical analysis of the game that the subjects played in the experiment. This experiment is close to perfect for our undertaking for the following reasons. First, it has a very clean and elegant design, which we explain in detail in Sect. 2. Second, it is a somewhat complicated game with a stage game that involves 14 players of three different types. Thus, the game has a large set of possible strategy profiles. This means that the game could possibly have (and indeed does have) a large number of Nash equilibria. Yet, the game has certain symmetries that pose subtle, but important restrictions on the set of feasible strategy profiles and thus on the set of feasible (or attainable) Nash equilibria in the sense of Alos-Ferrer and . 1 Third, the game is played recurrently. This means players are repeatedly randomly chosen and matched up from a bigger pool of players to engage in this 14-player stage game. This implies that if a Nash equilibrium establishes itself over time, or if subjects play the same equilibrium throughout, this equilibrium should be evolutionarily stable as in the sense of Maynard Smith and Price (1973) and Maynard Smith (1982). 2 Fourth, in one of the two treatments, the game has two stages and even though the second stage game by itself has many equilibria with symmetries posing subtle restrictions this game can be said to have an "obvious" way to play or focal point in the spirit of Schelling (1960) and formally as in Alos-Ferrer and . Nevertheless, while all these properties of the game make it somewhat hard to analyze for the analyst, the game it seems to us is not that difficult to (learn to) play for the actual subjects.
The laboratory experiment of Forsythe et al. (1993) has two treatments, a simpler one [denoted CPSS in Forsythe et al. (1993)], in which subjects only play a voting game, and a more complex one [denoted CPSSP in Forsythe et al. (1993)], in which subjects first play an opinion poll game with publicly observed outcome before they play a voting game. What we then find in our theory-driven analysis of these two treatments is this. We can calibrate the subjects' payoff function with a parameter of risk aversion in such a way that the unique evolutionarily stable attainable equilibrium of the simpler game provides a perfect fit to the observed frequency of play. Furthermore, and this is the main result, taking this so-calibrated risk aversion parameter as given, the null hypothesis of play being given by an evolutionarily stable attainable equilibrium with focal voting stage behavior in the more complex game cannot be rejected at the 5% level of significance. 3 We believe the main contribution of this paper to be the illustration, by means of an interesting case study, how careful theory can help with the understanding and analysis of experiments in which subjects play somewhat complicated games. The paper, arguably, also makes a contribution to the political science content of Forsythe et al. (1993) in completing the theory that in fact provides the same insights as the experiment (by virtue of providing almost exactly correct predictions for the experiment).

Additional motivation and related literature
Why would we possibly expect equilibrium behavior in the experiments of Forsythe et al. (1993)? Bernheim (1984) and Pearce (1984) have shown that the rationality of players, in fact even common knowledge of rationality, is not sufficient for Nash equilibrium play, except in very special cases. One sufficient condition for equilibrium play, provided by Aumann and Brandenburger (1995), is that players have "mutual knowledge of the payoff functions and of rationality, and common knowledge of the conjectures," where conjectures are meant to be "conjectures, on the part of other players, as to what a player will do." 4 The question then is this: Under what conditions do players obtain common knowledge of conjectures and would we expect this in our case? One idea is that the game is simple enough for players to identify a focal point, see Schelling (1960), i.e., an "obvious" way to play the game. If it is both obvious and then also played, a necessary condition for such a focal point would be that it is a Nash equilibrium. Another idea is that this common knowledge of conjectures derives from the presence of a book that is commonly known to have been read, understood, and that its recommendations are followed by all players. Such a book could be, among others, a religious text, a book of etiquette, or a comprehensive attempt at such a book such as Harsanyi and Selten (1988). 5 A necessary condition for such a book to fulfill this role would be for it to describe Nash equilibrium behavior, and a unique one at that. In a similar spirit, it is feasible that a meta-norm, using the language of Alos-Ferrer and -meaning a norm not specific to the particular game that is played (and not necessarily in a book), exists that is commonly used as a guiding principle to identify a focal point, especially when multiple equilibria exist. This could, for instance, be "play a Pareto-optimal (or risk-dominant) Nash equilibrium if one exists" but would have to be much more refined and complex if it were to be successful in all games. To understand this, consider, for instance, a simple coordination game with two options A and B, where coordination on A gives both players a payoff of 1, coordination on B both players a payoff of 2, and mis-coordination both a payoff of 3 "Appendix C" shows that there is some evidence of subjects gradually learning to actually play the evolutionarily stable attainable equilibrium without risk aversion in the game with opinion polls. There is no evidence for such a change in behavior in the game without polls. 4 Both quotes are from the abstract of Aumann and Brandenburger (1995). zero. Despite the multiplicity of equilibria in this game, subjects presumably have no problem coordinating on the "focal" payoff-and risk-dominant action B. 6 Interestingly Gunnthorsdottir et al. (2010a, b), who study somewhat more complicated games than this simple two players two actions game, also find that subjects are able to identify and play the unique Pareto-dominant (not risk-dominant) Nash equilibrium in their games. 7 Finally, one can derive conditions for Nash equilibrium play if the game played is played recurrently, i.e., played often with always changing other players. 8 This is the subject of evolutionary game theory. For textbook treatments, see, e.g., Weibull (1995), Hofbauer and Sigmund (1998), and Sandholm (2010). 9 Its origins go back to at least Brown (1951) and Robinson (1951) and is already present in the "mass action" interpretation of equilibrium in (Nash 1950, page 21). One of the key findings of this literature, sometimes referred to as the "folk theorem" of evolutionary game theory, see, e.g., Nachbar (1990), is that if a "reasonable" evolutionary process converges to a point, then this point must be a Nash equilibrium. As our game of interest here is played recurrently, albeit only 24 times, concerns of evolutionary stability of play strike us as potentially relevant for our analysis. Whether 24 "repetitions" is sufficient for this is an empirical question which we address in this paper. Moreover if a "reasonable" evolutionary process converges to a point, then this point should be evolutionarily stable; in that, it should be robust to the introduction of a small likelihood of arbitrary behavior. This idea has been formalized in the concept of an evolutionarily stable strategy (ESS) by Maynard Smith and Price (1973) and Maynard Smith (1982). There are many connections between convergence points of dynamic processes such as the replicator dynamic of Taylor and Jonker (1978) and ESS. See, e.g., textbooks of Weibull (1995), and Sandholm (2010) for these. One of the interesting conclusions of this literature is that (certain) mixed equilibria in coordination games (or generally in many games with multiple equilibria) are not evolutionarily stable. This is despite the fact these equilibria typically constitute a singleton strategically stable set in the sense of Kohlberg and Mertens (1986), and are therefore typically trembling-hand perfect 6 Somewhat relatedly see, e.g., Mehta et al. (1994a) and Mehta et al. (1994b) for seminal contributions to the experimental work on focal points using labels in games. 7 Unlike us, Gunnthorsdottir et al. (2010a) do not need to appeal to refinements as we do here, as the games they study only have two (kinds of) pure strategy Nash equilibria. We have chosen the experiments of Forsythe et al. (1993) here because the game played there must be one of the very few that are studied in the laboratory that are possibly simple for subjects but complicated for analysts; in that, almost the whole weight of the detailed developed theory of play is required for its analysis. 8 If the other players are not changing, then the game is really one big repeated game in which players have very different incentives than in the one-stage game. See, e.g., Mailath and Samuelson (2006) for a textbook treatment. 9 There is also a large related literature on learning in games. See, e.g., Roth and Erev (1995) for the seminal contribution of reinforcement learning in games and Camerer and Ho (1999) for a very flexible learning model. As subjects seem to play equilibrium from essentially the first round in the experiments we study here, we do not pursue these learning models further in this paper. (Selten 1975) and proper (Myerson 1978). 10 Evolutionary stable strategies agree with strategic stability and other refinements; in that, they cannot be weakly dominated. 11 As the theoretical sufficient conditions for Nash equilibrium are never perfectly satisfied in practice, ultimately the question is an empirical one. There is evidence that playing a game only once is not typically sufficient to guarantee equilibrium play. For instance, Wright and Leyton-Brown (2017), performing a meta-analysis of many games, show that Nash equilibrium is overall not a good prediction in games played only once. 12 There is evidence that playing a game sufficiently often (recurrently with different other players) does lead to equilibrium play. For instance, VanHuyck et al. (1990) show that subjects often fail to play any Nash equilibrium in one-shot coordination games, while Cooper et al. (1990) find evidence of Nash equilibrium play in recurrent coordination games. Similarly, O'Neill (1987) finds evidence against laboratory subjects playing minmax (i.e., Nash equilibrium) strategies in zero-sum games, while Walker and Wooders (2001) find mixed evidence that professional tennis players use minmax strategies in their service game, Hsu et al. (2007) find evidence that professional tennis players use minmax strategies, and Palacios-Huerta (2003) finds strong evidence that professional soccer players (and goalkeepers) use minmax strategies when taking (or defending) penalty kicks. Professionals have played these games often, while laboratory subjects not (or not often enough). Binmore et al. (2001) find that after and only after sufficient practice with the game do laboratory subjects play minmax strategies in a set of zero-sum games. 13 Also in public good provision games initial play is never an equilibrium (under material self-interested preferences), but eventual play after repetitions often is. See, e.g., Andreoni (1988) and the control group in Fehr and Gächter (2000). See also Duffy and Hopkins (2005) for learning in market entry games.
There is also a more specific literature that aims to test evolutionary game theory in the laboratory and in the field. One could argue that Cooper et al. (1990) is an early example of that. This literature is surveyed in Friedman and Sinervo (2016). For instance, Oprea et al. (2011) study the evolution in single-population and twopopulation hawk-dove games and find support for evolutionarily stable play (which is very different in the two different cases). Our paper does in fact have similarities to a two-player hawk-dove game. Recently, people have been interested in testing 10 This literature on equilibrium "refinements" can be said to ask the following question. Suppose that for some reason some Nash equilibrium play has established itself. What properties would this equilibrium have to have so that "highly rational" individuals really want to play according to this equilibrium? See, e.g., Ritzberger and Weibull (1995), Swinkels (1993), Balkenborg et al. (2013) for connections between evolutionary and strategic stability. Typically, (smallest) evolutionarily stable sets of strategy profiles, while they can include non-Nash behavior, also include strategically stable sets. Yet, not every strategically stable set is necessarily included in a (smallest) evolutionarily stable set. 11 For dynamic evolutionary processes, it is not always true that weakly dominated strategies are eliminated. See, e.g., Weibull (1995), Hart (2002), Kuzmics (2004), Kuzmics (2011), Laraki andMertikopoulos (2013), Bernergård and Mohlin (2017) for a discussion of this issue. 12 See also Fudenberg and Liang (2018) on this point. 13 There are games for which even very long periods of learning are insufficient to provide minmax play. We know from Zermelo (1913) that optimal (minmax) play in chess would result in every game of chess ending the same way. Yet even with professional chess players' games can end with any one of the three possible outcomes. Clearly, even these professionals do not play Nash equilibrium. evolutionary game theory in those cases where it predicts cycles instead of convergence to equilibrium. See, for instance, Sinervo and Lively (1996) who find that certain types of lizards display cyclical behavior, and Hoffman et al. (2015), Cason et al. (2010Cason et al. ( , 2013 for experiments with human subjects and some evidence of cycling. We are not aware of too many papers on the issue of whether only certain strategically or evolutionarily stable equilibria are played in laboratory experiments. 14 Laboratory experiments suffer a bit from the problem of the experimenter's lack of complete control over the subjects' utility or payoffs. See, e.g., Weibull (2004) for a discussion of this issue. For instance, observed play in ultimatum games, perhaps started by Güth et al. (1982) and recently surveyed in Cooper and Kagel (2016), could be evidence against subgame perfection, but could also be evidence against self-centered preferences. Indeed, the literature mostly interpreted this as evidence of other-regarding preferences. 15 There is a sizeable literature on laboratory experiments in political economy. The results in this literature, on the whole, mostly support the theoretical equilibrium predictions of various models on voter turnout, the Condorcet jury theorem, and the swing voter's curse. See, e.g., Guarnaschelli et al. (2000), Aragones and Palfrey (2004), Levine and Palfrey (2007), and Bhattacharya et al. (2014). Palfrey (2009) provides a survey of this literature and finds that "[a]ll the experiments find significant evidence of strategic voting and, with a few exceptions, find support for the equilibrium predictions of the theories." 16 What we add to this literature is that we provide an(other) instance of laboratory play that is extremely close to equilibrium theory for a rather complicated game with multiple equilibria. We show that despite these complications and strategic uncertainty, the subjects are able to coordinate their play very effectively on one of the evolutionarily stable attainable equilibria of this game.
We also point out the role risk aversion (or some other reasonable and slight transformation of the utility function) could play in explaining laboratory experiments more closely. 17 Risk aversion has been used to explain behavior in experiments in a variety of studies. For instance, Goeree and Holt (2004), Goeree et al. (2003), and Fudenberg and Liang (2018) find that risk aversion improves the explanatory power of equilibrium behavior in matrix games, while Cox and Oaxaca (1996), Chen and Plott (1998), Goeree et al. (2002), and Campo et al. (2011) find that risk aversion is helpful in explaining behavior in auctions. Deck et al. (2008) and Friedman et al. (2014) have shown, however, that estimated risk-aversion parameters are often indi-14 There is a sizeable empirical literature on whether one would expect Pareto-dominant or risk-dominant equilibria in coordination games. See, e.g., VanHuyck et al. (1990) as a starting point. There are also evolutionary game theory results about the ultra-long-run behavior in coordination games starting with Young (1993) and Kandori et al. (1993). But this is an issue of selection not refinement as all relevant equilibria among which selection may happen are singleton strategically stable sets. 15 See also Andreoni and Blanchard (2006). Also evolutionary game theory has provided only partial support for subgame perfection. See, e.g., Gale et al. (1995) for the ultimatum game specifically, and Nöldeke and Samuelson (1993), Hart (2002), and Kuzmics (2004) more generally. 16 See also Palfrey (forthcoming) for a more comprehensive and more recent survey. 17 Palfrey (forthcoming) argues that for cases when Nash equilibrium does not perfectly explain the data, quantal response equilibria of Palfrey (1995, 1998) often provide a much better fit. Given that Nash equilibrium theory suffices in the present case and that it is a special case of quantal response equilibrium, we have not attempted to consider quantal response equilibria further than that. vidual and even situation specific. 18 In fact, we favor another interpretation of the necessary concavification of the utility of money to explain the data here close to perfectly as stemming from the subjects' heightened concern for coordination beyond pure monetary incentives.

The experimental setup
In this section, we describe the experimental setup employed by Forsythe et al. (1993). There are two treatments. In treatment 1, the stage game is a voting game with three types of players. In treatment 2, the stage game, also with three types of players, has two phases: players, before playing the voting game, participate in an opinion poll. There is one session per treatment, and there are 28 subjects per session. These are randomly re-arranged into two equal-sized groups in each of the 24 rounds, and randomly re-assigned to a new type in each round. The voting game is the same in all treatments, and its payoffs are given in Table 1. Four players are randomly assigned the type A, four the type B, and 6 the type C. 19 Each player can choose to vote for one of the three candidates, also labeled A, B, and C, and can abstain. The table states the monetary payoffs (in US$) to each type depending on which candidate wins the election. Voters of type A favor candidate A, voters of type B favor candidate B, and voters of type C favor candidate C. Voter types are, thus, named in terms of their favorite candidate.
Voters are asked to cast their vote for one of the three candidates or to abstain. If there is a tie between two or three of the most-voted candidates, the elected candidate is chosen uniformly randomly.
For treatment 2, players are first asked to state one of the three candidates (or to abstain) in an opinion poll (without payoffs), the outcome of which is then publicly announced before players are asked to play the voting game as described above. All this was carefully explained to the subjects. Subjects were, however, not told that the game would end after exactly 24 rounds. 20 There are two correct ways of seeing this experiment. One way is to see each of the two treatments in Forsythe et al. (1993) as a finitely repeated 28-player game without 18 We are grateful to an anonymous referee for pointing this out. 19 As there are a finite and fixed number of each type, the game here differs somewhat from the voting games studied in, e.g., Palfrey (1989), Myerson and Weber (1993), Fey (1997), and Andonie and Kuzmics (2012), where the focus is on large elections and where every player is randomly allocated a type without keeping the fraction of players of any one type constant. 20 We are grateful to an anonymous referee for pointing this out. discounting and with a particular highly imperfect monitoring structure. Subjects, after each round, only learn what the frequency of play is within their group. They do not learn what happens in the parallel group. They also can never associate action choices with the identity of the other players. This finitely repeated game has a huge number of subgame perfect equilibria. We shall not attempt to identify all equilibria of this huge game. This is also not necessary, as we in fact show that subjects of this game play a very particular subgame perfect equilibrium of this game. They play an evolutionarily stable attainable equilibrium of the 14-player stage game at each of the 24 rounds.
The other way to see this experiment is the one that the authors seem to have intended: as a 14-player stage game recurrently played with always changing other players. Forsythe et al. (1993) have been very careful in their design so that this is a valid interpretation of the experiment. In fact, we find that subjects play independently of the past. To make it clear that this is the case and that the second view is appropriate, we analyze the data both at the aggregate level and at the individual level. Under the assumption that play is stationary, the aggregate analysis is more powerful. However, we perform a very detailed round-by-round analysis in "Appendix C" that demonstrates that the stationarity assumption is justified. One cannot reject the null of subjects playing the same strategy profile across all 24 rounds. Finally, we also split the data into early rounds (1 to 12) and later rounds (13 to 24). The analysis of these two data halves is presented in "Appendix C" as well. The main finding there is that in the game without polls there is no evidence of a change in behavior over rounds and in the game with polls there is some evidence of a change in behavior which could be interpreted as learning leading closer to (evolutionarily stable attainable) Nash equilibrium play (without risk aversion) of the game.

The voting game without opinion polls
In this section, we study the voting game without polls [treatment CPSS in Forsythe et al. (1993)].

The game
This is a game in which each player of each of the three types of players has four pure strategies: vote for A, vote for B, vote for C, and abstain from voting. There are four players of the A type, four of the B type, and six of the C type. The winning candidate is determined by simple (relative) majority. If there is a candidate with more votes than any other candidate, this candidate is elected with payoff consequences to the various voter types as given in Table 1. 21 If there are two or three candidates with the highest number of votes, then one of them is drawn uniformly randomly as the winner with payoff consequences again as given in Table 1.

Attainable equilibria
We shall here identify and compute all attainable equilibria of this 14-player voting game that are not weakly dominated. 22 Note that for C types voting for C is a weakly dominant strategy, while for A and B types voting for C and abstaining are both weakly dominated. Thus, we need to concern ourselves only with the choice of A and B types between voting for A and voting for B.
The game (without considering framing effects induced by ballot position-more about this further below) has inherent symmetries as identified already in Andonie and Kuzmics (2012). Before we identify attainable strategies for the game at hand, we here briefly explain how symmetries are identified mathematically and why symmetries restrict the set of feasible strategy profiles. We begin with the latter. The best simple example, that is in fact close to the game at hand, is (a version of) the two-player hawk-dove game. Suppose that there are two players who both can choose one of the two pure strategies, H or L. If both play H or both play L, they both get nothing. If one plays H and the other L, the player who chose H gets a high payoff and the player who chose L gets a low but also positive payoff. Suppose that this game is played recurrently by always independently randomly chosen and anonymously matched subjects from a large pool. It is then impossible, for instance, that every round of this game results in the asymmetric outcome of one player choosing H and the other L. Why? A necessary condition to achieve this asymmetric outcome would be that half of the pool of players choose H and the other half L. Note also that subjects cannot condition their behavior on the other player's type as the interaction is completely anonymous. But then there is only a 50% chance that a pair of one H and one L player is matched. 23 Now suppose that an analyst arbitrarily always calls one of the two players player A and the other player B and arbitrarily labels the strategies H and L as A and B for player A and as B and A (i.e., the other way round) for player B. If this is only known to the analyst, then of course nothing changes for the subjects and they still find it impossible to play an asymmetric strategy profile. What changes if the subjects also see this labeling of the analyst? It is then feasible, but not clear whether it is very plausible, that the subjects now condition their behavior on the commonly known strategy names. For instance, they could use the fact that A comes before B in the alphabet to play the asymmetric equilibrium of everyone choosing A (that is A players play their favorite strategy, and B players their second favorite strategy). In the present context, the ballot position could have this effect. Forsythe et al. (1993) purposely always put candidate A as the first in the ballot. It is now feasible and perhaps plausible that all players understand this and condition their choice on the ballot position. This would allow all A and B players to vote for A identified as the first on the ballot. In the end, it is an empirical question whether players have sufficient faith in most of them "seeing" that they could do that in order for them to solve the coordination problem this way. In our empirical 22 Weakly dominated strategies cannot be evolutionarily stable in the sense of Maynard Smith and Price (1973) and Maynard Smith (1982) and of our extended notion below taken from Palm (1984) and Broom et al. (1997). 23 For more on this subject, see Selten (1980), Farrell (1987), Bhaskar (2000) (see also Kuzmics and Rogers 2012), Kuzmics and Rogers (2010), and Kuzmics et al. (2014). analysis, and also already that of Forsythe et al. (1993) it emerges clearly that this is not (or at least hardly) the case.
Mathematically, attainable strategies are defined, see, e.g., Alos-Ferrer and Kuzmics (2013) building on prior work of Von Neumann and Morgenstern (1953), Nash (1951), Harsanyi and Selten (1988), Crawford and Haller (1990), and others, as strategies that are unaffected by a relabeling of symmetric strategies and symmetric players. Symmetric strategies and players have to be (in general, and this is also the case here) identified simultaneously. Loosely speaking, two players are symmetric to each other if the payoff function of one player is the same as the payoff function of another player after some relabeling of symmetric strategies and symmetric players. For a precise definition, see Alos-Ferrer and . In the voting game treatment of Forsythe et al. (1993), we get the following restrictions for the various players. Every A-type player must use the same (mixed) strategy, every B-type player must use the same (mixed) strategy, and every C type must use the same (mixed) strategy. Let x A (A) denote the probability that an A-type player attaches to pure strategy A. Then, as we assume that A-type players do not use strategy C, we must have that , the probability that an A-type player attaches to pure strategy B. A final and perhaps most important restriction of attainability is that is the probability that a B type attaches to pure strategy B. See Andonie and Kuzmics (2012) for details. Thus, if we restrict attention to undominated attainable strategy profiles, all we need to determine is Proposition 3.1 The voting game has exactly three undominated attainable equilibria. In all of these, C types play C, and A and B types put zero probability on C and abstaining. The three undominated equilibria are then characterized by the probability an A type attaches to A (same as a B type attaches to B) which in the three equilibria is given by x = 0, x = 1, and x = 0.6615, respectively. 24 Proof We have already argued that C types must play C in an undominated equilibrium and that A and B types will avoid playing C and abstaining. Now consider the supposed equilibrium given by x = 0. This implies that all A types play B and all B types play A, which leads to C winning the election by two votes. This implies that no individual can change this outcome by unilaterally deviating from the stated strategy profile. The same argument applies for x = 1. So, these are both equilibria.
In order to identify an equilibrium x ∈ (0, 1), we need to appeal to the indifference principle (i.e., voting for A and B has to be equally good for all A and B types of voters).
Consider w.l.o.g. an A-type voter. Let Y (x) denote the random variable that, given probability x ∈ (0, 1), equals the number of A votes among all seven other A and B types of voters. Note that for the considered A-type voter, a vote between A or B changes the election result if and only if Y (x) ∈ {1, 2, 5, 6}. For instance, if Y (x) = 1 an A-vote by the considered A-type player will result in a tie between candidates B and C, while a B-vote would result in a win for B.
Let u A (A, x) denote the expected payoff to the considered A-type voter if she votes for A given all others use mixing probability x, and let, similarly, u A (B, x) denote her expected payoff in this case if she votes for B. The equilibrium condition is then given by u A (A, x) = u A (B, x). This equation is given in "Appendix B." It is a polynomial in x of degree 7, and the equilibria are given by its zeros. Dividing this polynomial by x(1 − x), given that x = 0 and x = 1 are zeros of this polynomial, leaves a polynomial in x of degree 5. We then apply Newton's method to find all further zeros of this polynomial and obtain only one at x = 0.6615.
Note that among all (three) attainable equilibria, in all of which C types vote for C, A and B types clearly as a group and individually prefer the mixed equilibrium with x = 0.6615, in which they have at least a positive probability of winning the election. 25 Note that the voting game has other, non-attainable, equilibria. For instance, it is an equilibrium that all A and B types vote for A, or all A and B types vote for B.

Evolutionary stability of attainable equilibria
In this section, we study the evolutionary stability properties of the three attainable equilibria. We shall adapt the notion of an evolutionarily stable strategy (ESS) in the sense of Maynard Smith and Price (1973) and Maynard Smith (1982) (see (Weibull 1995, Definition 2.1) for a textbook treatment) and Palm (1984) (who extended this to general symmetric multi-player games) to the context of attainable strategy profiles. Consider an attainable strategy (profile), characterized by x ∈ [0, 1], the probability that an A type attaches to playing A (and equivalently that a B type attaches to B). Let y ∈ [0, 1] denote a mutant strategy that enters with probability > 0 close to zero. 26 Let w = (1 − )x + y denote the post-entry mix of strategies. Let generally u A (x, z) denote the payoff to an A-type playing strategy x ∈ [0, 1] when all her (seven) other players (recall that C types just play C regardless) play strategy z ∈ [0, 1]. Then, x is an ESS if for all y ∈ [0, 1] with y = x we have that u A (x, w ) ≥ u A (y, w ).

Proposition 3.2 The only ESS among the three attainable equilibria of the voting game
Proof The attainability restriction that x is the probability of an A type playing A and at the same time also the probability of a B type playing B could also be handled by relabeling all B types strategies from A to B and vice versa with the appropriate transformation in terms of utilities. Then, the attainable strategy x is simple a symmetric strategy of the relabeled game, which is now an eight-player symmetric game. In Lemma A.1, in "Appendix" we provide a simple equivalent condition for a (mixed) strategy x ∈ [0, 1] to be an ESS for such games, where every player has the same two pure strategies. It implies that in the present context, attainable strategy (profile) x is an ESS if and only if for all y < x close to x A is a best reply to y for an A type and for all y > x close to x B is a best reply for an A type. In order to check the evolutionary stability of the three attainable equilibria, we therefore numerically compute the payoff of an A type for strategies A and B as a function of x, the probability with which all other A types play A and all B types play B. This is given in Fig. 1.
Note that when y exceeds x = 0.6615, strategy B is best for an A type (and analogously strategy A best for a B type). On the other hand, when y is below x = 0.6615 strategy A is best for an A type (and analogously strategy B best for a B type). Thus, by Lemma A.1, the only evolutionarily stable strategy is the mixed equilibrium. 27

Empirical results and tests
The actual voting behavior in the 48 rounds of the stage game without opinion polls [treatment CPSS in Forsythe et al. (1993)] is given in Table 2.
Note that roughly 2-3% of subjects choose a weakly dominated strategy in each group of voter types. If we only look at those subjects that do not choose weakly dominated strategies, we get the empirical fractions for voter types A and B given in Table 3.
From the empirical results, given in Tables 2 and 3 we can immediately reject the null hypotheses that A and B types play one of the asymmetric pure equilibria, or one of the symmetric pure equilibria. This is even true if we allow for a small enough fraction of "noise" players who just randomize in some arbitrary way. We now turn to testing attainability.
Test 3.1 The null hypothesis of attainability, i.e., x A (A) = x B (B), using a χ 2 test of independence, cannot be rejected at the 5% level of significance. It produces a p value of 0.8396 (χ 2 = 0.8411).
Note that in order to perform a χ 2 test of independence, we need to first relabel the strategies to take account of the symmetry restrictions. The result is given in Table 4. We can then perform the test and obtain a p value of 0.8396.
Test 3.1 is, thus, also a test of the null of the absence of a ballot position effect. As it cannot be rejected, there is no evidence, in the voting game, that players condition their strategy on the ballot position of the candidates. This is in agreement with the finding of Forsythe et al. (1993).
We can now test the null that A and B types (those that do not play dominated strategies) on average play the evolutionarily stable strategy x = 0.6615. While the empirical frequency of 0.5672 (average across the two types) is not very far from the hypothesized x = 0.6615, the null of x = 0.6615 can nevertheless be rejected at the 5% level of significance.

Test 3.2
The null hypothesis of ESS equilibrium play, i.e., x A (A) = x B (B) = x = 0.6615, using an exact binomial test must be rejected at the 5% level of significance. It produces a p value of 0.0002.
In this test, we are using the pooled data for both A and B types. The individual analysis performed in "Appendices C.1 and C.2" justifies looking at aggregate data and confirms this finding.
Even though the null hypothesis of the evolutionarily stable equilibrium probability x = 0.6615 is rejected, this theory is nevertheless not so very far from "explaining" the data. In order to see this, we identify all possible x ∈ [0, 1] that would "explain" the data better. That is, we identify those values of x ∈ [0, 1] that produce a higher likelihood of the data than the equilibrium value x = 0.6615. One way to attempt this would be to compute the binomial likelihood of 211 "successes" (i.e., an A type choosing A, a B type choosing B) among 372 "trials." This is computationally infeasible. Instead we perform a bootstrap, where we simulate the 48 rounds of the stage game 1000 times and compute the likelihood in each case and compute the average. We find that the set of theories "better than" the evolutionarily stable equilibrium theory is approximately the interval (0.4851, 0.6615) for x, which represents approximately 16.66% of the zero-one interval.

Risk-averse players
So far in this section, we considered that all players have affine preferences over money. In particular, for A and B types we postulated payoffs of u(m) = m, where m ∈ {1.2, 0.9, 0.2} the three possible distinct payoffs these types can receive (see Table 1). One might call such players risk-neutral.
One could, however, in principle envision that players do not have such affine preferences over money. We could, for example, fix that u(1.2) = 1.2 and u(0.2) = 0.2 but choose u(0.9) somewhere between 0.2 and 1.2. A person with u(0.9) > 0.9 could then be said to be risk-averse and one with u(0.9) < 0.9 would be risk-loving. 28 In the present context, one could also imagine another reason why a person might have a u(0.9) > 0.9. Such a person, in place of an A or B type, might consider the experiment mostly a challenge to achieve coordination for A or B, and not actually care too much about the money involved. 28 There is plenty of experimental evidence that people are risk-averse even over small gambles (see, e.g., Holt and Laury 2002;Barberis et al. 2006, andHarrison andRutström 2008). This is normatively not very appealing as such people would have to be implausibly extremely risk-averse over larger gambles (see, e.g., Rabin 2000). 29 The value u(0.9) = 1.0722 is the unique value that generates the equilibrium probability of A types playing A (and B types playing B) of x = 0.5672. Utility values higher than 1.0722 lead to lower x, while utility values lower than 1.0722 lead to higher x. The value of u(0.9) = 1.0722 corresponds to a level of risk aversion in a CARA utility of 2.014. This is higher than typically reported levels of risk aversion from experimental data, see, e.g., Goeree et al. (2003). Note, however, that risk aversion is just one interpretation one could give the value of u(0.9) = 1.0722.
The proof of this proposition is omitted. It follows the same steps as the proofs of Propositions 3.1 and 3.2. Proposition 3.3, thus, states that the voting game can be calibrated as to generate a unique undominated attainable ESS to perfectly match the observed average frequency x = 0.5672 of A types playing A and B types playing B. The individual analysis performed in "Appendices C.1 and C.2" confirms the finding that the null hypothesis of stationary play of A and B types mixing with the round-independent probability of x = 0.5672 across all 24 rounds cannot be rejected.

Opinion polls before voting
In this section, we study the opinion poll game with the understanding that, after the opinion poll results are publicly announced, the voting game is played [treatment CPSSP in Forsythe et al. (1993)].

The game
This game is, thus, a two-stage game with 14 players, 4 A and 4 B types and 6 C types. There are many possible strategies in this two-stage game. Every single player has to choose one of the four actions (vote for A, vote for B, vote for C, or abstain) in the first, the opinion poll, stage. We shall refer to these, in what follows, as "straw" votes whenever the context may be insufficient to distinguish them from the actual votes in the second stage. Then, for the second stage, the players, after observing the outcome of opinion polls, choose to vote for A, B, or C, or to abstain. Payoffs depend only on which candidate ultimately wins the election and are given as in Table 1.

Attainable equilibria
Given that the voting game by itself has multiple equilibria, the two-stage game with an opinion poll stage before the actual voting stage has many subgame perfect equilibria. We here follow the argument in Andonie and Kuzmics (2012) to restrict attention to a small subset of attainable equilibria. This restriction is also justified empirically as we shall see as follows.
The equilibria identified in Andonie and Kuzmics (2012) are such that the outcome in the opinion poll to a large extent already determines the ultimate election winner: Whichever candidate A or B receives more straw votes in the opinion poll wins the election. This is achieved by A and B types of subjects coordinating their second-stage voting choices on the publicly observed opinion poll winner (between candidates A and B). This behavior is feasible under attainability. Attainability imposes, however, a stronger restriction on second-stage voting behavior in the case when there is a tie between the number of straw votes cast for both candidates A or B in the opinion poll. Then only attainable equilibria of the voting game, as identified in Sect. 3.1, can be played. In this case, as we have seen in Sect. 3.2, the most likely candidate to win the election is candidate C. Thus, in these equilibria, the most important aim for A and B types is to cast straw votes in the opinion poll in such a way as to avoid a tie between A and B, ideally with their preferred candidate "winning" the opinion poll. The point of this paper is to demonstrate that the subjects' behavior in the experiments of Forsythe et al. (1993) is very close to such an equilibrium in all stages of this game. That the poll winner between candidates A and B determines the outcome of the election is an assumption that is somewhat justified by the experiment [treatment CPSSP in Forsythe et al. (1993)] as Table 5 illustrates. 30 When there is a tie between the numbers of straw votes cast for both candidates A or B in the opinion poll, only attainable equilibria of the voting game, as identified in Sect. 3.1, can be played. There are, however, three such equilibria as we saw in Sect. 3.2. Only one of these is evolutionarily stable in the voting game (see previous section). We assume that this is the voting behavior that the players foresee for the second stage when they participate in the opinion poll in the first stage. Ties in the experiment happen 11 out of 48 times. Play of A and B types in these cases is summarized in Table 6.
This means that the combined empirical proportion of A and B types who vote for their most preferred candidate (after a tie in the poll) is 0.7045, for their second most preferred is 0.2160, for C 0.0682, and for abstain 0.0114. This is close to but somewhat statistically different from the supposed equilibrium proportions of (0.6615, 0.3385, 0, 0) without risk aversion and (0.5672, 0.4328, 0, 0) with risk aversion. Nevertheless, we shall assume that players ex-ante expect the evolutionarily stable strategy play. 31 30 One should also note that what matters to play in the first, the opinion poll, stage, is what players expect the continuation play to be in all possible subgames. 31 The behavior of A and B types after a A-B tie in the poll is, thus, statistically different from the behavior of A and B types in the treatment without polls. It is a bit difficult to explain this difference. It is as if the heightened concern for coordination of A and B types after a tie in the poll changes somewhat from their original preference. We shall term the behavior described in the previous paragraphs the focal voting behavior. Having solved (or assumed behavior for) the second-stage voting game for every subgame, we can write the first-stage opinion poll game in reduced form. Note that, under the given assumptions about subgame behavior, choosing to abstain in the opinion poll is equivalent to casting a straw vote for C for all types of players. Thus, each player of each type has three distinct pure strategies: Cast a straw vote for A, B, or C. The ultimate payoffs depend only on whether A or B has more straw votes in the poll or whether there is tie between the two in the poll. These payoffs are summarized in Table 7.
We can, then, finally turn to the equilibrium analysis of the behavior in the opinion poll. Note that, unlike in the voting game, no player of any type has weakly dominated strategies. Consider, for instance, a C type and other player behavior in the poll such that without her straw vote A will beat B in the poll by one straw vote. Then this C type's best strategy is to cast a straw vote for B to create a tie between A and B and, thus, make C win in the election much more likely. Also for an A or a B type, there are other player strategy profiles that make casting a straw vote for C a uniquely best strategy.
An attainable strategy profile, again following Andonie and Kuzmics (2012), in this polling game, must be such that all A types use the same (mixed) strategy, all B types the same (mixed) strategy, and all C types the same (mixed) strategy. Additional restrictions induced by the symmetries of the game are as follows. Let x i ( j) denote the probability a player of type i attaches to stating a preference for candidate j, for any i, j ∈ {A, B, C}. Then, we must have that   Proof We need to find all attainable equilibria of the polling game given the assumed focal second-stage voting behavior. In principle, such equilibria may be in completely or only partially mixed or even pure strategies. We, thus, have to consider all possible support pairs. To give one example, there could be an equilibrium in which A types mix between A and C only (and, by attainability B types mix between B and C), while C types only play C. One would then get one equation from the indifference of A types between A and C and two inequalities as A types must then find playing B worse than (or equal to) playing A and C types must find playing A (and, thus, also B) worse than or equal to playing C. Altogether, there are three possible equilibrium supports to be considered for C types (play the "pure" strategy given by playing A and B with probability 1/2 each, play pure strategy C, and mix between both). There are seven possible equilibrium supports for A types (B types then follow from attainability). These are pure strategies A, B, and C, mixing between two pure strategies AB, AC, and BC, and mixing between all three pure strategies ABC. Thus, there are 21 cases to consider.
For each case, we write down the (polynomial of degree up to 7) equalities and inequalities induced by the case and use the Newton method to find all solutions. The program is available as part of the supplementary material. This procedure provides us with the stated three attainable equilibria.
Expected payoffs to all types in these three attainable equilibria are given in Table 8. This reduced polling game also has non-attainable equilibria. For instance, all A and B types playing A and all C types playing C are an equilibrium, in which eventually A is elected. No player can in this case change the outcome by unilaterally deviating to some other strategy. Even six of the A-and B-type players playing A and the other two B or C and all C types playing C is an equilibrium, in which again no player can change the outcome (of A winning the election) by unilaterally deviating. There are many more non-attainable equilibria.

Evolutionary stability of attainable equilibria
In this section, we study the evolutionary stability properties of the three attainable equilibria. Proof Let us first consider equilibrium x * * * . We shall look at the population of C types, which in this equilibrium mix between pure strategies A,B and C with probabilities (0.3965, 0.3965, 0.207). Attainability restricts C types to attach the same probability on pure strategies A and B. Thus, attainable strategies for C types can be identified with one number α ∈ [0, 1/2], the probability they attach to pure strategy A (and, thus, also to pure strategy B). They then must attach probability 1 − 2α to pure strategy C. Fixing play of A and B types with their equilibrium play in x * * * , the C types are playing a six-player symmetric game with two strategies. We can, thus, appeal to Lemma A.1, given in "Appendix," to check whether or not x * * * is an ESS. Figure 2 depicts the payoff to a C type, for playing A (same as B) and C, as a function of α, assuming A-type and B-type players use their prescribed mixed strategy in equilibrium x * * * . This figure, appealing to Lemma A.1, thus shows that the strategy profile x * * * is not evolutionarily stable. This is so because strategy A is the unique best reply to strategies α above the equilibrium value 0.3965 and is strictly worse than strategy C for values α below 0.3965.
We now turn to equilibrium x * * . Figure 3 depicts the payoff to a C type, for playing A (same as B) and C, as a function of α, assuming A-type and B-type players use their prescribed mixed strategy in equilibrium x * * . Note that, if, as prescribed in the equilibrium, all other C types play α = 1/2, then strategy C is strictly worse than strategies A and B. Thus, x * * is an ESS from the point of view of C types. We then need to turn to the A and B types in this equilibrium. Figure 4 shows the best response regions for an A type as a function of the mixed strategy assumed by all other A types (and all B types playing accordingly), assuming C types play according to equilibrium x * * . One can see from Fig. 4, and appealing to Lemma A.1, that no mutant that places probability zero on strategy B can enter the equilibrium x * * . This is so because the unique best response to a strategy that puts more probability on A than x * * does, while B receives zero probability, is to play C, while the unique best response to a strategy that puts more probability on C than x * * does, while B receives zero probability, is to play A. To see that no other mutant can enter x * * either consider that the payoff from playing the A types' part of x * * against all other A types playing y in the simplex (and all B types their corresponding mixed strategy), while C types play as prescribed in x * * , is always positive unless y = x * * . Finally, we explore the evolutionary stability properties of equilibrium x * . This equilibrium is also an ESS. This can be seen partially from the best response regions given in Figs. 6 and 7. Figure 7, in conjunction with Lemma A.1, shows that playing C for C types is evolutionarily stable as C is the unique best reply to α = 0 (i.e., to all other C types playing C). In Fig. 6, we can only partially see that x * is an ESS. For instance, it is clear that a C mutant, who upon entering would lead us to region 1 of the picture, would lead to C being the worst strategy. From Fig. 6, the effect of The simplex of mixed strategies for players of type A and the best response regions for an A-type player if all other A types (and symmetrically B types) use a mixed strategy in this simplex, while all C types use their prescribed strategy in equilibrium x * . The three lines in this picture are the indifference lines between each pair of pure strategies. The line emerging from the A corner is the indifference line between strategies B and C. The one emerging from the B corner is the indifference line between A and C. The remaining line is the indifference line between A and B. Suppose we label these six regions clockwise starting with 1 at the top (around the C corner), then in region 1 the A type has preferences A B C, in region 2 A C B, in region 3 C A B, in region 4 C B A, in region 5 B C A, and in region 6 B A C mutants entering is, however, not clear for every possible mutant. 32 To fully see that x * is an ESS, we appeal to Lemma A.2 and Fig. 8, which shows the payoff difference u A (x * , y n−1 ) − u(y, y n−1 ) which is positive everywhere and equal to zero only for y = x * .

Empirical results and tests
The actual behavior in the opinion poll in the 48 rounds of the stage game with opinion polls [treatment CPSSP in Forsythe et al. (1993)] is given in Table 9. We first test the null of attainability, that is, we test independence in the relabeled Table 10.
Test 4.1 The null hypothesis of attainability, for A and B types, that x A (A) = x B (B) and x A (C) = x B (C), cannot be rejected at the 5% level of significance. The χ 2 test of independence leads to a p value of 0.1182 (χ 2 = 5.8677).
A few notes are in order here. The null of attainability is also the null of the absence of a ballot position effect. We, thus, cannot reject this null. Thus, at the 5% level of significance, there is no evidence that players condition their strategy on the ballot position. Forsythe et al. (1993) find some evidence of a ballot position effect. This evidence can be reproduced here if we restrict attention to straw votes cast for A and B only, ignoring those cast for C and abstaining. If we do this test, we obtain a p value of 0.048 (χ 2 = 3.91), which would lead to rejecting the null at the 5% level of significance. This test, however, ignores the randomness in the total number of A and B votes (assuming this total to be exogenously given by the experimenter, which is not the case). 33 It is clear that subjects are not playing anywhere close to ESS x * * . While play is very close to ESS x * , the null of x * must nevertheless be rejected at the 5% level of significance.

Test 4.2
The null hypothesis of attainable ESS equilibrium play, i.e., x * as identified in Proposition 4.2, using a χ 2 goodness of fit test must be rejected at the 5% level of significance. The empirical frequency of play averaged for A and B types is given by (0.7005,0.1432,0.1563), and the p value is essentially zero (χ 2 = 32.7551).
The individual analysis performed in "Appendices C.3 and C.4" confirms this finding.
Aggregating across A and B types, the data are "explained better" than with the x * equilibrium by all x ∈ ({A, B, C}) indicated in Fig. 9. Here, we use the simulated likelihood approach introduced in Sect. 3.4. The area of the gray "spot" is about 1.37% of the area of the simplex. 33 Theoretically, one could well expect that subjects who play this game often eventually learn that they could use the ballot position to condition their strategy on. This would then allow the A and B types to be coordinated in more cases if not in all. While there is some suggestion that this is going on here, see also Forsythe et al. (1993) for a discussion, there is no statistically significant evidence of this. See the analysis of the data of the individual rounds in "Appendix C." Fig. 9 The gray "spot" indicates the set of all hypothesized x ∈ ({A, B, C} for player types A and B that have a higher likelihood than equilibrium x * ) A B C

Risk-averse players
We now return to the possibility of risk-averse players or players who for some other reason value the money amount of 0.9 relatively higher than under the assumption of an affine preference in money. In fact, we shall here consider the case where u(1.2) = 1.2, u(0.2) = 0.2, and u(0.9) = 1.0722, the value that perfectly calibrates or "explains" the outcome of the voting game alone (see Sect. 3.5). The proof is omitted. It follows the steps of the proofs for Propositions 4.1 and 4.2. Again, equilibriumx is not at all consistent with the data. Equilibriumx now is.

Test 4.3
The null hypothesis of evolutionarily stable attainable equilibriumx of the polling game, when A and B types have utility u(0.2) = 0.2 and u(1.2) = 1.2 and u(0.9) = 1.0722, cannot be rejected at the 5% level of significance. The χ 2 goodness of fit test produces a p value of 0.7799 (χ 2 = 0.4972).
The null cannot also be rejected if we do this test individually for A and B types. The p values are 0.09889 (χ 2 = 4.6276) for A types and 0.4897 (χ 2 = 1.4277) for B types. The individual analysis performed in "Appendices C.3, and C.4" also confirms this finding.
Test 4.3 is very powerful. 34 The power function, for this test, is the function f from the set of all distributions x over {A, B, C} (for A types and appropriately relabeled for B types) to the interval [0, 1] with f (x) the probability of rejecting the null hypothesis at the 5% level of significance given that a sample of n = 384 data points is drawn given the alternative hypothesis x is true. Figure 10 provides two insights. For the first insight, the simplex represents the set of all possible data vectors for A and B types (with coordinates denoting the proportion of individuals choosing their favorite, second favorite, and least favorite, respectively, in the opinion poll). The black dot then represents the null hypothesisx, and the gray area is the set of possible data vectors for which the null would not be rejected. For the second insight, the simplex represents the set of all possible alternative hypotheses x, distributions over {A, B, C} (for A types and appropriately relabeled for B types). The iso-quants of the power function f (x) are indicated by the dashed "ellipses."

Conclusion
What have we learnt from this paper? As the experiment was originally performed for a different purpose, we first had to carefully identify what equilibrium theory would predict in this game. It became clear that one had to take seriously certain symmetry restrictions, termed attainability by Crawford and Haller (1990), even though the game was presented to subjects in a way that also allowed play that violates these symmetry restrictions. As the game was played recurrently in the laboratory, the appropriate equilibrium theory is that of evolutionary stability. We, thus, need to check the evolutionary stability properties of all equilibria we found.
In one treatment, the two-stage game with an opinion poll followed by a voting stage, we needed to identify the focal (perhaps simplest reasonable and empirically founded) attainable subgame continuation play in the voting games for all possible different outcomes of the opinion poll.
We then tested whether subjects played an evolutionarily stable attainable equilibrium under the assumption that players are risk-neutral. While this theory must be rejected at the 5% level of significance, it is nevertheless quite close to "explaining" the data.
We then use one treatment, the voting game alone, to calibrate the risk aversion of the players to give a perfect fit to the data for this treatment. We then test whether subjects' play can be rationalized by an evolutionarily stable attainable equilibrium in the polling game. This hypothesis cannot be rejected at the 5% level of significance despite the fact that we have close to 400 data points.
All in all, evolutionarily stable attainable equilibrium is not too far off "explaining" the play in this particular laboratory experiment. It might be interesting to perform a similar case study for experimental work following up on Forsythe et al. (1993), in which symmetries play a role, including Forsythe et al. (1996), Rietz et al. (1998), andBassi (2015).

A Evolutionary stability in symmetric n-player games
Let = (I , S, u) be a symmetric n-player normal form game. That is, the set of players is given by I = {0, 1, . . . , n}, where n ≥ 2. The set of pure strategy profiles is given by S = × i∈I S i , where S i is the (finite) set of pure strategies and S i = S j , for all i, j ∈ I , i.e., every player has the same pure strategies. Finally, u : S → R is the payoff function for every player with the understanding that u(s 1 , s 2 , . . . , s n ) is the payoff to a player if she plays pure strategy s 1 , while the others play (s 2 , . . . , s n ). This payoff function has the additional property that it is unaffected by all permutations of (s 2 , . . . , s n ). Players evaluate mixed strategy profiles by taking the expected utility. Let denote the set of mixed strategies, i.e., the set of probability distributions over S i . For strategies x, y ∈ , we shall denote by u(x, y n−1 ) the payoff of an x-strategist if all her n − 1 other players play the same strategy y.
We shall use the notion of an evolutionarily stable strategy (ESS) in the sense of Maynard Smith and Price (1973) and Maynard Smith (1982) (see (Weibull 1995, Definition 2.1) for a textbook treatment) adapted to symmetric n-player games as in Palm (1984) and Broom et al. (1997).
Definition 1 A (mixed) strategy x ∈ of a symmetric n-player normal form game is an evolutionarily stable strategy (ESS) if for all mutants y ∈ with y = x, there is an¯ > 0 such that for all ∈ (0,¯ ) we have that u(x, w n−1 ) > u(y, w n−1 ), where w = (1 − )x + y denotes the post-entry mix of strategies.
For symmetric two-strategy n-player normal form game, there is a simple equivalent condition for x, now ∈ [0, 1], to be an ESS. See also (Broom et al. 1997, Section 4) for a discussion of this case.
Lemma A.1 A (mixed) strategy x ∈ [0, 1] of a symmetric two-strategy n-player normal form game is an ESS if and only if there is an > 0 such that for all y < x with x − y < we have that u(A, y n−1 ) > u(B, y n−1 ) and for all y > x with y − x < we have that u(A, y n−1 ) < u(B, y n−1 ).
Proof: Consider x ∈ [0, 1] and a mutant y with the property that y > x. Then, there is an α ∈ (0, 1) such that y = (1 − α)x + α1 (i.e., y is a convex combination between playing x and pure strategy A) and the ESS condition reduces to where w > x, given that w is a convex combination of x and y and y > x. The analog steps can be made for a mutant y < x. For more than two-player games, one can characterize an ESS in terms of first-and higher-order conditions. For two-player games, see, e.g., Weibull (1995) Proposition 2.1 for fully characterizing an ESS in terms of a first-order (Nash equilibrium) and a second-order condition. Broom et al. (1997), p. 935, provide such conditions for n-player games, which one could call first to n-th order conditions. For our purposes, a part of their result suffices.
Lemma A.2, thus, only provides a sufficient condition for a strategy to be an ESS.

B Indifference condition for the mixed voting equilibrium
In where As x = 0 and x = 1 are solutions, one can reduce the condition by dividing by x(1 − x) to obtain a polynomial of degree 5, given by The only real root of this polynomial in the interval (0,1) can be determined by Newton's method and is given by x ≈ 0.6615.

C Individual data analysis
In this section, we investigate the data from the individual rounds in detail. Before we do this, we need to discuss another issue, the interpretation of mixed equilibrium play, i.e., play in which people supposedly randomize. There are at least three interpretations of mixed equilibrium in the literature, with different appeals in different games. The original interpretation, due to John von Neumann and Oskar Morgenstern and others, is that people actually do make choices randomly. Another view, see, e.g., Aumann and Brandenburger (1995), is that the randomization is just in the beliefs of the players. The third view, due to Harsanyi (1973) and one could call it a micro-foundation for the second view, is that there are many possible types of other players, everyone of them uses a pure strategy, but as we do not know the other player's type to us it looks like a mixed strategy. Indeed, this third view is also very consistent with the view of evolutionary game theory, see, e.g., Weibull (1995), that instead of one other player there is actually a large population of possible other players and each of them typically plays a pure strategy. This discussion is relevant for the analysis here. We do not necessarily expect that every single subject in this game randomizes in an iid fashion in every round. But this is also not needed for play to be in a mixed equilibrium. Note that a lot of independent randomization happens in this game purely by the fact that subjects are repeatedly randomly (independently of anything else) allocated into one of the two groups. Even if subjects condition their behavior on their own special characteristics and their own private history in this game, even if many of these subjects employ a pure strategy, to another player, who does not know all this, this person's strategy may appear random. In the end, it is an empirical question whether or not the subjects actually employ a sufficient degree of independent randomization for the relevant variables in the game to be independent of past play.
In this particular game (in both treatments), the players of interest-the ones who may have to randomize in equilibrium-are the A and B types. Key for these types of players is their beliefs about what the other seven A and B types in their round do. In this section, we show that the null of no serial correlation in the round behavior of A and B types cannot be rejected. In other words, there is no evidence that any A and B types could use past information to help her predict the strategy profile of the other players beyond the apparent knowledge that it is given by the equilibrium in question. We also perform a meta-analysis of the null hypothesis that play is given by the stationary play of the evolutionarily stable attainable equilibrium (with risk aversion) in question in all rounds. This null can also not be rejected.

C.1 Game without polls: testing independence
Consider the treatments without polls. We are only interested in the A and B types.
Denote by X i t the number of A and B types in group i ∈ {1, 2} and round t ∈ {1, 2, . . . , 24} who have voted for their favorite candidate. To assess the extent of serial dependence in the time series (X 1 t , X 2 t ), we test the null that there is zero correlation between X i t and X j t−1 = L X j t , where L denotes the lag operator, for any combination of i, j ∈ {1, 2}. The empirical correlation coefficients are given in Table 11.
To test the null of the relevant four true correlation coefficients between these variables being zero, we employ the Fischer z-transform of the empirical correlation coefficients which, under the null, are approximately normally distributed with a mean of zero and a standard error of 1/ √ 20 = 0.2236. The Fischer z-transform for empirical correlation r is given by z = (1/2) ln ((1 + r )/(1 − r )). Note that all Fischer ztransformed empirical correlation coefficients are less than even one standard error (away from zero) and, thus, the null of zero correlation between all these variables cannot be rejected. There is, thus, no evidence of serial correlation in the round by round data for the game without polls.

C.2 Game without polls: meta-analysis of all rounds
We here perform a meta-analysis of the test of the null that voting behavior in each of the 24 rounds without polls for A and B types is given by the attainable ESS under risk-neutrality and under risk aversion, with the probability of voting for the favorite candidate given by x = 0.6615 and x = 0.5672, respectively. We perform these tests for each of the 24 rounds separately and note down the resulting roundspecific p values. Under the null, these p values should be uniformly distributed (see, e.g., Hedges and Olkin 1985, Chapter 3). To test the null of uniformly distributed p values across the 24 rounds, we perform a standard Kolmogorov-Smirnov (KS) test of fit. Figure 11 additionally provides a Q-Q plot plotting the empirical against the theoretical (uniform) quantiles of the p value distribution for both null hypotheses.  The resulting p value of the KS test for the ESS theory under risk-neutrality, i.e., for x = 0.6615, is 0.0337. Thus, this null has to be rejected at the 5% level of significance. This is in agreement with what we found for the aggregate data. Note that, if we know that the probability of voting for the favorite candidate, x, is the same across all rounds, then the aggregate test is more powerful than this meta-analysis.
The resulting p value of the KS test for the ESS theory under risk aversion, i.e., for x = 0.5672, is 0.4858. Note that, as we calibrated the model so as to provide a perfect aggregate fit, this test has to be interpreted more carefully. It is not a test of the null that play in all rounds is governed by x = 0.5672. But it is a test of the null that play in all rounds is governed by the same x.

C.3 Game with polls: testing independence
We turn to the game with polls. We are again only interested in the A and B types. Denote by X i t the number of A and B types in group i ∈ {1, 2} and round t ∈ {1, 2, . . . , 24} who voted for their favorite candidate in the poll. Analogously, denote by Y i t the number of A and B types in group i and round t who voted for the second favorite candidate in the poll.
Note that all Fischer z-transformed empirical correlation coefficients, given in Table  12, are less than two standard errors (away from zero). The null of zero correlation between all these variables therefore cannot be rejected. Not a single one of these 16 theoretical quantiles z−transformed corr. coeff.  Fig. 12 Q-Q plot of the 16 z-transformed correlation coefficients between X 1 , X 2 , Y 1 , Y 2 and their values at lag one against the standard normal distribution. This is for the polling behavior of the voting game with polls correlation coefficients is significantly different from zero at the 5% level of significance. As we found for the game without polls, there is also no evidence of serial correlation in the round by round data for the game with polls. Another possibility is to directly test the null that these 16 z-values (each divided by the standard error 1/ √ 20 = 0.2236) come from a standard normal distribution. The KS test produces a p value of 0.2009. This null can therefore also not be rejected at the 5% level of significance. See Fig. 12 for a Q-Q plot of the empirical distribution of these z-values against the standard normal distribution.

C.4 Game with polls: meta-analysis of all rounds
We here perform a meta-analysis of the test of the null that the polling behavior in each of the 24 rounds of the game with polls for A and B types is given by the attainable ESS under risk-neutrality and under risk aversion, given by x * andx, respectively. We perform this test for each of the 24 rounds separately and note down the resulting round-specific p values. Under the null, these p values should be uniformly distributed (see, e.g., Hedges and Olkin 1985, Chapter 3). To test the null of uniformly distributed p values across the 24 rounds, we perform a standard Kolmogorov-Smirnov (KS) test of fit. Figure 13 additionally provides a Q-Q plot plotting the empirical against the theoretical (uniform) quantiles of the p value distribution.
The resulting p value of the KS test for the ESS theory under risk-neutrality, i.e., for x * , is 0.0018. Thus, this null has to be rejected at the 5% level of significance. This is in agreement with what we found for the aggregate data.
The resulting p value of the KS test for the ESS theory under risk aversion, i.e., for x, is 0.5145. This null cannot be rejected at the 5% level of significance. This is in agreement with the test based on the aggregate data.

C.5 Game without polls: splitting data in half
While we, thus, cannot reject the null of stationary ESS behavior across all rounds for the game without polls, the test provided in Sect. C.2 for this purpose may not be  extremely powerful. If we suspect that there is a specific trend, perhaps because of learning, in the behavior across rounds, we may be able to perform a more powerful test by splitting the sample into two halves, the early rounds 1 to 12 and the later rounds 13 to 24. The results of the voting behavior for early rounds are given in Tables  13 and 14 and for the later rounds in Tables 15 and 16.
The weighted average proportion of A and B types who vote for their preferred candidate is then given by 0.5824 for rounds 1 to 12 and 0.5526 for rounds 13 to 24. The test of the null that the true proportion of A and B types who vote for their preferred candidate is the same in early and later rounds yields a chi-squared value of 0.3360, which at one degree of freedom yields a p value of 0.5622. There is therefore still no evidence to suggest that voting behavior in the game without polls changes from earlier to later rounds. 36

C.6 Game with polls: splitting data in half
We now turn again to the game with opinion polls and the subjects' behavior in these opinion polls. As in the previous subsection, we here split the data in two halves, the early rounds 1 to 12 and the later rounds 13 to 24. The results of the voting behavior for early rounds are given in Table 17 and for the later rounds in Table 18. The average proportions of votes for the most preferred, second most preferred, and least preferred candidate aggregated across the two types A and B are thus 0.6667, 0.2031, and 0.1302 for rounds 1 to 12 and 0.7344, 0.0833, 0.1823 for rounds 13-24.
Having split the data in halves in this way, with the resulting more powerful test, we now do find statistically significantly different behavior between early and later rounds. The chi-squared value is 11.913, which with two degrees of freedom yields a p value of 0.0026.
If we test the null of ESS play in the early rounds, we get a p value of 0.1387, whereas for the later rounds we get a p value of 0.0213. Thus, while there is not sufficient evidence to reject this null for the early rounds, there is some evidence against it for the later rounds. Interestingly, the behavior in the later rounds is closer to ESS behavior without risk aversion (or coordination concerns). If we test the null of ESS behavior without risk aversion (i.e., with u(0.9) = 0.9), we get a p value of < 0.00005 for the early rounds and a p value of 0.0876 in the later rounds. At, for instance, the 5% level of significance we can in fact not reject the null hypothesis of ESS play without risk aversion for the later rounds. All this may be evidence of subjects learning with experience to tighten their behavior toward (evolutionarily stable) Nash equilibrium.
For this treatment, one could also calibrate u(0.9) in such a way that the ESS prediction minimizes the Kullback-Leibler divergence relative to the observed proportions of play. If we do this with the whole data for A and B types, we obtain a calibrated u(0.9) = 1.0288 (recall the calibrated u(0.9) = 1.0722 from the treatment without polls). For rounds 1 to 12, we obtain a calibrated u(0.9) = 1.0877 and for rounds 13 to 24 a calibrated u(0.9) = 0.9293 (almost equal to 0.9, the value for a risk-neutral player). This also confirms the slow learning tendency toward ESS Nash equilibrium without risk aversion.