Strategic voting in the lab: compromise and leader bias behavior

Plurality voting is perhaps the most commonly used way to aggregate the preferences of multiple voters. Yet, there is no consensus on how people vote strategically, even in very simple settings. The purpose of this paper is to provide a comprehensive study of people’s voting behavior in various online settings under the plurality rule. We implemented voting games that replicate two common real-world voting scenarios in controlled experiments. In the first, a single voter votes once after seeing a pre-election poll. In the second game, a group of voters play an iterative game, and change their vote as the game progresses (as in online voting). The winning candidate in each game (and hence the subject’s payment) is determined using the plurality rule. For each of these settings we generated hundreds of game instances, varying conditions such as the number of voters, subjects’ preferences over candidates and the poll information that was made available to the subjects prior to voting. We show that people can be classified into several groups, one of which is not engaged in any strategic behavior, while the largest group demonstrates both a tendency for strategic compromise, and a bias toward voting for the leader in the poll. We provide a detailed analysis of this group behavior for both settings, and how it depends on the poll information. Our study has insight for multi-agent system designers in uncovering patterns that provide reasonable predictions of voters’ behaviors, which may facilitate the design of agents that support people or act autonomously in voting systems.


Introduction
Voting protocols are among the most widely used tools for group decision-making and preference aggregation [16,53,61], and their properties have been studied formally at least since the eighteenth century [18,19]. More recently, computers have been playing an increasingly active role in voting systems, whether as systems for aggregating preferences [23], or autonomous agents acting as proxies for individual voters [7].
Examples of existing systems abound, especially on the Internet: Wikipedia, which promotes its editors via an online election system [37] and Doodle, which specializes in scheduling polls, has over 30M monthly users voting over meeting times [22]. Facebook, and more recently Google, allow users to create their own polls and aggregate votes from friends. Apps such as Dvel (www.letsd vel.com) let their users to take any decision they face and ask their friends to vote on it in real time. Voting is also shown to be a useful aggregation tool for crowdsourcing [65] and human-computation applications [39]. New voting rules are being designed for the purpose of being used in large-scale online settings [30].
While there is general consensus in political science, economics, game theory and computational social choice that people do not vote truthfully, it is not clear what voting strategy they actually employ, and what type of environmental factors affect this strategy. Indeed, even under the simple plurality rule there is an active discussion on how voters should vote or would vote given their preferences, and different studies suggest different conclusions [5,29,55]. There are precious few publicly available benchmarks that researchers can use to evaluate the assumptions and predictions of various theories from the social choice literature. One exception is the PrefLib project [40], which contains over 3000 datasets from a variety of sources and locations, and is freely available on the web (www.prefl ib.org). However the PrefLib datasets contains either reported preferences (e.g., movie or Sushi preferences), or reported votes (e.g., referee ratings in ice-skating championship), but not both.
Our goal is to fill this gap by collecting and analyzing human strategic voting behavior in a variety of online settings. There are several benefits for controlled online experiments. First, they reflect the growing use of computerized systems in the aggregation of people's preferences and voting behavior. Second, we can run experiments on a large scale using crowdsourcing. Third, it allows us to create a controlled environment that abstracts away (as much as possible) from the context, and thus the only factors affecting people's voting behavior are their preferences and the information that is available to them. There is no interference due to dependency relationships with candidates, sense of duty, expressive voting, coalition formation and other factors that are common e.g., in political voting.
We base our controlled experiments on two interactive voting games that are easy to explain to subjects. In both settings, voters are automatically assigned private cardinal utilities over a fixed set of three candidates. The payment to subjects depends only on the identity of the winning candidate, regardless of how they have voted.
The first setting consists of a one-shot voting with a single human voter. We completely control the data available to the voter by providing her with a (non-binding) pre-election poll of others' votes, and record her voting behavior under conditions that vary the information in the poll.
The second setting consists of a group of human participants in an iterative voting game. As in the previous setting, the preference profile is dictated to the voters, but they are free to change their votes at will until they reach an agreed outcome (or a timeout). As in the poll game, we recorded the decision of each voter along with the information available to her at that point in time. Both games put voters under uncertainty, but the source of uncertainty varies: in the first game voters only have access to an inaccurate poll. In the second game a voter directly observes the current votes of her peers, but does not know how they will vote eventually at the final round (or when will the final round arrive).
In both games the voter faces a strategic dilemma when her favorite candidate from the three is at a disadvantage (at the poll or according to the other current votes): to remain truthful or to compromise, i.e. to vote for a less preferred candidate that has a better chance to win. This definition of compromise coincides e.g. with the definition of a strategic vote in [2].
We conducted an extensive empirical study in which over 550 human subjects played over 10,000 game instances in both game settings. We varied the number of voters, subjects' preferences over candidates, and (in the one-shot case) the poll information that was made available to them prior to voting. We analyzed under what conditions subjects choose the strategic compromise (or an unexpected, "irrational," action) over the truthful vote. Our three main findings are as follows: • In both settings we found large interpersonal differences, identifying several distinct groups: (1) subjects that consistently voted for their most preferred candidate; (2) subjects who tended to compromise when facing a strategic decision; and (3) subjects who sometimes play dominated actions, like voting for the least preferred candidate. We focused our analysis on the second group, which consisted of 85% of the subjects (the "strategic subjects"). • Strategic subjects tend to compromise more in situations where this increases their expected payoff. Yet many subjects compromise even when it would be better to vote truthfully. • When the most preferred candidate is ranked second (a situation that should not pose a strategic dilemma), a significant fraction of the strategic voters voted for the (less preferred) leader of the poll.
In addition, in the iterative setting we show that: • The behavior was remarkably similar to the one-shot setting, where the current votes of the other voters are treated like the poll in the first setting. • Voters demonstrate some level of "stickyness," and are more likely to keep their vote from the previous turn.
Therefore, the contribution of this paper is three-fold: First, the development of a flexible experimental platform that is designed to run both offline and online settings. Second, collecting thousands of instances of human strategic voting behavior in a different interactive online settings. Third, defining novel ways to measure and quantify various voting behaviors empirically, and studying how these measures are affected by the context and information available to a voter with fixed preferences. Our platform provides researchers with an environment that allows to control the factors affecting people's voting behavior, the information that is available to them and the context. All of the data from this paper is publicly available to the research community at www.votel ib.org/. The Also, we will make our open source code available for public use. The VoteLib database is the first of its kind in that it combines both people's preferences and their strategic voting behavior, over multiple strategic decisions. This allows researchers to test their own theories and train their models on our data without incurring the overhead of collecting the data, and will advance research in MAS and computational social choice.
The paper significantly revises and extends prior work by the authors [68] both on the data collection side and on the analysis side. First, it scales up the data collection to include tens of thousands of games and hundreds of subjects. Second, it provides a completely new analysis of the data collected for both one-shot and iterative settings, and identifies new types of voting behavior in these settings. Lastly, it provides a new and detailed comparison of our study to relevant work in social choice and experimental economics. Together, these contributions led to new insights about people's voting behavior in strategic settings.
The remainder of this paper is organized as follows. Section 2 discusses theoretical and experimental work that inspired this work. Section 3 introduces the formal problem and definitions. Sections 4 and 5 include our results and analysis on the one-shot and iterative settings, respectively. We end the paper with a discussion and ideas for future work in Sect. 6.

Related work
We begin by overviewing several prominent theoretical voting models from the social choice literature that aim to describe strategic voting behavior. Then, we review relevant findings from real election studies. Finally, we position our study within the large literature on voting experiments.

Theoretical work
The most fundamental solution concept in game theory is Nash equilibrium. However, trying to apply Nash equilibrium (either pure or mixed) to strategic voting often results in a trivial unrealistic outcome, since almost all voting profiles are Nash equilibria. Other game theoretic approaches have been developed by imposing various notions of uncertainty and rationality. Predominant examples include the calculus of voting Bayes-Nash equilibrium models [48,52,61], trembling hand equilibrium [46], strong equilibrium [64], and subgame-perfect equilibrium [21,26] models. In the calculus of voting model, which has been the leading model in the economic literature since the 1970's, a voter estimates her probability of being pivotal when all other preferences are sampled from a known distribution, and votes in a way that maximizes their expected utility. The calculus of voting papers, as most of the other papers above, assume that all voters employ the same (rational) reasoning, and in particular two voters with the same preferences would always vote the same. A newer model that relaxes the assumption that the number of voters is known in advance is Poisson games [49].
A second class of models focuses on heuristics for making strategic decisions by a single voter, regardless of equilibrium considerations. These heuristics have ranged from myopic heuristics based on best-response [15,32] to regret minimization [28], and complex decision diagrams [27,50]. Some papers analyze the best-response strategy specifically for voters that are faced with poll information rather than with the preferences of their peers [14,59]. More recent work considers voters that are faced with both poll information and the votes of their neighbors in a social network [67].
Several papers have combined decision-and game theoretic modeling of voting behavior. In an iterative setting where voters may change their vote one at a time, voters who follow the simple myopic best-response (MBR) heuristic are guaranteed to converge to a Nash equilibrium under the plurality rule [42]. Consequently, other heuristics have been shown to converge, giving rise to new notions of equilibrium [32,43,60]. For an extensive overview of strategic voting models, see [44].
The bias towards voting for the leader ("bandwagon effect:) also received theoretical attention: Simon [66] considered the "prediction problem," which states that it is impossible to give a correct political prediction since the prediction itself affects the outcome. He showed that in a particular model of bandwagon (or opposite, underdog) effect, there must be an equilibrium point where prediction is self-justifying.

Empirical work and in-situ experiments
Strategic voting was thoroughly studied in the context of political elections (see Regenwetter and Grofman [57] for an overview), supreme court votes [33] and other real world voting scenarios. These works commonly compare the vote distribution over several years and/or several districts [11].
Real world voting settings are challenging to study due to the lack of information about voters' preferences. Indeed, Regenwetter et al. [56] observe that testing phenomena such as existence of Condorcet cycles cannot be done reliably when it is not possible to infer individual preferences from ballots alone. In addition, it is not possible to explain people's voting behavior with a single monolithic model. Regenwetter et al. [58] found that people's voting behavior in organizational elections cannot be explained by a single heuristic function uniformly by all voters, and that their behavior may fit a mixture of heuristic behaviors based on the theoretical models mentioned above. They argue that "individual choice research finds actors to behave worse than normative theory requires, whereas the sparse empirical research on social choice appears to suggest that electorates may outperform normative expectations" (p. 1011), and call for further research that links decision making with individual voting behavior. Such links, together with the findings that a significant portion of the voters vote strategically [11,72], was part of the motivation for the current work.
Other works focusing on real world voting behavior query a subset of voters and ask them to report their truthful preferences, in addition to how they actually voted [1,6,13,54,70], or how they would have voted under different voting rules [72]. Blais et al. [10] presented voters in the 2015 Canadian elections with different levels of information regarding the party strength, and observed no effect on the tendency to vote strategically.
Collecting such information is valuable to researchers as it allows to compare between a voter's individual preferences and her action. However, it is commonly the case that each voter provides a single data point, so it is not possible to model or predict how the voter would behave in different situations. Another challenge to studying real world voting behavior is that voters are influenced by various social, ideological, emotional factors that are not readily available to the researcher.

Lab experiments
Most experimental studies of people's voting behavior in the lab focus on settings in which the same game is played several times and payments are realized at each game after the winning candidate is determined [5,9,69]. These works vary in the number of candidates, the voting rule and more. Blais et al. [9] studied settings in which there are only two candidates, voting is costly, and the strategic decision to make is whether to vote or not. They show that even in this simple setting subjects exhibit irrational behavior: they overestimate the probability of ties, and fail to vote in a way that maximize their payoff.
The three-candidate setting we use in our work has also been explored by other settings where subjects played a complete information multiplayer game with at least three candidates [5,8,29,69,71]. The main treatment in these experiments is to vary the voting rule, and results are studied at the aggregate level (e.g. the distribution of votes, or the likelihood of a specific winner), in comparison to predictions based either on rational theoretical models (plus some model of participants' beliefs) or on heuristic models.
As part of this line of work, Forsythe et al. [29] and Bassi [5] studied people's voting strategies in Borda, approval voting and plurality voting systems. They found that strategic voting was common in all of the voting systems, and that voters behaved strategically more often in the plurality voting condition than in the Borda or Approval voting condition. The candidate that maximizes the social welfare was chosen significantly more often in the Plurality voting condition that in the other conditions, and the Condorcet loser candidate was never chosen. They showed that over time people learned to strategize in a way that was consistent with a single equilibrium strategy of the stage game, modeled using a quantal response equilibrium.
Kube and Puppe [35] studied strategic voting behavior under the Borda rule. They found a positive relationship between the amount of information that was provided to subjects about others and their propensity to manipulate: subjects were significantly more likely to engage in strategic behavior when they are informed about others' preferences, and even more so when subjects were provided information about others' actual votes. They conjectured that subjects' reason for the manipulation was to bring about a "satisfiable" outcome, that is, to increase the winning probability of the candidate that would have won under sincere voting. However, the baseline for comparison was "no information", whereas we are interested in the effect of direct information on other voters' actions rather than preferences.
Tyszler and Schram [69] study voter behavior in voting settings in which Condorcet cycles occur. They showed that information (whether voters know each other's preferences), and the value of the voter's second most preferred candidate affected the decision of whether to compromise by subjects. Specifically, the probability of strategic voting increases with the value of the intermediate candidate for both conditions and whether the most preferred candidate was trailing in the polls (when information was available) and the extent to which the poll leader is preferred. They also show that a quantal-response equilibrium model provides a good fit to the aggregate vote distribution of players.
Blais et al. [8] performed similar lab experiments with dictated metric preferences over 5 candidates that are placed along an interval. Each group of 21 participants played 4 consequent voting rounds with the same preference profile and voting rule, and repeated the experiment with a different profile and voting rule. They found that voters tend to strategically desert candidates with low support in past rounds, and that there was a similar tendency to vote strategically under plurality. In a followup study using the same data, van der Straeten et al. [73] analyzed the different factors that affect the strategic decision in 2-round plurality versus simple plurality.
van der Straeten et al. [71] model players' once as purely rational (following the calculus of voting models) and then as heuristic (omitting candidates ranked low in the previous round). They test the theoretical prediction vs. the actual individual vote (assuming all voters follow the same behavior). For the Plurality rule, they show that the rational model provides better predictions than the simple heuristic models, and better than the baseline model of truthful voting. They also show that for more complicated voting rules (e.g., 2-round Plurality) the heuristic models provide a better prediction. A more careful examination of the results for Plurality shows that the rational model is very inaccurate at the first round of the game when only preferences are known (54% accuracy vs. 68% for truthful voting), and reaches 80% accuracy by the fourth round. That is, rational behavior is made possible when playing the same game repeatedly with the same people.
An experiment that combined features from lab and in-situ, was performed in [36]: subjects arrived in the lab and were asked to use their real preferences over candidates in the French presidential elections. Thus in both studies there was no control over the preferences of the subjects, and only partial control over the information they had.
To summarize most of the work above, it seems that the rational models (which treat all voters uniformly) provide a reasonable explanation of voters' behavior either on the aggregate level, or in settings where voters have the opportunity to adapt their behavior. When voters lack the information and/or opportunity to learn, they resort to heuristics which are not well understood. Our work is the first that aims to identify the individual behavioral strategy of subjects and how it depends on the information they have.
Finally, voting behavior has increasingly been studied in the multi agent systems community. Bitan et al. [7], focused on designing computer agents that outperform the human voters using various best-response methods. They also found that people tend to strategize and deviate from truthful reporting over time, in a very different setting of voting committees. Fairstein et al. [25] tested how well various models from the literature (including calculus of voting, local dominance, and k-pragmatist mentioned above) can predict individual voting behavior in several experimental settings, including the data from the conference version of this paper [68]. In contrast to most experiments (e.g. the one by van der Straeten et al. [71]) that aim to explain all data with a single behavior (rational or heuristic), Fairstein et al. assume that voters may apply different bounded rational behaviors. They fit the parameters of the different models on a training set, then apply the model on held out test data, and compare the predicted actions of each voter to the ground truth, as well as to a benchmark of a machine learning algorithm. In particular they show that a wide range of parameters is required to get a good prediction (supporting our finding regarding distinct types of voters), and that the heuristics that obtained the highest performance are those that account for leader-bias behavior.

The setting
We denote [x] = {1, 2, … , x} . Let M be a set of m candidates and let N be a set of n voters. A single-vote social choice correspondence is a function f ∶ M N ⟶ 2 M ⧵{�} that returns the set of winning candidates given a voting profile. A voting profile consists of a vector ∶ N → M , where a i ∈ M is the vote of voter i. The score of a candidate c ∈ M given the voting profile is defined as s (c) = |{i ∈ N ∶ a i = c}| . A score vector given voting profile contains the scores for all voters, which summarizes all the relevant information on the outcome of the vote. We use notation (c) for the score of candidate c when the voting profile is clear from context.
For the remainder of this paper, we will use the Plurality rule to choose the winning candidates W( ) with maximal score given the voting profile , that is W( ) = argmax c s (c).
Let L = L(M) be the set of all strict total orders over M. The preference ordering of voter i is a strict total order L i ∈ L over the candidates (which is known only to i). Let L i (a) ∈ [m] be the rank of candidate a ∈ M.
Voter i prefers candidate a to b, iff L i (a) < L i (b) . In this paper we focus on m = 3 candidates, therefore we refer to the most preferred, second, and least preferred candidates for i as q i , q ′ i , and q ′′ i , respectively. That is, . We also omit the subscript i when clear from context.
We say that voter i is voting truthfully in profile if a i = q i ; otherwise i is voting strategically.
The reward to voter i when candidate c wins alone is defined as r i (c) = r(L i (c)) , where r is a non-increasing function. We extend this definition for a subset of candidates C ⊆ M as the average reward obtained over all candidates C: In game theoretic terms, the utility for voter i in voting profile is u i ( ) = r i (W( )).
To illustrate our setting we present the following example in which four voters vote over a set of three candidates: Red ( ), Grey ( ) and Blue ( ). The preference profile of the four voters is as follows: Suppose that each of the voters votes for its most preferred candidate and that the rewards are defined as . The winning candidate is W( , , , ) = { } , and thus L 1 ( ) = L 2 ( ) = 1 , and L 3 ( ) = L 4 ( ) = 3 . The rewards for all voters are . Suppose voter 4 voted for rather than . In this case there are multiple winners: W( , , , ) = { , } . Consequently, the rewards are , and . In all of our settings, a human subject is presented with a poll consisting of a voting profile for all agents, and is subsequently asked to vote for one of the candidates. Voters were automatically assigned a preferred ranking over the candidates, which is private information unknown to the other voters.

Expected utility and pivotal players
In order to analyze the rationality of a vote, we need to compute how much a voter gains by voting for a candidate c. To answer this formally we adopt an expected utility framework following the "calculus of voting" literature. 1 In the small n condition, we simply calculate the expected utility by going over all possible outcomes. In the other conditions we use the simplifying assumption that 3-way ties are impossible. 2 Let W( −i ) be outcome without i's vote. In order to calculate the expected utility, we assume that a probability distribution over the voting profile of the other players is known (denoted by −i ∼ D ). The expected reward (or expected utility) of the voter when not voting at all is We say two candidates x, y are tied in voting profile if W( ) = {x, y} , and denote this event by T xy ( ) . We also denote by T −xy ( ) the event that x is missing exactly one vote to be tied with y (i.e., f ( ) = {y} and f ( ∪ {x}) = {x, y} ). We observe that whenever T xy ( −i ) occurs, voter i has the power of making x a single winner, thereby increasing her reward from r i ({x, y}) to r i (x) . Similarly, when T −xy ( −i ) occurs then i has the power of making x part of the winning set, increasing her reward from r i (y) to r i ({x, y}) . Clearly in any other profile, voting for x has no effect on the outcome. We say that i is pivotal for x against y in profile if either event occurs, denoted P xy i ( ) = T −xy ( −i ) ∪ T xy ( −i ) . The expected utility gain (EUG) for voter i by voting for x can be calculated as: Note that under "nice" distributions, Pr D (T −xy ( )) ≅ Pr D (T xy ( )). 3 Similarly, Pr D (P xy ( )) ≅ Pr D (P yx ( )) . We thus make the following simplifying assumption for theoretical analysis purposes (also taken from [48]), noting that it only applies for high values of n: Given a probability distribution D , the utility gain depends almost entirely on the probability of a tie, and Eq. (4) can be simplified under the above assumption: In our simple experiment there are only 3 candidates. For a voter i we write this event corresponds to i being pivotal for q against q ′ ). We omit when clear from the context. We similarly define T 13 i and T 23 i . (3) then When n is large then these numbers are almost identical. Finally, we say that i is pivotal if i is pivotal for some pair of candidates.

One-shot voting
The first type of voting game we studied consisted of a one-shot voting setting in which participants could vote once. A single human subject is presented with a poll, and is subsequently asked to vote for one of the candidates.

Methodology
The game was implemented online using a voting infrastructure that allows to configure the number of computer-simulated voters and the subjects' preferences over the candidates. Figure 1 shows a snapshot of the GUI of the one-shot voting game that is configured to include three candidates (red, grey, and blue) and 103 voters. The game interface is shown from the perspective of a human subject playing the game. The candidates are displayed in order of the preferences for the voter, from left (the most preferred candidate) to right (the least preferred candidate). The voting profile in the poll is visualized by showing the number of votes for each candidate (in the voting bar to the left of each candidate). In our example, the red candidate has 30 votes. The leading candidate of the poll according to the plurality rule (the grey candidate in the figure, with 38 votes) is marked by a glowing voting bar.

Poll conditions
We control both the voter's preferences and the information presented to the human in the poll. Suppose the three candidates are sorted so that according to the poll we have s(c 1 ) ≥ s(c 2 ) ≥ s(c 3 ) . We use the notation c > s c ′ to indicate that the score of candidate c in poll s is larger than the score of c ′ . We omit the subscript s when clear from the context. A game is defined by setting the values of four parameters: Fig. 1 Voting game interface for one-shot setting. The voting bar to the left of each candidate displayed the number of votes for the candidate in the poll 1. The total number of voters n, which ranged over the four values "small", "100", "1000" and "10, 000". 4 2. The ordinal alignment of candidate's scores with voters' preferences. Since there are 6 possible permutations of three candidates, this is a categorical parameter with 6 possible values called "scenarios." See Sect. 4.3 for details. 3. The gap between the number of votes for the leader and the runner-up, denoted "Gapleader" (formally, s(c 1 ) − s(c 2 )). 4. The gap between the runnerup and the least popular candidate in the poll, denoted "Gaplast" (formally, s(c 2 ) − s(c 3 )).
For n ≥ 100 , we varied the gap values from 1 vote to (almost) n/2, and clustered each of them into five discrete conditions (for n = 100 some conditions coincide). Figure 1(top) shows an example of a poll in the n = 100 condition (note that the actual number of voters is a bit higher, see Footnote 4), the scenario is q ′′ > s q > s q ′ (see Sect. 4.3), gap-leader = 4 and gap-last = 4.

Determining the outcome and payoff
The outcome of the voting process was generated by sampling each voter i.i.d using the poll scores as the distribution and then adding the vote of the subject. Thus the poll provided a noisy indication of the results of the voting.
We emphasize several design choices. First, the subjects were not informed on the accuracy of the poll or how votes are sampled (only that the poll was non-binding and that the poll results may not reflect the final score of each candidate), but could see the final scores and the true winner(s) after each game. Experiments in economics typically present the subjects with information that allows them, at least theoretically, to deduce their expected utility. However voters are unlikely to know the actual types of the other voters or the statistical methods behind polls, and even less likely to perform complicated probabilistic calculations.
Second, the actual probability that the participant would affect the outcome rapidly becomes smaller for large values of n, since the voter is pivotal only in case of a tie or neartie. Therefore for large polls (e.g. n ≥ 1000 ) the strategy of the participant had almost no effect on her actual reward, which is a common situation in wide-scale elections in the real world. In fact, exact calculations show that the action of the voter in any single game with n ≥ 100 cannot affect her expected payoff by more than (and much less under most conditions, see Figs. 4 and 17). In contrast, under the "small n" condition the action of the voter may change the payoff by up to in each game.
Data collection 603 subjects participated in total. Of which, 60 subjects (all for the small n configuration) were first-year engineering students from Ben-Gurion University who played the game in the lab. IRB approval for the study was granted by the Ethics review board of this institution. All other subjects were recruited using Amazon mechanical Turk (all from the U.S.) and played online. For subject who participated in the experiment more than once, only the first session was considered. Subjects were given a detailed tutorial of the voting game and their participation in the study was contingent on passing a comprehension quiz about the game. 5 All the collected data is available for download from www. votel ib.org.
Subjects played up to 20 instances (games) in sequence, and were free to leave at any point. The average number of games per subject was 17.2, where 400 subjects completed all 20 games. Each of the 20 games was independently sampled from a distribution over the 6 scenarios and the (up to) 25 combinations of gap values (all games with the same value of n). The sample was not uniform and scenarios we considered as more interesting were sampled with a higher frequency.
After each game we showed the subject the true outcome of the election and the winning candidate. The subject could choose to play a new game or to stop and collect her earnings on the games played. The average session time per subject (excluding tutorial) was about 2-3 min. All AMT subjects received a show-up fee of $0.4 and a bonus that depended on their total rewards in the game. The reward (utility) of each candidate for a voter in a given game was set based on her preferences, as explained in Sect. 3. The reward was set to , i.e. the maximum bonus was $2. The average payment per subject was $1.92 including the show-up fee.

Hypotheses
We collected more than 10,000 game instances in all poll conditions (see Sect. 4.1). As noted earlier, the sampling was not uniform but we had at least 5 instances from each configuration. Table 1 summarizes the number of games and participants for each value of n. We can see from the table that the average number of games per subjects was more than 17, as most subjects completed all 20 games.
Based on standard game theoretic models of voting equilibria under uncertainty, we hypothesized the following.
1. People never vote for the least-preferred option q ′′ . This is since q ′′ is a globally-dominated strategy. It may only lower the reward of the voter. 2. People vote truthfully when their most-preferred candidate q is ranked 1st or 2nd in the poll. While q ′ is not globally dominated, it is both locally-dominated [43], and has a lower expected utility than q as long as the poll is any indication of the outcome. 6 3. When q is ranked last in the poll, people will tend to compromise for q ′ . Also, people will tend to compromise more often when the expected gain from a compromise is higher.
Based on experimental findings from other studies in decision making [4,12,31,38], we also expected to see behavior that may contradict some of the previous hypotheses, as detailed below.
4. People tend to vote for the leader of the poll. 5. The number of voters n (i.e. size of the poll) has negligible effect on behavior.

Classifying games and actions
Measures of voting behavior The large number of combinations of poll conditions makes the analysis non-trivial, and thus we aimed to define simple measures for voting behavior. We focused on the following four behaviors: • TRT A truthful action. That is, voting for the most preferred candidate q.
• CMP Compromise. That is, voting for the second preference q ′ when q is ranked last.
• LB Leader bias. That is, voting for the leader of the poll that is not q.
• DOM Dominated moves. That is, there is an action that surely yields a higher expected utility (under very weak assumptions on the vote distribution). In other words, there is no rational motivation to select this action.
There are also two possible combinations, namely DOM+LB and CMP+LB, abbreviated as DLB and CLB, respectively. Thus there are six classes of "interesting" actions: A = {TRT, LB, DLB, CMP, CLB, DOM} . For each action A ∈ A , we can measure its "A-ratio," which is the fraction of instances where action A was played out of all instances  6 In order to prefer q ′ i under linear reward, a rational voter i must believe that the event T 23 i is strictly more likely than T 12 i ∪ T 13 i . This does not make sense even without computing the actual probabilities.
where it was available. These A-ratios are the main tool we apply in the paper to analyze voting behavior.

Computing A-ratios
We grouped all game instances into six scenarios, based on how candidates' scores are ordered in the poll compared to the voter's preferences. Table 2 shows this classification, where for each of the 3 actions in each of the 6 scenarios we marked which behaviors apply. That is, what kind of behavior would justify this action. Note that in some cases there are multiple possible justifications.
As the behaviors LB (by itself), CLB and CMP may only occur in a single scenario (3, 5 and 6, respectively), we name the scenario after these behaviors.
We note the following. First, for ease of presentation we ignore ties in the poll configurations. We return to this point later at Sect. 5 where there are few voters and ties are common. Second, it is natural to extend the definitions of these six action classes A to games with more than 3 candidates, where the number of scenarios is much larger.
Given a set of instances S (one-shot games) and k ∈ {1, … , 6} , S k is the subset of instances of S in scenario k. For any action class A ∈ A , K(A) ⊆ {1, … , 6} are the scenarios where action A is possible (e.g., K(DLB) = {4, 6} ). We define the A-ratio of voter i within S as follows: where the argument S is omitted when clear from context. For example, if the CMP-ratio of i is 0.2, this means i played a CMP action (voted q ′ ) in 20% of the games where this was an available action (scenario 6). As another example, if we take the group of all subjects, and find that they played q ′′ in 35% of all games in scenarios 4 and 6, then the DLB-ratio of this group is 0.35.
When the denominator of the A-ratio is smaller than 3 (less than three games where action A was available), we leave the A-ratio undefined. We define the type of a The fonts and colors code different actions: TRT in green, CMP in blue, DOM in red. LB is coded as bold, and CLB, DLB as a combination of bold and the relevant color participant as the collection of A-ratios (along all six action classes A ) over all games played by her, CMP-ratio ( S i ), TRT-ratio ( S i ), and so on, where S i is the set of games played by subject i.   Table 3 shows both the number and fraction of times each action was played for each scenario. The fonts and colors code different action classes: TRT in green, CMP in blue, DOM in red. LB is coded as bold, and CLB, DLB as a combination of bold and the relevant color (Fig. 2). The fractions shown in Table 3 are difficult to interpret, as it is not clear what, say voting q ′ in scenario 1 means. We thus rearrange the votes according to the action types we defined above. Figure 2 aggregates counts over each of the 6 available actions, showing the A-ratio of each action across the entire population. The counts for each action class correspond to the total number of times that the action was chosen (which is the sum of counts over the cells in Table 3 that match its color). For example, we can see that there are 415 DOM actions by summing all red-non-bold cells in Table 3. Similarly, there are 8377 games in which a DOM action was available, which is the sum of the total number of games in the first five lines of Table 3. These six ratios can be seen as a collective "fingerprint" of a group of subjects. Of course, we do not argue that the same ratios are indicative for the entire population, and are mainly interested on qualitative findings, and on how A-ratios are affected by the variables we control. Figure 2 shows that the DOM-ratio (0.050) is very low, which indicates that Hypothesis 1 holds at large. However, Figure 2 ignores any inter-personal differences among participants. To this end, we computed the types of all 603 unique subjects (i.e. all six A-ratios), and present the distribution of each A-ratio in the population in Fig. 3. We can see that there were 100-odd subjects with DOM-ratio ∼ 0.15 , whereas over 300 subjects never played a DOM action ( DOM-ratio = 0 ). These patterns are consistent across poll sizes. Fig. 3 is the clear bimodal distribution of the TRT-ratio. A significant fraction of the population has a TRT-ratio close to 1, whereas the rest of the population is centered (but not very concentrated) around 0.5. It thus makes sense to identify the "(almost) always truthful" voters as a separate group. We denote by N TRT all participants with TRT-ratio strictly above 0.9.  The partition to types is shown in Table 4. The type distribution for other poll size conditions was similar. In the next subsections, we focus on the majority of the voters (85%) who are classified as N other (neither N TRT nor N DOM ), and analyze their behavior in detail.

Compromise behavior
We first show that the aggregate compromise behavior (Scenario 6 in Table 2) is largely consistent with our expectation from rational players. To that end, we calculate the expected utility gain in cents from a CMP action (voting q ′ instead of q) for every possible combination of gaps in the polls. Note that the expected utility gain from compromise is almost the same in scenarios 5 and 6 (very slightly lower in Scenario 6), and is always negative in all other scenarios.
We assume for analysis purposes that Pr(T xy i ) = Pr(T yx i ) , i.e. that the probability of a voter to be pivotal for x against y is equal to that of being pivotal for y against x. For a rational voter, the decision whether to vote for q ′ in the CMP scenario depends on the likelihood of the events that i is pivotal for any pair of candidates, i.e. on T 12 i , T 13 i and T 23 i .

Compromising increases i's expected utility if and only if Pr(T 23
i ) > 2Pr(T 13 i ) + 2Pr(T 12 i ). 8 To see why, recall that by Eq. (4), , where s j is the score of candidate j at the poll. For the small n condition the estimation of Eq. (8) is inaccurate, and we thus used sampling to estimate CMP-value directly. Figure 4 (top) shows the expected gain (the CMP-value) from playing q ′ and the actual distribution of voters' decisions (CMP-ratio, bottom figure). In the figure, we can see that the rational decision in the n = 1000 condition is to compromise (roughly) when Gap-last is in one of the two largest values. The effect of Gap-leader is mainly on the absolute CMPvalue, and not on the correct decision (which depends on the sign of CMP-value). As Gapleader increases, the possible gain (or loss) from a compromise becomes negligible-especially for large n. For other poll sizes (see Figs. 15,16, and 17 in the appendix) we get similar results, except that the absolute expected gain differs. Thus for n = 10,000 the effect of a single vote on the expected utility is almost completely negligible. The CMP-value, as reflected in Fig. 4(top), provides the theoretical support to our Hypothesis 3: rational voters will be more inclined to compromise as Gap-last increases (since this means CMP-value is positive), and as Gap-leader decreases (since CMP-value becomes more significant). Figure 4 (bottom) shows both of these trends in participants' voting behavior for the CMP scenario (Scenario 6 in Table 2, which confirms Hypothesis 3. When gap-last is large and gap-leader is small (i.e., the voter's favorite candidate is trailing behind in the poll, whereas the two other alternatives are nearly tied), voters compromise with probability ∼ 0.8 . If either condition is violated, then the CMP-ratio drops significantly to 0.3-0.5.
Based on Fig. 4 we argue that voters in the CMP scenario follow a rational behavior, at least qualitatively. This can be further demonstrated by calculating the correlation between CMP-value and CMP-ratio. For n = 1000 , there is a medium correlation of 0.47. We get very similar trends for other poll sizes: The correlation of CMP-value with CMP-ratio is 0.69, 0.62 and 0.56 for small n , n = 100 and n = 10,000 , respectively.
Yet, we can also see from Fig. 4 (bottom) that participants fail to adjust their decision threshold correctly: they compromise too much even when it hurts them in expectation (when both gaps are small), and when they are not pivotal (Gap-leader is large). Notably, once we omit the rightmost columns where the voter is almost never pivotal, the correlation of CMP-value and CMP-ratio leaps from 0.47 to 0.76 in the n = 1000 condition, and similarly for the other poll size conditions. This is an indication that subjects tend to follow the rational behavior when stakes are (relatively) higher.
Another evidence that voters compromise too much shows when we analyze subjects' payoffs: in the "small n" condition, subjects who compromised frequently obtained a 4-5% (Pr(T 12 i ( −i ))(10) + Pr(T 13 ( −i ))(20)) = 10(Pr(T 12 i ) + 2Pr(T 13 i )), lower payoff than those who never compromised. In the larger n conditions the influence of the subject on the outcome (and thus on her own payoff) is negligible.

Leader bias
Studies in public policy and economics have documented "herding" effect in which voters are influenced by poll and ballot results [3,4,17]. As shown in the distribution of LB-ratio in Fig. 3, a significant number of voters are inclined to vote for the leader of the poll, at least when it is not the candidate they rank last ( q ′′ ). We highlight that such a decision cannot increase the reward of the participant in expectation (or at all, except in extremely unlikely cases). This confirms Hypothesis 4, and shows that Hypothesis 2 applies only for a subset of voters.
To understand leader bias behavior in our setting (Scenario 3 in Table 2), we focus on the gap between the leader and the runnerup. The frequency of an LB action increases monotonically with gap-leader from around 0.35 to around 0.7 (Fig. 5). We observe a similar increase (from 0.15 to 0.35) in the probability of DLB action, which is overall less frequent.

Combining compromise and leader-bias
When trying to apply the same analysis as above to CLB actions rather than LB, we get a much more noisy image. One possible explanation is that compromise behavior and leader-bias act in opposite directions, which leads to confounds. Recall that the LB-ratio of a subject was defined based on her behavior in the LB scenario only. We can refine the analysis by studying how voters with different LB-ratios behave differently in the CMP and CLB scenarios. We partitioned the voters with welldefined LB-ratio into three subclasses. Let N LB0 ⊆ N other denote all voters with LB-ratio of 0, and N LB1 ⊆ N other denote all voters with LB-ratio of 1. The remaining voters whose LBratio is defined are classified as N LBX . The number of voters in each of these subcategories can be seen in the entries outlined with dashed lines in Table 5.  Table 2) for different levels of gap last. In the figure we can see a remarkable difference between voters of different groups, where N LB1 voters compromise more than N LBX , which in turn compromise more than N LB0 voters. Within each group, we also see a slight increase in compromise as Gap-last increases. 9 In contrast, there is no clear difference between these 3 groups of voters in the CMP scenario (Fig. 6 (bottom)). We can thus conclude that: (a) the partition to LB types is robust, as leader-biased voters apply their bias consistently across different scenarios (LB and CLB); (b) the tendency to compromise (as measured in the CMP scenario) and the leader-bias (as measured in the LB scenario) have an additive effect when both apply in the CLB scenario.

Additional findings for one-shot scenario
We also did an initial analysis of two other behaviors, namely voting for dominated actions, and learning. Since these findings are secondary to our main results above, they are detailed in Appendix 1. Dominated actions can be divided into two: DLB actions, which we show to be a stronger kind of leader bias; and all other DOM actions. For the latter, we argue that they reflect a random component in the behavior of some voters. As for learning, we did not find any evidence that voters change their behavior after playing several games. This is in contrast to experiments such as in [5,29], where voters repeatedly play the same game with the same group of people.

Iterative voting
In an iterative setting [42], voters start from some initial state 0 , but are then given repeated opportunities to change their vote. In our study, a single voter may change her vote at each step according to some fixed order. The game ends either after a predetermined number of rounds, or if voters converged to an agreed outcome (see details below). It is important to note that voters' preferences do not change over the course of the game.
Formally, we denote the voting profile at step t by t , and the score vector and winner set derived from it by t = t , W t = W( t ) . Since only one voter may change her vote at each Table 6 Example of iterative voting process of 4 voters, with convergence at step 10, after two and a half rounds Step t  1  2  3  4  5  6  7  8  9 10 Vote a i r g g b r r g b r r Winner W( ) r r,g g g g r r r r r Fig. 7 Voting game interface for iterative voting setting. The voting bar to the left of each candidate displayed the number of votes for the candidate at each round, as well as the identity of the voters who voted for the candidate. For example, the red candidate is the current leader, with 3 of the votes, cast by voters p4, p1 and p5 Fig. 8 Two examples of preference profiles in iterative voting study for n = 5 . Each column is one preference order, and first row indicates the number of voters with this preference. In the profile on the left there is a Condorcet winner (blue) but it is not a Plurality winner. In the profile on the right, gray is both the Plurality winner and the Condorcet winner step, t , t+1 differ by at most one entry. A round is a sequence of n steps (one step for each voter). Convergence is defined as the case in which all voters do not change their votes in two consecutive rounds. Formally, if t−t � = t for all t � = 0, 1, … , 2n − 1. 10 For example, Table 6 shows a history of votes for the above example for steps 1 through 10, in which convergence occurred. In this example, the game converged because the vote for each voter in steps 3-6 ( , , , ) repeated in steps 7-10.

Methodology
The iterative voting experiments were performed on groups of several human voters, who are using iterative voting to select a winner or winners out of three possible candidates. Figure 7 shows a snapshot of the GUI of the one-shot voting game that is configured to include three candidates (red, grey, and blue) with 5 voters.

Game configurations
We used voter group sizes of 3, 5, and 7 voters, and for each group size designed 6 preference profiles according to the interplay between two selection criteria, the Plurality and Condorcet winners. 11 Two of the profiles we used for the n = 5 condition appear in Fig. 8. For all other profiles, see the "Appendix".
Determining the outcome and payoff The game was played according to the protocol of iterative voting described above, starting from the truthful voting profile. Subjects could not see the actual preferences of the other voters, but could see the current voting profile at each step (that is, which voter votes for which candidate). The game GUI is shown in Fig. 7 for an example configuration with 5 voters from the point of view of voter p1.
A game is terminated when the voters converge, as described in Sect. 5, or if the number of rounds reached a predetermined threshold unknown to the participants (uniformly distributed between 5 and 10). The winner (or winners, in case of a ties) was the candidate with the largest number of votes in the last round, and the reward for each voter in the game was determined separately according to the her preferences. The rewards for a single iterative game were set as ( ). Rewards per game are higher than in the one-shot games since an iterative game takes longer in average.

Data collection
Subjects were recruited via Amazon Turk from the same pool used for the one-shot experiments (Sect. 4). Subjects could play up to 6 games in a sequence, each time with a different preference profile and with a different group of subjects (matched at random). All the collected data is available for download from www.votel ib.org.

Hypotheses
Our general hypothesis was that the behavior in the iterative and the one-shot settings would be similar. In particular, we expected to find a similar partition to types and similar distributions of conditional actions, despite the different context. That is, despite the fact that in the iterative game a voter sees the actual votes of the other voters and knows it may later change.
An alternative hypothesis is that players adopt some notion of rational behavior in their play. For this purpose we will compare their behavior to the Myopic Best Response Model of Meir et al. [42]. Under this model, in each round a voter plays as if this is the last round. This means she should compromise if and only if both of these conditions apply: (1) q is ranked last in the current profile t ; and (2) the gap between the q ′ and q ′′ (which is exactly Gap-last) is either 0 or 1. These are exactly the conditions under which voting for q ′ changes the outcome in a way that increases the voter's utility.
We ruled out more complicated rational models such as subgame perfect equilibrium [21], as their assumptions are incompatible with the conditions of the experiment. In particular, our voters do not know the preferences of others and how many rounds the game will continue.
In order to test these hypotheses, we computed and analyzed A-ratios in the same way as we did for one-shot voting, except that instead of a poll we used the current voting profile t . More specifically, we counted each step by player i as a separate decision, classifying it into one of six scenarios as in Table 3 and checking the action classes from A to which it   applies. This way we get multiple data points on each subject (6 games times 2-5 rounds in each game) that allow us to measure the A-ratios.

Iterative voting findings
We report our findings for groups of 7 voters, and compare them to the small n condition in the one-shot setting. Our findings for groups of 5 and 3 voters were similar and exhibited the same patterns. Figure 9 shows the distribution over the TRT-ratio for the iterative setting (left) and the one-shot setting (right). As shown by the figure, both settings display similar bimodal behavior. A significant amount of the population is centered close to 1, while the rest of the population is centered around 0.5.
One big difference was the partition into types, see Table 7. The fraction of subjects classified as N TRT and especially N DOM was much higher in the iterative setting, compared to the one-shot games. We return to this point later on.
We can see that the action distribution in the iterative setting, shown in Fig. 10 bears striking resemblance to the one in the one-shot setting, even though this is completely different game! This confirms the hypothesis that human voters follow a myopic heuristic that is based on poll scores. Fig. 10 that there is more leader-bias in the iterative setting, but recall that in the iterative setting we are unable to control the frequency of each scenario, and in particular the gap-size. We thus need to add it as a control variable. Indeed, Fig. 11 (top) shows that once we control the gap size, the amount of leader-bias in the one-shot and Fig. 11 The effect of gap size on LB and CMP ratios for small n in iterative setting on N other iterative settings is remarkably similar. As shown in the figure, in both settings the LB ratio increases with the gap size.

Remark 2
Note that in the iterative setting there is a possible rational motivation for an LB or DLB action at high gaps that does not apply in the one-shot setting: the subject may use it to quickly finish a game where she cannot get her favorite candidate to win.

Compromise behavior
We observe very similar compromise behavior to the one-shot games, where CMP-ratio is increasing with gap-last (Fig. 11 (bottom)).

Fig. 12
A-ratios in iterative voting behavior for the N other group, split into "move" and "stay." As a baseline we also show the behavior in the one-shot scenario. On the right we can see the effect of stickiness on the DOM-ratio in the N other ∪ N DOM group The effect of gap-leader on compromise behavior is much weaker. The range of CMPratio is between 0.5 and 0.75 for any value of Gap-leader with weak negative correlation in one-shot games, and no correlation in iterative games. In contrast, MBR suggests that behavior should follow a sharp threshold: 1 when the voter is pivotal and 0 otherwise. We can thus reject MBR as a plausible description of voters' behavior. While the match with the behavior from one-shot games is not perfect, it seems like a good baseline to explain and predict the behavior in iterative games.
"Stickiness" behavior Interestingly, we saw that many voters choose to vote for the same candidate as in the last round, even if the scenario changed due to actions by other players. This behavior can be demonstrated by splitting the data into "move" and "stay" conditions. We thus recomputed the A-ratios for different actions on each of the "move" and "stay" subsets. For example, there were 64 steps in scenario 3 (the LB scenario) where the previous vote of the voter was q ′ , and in 48 of them (75%), the voter voted q ′ again. In contrast, there were 86 steps in scenario 3 where the previous vote was not q ′ , and in 29 of them (34%) the voter changed the vote to q ′ . This gives us an LB-ratio of 0.75 and 0.34 in the "stay" and "move" conditions, respectively. Figure 12 shows the three most important A-ratios (the others exhibit a similar pattern). We can see that in all of them, a voter in the iterative game has a lower tendency for compromise/leader-bias than in the one-shot game, if this requires an active change of vote. However this tendency becomes higher if it only requires to repeat the previous action. The stickiness effect persists when we control for the size of the gap (Fig. 13). A natural explanation is that voters in the iterative game exhibit a strong level of default-bias, where their last vote is used as the default action. We also checked whether the difference in tendency to move could follow from "ego depletion effect," where subject becomes less active in later rounds [62]. We ruled this explanation out since the CMP-ratio remain stable throughout the six games of each subject.
Next, we checked whether default-bias could account for the much higher rate of DOM and TRT voters in the iterative setting. Since TRT voters have a constant behavior, splitting into conditions cannot reveal much on their actions. However, we analyzed again the DOM votes of all voters who are not TRT voters (Fig. 12, on the right). We can see that the DOM-ratio in the "move" condition is essentially the same as in the one-shot setting, meaning that voters in both settings actively choose a dominated action at a similar rate, probably as random exploration (see Sect. 4.4.3). In contrast, the DOM-ratio in the "stay" condition is significantly higher, adding many more instances where voters simply kept their previous vote (that may have not been dominated in the previous round).
Effect on welfare We did not detect any statistically significant differences in the social welfare of groups as the iterative game progresses (meaning the iterative process does not lead to higher or lower social welfare). However at the individual level there is a negative correlation between CMP-ratio and the payoff for the subject, where the payoff of the most compromising subjects was 10-15% lower than those who consistently voted for their top choice.

Discussion and future work
Our results demonstrate that there are simple heuristics or patterns that provide reasonable description of voters' behaviors, even in distinct settings such as one-shot and iterative games. We provide an abstraction of the decision process of the voter that is consistent with our findings in Fig. 14: • First, a voter may decide to explore with a random vote with some (low) probability.
This probability is higher for some voters, and higher on the first game. We believe this decision comes first since it seems to be independent of other factors and since random actions demonstrate shorter response times. • Then, a voter may simply stick to her current vote (only relevant to the iterative setting), where again some voters may be more "lazy" than others. • In scenarios that pose a strategic dilemma, the voter may choose to compromise, where this decision is affected both by internal factors (tendency to compromise, as measured by the CMP-ratio), and by the poll information. We note that the effect of the numeric information provided in the poll is qualitatively correct (i.e. voters compromise more in situations where a compromise would improve their expected utility), but most voters compromise too often. • Finally, the voter may decide to vote for the leader, where again this is highly affected by personal differences (some voters have no leader-bias at all), and by the margin of the leader.
This abstraction should facilitate the generation of hypotheses and specific models regarding voters' behavior in more complex situations, e.g. with a larger number of candidates or different voting rules.

Leader-bias and bandwagon effect
The phenomenon of leader-bias is particularly surprising, since a common explanation for herding/bandwagon effect is information cascade, where decision makers are unsure about the quality of each choice, and learn from the actions of others [3]. Such an explanation is irrelevant for our results due to two reasons. 12 First, Dekel and Piccione [20] showed that in voting between two candidates (as is the case in practice in our LB scenario) information cascades do not occur in equilibrium. Second and more importantly, in our setting the qualities of candidates were known and the reward was fixed. Also, our subjects did not have an exogenous incentive to reach consensus, in contrast to other experiments as in [34]. In fact, while in our iterative game voting for the leader can be somewhat rationalized (see Remark 2), in the one-shot game this invariably decreases the voter's expected utility, and hence the "herding" moves cannot be rationalized by purely economic terms.
Two possible alternative explanations are: (1) voting for the leader is perceived by some voters as an alternative "default" option, that does not require cognitive effort (just like voting for the most preferred); (2) some voters acquire (non-monetary) utility from the satisfaction of voting for the winner (known as expressive utility [63]), and thus voting for the leader does bear a higher overall utility for them. Further experimentation is required to determine whether these explanations are valid. We note that various possible motivations for herding (or "bandwagon effect") in political voting are discussed in [12], and are more inline with the latter explanation.
The behavior of voters in two very different settings (one-shot voting with simulated noise, and iterative voting game with other human participants) was remarkably similar, where participants treated poll information (in one-shot) and current votes (in iterative) in the same way. This similarity shows the robustness of the patterns above. The main difference between the settings was the higher tendency to play truthful actions and dominated actions, where at least the latter is fully explained by "stickiness": another form of default-bias that is not available in the one-shot game.
We can conclude that our results generally support the "decision-theoretic" models of strategic voting. Indeed, it seems that for the large part, human voters follow relatively simple heuristics, that ignore and sometimes directly contradict economic, or "game theoretic" reasoning. Moreover, in the context of iterative voting these heuristics are largely myopic, as they only depend on the current state. This finding emphasizes the relevance of theoretical models of myopic strategies [24,32,43,51]. When looking for a theory to explain and predict voters' behavior, it is crucial that the model will allow for a wide range of behaviors, as specified above.

Future directions
In the future we intend to perform a deeper analysis of interpersonal differences, whether by identifying finer subgroups of voters, or some individual parameters that affect voters' behavior (such as different levels of risk-aversion or tendency for herding). We believe it is possible to predict the voting behavior of a person based on few observations, and that the accuracy of such predictions can guide us in finding better voting models.
Since in our experiment there were only three candidates, the range of available strategic decisions was very limited. Running experiments with larger sets of candidates will enable us to study what strategic actions voters prefer when there are  several plausible alternatives. Note that the number of scenarios quickly explodes as we increase the number of candidates (24 for 4 candidates, 120 for 5, and so on). However, the six behaviors that we define and measure (TRT, CMP and so on) can be extended to any number of candidates, even if the classification of some borderline scenarios may be arguable.
Our results demonstrate the critical role of the information that a voter has on her strategic decisions. As most economic and game theoretic models assume that available information includes the preferences of other voters (or a distribution thereof), we would like to check the effect of such information on the behavior, when given instead or in addition to current score information (such as polls). We conjecture that the effect in the latter case will be small, as unlike poll information, it may be difficult for people to translate preference profiles to an obvious strategic decision.
Finally, a better understanding of how people behave strategically in online voting settings can guide the design and implementation of better platforms for preference aggregation. Our experimental infrastructure can be used to test such mechanisms in a context-free environment.
Acknowledgements This work was supported in part by Israeli Science Foundation (ISF) Grant No. 773/16.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.

Fig. 18
Every point represent a single subject, except those for which there where not enough samples to determine their type. The X-axis is the LB-ratio, and the Y-axis is the CLB-ratio Fig. 19 Average A-ratios for early games (games 1-10 of each subject) versus late games (games 11-20 of each subject). Data presented is for n = 1000

Fig. 20
A graphical representation of the A-ratios of the DOM subjects, versus the A-ratios of LB1 voters with 50% probability to perform a random action One possible explanation is that these participants misunderstood the game (e.g., believed their goal is to guess the winner). We rule out this explanation since: a. all subjects completed a tutorial and a quiz which demonstrate in detail how the reward is decided; and b. although "always vote for the leader" is a very simple strategy to play, no subject followed it consistently, and only a handful of subjects had a DLB-ratio of 1.
Comparing the DLB and LB ratio of each subject, it is evident from Fig. 18 that the DLB ratio of almost all voters is same or lower than their LB ratio. We can therefore think of DLB as a stronger and less rational form of leader-bias.

Learning
We relate to the question of how people's strategic behavior changes over time. On one hand, since voters essentially compromise "too much," we expect them to eventually learn to correct their behavior by compromizing less. On the other hand, the signal after every round is very weak (except for n = 7 ), so in practice voters do not get any valuable feedback on their strategy.
We can see in Fig. 19 that there is almost no change in A-ratio, for all the 5 interesting actions we identified (TRT-ratio omitted to avoid cluttering). We get similar results for all poll size conditions, and also when controlling the value of gap-leader and gap-last. In fact, for n = 7 we get a slight increase in CMP-ratio, which is the opposite of what we would expect.
Thus at least at the population level there does not seem to be any learning process or change in behavior over time is taking place. It is possible though that at the individual level people are changing their behavior in opposite ways, and further experimentation is required to determine that (Fig. 19).

Dominated actions
It seems a bit mysterious that subject play actions that strictly decrease their payoff. One possible explanation that appeared in previous work was that voters use Quantal Response rather than best response [41]. We offer a different explanation, and argue that these actions are due to a random component in the behavior of some subjects. In other words, each  21 Average response times (in seconds) for voters classified as DOM, TRT and all others. We control the poll size, as it has a strong effect on the response time subject has a certain likelihood (which varies among subjects) to vote randomly, which is part of her type.
To check this, we partitioned subjects based on their type. We computed the A-ratios A DOM of subjects classified as DOM, and compared it to the A-ratios of other subgroups of subjects, and in particular to the most Lead-biased group whose A-ratios are A LB1 . We considered possible noisy actions by mixing the actions with a uniform distribution over all actions A U .
We can see in Fig. 20 that A DOM is almost identical to the linear combination A mix = 1 2 A LB1 + 1 2 A U , even without trying to optimize any parameters. In other words, a DOM voter behaves, on average, like a standard voter that randomized her action with probability 0.5.
Another corroboration to that viewpoint is by looking at response times. Intuitively, we would expect a voter that behaves randomly to react faster. Indeed, as we can see in Fig. 21, DOM voters are about 10-15% faster than "other" voters. The TRT voters, who do not need to think about their action at all, are the fastest.

Appendix 2: Profiles used for iterative games
For each of the n = 5 and n = 7 conditions, we constructed 6 profiles, which differ in the consensus among players regarding the best outcome. In two of the profiles there was a Condorcet winner who is same the Plurality winner; In two profiles there was no Condorcet winner; and in two profiles the Condorcet winner was different than all Plurality winners. Each table presents one profile. Each column presents a single voter type, and the top row presents the number of voters of this type in the profile. The Condorcet winner appears in bold, when exists. For the n = 3 there is only one profile with no Condorcet winner, which is completely symmetric and thus less interesting so we only used the CsP and CdP profiles.