1 Introduction

Voting methods exist at the very core of our political discourse and government, affecting electoral outcomes and incentives for voters, candidates, and parties, and thus shaping our society. In this paper, we present a modern voting method, STAR Voting, discuss its properties, and give some results from statistical models comparing its utilitarian outcomes and strategic incentives to other prominent voting methods.

All voting methods consist of two parts: a ballot, and a process for the tabulation of those ballots. STAR Voting is intended to optimize both of these components. In this paper, we lay out the rationale for the five star ballot as a voter interface, and the Score Then Automatic Runoff (STAR) algorithm for measuring the strength and breadth of support for each candidate in order to find majority preferred winners who represent the will of the people.

Fig. 1
figure 1

A STAR Voting ballot

2 Presenting STAR Voting

STAR Voting is an alternative voting method in which voters score candidates from zero up to five stars, as shown in Fig. 1. The name STAR is both a reference to the five star ballot itself, and an acronym for Score Then Automatic Runoff, which describes the two-round tallying process:

  • Scoring Round: All scores for each candidate are totaled and the two highest scoring candidates advance.

  • Automatic Runoff: In the runoff, each ballot is counted as one vote for the finalist who was scored higher on that ballot. The finalist who was preferred by more voters wins.

STAR Voting was invented in 2014 with the objective of better delivering on the underlying goals of voting reform advocates, while addressing serious issues with Plurality Voting and limitations with leading reform proposals like Top Two Runoff, Score Voting, and Instant Runoff Voting (IRV).

The five star ballot allows voters to express not only preference order, but also equal preference and preference strength. As a result, the star ballot not only conveys more information than a choose-one ballot, it also conveys more than a ranked ballot. On a strict ranked ballot (where equal preferences are not allowed) a voter’s second choice could be as good as the voter’s favorite or almost as bad as their last choice. In contrast, even in STAR elections with a large field,Footnote 1 there’s no need to limit the number of candidates, limit the available levels of voter expression, prohibit equal rankings, or present voters with an unwieldy ballot. Eliminating these rules reduces the probability of invalid (“spoiled”) ballots, which studies suggest may be increased under Instant Runoff Voting (Neely & McDaniel, 2015).Footnote 2

Before the 0–5 scale was selected for STAR Voting, early simulations on Voter Satisfaction Efficiency (Quinn, 2017) were used to confirm the hypothesis that this level of granularity does in fact lead to significant gains in terms of more representative outcomes compared to less expressive scales. The simulations also confirmed that increasing the ballot scale further yielded diminishing returns. Even with large fields of candidates, the 0–5 star scale offers a high degree of resolution and granularity for voter choice, without exceeding the upward bounds of cognitive load.Footnote 3

The five star rating, well-known from online surveys, is very similar to the five star ballot and offers a familiar interface that has become a leading option for collecting detailed public opinion data. When tabulating five star ballots, both scores and runoff votes are totalled using addition, which allows STAR Voting to be run on existing voting machines in most cases. Tabulation by addition also means that STAR Voting ballots are summable and don’t require centralized tabulation, unlike Instant Runoff Voting (Maine Supreme Judicial Court, 2017). STAR results can be totaled at the local level with accurate reporting of preliminary results as ballots come in. This makes the five star ballot itself a compelling option for increasing voter voice in elections.

The second half of the STAR Voting method is the ‘automatic’ Top Two runoff. In the runoff, whether or not your favorite can win, your one full vote goes to the finalist you prefer. Ballots that scored both finalists equally count as a vote of ‘no preference’ in the runoff. The STAR Voting runoff identifies majority-preferred winners whenever possible, while voters who oppose both finalists can still, at the least, help to prevent their worst-case scenario. In STAR Voting, the candidate with the most votes wins (as in Plurality), as required by election laws in states and jurisdictions.Footnote 4

While this paper focuses on single-winner voting methods, good multi-winner voting methods are also important for comprehensive electoral reform. There are two ways of using STAR Voting and the five star ballot for multi-winner elections. One option is to repeat the single-winner Score Then Automatic Runoff process until all positions have been filled. The other option is to use Proportional STAR Voting (STAR-PR), which employs the same five star ballot as single-winner STAR but tallies those ballots using a proportional algorithm to ensure that factions or parties that have a quota worth of support can win a proportional number of seats. Evaluating and choosing between these options requires tools that are beyond the scope of this paper.

3 Pass/fail criteria, vote-splitting, the ability to vote your conscience, and inequality in the vote

Comparisons of voting methods have historically largely focused on evaluating methods according to various pass/fail criteria, even though many desirable electoral goals are mutually exclusive or inversely correlated (Arrow, 1950). Despite the obvious limitations of this lens, many advocates frame the conversation around the criteria which their preferred voting methods pass—to the exclusion of other criteria and common sense considerations like representative accuracy. Predictably, this black and white approach has proven to be more divisive than constructive.

Two criteria which are both highly regarded but are inversely correlated are Favorite Betrayal and Later No Harm. The Favorite Betrayal (FB) criterion is about preventing the need for the all-too-familiar strategy where rather than throwing away their vote on a candidate they know can’t win, a voter will vote for the candidate on their side who seems the most electable. Voting methods which pass Favorite Betrayal solve this problem, requiring that a method will never incentivize giving one’s favorite any less than maximum support.Footnote 5

The Later No Harm (LNH) criterion is effectively the opposite of Favorite Betrayal. Saying nothing about whether it’s safe to vote for one’s favorite, LNH specifies that supporting other candidates in addition to a voter’s favorite cannot hurt their first choice. This allows candidates to encourage supporters to rank others at no risk to themselves.Footnote 6

Both criteria are clearly desirable, but no deterministic voting method proposed to date has been able to satisfy both for elections with more than two candidates. We posit that rather than passing one but then failing the other criterion badly, voting methods should instead seek to maximize both. We believe that violations of LNH and FB, and their impacts on strategic voting incentives, should be evaluated statistically rather than with an axiomatic approach alone.

As we will show, STAR Voting incentivizes both honest and expressive voting by counting all ballot data given. The scoring round incentivizes voters to give their favorite(s) five stars. The runoff incentives voters to also give intermediate scores because showing honest preference order ensures their full vote will go to the finalist they prefer in the automatic runoff. Compare this with Instant Runoff Voting, which can actually incentivize Favorite Betrayal because, in order to pass Later No Harm, it ignores down-ballot voter preferences which could have been relevant.

Despite the widespread claim that IRV eliminates the Spoiler Effect, Emily Dempsey (2018) demonstrates that in order to pass Later No Harm, a voting method must by definition fail to eliminate the Spoiler Effect and vote-splitting. For these reasons, we believe that the adherence to Later No Harm as a desirable pass/fail criterion is problematic.

In IRV, some voters whose favorites are eliminated will have their next choice counted, but voters whose favorites are eliminated in the final round will not. This biases elections against voters who prefer strong underdog candidates with broad support. Counting the full ballot for some voters while ignoring relevant ballot data for others (as Later No Harm requires) gives voters a false sense of agency, may erode trust in the system and in voting reform in general, and is out of keeping with the spirit of one person, one vote.

Elections spoiled due to vote-splitting not only fail to elect the right winner, they also bias outcomes in predictable ways. “The Spoiler Effect occurs when a third candidate entering a race splits votes with a similar candidate who would otherwise win, thus causing a candidate less-preferred by the electorate to win instead.” Dempsey (2018)

The center-squeeze Spoiler Effect in particular is pervasive in Plurality but is exhibited by IRV as well. When it happens, it fuels polarization and entrenches two party domination by preventing candidates in the middle of the field from winning. The center-squeeze effect was clearly demonstrated in simulated elections visualized by Ka-Ping Yee (2005) and was analyzed and discussed by Warren Smith and later Mark Frohnmayer (2017). Yee Diagrams such as those shown in Fig. 2 demonstrate that STAR, Score + Top Two Runoff, and Condorcet methods are more accurate than other alternative voting methods and do not exhibit exaggerated center-squeeze or center-expansion biases, two common pathologies which can result in unrepresentative outcomes.

Fig. 2
figure 2

Yee diagrams

Many systemic problems in electoral politics in general can be traced back to vote-splitting and the Spoiler Effect under Plurality Voting. Fear of wasting one’s vote leads directly to “lesser-evil” voting where voters are unable to safely support their favorite. This, in turn, is the reason for the “electability” bias, and it’s likely a fundamental reason why political leadership in the United States remains starkly out of sync with the makeup of the voting population.

In research on the demographics of elected officials, (Kauzlarich, 2019) white men held 62% of U.S. elected offices in 2019, despite comprising only 30 percent of the population. In order to achieve gender parity and racial equity in politics—or to overcome two-party domination—we need a level playing field where candidates from underrepresented communities can fairly compete, but the reality is that voting methods biased towards those who are deemed most electable (Abramowitz, 1989) are likely to maintain serious disparities in representation, regardless of public opinion. Fear of vote-splitting appears to be a powerful driver of voter behavior; if so, this would lead to a significant additional advantage for the candidates deemed most electable. In practice, those who benefit from the “electability bias” are usually those who raised the most money, (OpenSecrets, 2020) candidates with name recognition, and incumbents (OpenSecrets, 2019) who tend to be wealthy, older, white, and male.

We posit that by passing the Equality Criterion, vote-splitting caused by the voting method itself can be eliminated. The Equality Criterion states that for any given vote, there is a possible opposite vote, such that if both were cast, it would not change the outcome of an election.Footnote 7 The Equality Criterion ensures that if one party had the support of 51% of the voters and ran multiple candidates, and another party had the support of 49% of the electorate and ran only one candidate, the majority faction would always have some way to give all of their candidates full support and thus guarantee a win, even if the front-runners were unknown.

In 1964, Wesberry v. Sanders, (Black, 1964) The U.S. Supreme Court declared that equality of voting—one person, one vote—means that “the weight and worth of the citizens’ votes as nearly as is practicable must be the same.” Passing the Equality Criterion ensures that it’s possible for voters who disagree to cast equally weighted and opposite votes, no matter how many candidates are on their side. Approval, Score, Smith/Minimax, and STAR Voting all pass this basic and ’practicable’ criteria; Plurality and Instant Runoff Voting do not.

4 A non-binary statistical approach

In the axiomatic approach to evaluating voting methods, one begins by clearly defining an outcome that is arguably pathological, and then proceeds to check if that pathology is possible. In the statistical approach, the question is not just “can this pathology ever happen?” but “how problematic is this pathology on average?” Thus, we identify the context where the problem occurs, measure its frequency, and then quantify the severity of its impact.

In principle, one could define a context using only real historical outcomes, but in practice that would mean sample sizes would be unacceptably small, and for newly adopted voting methods there would be no samples at all. Historical results almost never include the data required to assess what role, if any, strategic voting played in an election. In practice, measuring the frequency of various pathologies adequately means defining a generative probability model and using it to run simulated elections.

We simulate elections for a given voting method using the following steps, which each may involve some randomness:

  1. 1.

    Populate election: Generate voters and candidates with the inherent characteristics that will be needed for future steps.

  2. 2.

    Assign utilities: Decide how each voter feels about each candidate.

  3. 3.

    Polling: Conduct some kind of “pre-election polls” of the electorate.

  4. 4.

    Honest-Naive Voting: Cast ballots for each voter using their inherent feelings about the candidates. Determine the winner.

  5. 5.

    Viability-Aware Voting: Cast ballots for each voter using their inherent feelings about the candidates, available “polling” information about candidate viability, and their understanding of the voting method. Determine the winner.

  6. 6.

    Targeted Strategic Voting: For each targeted strategy tested, identify which voters are likely to employ the strategy in question, which ballots to change, and how to change those ballots. Determine the new winner.

Creating a model as outlined above requires balancing realism and simplicity. Models should have a relatively high likelihood of reproducing historical elections, and counterfactual examples should be plausible. At the same time, the model should be relatively easy to describe and reproduce, while avoiding unnecessary complexities which researchers could use to bias results.

Although this paper focuses on STAR Voting, we have done our best to avoid creating a model biased in favor of STAR. Jameson Quinn’s early work on the models presented here, first made public in Quinn (2017), preceded (and was the basis for) serious STAR activism. Quinn had hoped and expected to instead find support for Bucklin-style voting methods such as Majority Judgment (Balinski & Laraki, 2010).

4.1 Voter model

Populating the election and assigning utilities to the voters and candidates is done according to a voter model. In order to select our voter model we first considered the “impartial culture.” (See: Black (1958) for introduction of the idea; Klahr (1966) for its extension to weak orderings; Fishburn and Gehrlein (1980); and Smith (2000) for generative versions and utilitarian interpretations.) We then considered “normally-distributed spatial models” (see: Downs (1957) for the introduction of the idea as a descriptive model; Smith (2000), Tideman and Plassmann (2008), and Green-Armytage et al. (2016) for simulations using the model).

Tsetlin et al. (2003) points out that impartial culture has too many Condorcet cycles. Tideman (2020) agrees and argues that for this and other reasons spatial models are better, but normal spatial models go too far in the other direction, with too few Condorcet cycles.Footnote 8 For these reasons we use a clustered spatial model. Because this is non-parametric (that is, because it can include an unbounded number of clusters), it can reproduce real-world election scenarios—including Condorcet cycles—as precisely as desired.

In the clustered spatial model (as in the spatial model), voters and candidates are characterized by their ideal points in a vector space; in the clustered model they are distributed in that space using a common hierarchical Dirichlet structure of Gaussian clusters, similar to a CrossCat model (Mansinghka et al., 2016). Step-by-step we:

  1. 1.

    Assign relative weights to issue dimensions using a stick-breaking Dirichlet process, adding dimensions until the remaining weight falls below a given threshold. This allows differing distributions of issue weights, while ensuring that, on average, issue weight decays exponentially and thus only a finite number of dimensions need to be modeled.

  2. 2.

    Cluster the dimensions themselves into “views” using a Chinese Restaurant Dirichlet process.

  3. 3.

    Within each view, independently cluster the voters using a Chinese Restaurant Dirichlet process. For each voter cluster, assign a mean and variance for each dimension, and draw the voters’ ideal points as normally-distributed using their cluster mean and variance in each view.

This results in a model with good exchangeability properties that make it relatively easy to sample from and analyze. We believe that a nonparametric model of this form will show good realism, and in particular that specific kinds of potentially-pathological scenarios such as Condorcet cycles will occur with a realistic frequency – neither artificially often as in impartial culture, nor artificially rarely as in normally-distributed spatial models.

4.2 Polling model

Next we define the polling and media models used. “An initial “poll” is performed using Approval Voting, where the approval threshold for each voter is set at 70% of the way between an average utility candidate (0%) and the voter’s favorite (100%). We use Approval Voting for the poll because it gives the voters a realistic amount of information about the relative strength of the candidates. We simulate real-world polling error with Gaussian noise (standard deviation of 5%).

4.3 Initial strategy models

We consider both a zero-information “honest/naive” strategy and a “viability-aware” strategy using the Approval polling discussed above.Footnote 9

For Approval methods, the approval threshold for the naive strategy is set at 40% of the way up from the average candidate to the voter’s favorite. For STAR, the “naive” strategy is to vote per the instructions by normalizing one’s utilities to the [0–5] star scale and voting accordingly.


“Viability-Aware” strategies are as follows:

  • Plurality: For a given voter, determine the expected utility of the election (EV), then, given a candidate with utility \(u_i\) and estimated probability of winning \(p_i\), score candidates equal to \(p_i(u_i - EV)\). Vote for the candidate with the highest score.

  • Plurality Top Two: Same as Plurality except that voters who do not prefer the polling leader replace the estimated probability of winning with an estimate of the probability of being in a two-way tie for second.

  • Approval: Determine the expected utility of the election, and vote for every candidate with utility greater than that.

  • Approval Top Two: Same as Approval, but replace the estimated probability of winning with the estimated probability of being in a two-way tie for second place.

  • Smith/Minimax: Use the same candidate scores determined for Plurality and rank the candidates in decreasing order by score. This can result in both the burial of disliked frontrunners and favorite betrayal akin to what is done in plurality.

  • IRV: Same as Smith/Minimax, but rank the candidates with negative scores honestly. This can result in favorite betrayal but not burial.

  • STAR: Balance strategic exaggeration (using 5’s and 0’s instead of 3’s and 2’s to maximize influence in the scoring round) with the competing incentive of not wanting to give multiple viable candidates the same score. (This yields more 5’s and 0’s than the naive strategy and is mostly unaffected by the presence of non-viable candidates in the race.)

Note that the above viability-aware strategies exhibit honest preferences for Approval, Approval Top Two and STAR, but may exhibit dishonest preference order for Plurality, Plurality Top Two, IRV, and Smith/Minimax. This is due to the fact that there's no way to change an ordinal ballot that wouldn’t also change the preference order, whereas in cardinal methods there is more flexibility.

5 Representative outcomes: Voter Satisfaction Efficiency

An ideal winner represents as many voters as possible as well as possible. One way to measure this is with utility. Voter Satisfaction Efficiency (VSE) is a linear measure of a voting method’s utilitarian outcomes. In VSE a voting method that always elected this ideal candidate would score 100%, while one that elected a candidate completely at random would score 0%.

Smith/Minimax and STAR Voting delivered the highest levels of Voter Satisfaction of all methods tested (see Fig. 3). Viability-aware voting yielded higher VSE under all voting methods where it was strategically incentivized (see Fig. 4). Approval-based methods also performed notably well under best-case-scenario voter behavior.Footnote 10

Fig. 3
figure 3

Voter Satisfaction Efficiency of selected voting methods. Model for Fig. 3 is run using 10,000 electorates each with 6 candidates and 5001 voters. Model for Figs. 4 and 5 are run using 10,000 electorates each with 6 candidates and 101 voters. Insofar as voters are naturally clustered, the simulated voters stand in for a greater number of actual individuals

6 Honest and strategic incentives

A voting method’s outcomes can only be as good as the ballot data itself, and ballot data is likely to only be as good as the voter behavior. In order to evaluate the likelihood of specific voter behaviors and their impacts on the quality of electoral outcomes, we define a statistical metric called Pivotal Voter Strategic Incentive (PVSI). This measures strategic incentives by taking a faction who wants to employ a specific strategy and changing their votes, starting with the person who stands to gain the most and continuing down the list, until the outcome changes with one pivotal voter. We then evaluate the change in outcome and resulting strategic incentives from this voter’s perspective.Footnote 11 When a strategy works as intended its incentive will be greater if it takes relatively low coordination, it will average near zero if the strategy rarely has any effect, and it will be negative if the risk of the strategy backfiring outweighs the potential reward of success.

We begin by using Pivotal Voter Strategic Incentive to look at how much this voter benefits if a random factionFootnote 12 that includes them takes candidate viability into consideration (Viability-Aware).

Fig. 4
figure 4

A “viability-aware” strategy is incentivized over honest/naive behavior for most methods tested but the size of these incentives varies considerably. Honest voting outperformed our viability-aware strategy for Smith/Minimax and IRV

We then consider the impacts of strategy types such as “Favorite Betrayal” (rating the “allied” frontrunner above your true favorite); “Burial” (rating the “enemy” frontrunner below apparently non-viable candidates you actually like less); “Bullet Voting\(^*\)” (voting for only your favorite); “Inclusive” and “Exclusive” strategies in Approval Voting (lowering or raising your approval threshold); “Polarized Inclusive” and “Polarized Exclusive” in STAR Voting (scoring all candidates 5 or 0); and “Honest Inflation” and “Honest Deflation” in STAR Voting (exaggerating candidates scores to 5, 4, 1, or 0 in order to boost an allied frontrunner or block an enemy frontrunner).

Fig. 5
figure 5

Incentives for various targeted strategies compared to the most-incentivized behavior from Fig. 4 (honest/naive or viability-aware). \(^*\)Bullet voting, unlike other strategies pictured, is employed by a random faction. This is because any faction could realistically bullet vote, so targeting would have required potentially unrealistic assumptions

For targeted strategies (all those in Fig. 5 except for Bullet Voting), we identify the faction who stands to benefit most from strategic voting.Footnote 13

In Fig. 5, we can see that in STAR Voting, the dishonest strategies (Favorite Betrayal, Burial, and Bullet Voting) are all strongly disincentivized. Weakly incentivized honest/semi-honest strategies include Polarized Inclusive, in which voters might give all the candidates on their side 5 stars, and Honest Inflation, in which a voter might give their top candidates 5 stars, their compromise candidates 4, an opposing frontrunner 1, and their last choice 0.

In Instant Runoff Voting, the most-incentivized strategy is Favorite Betrayal, commonly known as “voting for the lesser of two evils”, at 3%. The positive incentive here indicates that ranking your favorite 1st can backfire. This is even more true in Plurality, where Favorite Betrayal is incentivized at 14%. (PVSI would be 100% if a tactical vote was guaranteed to change the winner from an average candidate to the pivotal voter’s favorite.)

In Approval, the most incentivized strategy, topping out at 10%, is to set one’s approval threshold between the two most popular candidates. With Approval Top Two, this goes down to 3%. For STAR there is a weak incentive, at 2%, to exaggerate one’s scores up while maintaining an honest preference order. In STAR, Approval Top Two, and Smith/Minimax, strategies that give less than full support to a voter’s favorite are all disincentivized. Smith/Minimax disincentivized strategic voting across the board.

We can clearly see that Plurality Voting, with its strong and transparent strategic incentives, pressures voters to support “electable” candidates, further entrenching barriers to entry and existing inequities in political representation. While voters trying a new method for the first time may need to unlearn strategic habits, the Pivotal Voter Strategic Incentive voter models illustrate why we believe that over time voter behavior would become more honest under STAR and other top-tier voting methods.

7 Conclusion

STAR Voting’s expressive and user-friendly ballot, simple and transparent tabulation, positive incentives to vote one’s conscience, and its accurate outcomes make it an actionable solution to level the playing field and elect representative winners in single-winner elections.