1 Introduction

Just before and during the French revolution, three members of the French Academy of Sciences, Borda (1784), Condorcet (1785), and Danou (1803), initiated the mathematical study of voting mechanisms, known today as social choice theory. Each of them proposed a voting method. All three of the methods they proposed allow voters to better express their preferences by ranking all the candidates from best to worst, as opposed to the usual voting ballot, which restricts voters to choosing only one candidate, without expressing their opinions about the others.

Borda (1784) proposed that each candidate receive one point for each candidate ranked below him on any ballot. Condorcet (1785) proposed electing the candidate, when there is one, who beats all others in paired comparisons. His proposal for the case of no such candidate is not clear. Danou (1803) after trying Borda’s method in the French Academy of Sciences, rejected it because of its manipulability. He proposed electing the Condorcet winner if one exists; otherwise eliminate Condorcet losers recursively and elect the plurality winner among those who survived.

All of these methods aim to improve on plurality by allowing voters to better express themselves. Unfortunately, they are all subject to a disqualifying paradox. In his famous impossibility theorem, Arrow (1951/63) proved that with any unanimousFootnote 1 method that treats voters equally and uses a ballot based on rankings, removing a loser can change the winner.Footnote 2

Kurrild-Klitgaard (1999) claims that Condorcet’s paradox occurred in a Danish election. Arrow’s paradox occurs more frequently. It almost certainly occurred in the 2000 US presidential election. Had Nader not been a candidate in Florida, most of his votes, according to polls, would have gone to Gore, who would have won Florida’s 25 electoral votes and so the US presidency. Arrow’s paradox also occurred in the French presidential election of 2002. There were 16 candidates, including J. Chirac (the outgoing rightist president), L. Jospin (the outgoing leftist prime minister), and J.-M. Le Pen (the extreme right leader). Polls predicted Jospin over Chirac in the second round. But in the first round, Chirac received 19.88% of the votes, Le Pen placed second with 16.86%, and Jospin was eliminated with 16.18%. If any one of the eleven minor leftist candidates had been absent, Jospin would probably have won. A recent econometric study of French parliamentary and local elections between 1978 and 2012 by Pons and Tricaud (2018) estimates that in 19.2% of races, the presence of a third minor candidate changed in the winner. They conclude that “a large fraction of voters are ‘expressive’ and prefer to express themselves by voting for their favorite candidate at the cost of causing the defeat of their second-best choice.”

These theoretical paradoxes occur in practice and put democracy in danger.Footnote 3 Voting becomes a strategic game, increasing frustration and abstention. For example, although a large majority of French voters reject extreme candidates, a score of 20% in the 1st round is often sufficient to qualify for the 2nd round. Thus, French voters are forced to vote strategically, inducing the historical left and right parties to organize open primaries to choose their presidential candidates in 2017. However, the primaries selected their most extreme candidates, opening the door to the election of an outsider never elected to office before: Emmanuel Macron. His legitimacy has been strongly contested since his election, in particular by the yellow vest movement.

My late colleague Michel Balinski and I () learned from studying figure skating, oenologists, sportsmen, pianists and others that the fundamental question of whom to select can be posed differently. Instead of aggregating individual rankings, one can specify a common language to measure merit (such as Excellent, Very Good, Good Acceptable, Poor or Inadequate), and voters can assign grades to candidates in that language. This change helps overcome Arrow’s and Condorcet’s paradoxes and allows us to address another important question: What mechanisms are the most robust against strategic manipulations? The method that emerges for electing and ranking is simple and practical. We named it Majority Judgment.

This article first explains how a majority judgment (MJ) ranking is computed, then why MJ is a good reform: It lets voters express themselves better, it avoids the Arrow, Condorcet and domination paradoxes, it best resists strategic manipulations, it is not biased in favor or disfavor of moderate candidates, and, importantly, it works well in practice and arouses great enthusiasm.

2 Calculating a majority-judgment ranking in a large electorate

Under majority judgement, each voter assigns each candidate a grade on a scale like the one above. To determine the winner, the grades for each candidate are arrayed from best to worst. A candidate’s majority grade is the highest grade that is approved by an absolute majority. When majority grades for two candidates differ, the candidate with the higher grade is ranked ahead of the other. When two candidates have the same majority grade, four sets of voters disagree with the majority grades: For each candidate, one set favor higher grades and another favor lower grades. The ranking of the pair of tied candidates is determined by which of these four sets of disagreeing voters is largest. This can be described as finding the candidate and the direction for which one runs out of tied voters soonest when moving away from the median. If that nearest disagreeing set favors a higher grade for the candidate, then the corresponding candidate leads the other; if that set favors a lower grade, then the corresponding candidate trails the other.

To compare MJ with other voting methods, OpinionWay conducted a national presidential poll April 12-16, 2012, just before the first-round of the April 22 election. The full merit profile is given in Table 1. The sizes of the sets of agreeing and disagreeing voters are given in Table 1

Table 1 Merit profile, 737 ballots, 2012 French presidential poll (Balinski & Laraki, 2014a)

1b, together with the plurality ranking, to show the differences between the two.

In Table 2, “+” on a grade means that more voters prefer a higher grade than a lower one, while “−”means the opposite. The candidates are ranked first by grade, then plusses before minuses within each grade. The plusses within a grade are ranked by their “above” scores, highest score first, while the minuses within a grade are ranked by their “below” scores, lowest score first.

Table 2 MJ and plurality rankings, 2012 French poll (Balinski & Laraki, 2014a)

3 Why MJ is a good election reform

3.1 MJ permits voters to express themselves fully

When there are two candidates, most voting rules are equivalent to plurality: Voters can choose one of the two candidates or abstain, and the candidate with more votes wins. This is too

restrictive, as the following examples show. In the second round of the French presidential election of 2017, voters were asked to choose between Emmanuel Macron, a centrist, and Marine Le Pen, an extreme rightist. Compared to the first round, there were 5.5 million fewer valid votes. Why? Many voters refused support either candidate. Still, they may have wished to express the difference in their feelings about these two radically different candidates. This is not isolated to France. In the 2016 US presidential election, many voters disliked both Donald Trump and Hilary Clinton. Their best response was to vote for a minor candidate, abstain, or protest against the system by voting for Trump. With MJ, voters who dislike both candidates can judge one Poor and the other Inadequate. This is not possible with most other voting rules. Such systems ignore millions of opinions and count them as invalid votes.

When there are many candidates, grading them is much simpler and natural than ranking them, as several experiments show (Balinski & Laraki [hereafter B&L], 2011a, 2020). A rough calculation suggests that ranking n candidates takes time proportional to n(n + 1)/2, versus n for grading them. Practice suggests that ranking is difficult. In Australia, a voter must rank all candidates for her ballot to be valid. Since races often have many candidates, parties provide “how to vote cards,” and relatively few voters provide thoughtful rankings. In Australian Senatorial elections there may be as many as sixty candidates; the preponderance of voters (some 90%) simply tick one box that indicates a party’s list (Australian Electoral Commission, 2011; AustralianPolitics.com, 19952021).

3.2 MJ avoids the domination paradox

In addition to restricting the expression of voters’ opinions, plurality and all ranking methods allow the domination paradox. Consider the following merit profile with two candidates:

 

Excellent

Acceptable

Inadequate

Candidate A

40%

36%

24%

Candidate B

36%

34%

30%

A has more grades Excellent than B, more Acceptable, and fewer Inadequate. Any good voting method must elect A. Suppose that the underlying opinion profile is the following, which is compatible with the merit profile above:

 

10%

30%

36%

24%

A

Excellent

Excellent

Acceptable

Inadequate

B

Acceptable

Inadequate

Excellent

Acceptable

All 60% of the voters in the two last columns prefer B to A, so if plurality or any ranking-based method is used, the winner is B instead of the clearly better candidate A.

R. A. Dahl challenged the use of plurality (or any ranking-based method) for two candidates, on grounds of the domination paradox. He wrote, “By making ‘most preferred’ equivalent to ‘preferred by most’ we deliberately bypassed a crucial problem: What if the minority prefers its candidate much more passionately than the majority prefers a contrary candidate? Does the majority principle still make sense?” concluding, “If there is any case that might be considered the modern analogue to Madison’s implicit concept of tyranny, I suppose it is this one” (Dahl, 1956, p. 99). To deal with this issue, Dahl (1956, p.101) proposed using an ordinal “intensity scale” obtained “simply by reference to some observable response, such as a statement of one’s feelings ...” and argued that it is meaningful to do so. This is precisely the innovation behind MJ.

3.3 MJ avoids the condorcet and arrow paradoxes

MJ avoids Arrow’s paradox because using a scale of absolute grades implies that a removal from or addition to the list of candidates or competitors does not change the voters’ evaluations of the others of the list. The MJ-ranking is transitive, so MJ avoids the Condorcet paradox as well.

The Condorcet paradox is not often observed because few voting methods ask voters to rank candidates, and when voters do rank them, the details needed to check for the paradox are generally unavailable. Some evidence has recently emerged. Song (this volume) found no Condorcet paradox in 115 American elections by Instant Runoff Voting and just one paradox in 1,022 Politics barometer surveys. So, while this suggests that the Condorcet paradox seems to be rare in practice, MJ still has the virtue of never needing to cut through this paradox to produce a single winner and in any case, as the domination paradox shows, the Condorcet (e.g. the majority rule / plurality) winner is not necessarily the best candidate.

Under plurality, the Arrow paradox occurs frequently. Its occurrence under plurality can only be inferred, but its impact is enormous. Recall the election of George W. Bush instead of Al Gore in 2000 due to the candidacy of Ralph Nader; the election of Jacques Chirac in 2002 instead of Lionel Jospin due to the presence of two other socialist candidates; the election of Nicolas Sarkozy instead of François Bayrou (eliminated in the first round) in 2007; the election of Bill Clinton in 1992 due to the candidacy of H. Ross. Perot; the election of Woodrow Wilson in 1912 instead of Teddy Roosevelt or William. H. Taft.

3.4 MJ resists to strategic manipulations

In sport competitions, to decrease the impact of strategic manipulations, scores are often computed by using a “trimmed average,” that is, by eliminating the k highest and k lowest grades, for some k < n/2, before summing the remaining scores. But this results in many ties, which is no more acceptable in sports than in elections. The MJ ranking procedure can be described as the optimal extension of the trimmed average concept, if one wants to (a) eliminate as many grades as possible from the top and the bottom while still reaching a result using the dominance principle and (b) producing a tie only if the candidates have precisely the same sets of grades.

3.5 MJ is not biased for or against centrist candidates

MJ and plurality are different in Table 2: the centrist candidate, François Bayrou, is only 5th by plurality (with the score of 9.1%) while he is placed second by MJ, with the majority grade “Good.” The extreme rightist candidate Marine Le Pen is ranked 3rd by plurality (with 17.9%) but she is only placed the 8th by MJ, with the majority grade “Poor.” This happens generally: MJ increases the chances of centrist candidates and diminishes the chances of the extremes.

In discussions with commentators and ordinary people, most reject election mechanisms that systematically elect the centrist candidate (such as Bayrou in 2007 or Biden in 2020) and eliminate polarizing but majoritarian candidates (such as Sarkozy in 2007 or Trump in 2020). As the popularizer of science, William Poundstone (2008, p. 211), wrote, “We want a system that doesn’t automatically exclude [moderate] candidates from winning. We also want a system that doesn’t make it easy for any goof who calls himself a moderate to win”. Hence, while its good that that a voting method gives more chance to the centrist candidates compared to plurality (which is what MJ does), to be acceptable, the method should not give all the chances to the centrists but be open to diversity in elected candidates.

Empirical analyses (Balinski and Laraki 2011a, chapters 6 & 19, 2011b, 2014a) compare several voting methods and show that MJ is the least biased for or against centrist candidates. The most biased against centrists is plurality, the most in favor of centrists are Borda and point-summing (also called range voting) methods. Condorcet-consistent methods are less in favor of centrists than Borda’s method but more in favor of centrists than MJ. The empirics suggest that MJ passes the Poundstone test, but Borda, Condorcet, plurality, AV, Range and Rank voting methods do not.

3.6 Answers to some critics

Critics of MJ have focused primarily on two points: Condorcet-consistency and the “no-show paradox” (Bogomolny, 2011; Brams, 2011; Edelman, 2012; Felsenthal & Machover., 2008; Fleurbaey, 2014). However, most of the widely advocated voting methods are not Condorcet consistent. This applies to the judgment version of approval voting (Balinski and Laraki, 2011a, chapter 18), point-summing, including range voting (Balinski and Laraki, 2011a, chapter 17), Borda, plurality, and instant runoff voting. MJ, however, is based solidly on the idea of majority—not on the majority’s preference between pairs of candidates, but rather on the majority’s evaluation of each candidate’s merit.Footnote 4

What are the theoretical arguments in favor of Majority Rule/the Condorcet winner? (a) May’s (1952) axioms and (b) the Condorcet Jury theorem. In response to (a) Balinski and I (2020b) prove that MJ not only satisfies May’s axioms, but also avoids the Condorcet, Arrow and domination paradoxes. In fact, it is the only method with those properties that is Condorcet consistent in the domain where majority rule (and plurality with two candidates) do not admit the domination paradox. In response to (b) a recent article by philosopher of science Michael Morreau (2021) shows that a majority-grade method is more likely to identify the correct decision than plurality. Moreover, plurality, along with all Condorcet consistent methods, admits the domination paradox. (It can elect a candidate whose merit profile is dominated by another, as shown by the example above.) All this imply that, contrary to what most researchers in the field believe, electing the Condorcet winner whenever one exists is not necessarily an attractive property of a voting rule.

The no-show paradox occurs when it is better for a voter to not vote than to express his opinion sincerely, because his honest vote can tip the scales against his favorite candidate. However, every Condorcet-consistent method admits the no-show paradox (Moulin, 1988, 238-239), so it is unfair to criticize MJ for not satisfying two incompatible criteria. Moreover, we proved that the only methods that satisfy May’s axioms, avoid the Condorcet and Arrow paradoxes, and exclude the no-show paradox are the point-summing methods (including range voting, Balinski and Laraki 2011a, chap. 17), but they should be discarded because: they are the most manipulable among all voting methods and are extremely biased in favor of the centrist candidates. Moreover, for averaging to make sense, the scale must be an interval scale. This is too demanding in a voting rule (Balinski and Laraki, 2011a, chap. 17). MJ requires only an ordinal scale.

In any case the no-show paradox is less likely to occur with MJ than a tie is to occur with plurality, and it is of little importance in practice (as argued in Balinski and Laraki 2014b, 2020a). Its importance is insignificant when compared with the serious problems of methods of election, the necessity for voters to express themselves, Arrow, Condorcet and domination paradoxes and resistance to strategic manipulations.

Some have said that voters do not have a common understanding of the meanings of grades. But what about other methods? Do voters mean the same thing when casting plurality or approval votes? Clearly not, otherwise why did 5.5 million French voters in 2017 (and in 2022) abstain or vote blank even though they preferred Macron to Le Pen. Do voters mean the same thing when they rank candidates? Again, clearly not, for two voters who put a candidate first (or any other) place may have vastly different opinions about them, ranging from excellent down to mediocre. MJ’s grades can only improve the commonality of meaning: two Excellents have much more in common than two first-ranks; two Inadequates have much more in common than two last-ranks.

3.7 MJ works well in practice

Despite its novelty, MJ is already used in practice and is well accepted as a reform. In January 2022, about 400.000 French citizens participated in a leftist primary election, “La Primaire Populaire,Footnote 5’’ using MJ. The method worked extremely well. There was a clear final ranking of the candidates, all the nuances of the scale were used, the winner was well above the others with the majority grade Very Good, followed by 2 candidates with majority grade Good, then 3 with the majority grade of Acceptable and 1 Inadequate. The political movement LaREM (Emmanuel Macron’s party) adopted MJ in 2020; 3000 local representatives were elected, and an internal survey showed great satisfaction with the candidates. A computer programFootnote 6 allows one to easily create an MJ ballot, and about 100,000 users already used it.

The British Academy, the UK’s academy for humanities and social sciences, uses MJ for electing new fellows. MJ has also been tested in French presidential election surveys: in an experiment carried out in parallel with the first round of the 2007 election in Orsay (Balinski and Laraki 2011a, 2011b); in several national polls preceding the 2012 and 2022 presidential election, conducted by OpinionWay (Balinski and Laraki 2014a, 2014b) and in primary experiments (Gonzalez-Suitt et al., 2014; Balinski and Laraki 2014b). In these instances, voters displayed no difficulty in assigning grades to candidates. In contrast, voters faced with many candidates using plurality have a considerably more difficult task, since their “favorite” may have no chance of being elected. Le “vote utile” (tactical voting) is widely debated in France in 2022, and millions of voters refused to participate because of it.

Recent national political surveys (Pew Research Center, 2016a. pp. 11-12) provide a striking example of MJ in use and show that polling professionals believe that asking registered voters to give grades to candidates is a natural way of eliciting opinions. Pew conducted four such surveys in mid-January, late March, mid-August, and late October 2016 (just before the election on November 8). One of the questions asked in each survey was:

“Regardless of who you currently support, I’d like to know what kind of president you think each of the following would be if elected in November 2016. ... [D]o you think (he/she) would be a great, good, average, poor or terrible president?”

These words provide a reasonable scale of evaluations.

Pew Research did not know that this information could be used to deduce a rank-order of the candidates. In late March there were three remaining Republican candidates—Cruz, Kasich, and Trump—and two remaining Democratic candidates—Clinton and Sanders. The results are given in Table 3.

Table 3 (Pew Research Center, 2016a), poll results, March 17–27, 2016

Plurality, used in the primaries of both parties, led to the nominations of Clinton evaluated Poor or worse by 47% and Trump evaluated Poor or worse by a majority of 62%, the least respected major candidates in more than a half century according to many independent assessments. The distributions of Clinton’s and Trump’s grades in answer to four 2016 Pew polls between January and October are given in Table 4. In contrast with many plurality polling results, the evaluations of Clinton and Trump in these polls remain remarkably stable throughout the entire year, despite the ups and downs—the many bombastic revelations and accusations—of the campaign.

Table 4 (Pew Research Center, 2016b) poll results

4 In conclusion

“Market forces" are increasingly pushing practical people to reform voting methods by replacing rankings as inputs with grades and ignoring equal numbers of highest and lowest grades to combat manipulation. MJ simply goes all the way, eliminating as many equal numbers of highest and lowest grades as possible while still distinguishing any two competitors using the dominance criterion. The negative impact on election outcomes from the Arrow paradox and strategic manipulation, the refusal of millions of electors to vote for candidates they dislike, and the huge desire of voters to better express themselves have all contributed to the growing acceptance of majority judgement.