1 Introduction

Elections measure. Voters express themselves, a rule amalgamates them, thecandidates’ scores—measures of support—determine their order of finish, and the winner.

Traditional methods ask voters to express themselves by comparing them:

  • ticking one candidate at most (majority rule, first-past-the-post) or severalcandidates (approval voting), candidates’ total ticks ranking them;

  • rank-ordering the candidates (Borda rule, alternative vote or preferential voting, Llull’s rule, Dasgupta–Maskin method, etc.), differently derived numerical scores ranking them.

However, no traditional method measures the support of one candidate.Footnote 1 Instead, the theory of voting (or of social choice) elevates to a basic distinguishing axiom the faith that in an election between two candidates majority rule is the only proper rule. Every traditional method of voting tries to generalize majority rule to more candidates and reduces to it when there are only two candidates. Yet majority rule decides unambiguously only when there are two candidates: it says nothing when there is only one and its generalizations to three or more are incoherent.

Using the majority rule to choose one of two candidates is widely accepted asinfallible: since infancy who in this world has not participated in raising their hand to reach a collective decision on two alternatives? de Tocqueville (1831, p. 379) believed “It is the very essence of democratic governments that the dominance of the majority be absolute; for other than the majority, in democracies, there is nothing that resists”.Footnote 2 Judging from Sadurski’s (2008, p. 39) assertion—“The legitimating force of themajority rule is so pervasive that we often do not notice it and rarely do we question it: we usually take it for granted”—that conviction seems ever firmer today. Students of social choice well nigh universally accept majority rule for choosing between two alternatives, and much of the literature takes Condorcet consistency—that a candidate who defeats each of the others separately in majority votes must be the winner (the Condorcet-winner)—to be either axiomatic or at least a most desirable property.

Why this universal acceptance of majority rule for two candidates? First, the habit of centuries; second, May’s (1952) axiomatic characterization; third, the fact that it is strategy-proof or incentive compatible; fourth, the Condorcet jury theorem (Condorcet 1785).

Regrettably, as will be shown, the majority rule can easily go wrong when voting on but two candidates. Moreover, asking voters to compare candidates when there are three or more inevitably invites the Condorcet and Arrow paradoxes. The first shows that it is possible for a method to yield a non-transitive order of the candidates, so no winner (Condorcet 1785). It occurs, e.g., in elections (Kurrild-Klitgaard 1999), in figure skating (Balinski and Laraki 2011a pp. 139–146, 2014a), in wine-tasting (Balinski and Laraki 2013). The second shows that the presence or absence of a (often minor) candidate can change the final outcome among the others. It occurs frequently, sometimes with dramatic global consequences, e.g., the election of George W. Bush in 2000 because of the candidacy of Ralph Nader in Florida; the election of Nicolas Sarkozy in 2007 although all the evidence shows François Bayrou, eliminated in the first-round, was the Condorcet-winner.

A recent econometric study of French parliamentary (1978–2012) and local (2011, 2015) elections Pons and Tricaud (2017) shows that in 19.2% of the 577 elections the presence of a third minor candidate in run-offs causes a change in the winner and concludes “that a large fraction of voters are ...‘expressive’ and vote for their favorite candidate at the cost of causing the defeat of their second-best choice” (Pons and Tricaud 2017, p. 42).

How are these paradoxes to be avoided? Some implicitly accept one or the other (e.g., Borda’s method, Condorcet’s method). Others believe voters’ preferences are governed by some inherent restrictive property. For example, Dasgupta and Maskin (2004, 2008) appeal to the idea that voters’ preferences exhibit regularities—they are “single-peaked”, meaning candidates may be ordered on (say) a left/right political spectrum so that any voter’s preference peaks on some candidate and declines in both directions from her favorite; or they satisfy “limited agreement”, meaning that for every three candidates there is one that no voter ranks in the middle. Another possibility is the “single crossing” restriction (e.g., Barberà and Moreno 2011; Moulin 1988b; Puppe and Slinko 2015): it posits that both candidates and voters may be aligned on a (say) left/right spectrum and the more a voter is to the right the more she will prefer a candidate to the right.

Such restrictions could in theory reflect sincere patterns of preference in some situations; nevertheless in all such cases voters could well cast strategic ballots of a very different stripe. Moreover, all the sets of ballots that we have studied show that actual ballots are in no sense restricted. There are many examples. (1) An approval voting experiment was conducted in parallel with the first-round of the 2002 French presidential election that had 16 candidates. 2587 voters participated and cast 813 different ballots: had the preferences been single-peaked there could have been at most 137 sincere ballots (Balinski and Laraki 2011a, p. 117). (2) A Social Choice and Welfare Society presidential election included an experiment that asked voters to give their preferences among the three candidates: they failed to satisfy any of the above restrictions (Brams and Fishburn 2001; Saari 2001). (3) A recent study of voting in Switzerland uses principal component analysis to show that at least three dimensions are necessary to map voters’ preferences, so the one-dimensional single-peaked condition cannot be met (Etter et al. 2014). Real ballots—true opinions or strategic choices—eschew ideological divides.

All of this, we conclude, shows that the domain of voters’ preferences is in real life unrestricted. An entirely different approach is needed. This has motivated the development of majority judgment (Balinski and Laraki 2007, 2011a, 2014a) based on a different paradigm: instead of comparing candidates, voters are explicitly charged with a solemn task of expressing their opinions precisely by evaluating the merit of every candidate in an ordinal scale of measurement or language of grades.

Thus, for example, the ballot in a presidential election could ask:

Having taken into account all relevant considerations, I judge, in conscience, that as President of France, each of the following candidates would be:

The language of grades constituting the possible answers may contain any number of gradesFootnote 3 though in elections with many voters five to seven have proven to be good choices.

A scale that has been used is:

Outstanding, Excellent, Very Good, Good, Fair, Poor, To Reject;

The method then specifies that majorities determine the electorate’s evaluation of each candidate and the ranking between every pair of candidates—necessarily transitive—with the first-placed among them the winner.

Majority judgment has been criticized by the proponents of the traditional paradigm in articles, reviews, and talks (Bogomolny 2011; Brams 2011; Edelman 2012; Felsenthal and Machover 2008; Fleurbaey 2014). Most objections have been answered in chapter 16 of Balinski and Laraki (2011a), see also Balinski (2019). New arguments to the no-show paradox (Moulin 1988a) charge are detailed in a companion paper (Balinski and Laraki 2019). The present article gives a completely new argument for majority judgment that responds to the objection that it does not agree with the majority rule on two candidates, i.e., that it is not Condorcet-consistent.

This paper first gives a new description of majority judgment that shows it naturally emerges from a majority principle when candidates are assigned grades. Then, it shows that majority rule on two candidates is not incontestable, it is open to serious error because it admits the “domination paradox”: it may elect a candidate whose grades are dominated by another candidate’s grades.

The infallibility of majority rule on two candidates has been challenged before. Dahl charged: “By making ‘most preferred’ equivalent to ‘preferred by most’ we deliberately bypassed a crucial problem: What if the minority prefers its alternative much more passionately than the majority prefers a contrary alternative? Does the majority principle still make sense? This is the problem of intensity, ...[W]ould it be possible to construct rules so that an apathetic majority only slightly preferring its alternative could not override a minority strongly preferring its alternative?” (Dahl 1956, pp. 90–92).

Finally, starting from the basic principles that justify majority rule (May’s axioms), majority judgment is characterized as the unique method based on evaluations that satisfies those same principles, avoids the Condorcet and Arrow paradoxes, andcoincides with majority rule on two candidates when the electorate is “polarized”, that is, when the higher (the lower) a voter evaluates one candidate the lower (the higher) she evaluates the other, so there can be no consensus. Majority rule does not violate domination on polarized candidates, and since it is incentive compatible on that domain, so is majority judgment, and that is when voters are most tempted to manipulate.

2 Evaluating vs comparing

Except for political elections, the practice in virtually every instance that ranks entities is to evaluate each of them (see Balinski and Laraki 2011a, chapters 7 and 8). The Guide Michelin uses stars to rate restaurants and hotels. Competitive diving, figure skating, and gymnastics use carefully defined number scales. Wine competitions use words: Excellent, Very Good, Good, Passable, Inadequate, Mediocre, Bad. Students are graded by letters, numbers, or phrases. Pain uses sentences to describe each element of a scale that is numbered from 0 (“Pain free”) to 10 (“Unconscious. Pain makes you pass out.”), a 7 defined by “Makes it difficult to concentrate, interferes with sleep. You can still function with effort. Strong painkillers are only partially effective”.

In the political sphere polls, seeking more probing information about voter opinion, also ask more. Thus a Harris poll: “...[H]ow would you rate the overall job that President Barack Obama is doing on the economy?” Among the answers spanning 2009–2014 were those given in Table 1.

Table 1 Measures evaluating the performance of Obama on the economy (Harris 2014)

Recent Pew Research Center national political surveys (Pew Research Center 2016a, pp. 11–12) show that polling professionals believe asking registered voters to evaluate candidates is a natural way of eliciting opinions. Pew conducted four such surveys in mid-January, late March, mid-August, and late October 2016 (just before the election on November 8). One of the questions asked in each survey was:

Regardless of who you currently support, I’d like to know what kind of president you think each of the following would be if elected in November 2016. ...[D]o you think (he/she) would be a great, good, average, poor or terrible president?

The results are given in Table 2. Majority rule, used in the primaries of both parties, led to the nominations of Clinton evaluated Poor or worse by 47% and Trump evaluated Poor or worse by 62%: the least respected major candidates in more than a half century according to many assessments.

Table 2 Pew Research Center poll results, March 17–27, 2016 (Pew Research Center 2016a)

The distributions of Clinton’s and Trump’s grades in answer to the four 2016 Pew polls are given in Table 3. In contrast with many majority rule polling results, the evaluations of Clinton and Trump are stable throughout the entire year despite the ups and downs—the many revelations and accusations—of the campaign. In this election, majority rule fails because many voters were forced by the system to choose between two candidates they disliked and so some did not participate, some decided for whom to vote depending on the last news, others wished to reject politics and majority rule left them with only one way to do so: vote for Trump.

Table 3 Pew Research Center poll results, 2016 (Pew Research Center 2016b)

A similar situation was observed in the second round of the French presidentialelection of 2017 when majority rule asked voters to choose between Emmanuel Macron (En Marche, a centrist newly formed party) and Marine Le Pen (Front National, extreme right party). Compared to the participation of the first round—see Table 4—we observe 1.5 million fewer voters, 5 times as many blank ballots, 4 times as many invalid ballots and so almost 5 million fewer valid votes. Why? Many voters refused to be counted as supporting either candidate, yet they may have wished to express a difference between Macron and Le Pen (two radically opposed candidates).Footnote 4

Table 4 2nd vs 1st round results in the 2017 French presidential election (Pew Research Center 2016b)

One of the reasons that things go so wrong in those real examples is that majority rule on two candidates measures badly. Voters are charged with nothing other than to tick the name of at most one candidate or to abstain. They are not given the means to express their opinions.

All this implies, we believe, that it is not only natural to use grades to evaluate one, two, or more alternatives but necessary. To be able to measure the support a candidate enjoys, a voter must be given the means to express her opinions or “feelings”. To assure that voters are treated equally, voters must be confined to a set of expressions that is shared by all. To allow for meaningful gradations—different shades ranging from very positive, through mediocre, to very negative—the gradations must faithfully represent the possible likes and dislikes. Such finite, ordered sets of evaluations are common and accepted in every day life. Call it a scale or common language of grades\(\Lambda \) linearly ordered by \(\succ \).

An electorate’s opinion profile on a candidate is the set of her, his, or its grades \({\alpha } =(\alpha _1,\ldots \alpha _n)\), where \(\alpha _j\in \Lambda \) is voter j’s evaluation of the candidate.

Since voters must have equal voices, only the grades can count: which voter gave what grade should have no impact on the electorate’s global measure of a candidate. The number of times each grade occurs or their percentages (as in Tables 12, 3) is called the candidates’s merit profile (to distinguish it from an opinion profile that specifies the grade given the candidate by each of the judges). A candidate’s merit profile will always be written from the highest grades on the left down to the lowest on the right.

2.1 A new description of majority judgment

In this section, we show why majority judgment naturally emerges from the majority principle applied to grades instead of preference orders (leading to majority rule).

What is the electorate’s majority opinion of a candidate with grades \(\alpha =(\alpha _1,\ldots ,\alpha _n)\)? An example best conveys the basic idea. In the spring of 2015 majority judgment was used by a jury of six (\(J_1\)\(J_6\)) at LAMSADE, Université Paris–Dauphine to rank six students (A to F) seeking fellowships to prepare Ph.D dissertations. The jury agreed their solemn task was to evaluate the students and chose the language of grades

Excellent, Very Good, Good, Passable, Insufficient.

The opinion profile of candidate C was

 

\(J_1\)

\(J_2\)

\(J_3\)

\(J_4\)

\(J_5\)

\(J_6\)

C:

Passable

Excellent

Good

V. Good

V. Good

Excellent

Since voters or judges must have equal voices, only the grades count, not which voter or judge gave what grade. Accordingly the number of times each grade occurs or their percentages is called the candidate’s merit profile, always written from the highest grades on the left down to the lowest on the right. C’s merit profile was

C:

Excellent

Excellent

V. Good

\(\updownarrow \)

V. Good

Good

Passable

The middle of C’s grades in the merit profile is indicated by a two-sided arrow.

There is a majority of \(\frac{6}{6}\)—unanimity—for C’s grade to be at most Excellent and at least Passable, or for [Excellent, Passable]; a majority of \(\frac{5}{6}\) for C’s grade to be at most Excellent and at least Good, or for [Excellent, Good]; and a majority of \(\frac{4}{6}\) for C’s grade to be at most Very Good and at least Very Good, or for [Very Good, Very Good]. The closer the two—equally distant from the middle—are to the middle, the closer are their values, so the more accurate is the majority decision. When the two are equal it is the “majority-grade” and it suffices to specify one grade. If n is odd there is certain to be an absolute majority for a single grade; if n is even and the middlemost grades are different (very rare in a large electorate) there is a majority consensus for two grades.

In general, letting \({\alpha } =(\alpha _1,\alpha _2,\ldots ,\alpha _n)\) be a candidate A’s set of n grades written from highest to lowest, \(\alpha _i\succeq \alpha _{i+1}\) for all i, there is a majority of (at least) \(\frac{n-k+1}{n}\) for A’s grade to be at most \(\alpha _k\) and at least \(\alpha _{n-k+1}\), for all \(1\le k\le (n+1)/2\). Call this the \((\frac{n-k+1}{n}\))-majority for\([\alpha _k,\alpha _{n-k+1}]\). When \(k>h\) the two grades of the \((\frac{n-k+1}{n}\))-majority are closer together than (or the same as) those of the \((\frac{n-h+1}{n}\))-majority: they are more accurate.

A measure of A’s global merit is the most accurate possible majority decision on A’s grades. C’s majority-grade is Very Good, Obama’s majority-grade in each of the evaluations of his performance on the economy (Table 1) is Only fair.

How does a majority of an electorate rank candidates having sets of grades? For example, how does it rank two distributions of grades of Obama’s performance at different dates (Table 1)? Had the March 2011 or 2013 distributions been identical to that on March 2009 the electorate would have judged the performances to be the same. In fact, Obama’s March 2009 evaluations “dominate” those of the same month in 2011 and 2013 and so the electorate clearly ranks it highest; but it is not clear how to compare the evaluations in 2011 and 2013.

In general, a candidate A’s merit profile \({\alpha } =(\alpha _1,\alpha _2,\ldots ,\alpha _n)\)dominatesB’s merit profile \({\beta } =(\beta _1,\beta _2,\ldots ,\beta _n)\) (both written from highest to lowest) when \(\alpha _i\succeq \beta _i\) for all i and \(\alpha _k\succ \beta _k\) for at least one k (equivalently, when A has at least as many of the highest grade as B, at least as many of the two highest grades,..., at least as many of the k highest grades for all k, and at least one “at least” is “more”).Footnote 5 Any reasonable method of ranking should respect domination: namely, evaluate one candidate above another when that candidate’s grades dominate the other’s. Surprisingly, some methods do not (as will be seen).

With m candidates the basic input is an electorate’s opinion profile: it gives the grades assigned to every candidate by each voter and may be represented as a matrix \({\alpha }=(\alpha _{ij})\) of m rows (one for each candidate) and n columns (one for each voter), \(\alpha _{ij}\) the grade assigned to candidate i by voter j. Table 5 gives the LAMSADE Jury’s opinion profile. The preference profile of the traditional theory—voters’ rank-orderings of the candidates—may be deduced from the opinion profile whenever the language of grades is sufficient for a voter to distinguish between any two candidates when he evaluates their merit differently. Thus, for example, \(J_1\)’s preferences are \(A\approx B\succ D\approx F\succ E\succ C\). Note that no judge used all five grades even though there were six candidates.

Table 5 Opinion profile, LAMSADE Jury

To see how majority judgment (MJ) ranks the candidates of the LAMSADE Jury consider the corresponding merit profile given in two equivalent forms: extensively (Table 6) and by counts of grades (Table 7).

Table 6 Merit profile (extensive), LAMSADE Jury

The \(\frac{4}{6}\)-majorities are indicated in bold in Tables 6 and 7. A’s \(\frac{4}{6}\)-majority dominates all the others, so A is the MJ-winner; B’s and C’s dominate the remaining candidates, so are next in the MJ-ranking; and D and E—tied in the MJ-ranking since their sets of grades are identical—follow in the MJ-ranking since their \(\frac{4}{6}\)-majorities dominate F’s. How are B and C to be compared (Table 8)?

Table 7 Merit profile (counts), LAMSADE Jury

Their \(\frac{4}{6}\)-majorities are identical but their \(\frac{5}{6}\)-majorities (indicated by square brackets in Table 8) differ: B’s is for [Very Good, Very Good] and C’s for [Excellent, Good]. Since neither pair of grades dominates the other and there is more consensusFootnote 6 for B’s grades than for C’s, MJ ranks B above C. Thus the MJ ranking is \(A\succ _{MJ}B\succ _{MJ}C\succ _{MJ}D\approx _{MJ}E\succ _{MJ}F\).

Table 8 Merit profile, B and C, LAMSADE Jury

In general, if A’s grades are \(\alpha =(\alpha _1,\alpha _2,\ldots ,\alpha _n)\) and B’s \(\beta =(\beta _1,\beta _2,\ldots ,\beta _n)\), both written from highest to lowest, suppose the most accurate majority where the candidates differ is the \((\frac{n-k+1}{n})\)-majority for \([\overline{\alpha },\underline{\alpha }]\ne [\overline{\beta },\underline{\beta }]\). Thus for A, \(\overline{\alpha }=\alpha _{k}\) and \(\underline{\alpha }=\alpha _{n-k+1}\), similarly for B, and \(\alpha _j=\beta _j\) for \(k<j<n-k+1\). Comparing them, A’s middlemost block of grades is \([\alpha _k,\alpha _{k+1},\ldots ,\alpha _{n-k},\alpha _{n-k+1}]\) and B’s is \([\beta _k,\beta _{k+1},\ldots ,\beta _{n-k},\beta _{n-k+1}]\).

The majority-ranking\(\succeq _{MJ}\) ranks A above B when (a) A’s middlemost blockdominates B’s or (b) A’s middlemost block is more consensual than B’s:

$$\begin{aligned} A\succ _{MJ}B \hbox { when }\left\{ \begin{array}{l}\hbox {(a) }\overline{\alpha }\succeq \overline{\beta }\hbox { and }\underline{\alpha }\succeq \underline{\beta },\hbox { with at least one } \succeq \hbox { strict, or}\\ \hbox {(b) }\overline{\beta }\succ \overline{\alpha }\succeq \underline{\alpha }\succ \underline{\beta };\end{array}\right. \end{aligned}$$
(1)

otherwise, their sets of grades are identical and \(A\approx _{MJ}B\). The most accurate majority is either for a single grade called the majority-grade, or, when n is even, it may be for two grades (very rare in a large electorate).

2.2 Majority judgment with many voters

When there are many voters simpler arithmetic is almost always sufficient to determine the MJ ranking. This is due to two facts: the most accurate majority decision concerning a candidate is generically a single grade—the majority-grade—and (1b) almost never occurs, so it suffices to detect when a difference in grades first occurs according to (1a). An example shows how it works.

Terra Nova, a Parisian think-tank, sponsored a national presidential poll carried out by OpinionWay April 12–16, 2012 (just before the first-round of the election on April 22) to compare MJ with other methods. 993 participants voted with MJ and also according to usual practice—first-past-the-post (FPP) among all ten candidates, followed by a MR run-off between every pair of the expected five leaders in the first-round. Since the results of FPP varied slightly from the actual national percentages on election day (up to 5%) a set of 737 ballots was found for which those tallies are closely matched and are presented here.

To begin consider one candidate’s merit profile (“H” for Hollande). 50% of the grades are to the left of the middle, 50% are to the right:

 

Outstanding(%)

Excellent(%)

Very Good(%)

Good(%)

\(\updownarrow \)

Good(%)

Fair(%)

Poor(%)

To Reject(%)

H

12.48

16.15

16.42

4.95

\(\updownarrow \)

6.72

14.79

14.25

14.24

Hollande’s majority-grade is Good because there is a \((50+\epsilon )\)%-majorityFootnote 7 for [Good, Good] for every \(\epsilon \), \(0<\epsilon \le 4.95\). Similarly, there is a \((54.95+\epsilon )\)%-majority for [Very Good, Good] for every \(\epsilon \), \(0<\epsilon \le 1.77\), and a \((56.72+\epsilon )\)%-majority for [Very Good, Fair] for every \(\epsilon \), \(0<\epsilon \le 8.07\). The \(x\%\)-majority decision—in general, a pair of grades—may be found for any \(x\%>50\%\).

The MJ-ranking with many voters is determined in exactly the same manner as when there are few: the most accurate majority where two candidates differ decides. Compare, for example, Hollande (H) and Bayrou (B):

 

Outstanding (%)

Excellent (%)

Very Good (%)

Good (%)

\(\updownarrow \)

Good (%)

Fair (%)

Poor (%)

To Reject (%)

H

12.48

16.15

16.42

4.95

\(\updownarrow \)

6.72

14.79

14.25

14.24

B

2.58

9.77

21.71

15.94

\(\updownarrow \)

9.30

20.08

11.94

8.69

Both have \((50+\epsilon )\)%-majorities for [Good, Good] for every \(\epsilon \), \(0<\epsilon \le 4.95\), so both have the majority-grade Good. But for \(0<\epsilon \le 1.77\) Hollande has a (\(54.95+\epsilon \))%-majority for [Very Good, Good] whereas Bayrou has a (\(54.95+\epsilon \))%-majority for [Good, Good]. Since Hollande’s middlemost block dominates Bayrou’s, MJ ranks Hollande above Bayrou. This happens because \(4.95<\min \{6.72, 15.94,9.30\}\). Had the smallest of these four numbers 4.95 been Hollande’s but to the right of themiddle, his (\(54.95+\epsilon \))%-majority would have been for [Good, Fair] whereasBayrou’s (\(54.95+\epsilon \))%-majority would have remained [Good, Good], putting Bayrou ahead of Hollande. Finding the smallest of these four numbers is the same as finding the highest percentage of each candidate’s grades strictly above and strictly below their majority-grades: if that highest is the percentage above the majority-grade it puts its candidate ahead, if that highest is the percentage below it puts its candidate behind. The rule has another natural interpretation: of the four sets of voters who disagree with the majority-grades the largest tips the scales.

The general rule for ranking two candidates with many voters makes the generic assumption that there is a \((50+\epsilon )\%\)-majority, \(\epsilon >0\), for each candidate’s majority-grade. Let \(p_A\) be the percentage of A’s grades strictly above her majority-grade \(\alpha _A\) and \(q_A\) the percentage of A’s grades strictly below \(\alpha _A\). A’s majority-gauge (MG) is \((p_A,\alpha _A,q_A)\). The majority-gauge rule\(\succ _{MG}\) ranks A above B when

$$\begin{aligned} A\succ _{MG}B \hbox { when }\left\{ \begin{array}{l}\alpha _A\succ \alpha _B\hbox { or},\\ \alpha _A= \alpha _B\hbox { and } p_A>\max \{q_A,p_B,q_B\} \hbox { or,}\\ \alpha _A= \alpha _B\hbox { and }q_B>\max \{p_A,q_A,p_B\}.\end{array}\right. \end{aligned}$$
(2)

The MG-rule is generically decisive, i.e., there are (unique) majority-grades \(\alpha _A\) and \(\alpha _B\), and a unique maximum among the four numbers \(p_A, q_A, p_B\) and \(q_B\). This is as certain as for the majority rule to be decisive. The majority-grade may be endowed with a sign, \(\alpha _A+\) when \(p_A>q_A\), otherwise \(\alpha _A-\). When the MG-rule is decisive its ranking is identical to that of the MJ-rule [specified in (1)] by construction.

The full merit profile of the French 2012 presidential poll is given in Table 9. The MJ(= MG)-ranking is given in Table 10 together with the first-past-the-post (FPP) ranking to show the marked differences between them.

Table 9 Merit profile, 2012 French presidential poll (737 ballots) (Balinski and Laraki 2014b)
Table 10 MJ and first-past-the-post rankings, 2012 French presidential poll (737 ballots) (Balinski and Laraki 2014b)

2.3 Point-summing methods

A point-summing methodFootnote 8 chooses (ideally) an ordinal scale—words or descriptive phrases (but often undefined numbers)—and assigns to each a numerical grade, the better the evaluation the higher the number. There are, of course, infinitely many ways to assign such numbers to ordinal grades. Every voter evaluates each candidate in that scale and the candidates are ranked according to the sums or averages of their grades. A point-summing method clearly respects domination, but it harbors two major drawbacks.

The Danish educational system uses six grades with numbers attached to each:Outstanding 12, Excellent 10, Good, 7, Fair 4, Adequate 2 and Inadequate 0 (Wikipedia 2014). Its numbers address a key issue of measurement theory (Krantz et al. 1971): the scale of grades must constitute an interval scale for sums or averages to be meaningful. An interval scale is one in which equal intervals have the same meaning; equivalently, for which an additional point anywhere in the scale—going from 3 to 4 or from 10 to 11—has the same significance. In practice—in grading divers, students, figure skaters, wines or pianists—when (say) the scale is multiples of \(\frac{1}{2}\) from a high of 10 to a low of 0, it is much more difficult and much rarer to go from 9 to 9\(\frac{1}{2}\) than from 4\(\frac{1}{2}\) to 5, so adding or averaging such scores is—in the language of measurement theory—meaningless. The Danes specified a scale that they believed constitutes an interval scale [for an extended discussion of these points see Balinski and Laraki (2011a, pp. 171–174), or Balinski and Laraki (2014a)]. “Range voting” is a point-summing method advocated on the web where voters assign a number between 0 and a 100 to each candidate, but the numbers are given no meaning other than that they contribute to a candidate’s total number of points and they do not constitute an interval scale.

A second major drawback of point-summing methods is their manipulability. Any voter who has not given the highest (respectively, the lowest) grade to acandidate can increase (can decrease) the candidate’s average grade, so it paysvoters or judges to exaggerate up and down. A detailed analysis of an actual figureskating competition (Balinski and Laraki 2014a) shows that with point-summing every one of the nine judges could alone manipulate to achieve precisely the order-of-finish he prefers by changing his scores. A companion analysis of the same competition shows that with majority judgment the possibilities for manipulation are drastically curtailed.

Long time users of point-summing methods such as the Fédération International de Natation (FINA) have recently tried to combat strategic manipulation. Divers must specify the dives they will perform, each of which has a known degree of difficulty expressed as a number. Judges assign a number grade to each dive from 10 to 0 in multiples of \(\frac{1}{2}\): Excellent 10, Very Good 8.5–9.5, Good 7.0–8.0, Satisfactory 5.0–6.5, Deficient 2.5–4.5, Unsatisfactory 0.5–2.0, Completely failed 0 (the meanings of each are further elaborated). There are five or seven judges. The highest and lowest scores are eliminated when there are five judges and the two highest and two lowest are eliminated when there are seven judges. The sum of the remaining three scores is multiplied by the degree of difficulty to obtain the score of the dive. Competitions in skating and gymnastics have chosen similar methods. Had any of them gone a little further—eliminating only what must be to distinguish a difference—they would have used MJ: increasingly practical people choose methods approaching MJ.

2.4 Approval voting

Analyses, experiments, and uses of approval voting (Brams and Fishburn 1983) have deliberately eschewed ascribing any meaning to Approve and Disapprove—except that Approve means giving one vote to a candidate and Disapprove means giving none—leaving it entirely to voters to decide how to try to express their opinions (Weber 1977; Brams and Fishburn 1983). Thus, for example, the Social Choice and Welfare Society’s ballot for electing its president had small boxes next to candidates’ names with the instructions: “You can vote for any number of candidates by ticking the appropriate boxes”, the number of ticks determining the candidates’ order of finish. In this description AV may be seen as a point-summing method where voters assign a 0 or 1 to each candidate and the electorate’s rank-order is determined by the candidates’ total sums of points.

Recently, however, that view has changed: “the idea of judging each and every candidate as acceptable or not is fundamentally different” from either voting for one candidate or ranking them (Brams 2010, pp. vii–viii). This implies a belief thatvoters are able to judge candidates in an ordinal scale of merit with two grades. With this paradigm approval voting becomes MJ with a language of two grades: “approvaljudgment”. For example, if Approve meant Good or better the AV results of the LAMSADE Jury would be those given in Table 11.

Table 11 AV-scores and -ranking, Approve means Good or better, LAMSADE Jury

When there are few voters AV’s two grades are not sufficient to distinguish the competitors. Further evidence shows two grades are too few even when there are many voters (Balinski and Laraki 2011a, 2014a, 2019).

3 Majority rule characterized for two candidates

Majority rule (MR) in a field of two elects that candidate preferred to the other by a majority of the electorate. May proved (1952) that the majority rule is the one rule that satisfies the following six simple properties in an election with two candidates. This theorem is considered to be a major argument in its favor (Elster and Novak 2014).

Axiom 1

(Based on comparing) A voter’s opinion is a preference for one candidate or indifference between them.Footnote 9

Thus the input is a preference profile that specifies the preference or indifference of each voter.

Axiom 2

(Unrestricted domain) Voters’ opinions are unrestricted.

Axiom 3

(Anonymity) Interchanging the names of voters does not change the outcome.

Axiom 4

(Neutrality) Interchanging the names of candidates does not change theoutcome.

Anonymity stipulates equity among voters, neutrality the equitable treatment ofcandidates.

Axiom 5

(Monotonicity) If candidate A wins or is tied with his opponent and one or more voters change their preferences in favor of A then A wins.

A voter’s change in favor of A means changing from a preference for B to either indifference or a preference for A, or from indifference to a preference for A.

Axiom 6

(Completeness) The rule guarantees an outcome: one of the two candidates wins or they are tied.

Theorem 1

(May 1952) For \(n=2\) candidates majority rule \(\succeq _{MR}\) is the unique method that satisfies Axioms 1 through 6.

Proof

The argument is simple. That MR satisfies the axioms is obvious.

So suppose the method \(\succeq _M\) satisfies the axioms. Anonymity implies that only the numbers count: the number of voters \(n_A\) who prefer A to B, the number \(n_B\) that prefer B to A, and the number \(n_{AB}\) that are indifferent between A and B. Completeness guarantees there must be an outcome. (1) Suppose \(n_A=n_B\) and \(A\succ _MB\). By neutrality switching the names results in \(B\succ _MA\): but the new profile is identical to the original, a contradiction that shows \(A\approx _MB\) when \(n_A=n_B\). (2) If \(n_A>n_B\) change the preferences of \(n_A-n_B\) voters who prefer A to B to indifferences to obtain a valid profile (by Axiom 2). With this profile \(A\approx _MB\). Changing them back one at a time to the original profile proves \(A\succ _MB\) by monotonicity.\(\square \)

There are three further arguments in favor of MR for two candidates. First, its simplicity and familiarity. Second, its incentive compatibility: the optimal strategy of a voter who prefers one of the two candidates is to vote for that candidate (Barberà 2010). Third, the Condorcet jury theorem.

In its simplest form, the jury theorem supposes that one of the two outcomes is correct and that each voter has an independent probability \(p>50\%\) of voting for it, concludes that the greater the number of voters the more likely the majority rule makes the correct choice, and that furthermore, in the limit, it is certain to do so. In most votes between two alternatives, however, there is no “correct” choice or “correct” candidate: all opinions are valid judgments, disagreement is inherent to any democracy, and must be accepted. A mechanism that choses the consensus is what is needed.

4 The domination paradox

The dangers of a “tyranny of the majority” have been discussed for centuries: the phrase was used by John Adams in 1788, Alexis de Tocqueville in 1835 and John Stuart Mill in 1859. Dahl’s now classic treatise (Dahl 1956) considers it at length in the context of what he calls “Madisonian Democracy” and poses the problem of intensity: “[J]ust as Madison believed that government should be constructed so as to prevent majorities from invading the natural rights of minorities, so a modern (Madison 1787) might argue that government should be designed to inhibit a relatively apathetic majority from cramming its policy down the throats of a relatively intense minority” (Dahl 1956, p. 90).

Table 12 Merit profile, Hollande–Sarkozy, 2012 French presidential election poll
Table 13 Possible opinion profile, Hollande–Sarkozy (giving the merit profile of Table 12), national poll, 2012 French presidential election

Given voters’ evaluations of candidates, MR for two candidates harbors a major drawback heretofore unrecognized. It admits the domination paradox: a candidate A’s evaluations dominate B’s evaluations but MR elects B not A.

Contrast the merit profiles of Hollande and Sarkozy in the national poll of the 2012 French presidential election (Table 12). Hollande’s grades very generously dominate Sarkozy’s. But this merit profile could come from the opinion profile of Table 13 where Sarkozy is the MR-winner with 59.57% of the votes to Hollande’s 26.19%, 14.24% rejecting both.

Those voters who rate Sarkozy above Hollande do so mildly (with small differences in grades, top of profile), whereas Holland is rated above Sarkozy intensely (with large differences in grades, bottom of the profile): this is a situation Dahl elicited in questioning the validity of MR. In the actual national vote—and in the poll—Hollande won by a bare majority of 51.6–48.4%, suggesting that the possibility for MR to err on two candidates is important and real. To elect a candidate whose evaluations are dominated by another is clearly unacceptable.

5 May’s axioms for more than two candidates

Given a fixed scale of linearly ordered grades \(\Lambda \), a ranking problem is defined by an electorate’s opinion profile \(\Phi \), an m by n matrix of grades when there are m candidates and n voters. A method of ranking\(\succeq _M\) is an asymmetric binary relation between all pairs of candidates.

With the grading model voters’ inputs are grades. With the traditional model voters’ inputs are comparisons. Individual rationality implies that a voter’s preference is a rank order over all the candidates. It may be deduced from a voter’s grades when the scale is sufficient to distinguish between any two candidates whenever the voter believes their merit to be different.

The following are May’s axioms extended to any number of candidates.Footnote 10

Axiom 1

(Based on comparing) A voter’s input is a rank-order of the candidates.

Axiom 2

(Unrestricted domain) Voters’ opinions are unrestricted.

Axiom 3

(Anonymity) Interchanging the names of voters does not change the outcome.

Axiom 4

(Neutrality) Interchanging the names of candidates does not change theoutcome.

In the traditional model a voter’s input becomes better for a candidate A if A rises in his rank-order. In the grading model a voter’s input becomes better for A if A is given a higher grade.

Axiom 5

(Monotonicity) If \(A\succeq _MB\) and one or more voters’ inputs become better for A then \(A\succ _MB\).

Axiom 6

(Completeness) For any two candidates either \(A\succeq _MB\) or \(A\preceq _MB\) (or both, implying \(A\approx _MB\)).

With more than two candidates the Condorcet and Arrow paradoxes must be excluded.

Axiom 7

(Transitivity) If \(A\succeq _MB\) and \(B\succeq _MC\) then \(A\succeq _MC\).

Axiom 8

(Independence of irrelevant alternatives (IIA)) If \(A\succeq _MB\) then whatever other candidates are either dropped or adjoined \(A\succeq _MB\).

This is the Chernoff (1954), Nash (1950) formulation of IIA defined for a variable number of candidates Nash (1950), Chernoff (1954), not Arrow’s definition for a fixed number of candidates, and it implies Arrow’s. It is this conception that is often violated in practice (e.g., elections, figure skating, wines).

Theorem 2

(Arrow’s impossibility Arrow 1951) For \(n\ge 3\) candidates there is no method of ranking \(\succeq _M\) that satisfies Axioms 1 through 8.

This is a much watered-down form of Arrow’s theorem, based on more axioms—though all necessary in a democracy—and is very easily proven. It is stated to contrast the two models: comparing candidates versus evaluating them.

Proof

Take any two candidates A and B. By IIA it suffices to deal with them alone to decide who leads. Axioms 1 through 6 imply that the method \(\succeq _M\) must be MR. Since the domain is unrestricted Condorcet’s paradox now shows MR violates transitivity. So there can be no method satisfying all the axioms. \(\square \)

Now replace Axiom 1 above by:

Axiom 1* (Based on measuring) A voter’s input is the grades given the candidates.

Dahl, when he challenged the validity of MR on two candidates, proposed as a solution to use an ordinal “intensity scale” obtained “simply by reference to some observable response, such as a statement of one’s feelings ...” (Dahl 1956, p. 101), and argued that it is meaningful to do so: “I think that the core of meaning is to be found in the assumption that the uniformities we observe in human beings must carry over, in part, to the unobservables like feeling and sensation” (Dahl 1956, p. 100). This is precisely the role of our new Axiom 1* which also relates to a key problem raised by measurement theorists, the faithful representation problem: when measuring some attribute of a class of objects or events how to associate a scale “in such a way that the properties of the attribute are faithfully represented ...” (Krantz et al. 1971, p. 1). Practice—in figure skating, wine tasting, diving, gymnastics, assessing pain, etc.—has spontaneously and naturally resolved it.

However, some social choice theorists continue to express the opinion that a scale of intensities or grades is inappropriate in elections. Why should intensities be valid—indeed, be necessary—and practical in judging competitions but not in elections? The validity of using intensities as inputs in voting as versus using rankings as inputs is not a matter of opinion or of mathematics: it is a practical, experimental issue. Repeated experiments and real uses of MJ (Balinski and Laraki 2011a, pp. 9–16 and chapter 15; Gonzalez-Suitt et al. 2014; Balinski and Laraki 2014a, pp. 496–497 and 504–509; and others referred to in this article) show that nuances in evaluations are as valid for candidates in elections as they are for figure skaters, divers or wines in competitions, though the criteria and scales of measures must be crafted for each individually. Participants have uniformly found it easy to assign grades and have done so quickly. Some practical-minded people are also persuaded that MJ should be used in elections. Terra Nova—“an independent progressive think tank whose goal is to produce and diffuse innovative political solutions in France and Europe”—has included majority judgment in its recommendations for reforming the presidential election system of France (Nova 2011). A non-profit associationFootnote 11 has been created in France to promote majority judgment. Finally, a real use of MJ was initiated by several concerned French citizens in 2016—an engineer, a lawyer and a mathematician. Called LaPrimaire.org the idea was for citizens independent of any political parties to freely nominate a candidate for the 2017 French presidential election. The entire effort was carried out on the web, including campaigning and voting. The organizers considered the various known voting methods and chose to use MJ. An initial slate of some 200 candidates was whittled down to 12. Then in a first vote each voter was asked to evaluate five prescribed candidates of the 12 on the scale Excellent, Very Good, Good, Passable, Insufficient. The assignment of five (of the 12) candidates to voters was done randomly but in such manner that each candidate was evaluated by approximately the same number of voters. The 12 were then ranked by MJ and the five leaders designated to participate in the final campaign and vote (LaPrimaire.org 2016a). The reason for not directly voting on all 12 was to incite voters to invest the time to make a careful comparative study of the proposals and approaches of each candidate: evaluating 12 candidates was seen as asking for too great an investment of time and effort. The final election among the five was conducted using MJ with the same scale; 32,685 persons voted; each candidate’s grades dominated those of the candidates lower in the MJ-ranking. The winner’s majority-grade was Excellent, the runner up’s was Very Good, all of the last three’s were Good (see LaPrimaire.org 2016b for a complete description). The results were well accepted and MJ was included in the electoral reforms proposed by the winner and others.

Theorem 3

For \(n\ge 1\) candidates there are an infinite number of methods of ranking \(\succeq _M\) that satisfy Axioms 1* and 2 through 8. Their rankings depend only on candidates’ merit profiles and they respect domination.

Proof

Majority judgment and any point-summing method clearly satisfy all axioms, so there are as many methods as one wishes that satisfy the axioms.

All methods satisfying the axioms depend only on their merit profiles. To see this compare two competitors. By Axiom 8 (IIA) it suffices to compare them alone. If two candidates A and B have the same set of grades, so that B’s list of n grades is a permutation \(\sigma \) of A’s list, it is shown that they must be tied. Consider, first, the opinion profile \(\phi ^1\) of A and a candidate \(A'\) who may be added to the set of candidates by IIA,

$$\begin{aligned} \phi ^1:\begin{array}{rccccc}&{}v_1&{}\cdots &{} v_{\sigma 1} &{}\cdots &{}v_n\\ \hline A:&{}\alpha _1&{}\cdots &{}\alpha _{\sigma 1}&{}\cdots &{}\alpha _n\\ A':&{} \alpha _{\sigma 1}&{}\cdots &{}\alpha _1&{}\cdots &{}\alpha _n\\ \hline \end{array} \end{aligned}$$

where \(A'\)’s list is the same as A’s except that the grades given by voters \(v_1\) and \(v_{\sigma 1}\) have been interchanged. \(\phi ^1\) is possible since the domain is unrestricted. Suppose \(A\succeq _MA'\). Interchanging the votes of the voters \(v_1\) and \(v_{\sigma 1}\) yields the profile \(\phi ^2\)

$$\begin{aligned} \phi ^2:\begin{array}{rccccc}&{}v_{\sigma 1}&{}\cdots &{} v_1 &{}\cdots &{}v_n\\ \hline A:&{}\alpha _{\sigma 1}&{}\cdots &{}\alpha _1&{}\cdots &{}\alpha _n\\ A':&{} \alpha _1&{}\cdots &{}\alpha _{\sigma 1}&{}\cdots &{}\alpha _n\\ \hline \end{array} \end{aligned}$$

Nothing has changed by Axiom 3 (anonymity), so the first row of \(\phi ^2\) ranks at least as high as the second. But by Axiom  4 (neutrality) \(A'\succeq _MA\), implying \(A\approx _MA'\). Thus \(A\approx _MA'\) where \(A'\)’s first grade agrees with B’s first grade.

Compare, now, \(A'\) with another added candidate \(A''\), with profile \(\phi ^3\)

$$\begin{aligned} \phi ^3:\begin{array}{rccccccc}&{}v_{\sigma 1}&{}v_2&{}\cdots &{} v_{\sigma 2} &{}\cdots &{}v_n\\ \hline A':&{}\alpha _{\sigma 1}&{}\alpha _2&{}\cdots &{}\alpha _{\sigma 2}&{}\cdots &{}\alpha _n\\ A'':&{} \alpha _{\sigma 1}&{} \alpha _{\sigma 2}&{}\cdots &{}\alpha _2&{}\cdots &{}\alpha _n\\ \hline \end{array} \end{aligned}$$

where \(A''\)’s list is the same as \(A'\)’s except that the grades of voters \(v_2\) and \(v_{\sigma _2}\) have been interchanged. Suppose \(A'\succeq _MA''\). Interchanging the votes of the voters \(v_2\) and \(v_{\sigma 2}\) yields the profile \(\phi ^4\)

$$\begin{aligned} \phi ^4: \begin{array}{rccccccc}&{}v_{\sigma 1}&{}v_{\sigma 2}&{}\cdots &{} v_2 &{}\cdots &{}v_n\\ \hline A':&{}\alpha _{\sigma 1}&{}\alpha _{\sigma 2}&{}\cdots &{}\alpha _2&{}\cdots &{}\alpha _n\\ A'':&{} \alpha _{\sigma 1}&{} \alpha _2&{}\cdots &{}\alpha _{\sigma 2}&{}\cdots &{}\alpha _n\\ \hline \end{array} \end{aligned}$$

so as before conclude that \(A'\approx _MA''\). Axiom 7 (transitivity) now implies \(A\approx _MA''\) where \(A''\)’s first two grades agree with B’s first two grades.

Repeating this reasoning shows \(A\approx B\), so which voter gave which grade has no significance. Therefore a candidate’s distribution of grades—his merit profile—is what determines his place in the ranking with any method that satisfies the Axioms. It has a unique representation when the grades are listed from the highest to the lowest.

Suppose A’s grades \(\alpha \) dominates B’s grades \(\beta \), both given in order of decreasing grades. Domination means \(\alpha _j\succeq \beta _j\) for all j, with at least one strictly above the other. If \(\alpha _k\succ \beta _k\) replace \(\beta _k\) in \(\beta \) by \(\alpha _k\) to obtain \(\beta ^1\succ _M\beta \) by monotonicity (Axiom 5). Either \(\beta ^1\approx _M\alpha \) proving that \(\alpha \succ _M\beta \), or else \(\alpha \succ _M \beta ^1\). In the second case, do as before to obtain \(\beta ^2\succ _M\beta ^1\), and either \(\beta ^2 \approx _M\alpha \), or else \(\alpha \succ _M\beta ^2\). If \(\beta ^2\approx _M\alpha \) then \(\beta \prec _M\beta ^1\prec _M\beta ^2\approx _M\alpha \) and transitivity implies \(\beta \prec _M\alpha \). Otherwise, repeating the same argument shows that \(\alpha \succ _M\beta \).

Monotonicity implies domination is respected.\(\square \)

A reasonable method should certainly avoid the domination paradox. This theorem shows that any method based on evaluations (as Dahl suggested) that satisfies May’s axioms and avoids the Condorcet and Arrow paradoxes does so. When majority rule fails to respect domination it differs from all such methods: why then the persistent insistence on agreeing with the majority rule?

6 Polarization

Are there reasons to choose one method among all that meet the demands of Theorem 3? All ranking methods that satisfy IIA (Axiom 8) are determined by how they rank pairs of candidates. So what makes sense when two candidates are to be ranked? In particular, are there circumstances when majority rule for two candidates is acceptable?

One instance leaps to mind: jury decisions. The goal is to arrive at the truth, the correct decision, either the defendant is guilty or is not guilty. A juror may be more or less confident in his judgment: the higher his belief that one decision is correct the lower his belief that the opposite decision is correct. In this context Condorcet’s jury theorem strongly supports MR. But this context is very different from that of most elections between two candidates where gradations of opinion are inherent and an excellent opinion of one does not necessarily imply a low opinion of the other.

“Political polarization” is a phenomenon that is more and more observed anddiscussed in the United States and other countries (see e.g., Barber and McCarty 2013; Brennan Center 2010). It refers to a partisan cleavage in political attitudes in support of ideological extremes—pro-abortion vs. anti-abortion, pro-evolution vs. anti-evolution, left vs. right—and applies to voters, elites, candidates, and parties. The concept necessarily concerns an opposition between two. The word is used when (say) large majorities of Democratic and Republican voters are vehemently onopposite sides in their evaluations of issues or candidates. The notion evokes the idea that most voters are at once intensely for one side and intensely against the other, so the situation approaches that of a jury decision where there is no question of (in Dahl’s words) pitting a passionate minority against an apathetic majority.

Consider the two major opponents of the 2012 French presidential election poll (Table 9), Hollande (moderate left) and Sarkozy (traditional right). Table 14 gives the electorate’s opinion profile concerning them (where, e.g., 1.63% in the first column give Sarkozy Fair and Hollande Outstanding). Tables 15 and  16 contain respectively the distributions and cumulative distributions of the grades given Hollande for each of the grades given Sarkozy (Table 15 is obtained from Table 14 by normalizing each line to sum to 100%).

Table 14 Opinion profile, Hollande-Sarkozy, 2012 French presidential poll
Table 15 Distributions of Hollande’s grades for each of Sarkozy’s grades, 2012 French presidential poll
Table 16 Cumulative distributions of Hollande’s grades for each of Sarkozy’s grades, 2012 French presidential poll

Comparing any two lines of Table 16, the lower (almost) always dominates the upper. Thus the lower the grade given Sarkozy the higher the distribution of grades given Hollande: statistically there is diametrical opposition in their grades. The same occurs with the distributions of Sarkozy’s grades for each of Hollande’s grades.Moreover, as may be seen in Table 17, 47.36% of the voters are polarized on the two candidates.

Table 17 Polarized part (47.36%) of true opinion profile, Hollande-Sarkozy, 2012 French presidential election poll (for merit profile of Table 12)

On the other hand, analyzing in the same manner the grades given to two extreme left candidates, Mélenchon and Poutou, the higher the grade given one the higher the distribution of grades given the other: statistically there is a concordance of grades. All of this confirms common sense.

Given an electorate’s merit profile take any two candidates. Different opinionprofiles on the two candidates have this same merit profile. An opinion profile on a pair of candidates will be said to be “polarized” when the higher a voter evaluates one candidate the lower the voter evaluates the other. Specifically:

Two candidates A and B are polarized if, for any two voters, \(v_i\) evaluates A higher (respectively, lower) than \(v_j\) then \(v_i\) evaluates B no higher (respectively, no lower) than \(v_j\).

This definition includes and extends the usual (loosely defined) notion of a polarized electorate and formulates a “pure” form of a type of distribution of grades found in practice. It brings to mind the single crossing opinion profile of the traditional theory (Barberà and Moreno 2011; Puppe and Slinko 2015).

The merit profile of Hollande and Sarkozy (Table 12 or 14) showed that Hollande’s grades largely dominate Sarkozy’s, and yet there are opinion profiles (e.g., Table 13) that make Sarkozy the overwhelming MR-winner. The polarized opinion profile that has the same merit profile is given in Table 18.

Table 18 Polarized opinion profile, Hollande-Sarkozy, 2012 French presidential election poll (for merit profile of Table 12)

The polarized opinion profile makes Hollande the MR-winner with a score of 50.75% to Sarkozy’s 43.28%, 5.97% giving both the grade Good, so agrees with Hollande’s domination in grades. With this profile MR makes the right decision.

It would seem that it is precisely when an electorate is polarized—or when a jury seeks the correct answer between two opposites—and there can be no consensus that the “strongly for or strongly against” characteristic of MR should render the acceptable result. The electorate is not polarized on Hollande vs. Sarkozy (Table 14): it is only statistically polarized. Completely polarized pairs are rare; however, significant parts may be polarized (Tables  1721). Consider, for example, the breakdown of the grades given to Poutou (extreme left) and Le Pen (extreme right) in the 2012 French presidential poll (see Table 19).

Table 19 Opinion profile, Le Pen-Poutou, 2012 French presidential poll

The polarized opinion profile that agrees with the merit profile is given in Table 20 (MR places Poutou with 47.63% ahead of Le Pen with 46.14%). It does not agree with the true opinion profile.

Table 20 Polarized opinion profile, Le Pen-Poutou, 2012 French presidential election poll (for merit profile of Table 9)
Table 21 Polarized part (72.60%) of true opinion profile, Le Pen-Poutou, 2012 French presidential election poll (for merit profile of Table 9)

However, a large part of the true opinion profile—almost three-quarters—is polarized on Le Pen and Poutou as may be seen in Table 21, and every opinion profile will have parts that are polarized.

Polarization, as defined here, is a significant phenomenon in practice. It is also of significance in theory. It is proven in the next section—and was observed in the example of Table 18—that when two candidates are polarized and one’s grades dominate the other’s MR (when decisive) makes the correct decision: it is acceptable.

7 A new chacterization of majority judgment

MR on two candidates has a particularly important property: it resists manipulation or is incentive compatible (see next section). Any method that agrees with it inherits the property. MR may err, but it does not when a pair of candidates is polarized (see corollary below), and that is precisely the situation when voters are most tempted to manipulate. Thus for a method to agree with MR on polarized pairs is very attractive.

Definition

A method of ranking \(\succeq _M\) is consistent with the majority rule on polarized pairs of candidates if both give the identical ranking between every pair of polarized candidates whenever both are decisive (meaning no ties). Generically, ties have a zero probability of occurring when the number of voters is large.

Not every method satisfying Axioms 1* and 28 is consistent with MR on polarized electorates. Point-summing methods, for example, are not. Take the polarized opinion profile of Poutou and Le Pen (Table  20). MR makes Poutou the winner with 47.63% of the votes to Le Pen’s 46.14%, 6.24% evaluating both to be Poor. But if Outstanding is worth 6 points, Excellent 5, ..., down to To Reject 0, Le Pen’s 172.75 points easily defeats Poutou’s 101.80.

Theorem 4

A method of ranking \(\succeq _M\) that satisfies Axioms 1* and 28 and is consistent with the majority rule on polarized pairs of candidates must coincide with the majority-gauge rule \(\succ _{MG}\) when the scale of grades \(\Lambda \) is sufficient.Footnote 12

Proof

Given an opinion profile with any number of candidates, IIA (Axiom  8) implies that the order between any two must be determined between the two alone. ByTheorem 3, the method \(\succeq _M\) depends only on the merit profile—the distributions of the candidates’ grade—not on which voters gave which grades. Thus, the order between them determined by the method is the same as that determined by the polarized opinion profile.

Table 22 Polarized opinion profile

So consider the polarized opinion profile of two candidates, A’s grades going from highest on the left to lowest on the right and B’s from lowest to highest, as displayed in Table 22. A’s grades are non-increasing and B’s non-decreasing and for any j either \(\lambda _A^j\succ \lambda _A^{j+1}\) or \(\lambda _B^j\prec \lambda _B^{j+1}\), so the corresponding grades can be equal at most once (as indicated in the middle line of the profile).Footnote 13

Suppose A is the MR-winner. Then \(x_A=\sum _1^{k-1}x_i>\sum _{k+1}^sx_i=x_B\). Byassumption \(x_A=50\%\) is excluded (since \(\succ _{MG}\) must be decisive).

If \(x_A>50\%\) then A’s majority-grade is at least \(\lambda ^{k-1}_A\) and B’s at most \(\lambda ^{k-1}_B\), so A is ahead of B by the MG rule.

Otherwise \(x_B<x_A<50\%\) implying \(x_A+x_k>50\%\) and \(x_B+x_k>50\%\), so the candidates’ majority-grades are the same, \(\lambda ^k_A=\lambda ^k_B\). Notice that \(p_A\le x_A\) and \(q_B\le x_A\) but one of the two must be an equality; similarly \(q_A\le x_B\) and \(p_B\le x_B\) but one of the two must be an equality as well. Thus \(x_A=\max \{p_A,q_B\}\) and \(x_B=\max \{p_B,q_A\}<x_A\). If \(p_A=x_A\) then \(p_A\) is the largest of the p’s and q’s, and MG puts A above B; if \(q_B=x_A\) then \(q_B\) is the largest of the p’s and q’s, and MG puts B below A, as was to be shown.

Now assume the MG rule places A above B, \((p_A,\lambda _A,q_A)\succ _{MG}(p_B,\lambda _B,q_B)\). Either A’s majority-grade is above B’s, \(\lambda _A\succ \lambda _B\); or they are equal and either \(p_A>\max \{q_A,p_B,q_B\}\) or \(q_B>\max \{p_A,q_A,p_B\}\).

In the first case, at least \((50+\epsilon )\%\) (for some \(\epsilon >0\)) of the voters gave the grade \(\lambda _A\) or better to A and \(\lambda _B\) or worse to B. They constitute a majority of at least \((50+\epsilon )\%\) that makes A the MR-winner.

In the second case, they are equal and are the kth column of the polarized opinion profile: \(\lambda _A=\lambda _B=\lambda _A^k=\lambda _B^k\). As before, \(x_A=\sum _1^{k-1}x_i=\max \{p_A,q_B\}\) and \(x_B=\sum _{k+1}^sx_i=\max \{p_B,q_A\}\). Thus, \(p_A>\max \{q_A,p_B,q_B\}\) implies \(x_A=p_A>\max \{p_B,q_A\}=x_B\) and \(q_B>\max \{p_A,q_A,p_B\}\) implies \(x_A=q_B>\max \{p_B,q_A\}=x_B\), so in both instances A is the MR-winner.\(\square \)

Corollary

Majority rule on a pair of polarized candidates elects (when decisive) a candidate whose grades dominate the other’s.

Proof

The majority-gauge rule elects the candidate whose grades dominate the other’s. But as just shown the majority rule coincides with it when the candidates are polarized, proving the corollary.\(\square \)

If the scale of grades is not sufficient—i.e., a voter is forced to give a same grade to two candidates whereas she has a preference between them—the theorem does not apply since it is impossible to deduce the ranking from the grades (a same grade for two candidates could bear three meanings, preference for one of the two or indifference). For example, with two grades MJ becomes approval voting, a voter may Approve both candidates or Disapprove both without actually being indifferent between them, so that agreement with majority rule calculated on the basis of preferences is impossible even when the electorate is polarized, and there is no guarantee that AV agrees with MR. For example, if Approve means Good or better in the polarized profile of Poutou versus Le Pen (Table 19), approval voting elects Le Pen with 32.16% not Poutou with 13.71% in disagreement with MR (and so MJ with the full complement of grades). Thus, to guarantee agreement with the majority rule on polarized pairs, the scale of grades must be rich enough to faithfully represent the preferences and indifferences. The optimal number of grades to use depends on the application.

In the most cited paper of the first hundred years of the Psychological Review (Miller 1956) concluded, “There is a clear and definite limit to the accuracy with which we can identify absolutely the magnitude of a unidimensional stimulus variable. I would propose to call this limit the span of absolute judgment, and I maintain that for unidimensional judgments this span is usually somewhere in the neighbourhood of seven”.

The evidence with MJ in voting concords with Miller’s finding (Table  23). In the 2007 experiment where voters were offered a scale of six grades only 14% used all six although there were 12 candidates. In the 2012 poll voters had a scale of seven grades, yet the number of grades they used is remarkably similar to that of the participants in the 2007 experiment, only 1% using all seven grades to evaluate ten candidates. This suggests that at least five grades are necessary, six is a good number, seven is unnecessary.

Table 23 Percentages of ballots with k grades, 2007 French presidential election experiment and 2012 French presidential poll

Experts in judging figure skaters, divers, gymnasts, or wines may, however, have the finesse to discern refinements that demand finer scales; in figure skating, for example, judges use a scale of 13 grades (for each part of a performance) and in diving a scale of 21 (for each dive).

8 Resisting strategic manipulation

The Gibbard-Satterwaite impossibility theorem (Gibbard 1973; Satterthwaite 1973) proves that no method of ranking satisfying Axioms 17—so based on comparisons—is strategy-proof when there are at least three candidates. Yet, “It is important to ask under what circumstances it would be possible to design non-trivial strategy-proof decision rules, because strategy-proofness, when attainable, is an extremely robust and attractive property” (Barberà 2010, Sect. 5).

Suppose a method M satisfying Axioms 1* and 28 ranks candidate A above B, \(A\succ _MB\). Because of monotonicity, a voter who prefers B to A can try to manipulate by lowering A’s grade and raising B’s. The method resists to strategic manipulation if A remains above B after the manipulation.

Definition

A method M is strategy-proof in ranking if for every two candidates A and B s.t. \(A\succ _MB\) and every voter who grades B above A, he can neither raise B’s global measureFootnote 14 nor lower A’s.

As will be shown, there is no such method. So formulate a less demanding property.

Definition

A method is partially strategy-proof in ranking if for every two candidates A and B s.t. \(A\succ _MB\) and every voter who grades B above A, if he can raise B’s global measure then he cannot lower A’s, and if he can lower A’s then he cannot raise B’s.

Point-summing methods are clearly not partially strategy-proof in ranking.

Theorem 5

(1) No method of ranking satisfying Axioms 1* and 28 is strategy-proof in ranking on the entire domain of possible opinions. (2) The majority-gauge rule \(\succ _{MG}\) is partially strategy-proof in ranking on the entire domain when the language of grades is sufficient.

Proof

(1) Suppose a method \(\succeq _M\) satisfying the axioms is strategy-proof in ranking on the entire domain. Axiom  8—IIA—implies it must be strategy-proof on any two candidates A and B with profile \(\Phi \). If a voter j gave A the higher grade change both of j’s grades, giving A the highest and B the lowest; if voter j gave B the higher grade change both, giving B the highest and A the lowest; and if voter j gave both the same grade change nothing (she is indifferent). Strategy-proofness or monotonicity implies \(\succeq _M\) must give the same outcome for this new profile on A and B. Doing the same for every voter in turn results in an opinion profile \(\Phi '\) for which the method \(\succeq _M\) necessarily gives the same outcome between A and B as does \(\Phi \).

In \(\Phi '\) let \(m_A\) be the number of voters who give A the highest and B the lowest grade, \(m_B\) the number who give B the highest and A the lowest, and \(m_{AB}\) the number who gave both the same grade (\(m_{AB}\) is also the number who gave the same grade in \(\Phi \)). If \(m_A>m_B\) then A’s grades dominate B’s and monotonicity implies \(A\succ _MB\); and if \(m_A=m_B\) their sets of grades are identical so by Theorem 3\(A\approx _MB\). But this implies \(\succeq _M\) coincides with the majority rule \(\succeq _{MR}\). Since the domain is unrestricted, Condorcet’s paradox shows transitivity is violated, a contradiction.

(2) A voter who gives the same grade to two candidates has no preference between them (when the scale is sufficient) so has no incentive to see one ranked above the other.Footnote 15

Suppose \(A\succ _{MG}B\), A’s majority-grade is \(\alpha ^A\) and B’s \(\alpha ^B\), and voter j ranks B above A, \(\alpha _j^B\succ \alpha _j^A\). If j can raise \((p_B,\alpha ^B,q_B)\) then \(\alpha _j^B\preceq \alpha ^B\), implying \(\alpha _j^A\prec \alpha _j^B\preceq \alpha ^B\preceq \alpha ^A\) so j cannot lower \((p_A,\alpha ^A,q_A)\); whereas if j can lower \((p_A,\alpha ^A,q_A)\) then \( \alpha _j^A\succeq \alpha ^A\), implying \(\alpha _j^B\succ \alpha _j^A\succeq \alpha ^A\succeq \alpha ^B\) so j cannot raise \((p_B,\alpha _B,q_B)\). \(\square \)

Lemma

A method of ranking \(\succeq _M\) that satisfies Axioms 1* and 28 and is strategy-proof on the limited domain of polarized pairs of candidates must be consistent with the majority rule \(\succeq _{MR}\) on polarized pairs, when the language of grades is sufficient.

Proof

The proof is very similar to that of Theorem 5. Suppose \(\succeq _M\) satisfies Axioms 1* and 28 and is strategy-proof on pairs of polarized candidates. Take any such pair with profile \(\Phi \). If voter j gave A the higher grade change both grades, giving A the highest and B the lowest; if voter j gave B the higher grade change both, giving B the highest and A the lowest; and if voter j gave both the same grade change nothing (he is indifferent). Since the new opinion profile on the two candidates is again polarized, strategy-proofness or monotonicity implies that \(\succeq _M\) gives the same outcome. Doing the same for all voters gives an opinion profile \(\Phi ^*\) for which the outcome is the same as for \(\Phi \). The last part of (1) in the proof of Theorem 5 now shows that \(\succeq _M\) between any two polarized opinion profiles is \(\succeq _{MR}\).\(\square \)

Theorem 6

A method of ranking \(\succeq _M\) that satisfies Axioms 1* and 28 and is strategy-proof on the limited domain of polarized pairs of candidates must coincide with the majority-gauge rule \(\succeq _{MG}\) when the language of grades is sufficient.

Proof

The Lemma together with Theorem 4 proves it. \(\square \)

Theorems 5 and 6 provide new theoretical evidence that majority judgment (with a sufficient number of grades) best resists strategic manipulation. It says that the majority-gauge rule is always partially strategy proof and when two candidates are polarized—and sizable parts of an electorate often are—the majority-gauge rule is strategy-proof. To appreciate the importance of those results in practice, consider the merit profile of Table 12 from which we can compute that:

  • The majority-gauge of Hollande is (45.05%, Good\(+\), 43.28%); and

  • The majority-gauge of Sarkozy is (49.25%, Fair\(+\), 39.62%).

From the opinion profile in Table 14, we can deduce that 40.31% of all voters strictly grade Sarkozy above Hollande. They are of three types:

  • Type 1 voters strongly prefer Sarkozy to Hollande because they give to Sarkozy a high grade (\(\succeq \)Good) and to Hollande a low grade (\(\preceq \)Fair); They are the most motivated to manipulate.

  • Type 2 voters like both candidates because give a high grade to both (to Hollande a grade \(\succeq \)Good, and so to Sarkozy a grade \(\succeq \)Very Good); They are less motivated to manipulate.

  • Type 3 voters dislike both candidates because give a low grade to both (to Sarkozy a grade \( \preceq \)Fair and so to Hollande a grade \(\preceq \)Poor). They are less motivated to manipulate.

From Table 14, we can compute that type 1 voters are the largest fraction because they form 76.09% of the 40.31% who prefer Sarkozy to Hollande. Type 2 constitute 19.20% and type 3 only 4.71%. How those three types of voters can manipulate? Just up Sarkozy’s grade to Outstanding, down Hollande’s to To Reject. The effects on majority-gauges are very limited:

  • Type 1 voters (76.09%, most tempted to manipulate) have no effect whatsoever. Type 1 voters give to Sarkozy a grade above his majority grade and to Hollande a grade below his majority grade. Consequently, increasing Sarkozy’s grade does not change the total of grades above his majority grade and decreasing Hollande’s grade does not change the total of grades below his majority grade. As such: even if all of type 1 voters manipulate, the majority-gauges of Hollande and Sarkozy remain exactly the same and so the ranking remains unchanged.

  • Type 2 voters (19.20%, not motivated to manipulate) can only partially manipulate by decreasing Hollande’s majority-gauge, but cannot increase Sarkozy’s majority-gauge;

  • Type 3 voters (4.71%, not motivated to manipulate) can only partially manipulate by increasing Sarkozy’s majority-gauge, but cannot decrease Hollande’s majority-gauge.

This example illustrates why majority-gauge rule resists strategic manipulation in practice. Voters that are the most tempted to manipulate (typically, the largest fraction) have no effect on the majority-gauges after manipulation. The other voters (a smaller fraction) have limited impact because they can only partially manipulate. Moreover, they are not very motivated to manipulate because either they like both candidates or dislike both and so perhaps voting honestly provide them more satisfaction than manipulating the result.

Now suppose that all type 1’s up Sarkozy’s grade to Outstanding, and downHollande’s to To Reject and that all voters of types Types 2 and 3 that are“sufficiently motivated” (e.g. grades differ by at least two levels) do the same. This implies that 86.21% manipulate among those who prefer Sarkozy to Hollande.

With majority judgment manipulation fails: because Hollande’s M-G decreases from (45.05%, Good\(+\), 43.28%) \(\searrow \) to (44.64%, Good−, 46.95%) and Sarkozy’s M-G increases from (49.25%, Fair\(+\), 39.62%) \(\nearrow \) to (49.66%, Fair\(+\), 39.62%).

With a point-summing method—where Outs. has value 6 points, Exc. 5 points, ..., Poor 1 point, Rej. 0 point—manipulation is successful. Before manipulation: Hollande’s average score is 3.00 and Sarkozy’s average score is 2.48 and after the manipulation: Hollande’s average score is 2.56 and Sarkozy’s average score is 2.94.

Thus, if is is true that all methods are manipulable, some are more manipulable than others, and majority judgment is the lessFootnote 16 manipulable among those that satisfy Axiom 1* and Axioms 28.

Strategy-proofness is a very important property of a method if it implementscorrect decisions. Dictatorship, for example, is strategy-proof, yet that attribute is not a sufficient reason to adopt it. Strategy-proofness is no virtue in a method that implements wrong decisions (as MR sometimes does when there are two candidates). First and foremost a method should first guarantee a correct decision when voters or judges express themselves honestly; subject to that caveat, the method should resist manipulation as best as possible.

When a candidate grades dominate all other candidates—and in practice such a dominating candidate often existsFootnote 17—common sense suggests that he is the correct winner of the election. By Theorem 3, when voters are honest, all methods satisfying Axiom 1* and 28 guarantee that the dominating candidate—whenever he exists—is elected. Theoretical, experimental and practical evidences, show that using majority judgement is the best chance to still elect the dominating candidate when voters are strategic.

9 Conclusion

The intent of this article is to convince readers of a number of main points. Several concern the drawbacks of methods based on the traditional paradigm where voters compare candidates and the “ideal” winner is a candidate who is preferred by a majority to another candidate:

  • The domination paradox shows that ideal has a major flaw: majority rule for electing one of two candidates is not necessarily, with the exception of polarized electorates, a good method.

  • Condorcet consistency in every situation, in consequence, is not a desirableproperty, contrary to the widely held view, and should certainly not beconsidered axiomatic.

  • Comparisons as inputs to methods of voting are insufficient expressions of opinions and should be replaced by ordinal measures.

Several others concern methods based on voters’ evaluations of candidates:

  • The essential property or “ideal” is that a candidate whose grades dominate another’s should lead the other (no domination paradox).

  • An infinity of different methods that obey the essential properties (Axioms 1* and 2 through 8) meet that ideal.

  • Among them MJ is the one that best combats strategic manipulation.

  • In particular, MJ is the one method that agrees with the traditional ideal when the electorate is polarized (i.e., when the domination paradox is excluded and the traditional ideal is acceptable).