1 Introduction

Voting rules are widely used to settle collective decisions in elections, committees, teams, among many other settings. They set the rules of how individual preferences are aggregated into a social preference, usually to select a winner.Footnote 1 Choosing which rule to use can be a consequential decision. A large literature has therefore compared voting rules across different domains and with different methods. For example, social choice theory has compared voting rules by proposing desirable axiomatic properties such as the ability to always elect the Condorcet winner or the ability to resist strategic manipulation (e.g. Sen 1995; Brams and Fishburn 2002; Laslier 2009), while a growing experimental literature compares voting rules in terms of their impact on voting behaviour and electoral outcomes (e.g. Forsythe et al. 1996; Van der Straeten et al. 2010a; Alós-Ferrer and Granic 2012; Baujard et al. 2013, 2021; Bouton et al. 2016, 2021; Moreno et al. 2019; Tsakas and Xefteris 2021).

A promising complementary way to assess voting rules is to compare the expressive utility—the intrinsic benefit derived from expressing one’s preferences/attitudes/values—that voters derive from voting with each rule. A desire to express oneself, for its own sake, is one of the most cited motivations to underlie turnout and voting behaviours.Footnote 2 However, to the best of our knowledge, little is known about the relationship between expressive utility and voting rules. Improving our understanding of this issue is important because different voting rules are widely used in companies, committees, elections, referendums and opinion polls, among many other settings, and they may have a significant impact on turnout and voters’ welfare if there are significant differences in terms of the expressive utility that voters derive from voting with different rules.

In this paper, we first propose a conceptual framework that allows us to compare voting rules in terms of the expressive utility that voters can derive from voting. Our main hypothesis is that voters’ expressive utility is highest when they express their actual preferences over alternatives, and decreases as the preferences cast in the ballot get more distant from their actual preferences. This distance can increase, in particular, because a voting rule does not allow a voter to express her actual preference. For instance, a voter that strictly prefers x to y and y to z cannot express this preference in an election decided by plurality rule.Footnote 3 We then argue that voting rules differ in terms of their expressive power, in the sense that voting rule A has more expressive power than voting rule B if A allows to express more preferences than B and all preferences that are possible to express with B can be expressed with A. It follows that if voting rule A has more expressive power than voting rule B, then voters’ expressive utility with A is equal or higher than with B, all else equal. This simple model provides a novel testable implication for the relationship between voting rules, expressive utility, and turnout: if voting rule A has more expressive power than voting rule B, then expected turnout with voting rule A is equal or higher than expected turnout with voting rule B, ceteris paribus.Footnote 4

To see this, consider plurality rule and approval voting.Footnote 5 Approval voting allows to express all preferences that are possible to express with the plurality rule, in addition to some that are not possible to express with the latter rule. Therefore, we say that approval voting has more expressive power than the plurality rule. This means that—according to our model—there should be no voter for whom her expressive utility is higher with plurality rule than with approval voting, while expressive utility may be higher for some voters with approval voting than with plurality rule. It follows that we can make an unambiguous comparison in terms of expected turnout, according to which expected turnout with approval voting is equal or higher than with plurality rule, all else equal. Note, however, that this type of unambiguous comparison is not always possible. For instance, we cannot make an unambiguous comparison in terms of expressive power (and expected turnout) between plurality rule and Borda rule.Footnote 6

We then ran an online experiment to test the model’s implications in a controlled environment. In May 2020, we recruited 1977 UK subjects to participate in an opinion poll and asked them to vote for different policies on how to “open up” the country after the first Covid-19 lockdown. However, contrary to a standard opinion poll, participants only registered their vote (i.e., their vote was counted for the final public results) if they were willing to respond to a series of additional questions that were not asked in case of abstention. This feature of the opinion poll made registering one’s vote (our proxy for turnout) costly. To test the model’s main implication, participants were randomly assigned to three voting rules with different degrees of expressive power: plurality rule (Treatment 1, or T1), approval voting (T2), and a synthetic rule that allowed participants to express all possible preferences with approval voting and the Borda rule (hereafter A-Borda rule, T3). In two additional treatments (T4 and T5), subjects participated in the same treatments as in T1 and T3 respectively, with the only difference being that they were made aware of different voting rules before they made their turnout decision. This experimental design allows us to test if voting rules with higher expressive power lead to higher turnout, and if awareness of different voting rules is necessary for this to hold.

We find no statistically significant difference between turnout in T1, T2, and T3. This contrasts with our predictions. However, once participants are made aware of different voting rules, turnout is statistically and “economically” significantly higher with the voting rule with higher expressive power (82.99% in T5 against 76.79% in T4). Our interpretation of these results is the following. For T1 and T2, the null result seems to be explained by the low percentage of subjects who were able to better express themselves in T2 than in T1. In fact, few subjects were indifferent between options in our setting (something we discuss below). For T1/T2 and T3, we highlight three potential explanations. First, increased rule complexity in T3 can be partly to “blame”. Subjects found the A-Borda rule to be the most complex rule, and our results suggest that rule complexity negatively affects turnout in our setting. Second, strategic voting can have a similar effect. There are more opportunities to vote strategically in T3 than in T1/T2, and our results suggest that non-sincere voters (such as strategic and inattentive voters) are less likely to turn out than non-strategic sincere voters. It follows that non-sincere voting can partly explain why turnout is not higher in T3 than in T1/T2.Footnote 7 Third, there may be a bias in favour of more familiar (and/or “real”) rules. In particular, we conjectured that if voters were not able to compare the voting rule they were assigned to with other rules (i.e., without explicit reference points), the ability/inability to better express oneself would not be salient and its effect on the expressive utility of voting muted. This factor, contrary to rule complexity and non-sincere voting, can explain our findings for T4 and T5. In those treatments, we find that simple awareness of alternative voting rules is enough to change voters’ turnout behaviour and that, in the presence of reference points, they are more likely to vote with voting rules that allow them to better express themselves. In these conditions, subjects can use the rules that were presented to them but with which they did not vote as reference points to judge the voting rule with which they voted. Then, according to this interpretation, the ability/inability to express oneself becomes salient and its effect on expressive utility is magnified. Taken together, our results suggest that the expressive power of voting rules can have implications for participation as long as voters are aware of (and can compare) different voting rules.

Our results also show that subjects were better able to express their actual preferences in voting rules with higher expressive power (72% in T3, 67.5% in T5, 7.5% in T2, 3.3% in T4, and 2.3% in T1 expressed their actual preference). This difference is due to the fact that most participants had strict preferences over alternatives that they could express under T3 and T5. According to our model, this suggests that subjects derived higher expressive utility from voting with rules with higher expressive power. A series of questions at the end of the experiment support this interpretation. We find that subjects regarded that they were better able to express themselves under the rules with higher expressive power. We also find that a large majority of subjects (68.4%) would have liked to have used the A-Borda rule when they were asked to choose between plurality/approval/A-Borda ex-post.

These results suggest that the expressive power of voting rules is a relevant criterion when deciding which voting rule to use in economic and political decisions. In particular, these results suggest that, all else equal, the higher the expressive power of a voting rule (i) the higher the turnout is expected to be if voters are aware of different voting rules, (ii) the better the voting rule represents individuals’ actual preferences, and according to our model (iii) the higher the expressive utility derived from voting is expected to be. If this holds more generally, a voting rule with higher expressive power is desirable for the sake of higher turnout, expressive utility, and representation of voters’ actual preferences. These are relevant goals when considering voting and turnout behaviours, which may have significant implications in terms of voters’ welfare.

1.1 Relationship to the literature

Our paper makes several contributions to the literature. First, it contributes to the growing literature that compares different voting rules (e.g. Merrill 1984; Chamberlin 1985; Nurmi 1987; Sen 1995; Forsythe et al. 1996; Brams and Fishburn 2002; Conitzer and Sandholm 2005; Brams and Fishburn 2007; Laslier 2009; Van der Straeten et al. 2010a; Alós-Ferrer and Granic 2012; Baujard et al. 2013; Plassmann and Tideman 2014; Bouton et al. 2016; Moreno et al. 2019; Baujard et al. 2021; Bouton et al. 2021; Tsakas and Xefteris 2021). Social choice theorists have studied this topic profusely, proposing many desirable axiomatic properties of voting rules (e.g. Sen 1995; Brams and Fishburn 2002; Laslier 2009). Other authors have proposed empirical criteria to evaluate these rules, such as the communication complexity and cognitive load associated with different voting rules (Conitzer and Sandholm 2005). More recently, several authors have used experimental evidence to bring new insights into this subject. For example, Bouton et al. (2021) use a lab experiment to study Duverger’s prediction that the plurality rule leads to higher coordination of votes on a limited number of candidates than the majority runoff rule. We contribute to this large body of literature by providing a novel criterion to evaluate voting rules and by testing some of its implications experimentally.

Our analysis also contributes to the expressive voting literature. The central hypothesis of this literature is that individuals may vote as an act of expressive behaviour, receiving direct utility from the act of voting itself instead of, or in addition to, instrumental utility related to material self-interest (e.g. Fiorina 1976; Brennan and Buchanan 1984; Carter and Guerette 1992; Brennan and Lomasky 1993; Brennan and Hamlin 1998; Tyran 2004; Shayo and Harel 2012; Igersheim et al. 2016; Spenkuch 2018; Pons and Tricaud 2018; Ginzburg et al. 2022). Many models, however, do not make any assumptions on the non-consequential expressive element associated with voting, and therefore have no choice-based testable implications (see e.g. Green and Shapiro 1994). One exception is the low-cost theory of expressive voting (Brennan and Lomasky 1993). According to this theory, expressive motives become more important relative to self-interested motives when pivot probabilities are small (e.g. large elections). Assuming that expressive motives are more pro-social than self-interested ones, it follows that when pivot probabilities decrease, voters can obtain the satisfaction of “behaving charitably” much cheaper (Tullock 1971, p. 389; for experimental evidence, see e.g. Carter and Guerette 1992; Tyran 2004; Feddersen et al. 2009; Shayo and Harel 2012; Ginzburg et al. 2022).

Our contribution to this literature is four-fold. First, we put forward a conceptual framework according to which the expressive utility derived from voting depends on the ordinal distance between voters’ actual preferences and the preferences that they express when voting. As explained below, this simple model has testable implications for individual turnout behaviour which we test in our data. Second, we propose a new formal criterion to compare voting rules in terms of their degree of expression. Other authors have noted that different voting rules allow for different degrees of expression (e.g. Brams and Fishburn 2005; Baujard and Igersheim 2009; Alós-Ferrer and Granic 2012). Aragones et al. (2011), for instance, used a model of “voting as making statements” to show that approval voting allows voters to make a richer set of statements than the plurality rule. We contribute to this body of literature by putting forward a new formal ranking of voting rules that is based on the ordinal preferences that individuals can express under different rules.Footnote 8 Third, our study adds up to the recent experimental literature that looks at the relationship between voting rules and expressive voting. Ginzburg et al. (2022) use a theoretical model and a lab experiment to show that voting rules that reduce the probability of a committee member being pivotal may raise the share of expressive votes; Baujard et al. (2021) conducted an experiment in parallel with the 2017 French presidential election to study if voters’ behaviour and the outcome of elections are sensitive to modifications of the grading scale employed in evaluative voting that allow—according to the authors’ definitions—for greater expressivity. While related, our focus is significantly different. Most notably, while the other studies are interested in voting behaviour and the outcome of elections, we investigate if different voting rules offering different levels of representation of voters’ actual preferences can lead to different levels of turnout. Fourth, our experimental results indicate some conditions under which differences in possibilities to better express oneself may or may not influence turnout. In particular, they suggest that awareness of alternative voting rules (i.e., the presence of explicit reference points) is necessary to increase participation. In other settings, such as committees or elections, one can easily raise awareness of alternative voting rules and provide reference points. For example, if a national voting reform increases expressive power, the government could link the reform with an informational campaign that would provide information about the old and the new rules and, in a more direct way than done in our setting, highlight the expressive benefit of the latter.

Third, our study is related to the literature devoted to voters’ preferences over electoral systems and voting rules (e.g. Bosch and Orriols 2014; Blais et al. 2015; Bol et al. 2018; Müller and Jankowski 2019; Engelman et al. 2020; Weber 2020; Blais et al. 2021; Hoffmann and Renes 2022; Bol et al. 2023). Of these, the survey-based experiment by Blais et al. (2021) and the lab experiment by Bol et al. (2023) are the closest to ours. Blais et al. (2021) randomly assign participants to vote in hypothetical elections with either plurality or Borda rule, and ask them about their satisfaction with the voting rule after voting. Contrary to their hypothesis that expressing a full ranking increases satisfaction with the rule, they find little difference in overall satisfaction between treatments. In Bol et al. (2023), subjects experience elections under plurality and approval voting (first stage), and then decide which rule they want to use for extra elections (second stage). The authors find that subjects who hold more (less) egalitarian values for society are more likely to choose approval voting (plurality) in the second stage when it has produced a more (less) egalitarian distribution of payoffs than plurality (approval voting) in the first stage. In our setting, turnout is our main outcome of interest (as opposed to ex-post satisfaction or preference over different rules). All these outcomes—turnout, ex-post satisfaction, and preference over rules—seem to be potentially affected by the “expressiveness” of the voting rules, but also by other factors such as the complexity of the rule (as acknowledged by Blais et al. 2021 for the case of satisfaction). In our view, these outcomes are relevant complements to understand the relationship between expressive utility and voting rules. Our conceptual framework and our results also suggest that it is important to consider people’s actual preferences over alternatives and their reference points in terms of rules when comparing the perceived expressiveness of the rules and people’s satisfaction with or preference over different voting rules.

Finally, our paper contributes to the literature in political science on the impact of alternative electoral systems on turnout (e.g. Ladner and Milner 1999; Blais 2006; Blais and Aarts 2006; Kartal 2015; Sanz 2017). Ladner and Milner (1999) is a seminal empirical study that exploits the fact that different communes in Switzerland use different electoral systems. The authors reveal that a proportional representation system entails higher turnout than a majoritarian one in this setting. This is the case whatever voting rule is associated with the representative system. Sanz (2017) offers another example. For local Spanish elections, which are held under a proportional representation system, the national law demands localities, depending on their population size, to either use a closed lists system (CLs) or an open list system (OLs). Under a CLs voters vote for a party list, while under an OLs voters can vote for candidates from a single party list but also from different party lists. Sanz (2017) reports that an OLs increases turnout compared to a CLs. One potential explanation for this effect is that the OLs captures the vote of “conflicted voters” who identify with more than one party (see Bourgeois-Gironde and Ferreira 2020). Our paper complements this literature by showing that different voting rules (an element of electoral systems) can entail different levels of turnout independently of their effective representational consequences.

2 Conceptual framework

Consider an individual i in a population of N individuals that can either cast a vote or abstain from voting. Let \(P_i^1(u_i)\) represent the probability that individual i casts a vote when her utility from voting is \(u_i\), where \(P_i^1(u_i)\) is a monotonically non-decreasing function that takes values in [0, 1]. The higher \(u_i\), the higher \(P_i^1(u_i)\). Conversely, let \(P_i^0(u_i)\) represent the probability that individual i abstains. Clearly, \(P_i^0(u_i)+ P_i^1(u_i)=1\) for each \(i \in N\).

Let X be the set of alternatives available for voting (e.g. candidates in an election or options in an opinion poll). A preference \(R_i\) is a collection of ordered pairs over these options. We say that (xy) belongs to \(R_i\) if and only if x is at least as good as y from the viewpoint of \(R_i\). Then, x is said to be indifferent to y for \(R_i\) if both (xy) and (yx) belong to \(R_i\), and x is said to be strictly better than y for \(R_i\) if (xy) belongs to \(R_i\) but (yx) does not. For simplicity, we assume preferences to be reflexive [i.e., (xx) belongs to \(R_i\) for every x in X] and transitive [i.e., if (xy) and (yz) belong to \(R_i\) then (xz) belongs to \(R_i\) for any x, y, z in X]. Preferences are not necessarily complete [i.e., (xy) or (yx) do not necessarily belong to \(R_i\) for every distinct x and y in X] nor antisymmetric [i.e., (xy) and (yx) may belong to \(R_i\) even if \(x \ne y\)].

We distinguish between two types of preferences. Individual i’s actual preference, denoted \(R^a_i\), is the individual own ranking of alternatives. Individual i’s expressed preference, denoted \(R^e_i\), is the ranking that i casts when voting. In other words, expressed preferences are the preference that an observer can derive from looking at the ballot. Assume that voters are attentive and sincere, in the sense that they do not make mistakes and they vote as closest as possible to their actual preference. Even so, \(R^a_i\) and \(R^e_i\) may differ. The underlying reason is that voting rules only allow i to express a subset of all possible preferences over X. To see this, let \({\mathcal {P}}_X\) be the set of all possible (reflexive and transitive) preferences over X, and \({\mathcal {P}}_A\) and \({\mathcal {P}}_B\) be the set of preferences that voting rules A and B allow to be expressed over X respectively. It is easy to check that \({\mathcal {P}}_A,{\mathcal {P}}_B \subseteq {\mathcal {P}}_X\). For instance, the plurality rule only allows to express preferences in which one alternative is strictly preferred to all other alternatives and those other alternatives are indifferent between themselves; if \(X= \{x,y,z\}\), the plurality rule allows i to express a preference \(R^e_i\) such that (xy) and (xz) belong to \(R^e_i\) (x is selected in the ballot), but it does not allow i to express a preference \(R^e_i\) such that (xy) and (yz) belong to \(R^e_i\). But then, if (xy) and (yz) belong to \(R^a_i\), it follows that \(R^a_i\) and \(R^e_i\) will differ. If, on the other hand, only (xy) and (xz) belong to \(R^a_i\), then \(R^a_i\) and \(R^e_i\) will coincide.Footnote 9 It is also easy to check that \({\mathcal {P}}_A\) and \({\mathcal {P}}_B\) may differ. For example, the plurality rule allows i to express \(R^e_i\) such that only (xy) and (xz) belong to \(R^e_i\) while the Borda rule does not, and the Borda rule allows i to express \(R^e_i\) such that (xy) and (yz) belong to \(R^e_i\) while plurality rule does not.

We assume that individual i’s utility of voting with expressed preference \(R^e_i\), \(u_i(R^e_i)\), is given as follows:

$$\begin{aligned} u_i(R^e_i)=p B_i(R^e_i) - C_i + E_i(R^e_i) \end{aligned}$$
(1)

where p denotes the probability that i’s vote decides the election (i’s vote is pivotal), \(B_i(R^e_i)\) denotes the material benefit (e.g. consumption) that i receives with the selection of the preferred option in \(R^e_i\) among the options for which the voter is pivotal, \(C_i\) denotes the costs associated with voting such as the time and effort to cast a vote, and \(E_i(R^e_i)\) denotes i’s expressive utility of voting \(R^e_i\). Note that \(E_i(R^e_i)\) is a non-consequential element that is independent of whether the vote affects the outcome of the election, which means that even if p is negligible, as is the case in most large elections, it may still be in i’s best interest to vote.Footnote 10

In this model, we emphasise how the expressive utility of voting \(E_i(R^e_i)\) may depend on the (ordinal) distance between the preference an individual expresses when voting (\(R^e_i\)) and her actual preference (\(R^a_i\)):

$$\begin{aligned} E_i(R^e_i)=max(E^*_i - K(R^e_i,R^a_i), 0) \end{aligned}$$
(2)

where \(E^*_i\ge 0\) is i’s expressive utility of casting a vote with \(R^e_i=R^a_i\), and \(K(R^e_i,R^a_i)\) is the Kemeny distance between i’s expressed preference \(R^e_i\) and i’s actual preference \(R^a_i\). The Kemeny distance between two preferences \(R_i\) and \(R_j\) is given by \(K(R_i,R_j)=\#[R_i {\setminus } R_j \cup R_j {\setminus } R_i]\), where \(\#\) denotes the cardinality of a set, \({\setminus }\) denotes the set difference and \(\cup\) the union.Footnote 11 It measures the distance between two ordinal preferences as the number of “swaps” necessary to transform one preference into another. For instance, if x is strictly better than y for \(R_i\), x is indifferent to y for \(R_j\), and \(R_i\) and \(R_j\) are identical on all other pairs, then (yx) will belong to \(R_j\) but it will not belong to \(R_i\). Hence, one swap is needed to transform \(R_i\) into \(R_j\) and the Kemeny distance between the two preferences is 1. Expression (2) above captures our main hypothesis, which can be summarized as follows:

Expressive utility is highest when a voter expresses her actual preference, and decreases as the expressed preference gets more distant from her actual preference.

Following this hypothesis, we can derive testable implications for expected turnout. For individual voters, the lower the (Kemeny) distance between actual and expressed preferences the higher the expected turnout, ceteris paribus. This is a straightforward implication of (1) and (2) (see also the proof of Proposition 1 below). To derive testable implications for expected turnout in different voting rules, we first define what we mean by a voting rule having more expressive power than another.

Definition 1

Voting rule A is said to have more expressive power than rule B if and only if \({\mathcal {P}}_A \supset {\mathcal {P}}_B\).

In other words, voting rule A is said to have more expressive power than voting rule B if and only if A allows to express a strict superset of preferences of the subset allowed to express by B. For instance, approval voting has more expressive power than the plurality rule because approval voting allows to express preferences that cannot be expressed with the plurality rule and all preferences that are possible to express with the plurality rule can be expressed with approval voting. We can then show the following:

Proposition 1

If voting rule A has more expressive power than voting rule B, then expected turnout with voting rule A is equal or higher than expected turnout with voting rule B, ceteris paribus.

Proof

If voting rule A has more expressive power than voting rule B then \({\mathcal {P}}_A \supset {\mathcal {P}}_B\). To prove the implication, note that it follows from (1) and (2) that \(u_i(R^e_i)\) decreases with \(K(R^e_i,R^a_i)\), ceteris paribus. By extension, \(P_i^1(u_i)\), the probability that individual i casts a vote, also decreases with \(K(R^e_i,R^a_i)\), ceteris paribus. We consider two cases: (i) There is no \(i\in N\) such that \(K(R^e_i,R^a_i)\) with voting rule A is lower than \(K(R^e_i,R^a_i)\) with voting rule B. This is the case, for instance, if \(R^a_i \in {\mathcal {P}}_B\) for all \(i \in N\). In this case, there is no \(i \in N\) such that the influence of \(K(R^e_i,R^a_i)\) on \(P_i^1(u_i)\) is different with voting rule A and voting rule B. Therefore, expected turnout with voting rule A is equal to expected turnout with voting rule B, ceteris paribus. (ii) There is \(i\in N\) such that \(K(R^e_i,R^a_i)\) with voting rule A is lower than \(K(R^e_i,R^a_i)\) with voting rule B. This is the case, for instance, if there is at least one individual in N such that \(R^a_i \in {\mathcal {P}}_A {\setminus } {\mathcal {P}}_B\). In this case, there is \(i \in N\) such that \(P_i^1(u_i)\) is higher with voting rule A than with voting rule B due to the influence of \(K(R^e_i,R^a_i)\). Therefore, the expected turnout with voting rule A is higher than the expected turnout with voting rule B, ceteris paribus. Since \({\mathcal {P}}_A \supset {\mathcal {P}}_B\), this completes the proof. \(\square\)

Proposition 1 shows that if at least one individual can express her actual preference with voting rule A but not with voting rule B (or can decrease the distance to her actual preference changing from B to A), then the average expected turnout with voting rule A is higher than with voting rule B, ceteris paribus. This means that voting rules may have different impacts on expected turnout according to their expressive power.

Before proceeding, it is worth emphasising the ceteris paribus nature of this proposition. It only says that if a voting rule allows voters to better express their preferences than another, then we should expect turnout to be higher with the former rule than with the latter if all remains equal. However, different voting rules can change the likelihood that voters will be pivotal. Therefore, if voting rule A has more expressive power than voting rule B but the likelihood of being pivotal in A is lower than the one in B, a decisive prediction in terms of overall expected turnout cannot be made. While in theory this is not an issue for large elections and opinion polls (as is the case of our experimental design), it is a relevant consideration for elections in small groups. A similar issue arises with respect to the strategic incentives of voters. Since these can be confounding factors in our experimental setting, we explore these issues below. More generally, this means that more research is needed to understand how voting rules affect the overall utility derived from voting and turnout. The contribution of this paper is to formally identify one channel linking voting rules with voters’ utility and turnout, and test this channel experimentally.

Finally, it is also worth noting that in some cases our model allows us to make unambiguous comparisons in terms of expressive power while in other cases it does not. For instance, we can unambiguously say that approval voting has more expressive power than plurality rule. At the same time, we cannot make an unambiguous comparison between the plurality and Borda rules. This means that our criterion does not allow us to unambiguously compare all voting rules in terms of expressive power. This issue can be addressed for collective decisions for which the empirical distribution of voters’ actual preferences is known. While this is usually not known or available, especially prior to the vote, it is possible to approximate this distribution based on polls or data from previous collective decisions in the same setting. In that case, one can juxtapose the observed empirical distribution of actual preferences with the preferences that alternative voting rules allow to express, and determine which voting rule allows voters to better express themselves (i.e., which voting rule leads to lower Kemeny distance between the observed distribution of actual preferences and the preferences that the voting rule allows to express). The voting rule that would allow voters to better express themselves in this sense would be the rule with higher expressive power. This alternative empirical criterion of expressive power would allow us to unambiguously compare all voting rules.

3 Experimental design

In this section, we present our experimental design following the order of the experiment: (i) the elicitation of actual preferences, (ii) the assignment to treatments and elicitation of expressed preferences, (iii) the turnout decision, and (iv) the additional questions at the end of the experiment. We end the section explaining the procedures we used to implement the experiment.

3.1 Elicitation of actual preferences

The experiment was presented to subjects as a survey on people’s opinions about how the UK government should “open up” the country after the first Covid-19 lockdown. The choice of a survey on this salient topic was meant to raise participants’ interest to turn out (i.e., to register their vote; see below) with low instrumental motives to do so and negligible probabilities of being pivotal. This partially shuts down these potential confounding factors, which we return to in our discussion of the results (Sect. 6).

The first stage of the experiment was meant to elicit subjects’ actual preferences over potential policies to “open up” the country that were envisioned by policy-makers during the first Covid-19 lockdown. Subjects were asked to rank the following three policies in order of their preferenceFootnote 12:

  1. (i)

    Policy A: Strict social distancing for everyone.

  2. (ii)

    Policy B: Strict social distancing only for vulnerable people.

  3. (iii)

    Policy C: Self-imposed social distancing.

Subjects were requested to write 1 for their preferred policy, 2 for their second-favourite, and 3 for their least favourite, and if they were indifferent between two or more policies they could give them the same number (i.e., subjects could report all possible preferences among the 3 policies). Importantly, they were informed that this ranking would not count for the voting results. We therefore infer subjects’ actual preferences from this ranking.

3.2 Treatments and elicitation of expressed preferences

In the second stage of the experiment, we randomly assigned subjects to different treatments and elicited their expressed preferences under different voting rules. Treatment 1 (T1), treatment 2 (T2) and treatment 3 (T3) are equivalent in all respects besides the voting rule that subjects use to vote:

T1.:

Subjects vote with the plurality rule. Subjects were told that they could give 1 point to their favourite policy and 0 points to the other policies or that they could give 0 points to all policies.

T2.:

Subjects vote with approval voting. Subjects were told that they could give 1 point to their two favourite policies and 0 points to their least favourite policy, or 1 point to their favourite policy and 0 points to the other policies, or 0 points to all policies.

T3.:

Subjects vote with the A-Borda rule, a synthetic rule that allowed participants to express all possible preferences with approval voting and the Borda rule. Subjects were told that they could give 2 points to their favourite policy, 1 point to their second favourite policy, and 0 points to their least preferred policy, or 1 point to their two favourite policies and 0 points to their least preferred policy, or 1 point to their favourite policy and 0 points to the other policies, or 0 points to all policies.

Three features of these treatments are worth mentioning. First, note that expressive power is strictly increasing from T1 to T3. This means that we can make unambiguous comparative predictions in terms of turnout. Second, note that the voting rule in T3 is a synthetic institution that, to the best of our knowledge, does not exist outside our experiment. The introduction of this voting rule follows the tradition of using experiments “to construct and implement institutions explicitly designed to test theory and to understand behaviour” (Kagel and Levin 1993, pp. 868–869; these authors implemented a synthetic institution to study the descriptive validity of Nash bidding theory). In our setting, the A-Borda rule allows us to make unambiguous comparisons in terms of expressive power and expected turnout that would not be possible with the Borda rule. In addition, although this rule has not yet been used to settle collective decisions outside our experiment, it is similar to other proposals to modify the Borda rule to allow for partial rankings (see Emerson 2013). Third, we note that the computer program enforced that subjects used the voting rules correctly to avoid mistakes in their use.

Finally, we introduce treatment 4 (T4) and treatment 5 (T5) which make salient to subjects the fact that there are “several common voting rules to gather people’s opinions through their vote”. The two treatments are as follows:

T4.:

Subjects vote with the plurality rule (i.e., equivalent to T1). The only difference to T1 is a screen, presented before the voting decision, in which we describe the plurality rule, approval voting, and A-Borda rule and inform subjects that they will vote using the plurality rule. See Appendix F for the exact wording.

T5.:

Subjects vote with the A-Borda rule (i.e., equivalent to T3). The only difference to T3 is a screen, analogous to the one used in T4 before the voting decision, in which we describe the plurality rule, approval voting, and A-Borda rule and inform subjects that they will vote using the A-Borda rule. See Appendix F for the exact wording.

These treatments allow us to study if awareness of different voting rules (and the contrast between those) affects the rate of turnout in our experiment. Since plurality and A-Borda rules are the voting rules with lower and higher expressive power in our experiment, the two treatments are enough to test this effect. The ability to test this effect with only two treatments is the main reason for not having run a similar treatment with approval voting.

3.3 Turnout decision

The third stage of the experiment elicits our main variable of interest: our proxy for turnout. After voting as explained in the previous section, subjects were asked if they would like “to register their vote”. Subjects were reminded that to register their vote they would “be asked some extra questions about pandemics that should take 3–5 min to answer before [they] continue the survey”, which would not be asked in case of not registering their vote.Footnote 13 They were also reminded that if they did not register their vote, their vote would not be counted when the results of the vote would be sent to them and others through the Prolific website, newspapers, and Twitter.

This incentive mechanism makes registering one’s vote costly. To register their vote, subjects needed to perform a real effort task for no extra pay. The objective is to mimic the costs associated with turnout. As a result, subjects willing to make their vote count had to exert effort. This decision is therefore used as our proxy for turnout.Footnote 14

It is worth noting that we reversed the natural order of turnout and voting in our experiment. In our experiment, subjects first cast their vote between options, and only then decide whether or not to turn out. This contrasts with most “real-life” voting situations where the order is the other way around. The main underlying reason for this design choice is that subjects are likely to be unfamiliar with some of our voting rules. Our method allows subjects to experience the voting rule and submit their vote before deciding whether to register their vote or not (i.e., turnout). We thereby minimize the number of subjects for whom we would not have observed an effect of expressive power on turnout because of a lack of familiarity with the rule. In other words, this design choice allows us to be more confident that subjects properly received our intervention (voting with a randomly assigned voting rule) before their turnout decision (our dependent variable).

3.4 Additional questions

At the end of the experiment, we asked subjects several questions about their perception of the voting rules as well as social demographics. In terms of subjects’ perception of the voting rules, we asked about the perceived expressiveness and perceived complexity of the voting rules they used to vote:

  • Perceived expressiveness: “From 1 (absolutely) to 10 (not at all), do you feel that the way you were allowed to give points to the policies let you express your preference about the 3 policies?”

  • Perceived complexity: “From 1 (very easy) to 10 (very complex), how difficult it was for you to understand how to attribute points to the policies?”

We also presented subjects with the three voting rules we used in our experiment and asked them their ex-post subjective ranking of the voting rules:

  • Preferred voting rule: “Which of these three rules [plurality, approval, and A-Borda rule, described in the same screen] would you have most liked to have used to vote for the 3 policies?”

In terms of social demographics, we asked about gender, age, education, health status, activity status, and several questions about past exposure to Covid-19. Finally, we also included an attention question among this final set of questions and we asked subjects how difficult they found it to answer the survey.

3.5 Procedures

A total of 2041 subjects were recruited online via Prolific on the 4th of May 2020. All subjects were UK nationals, currently living in the UK, with a minimum “approval rate” of 95% on Prolific.Footnote 15 The experiment was run online and participation was fully anonymous.

We paid subjects 1 pound for their participation. In line with our incentive mechanism, payment was independent of the duration subjects took to complete the experiment (we present durations per treatment and per turnout decision below). Subjects were sent the main results of the poll via the Prolific message board, where participants can be contacted keeping their anonymity. We will divulge the results of the experiment via newspapers and Twitter once the paper is published.

4 Predictions

Our conceptual framework allows us to make the following prediction in terms of average expected turnout in T1, T2, and T3:

Prediction 1

Average turnout in T3> Average turnout in T2 > Average turnout in T1.

Although our model is silent in terms of the effect of reference points and salience, it also allows us to compare the average expected turnout in T4 and T5:

Prediction 2

Average turnout in T5 > Average turnout in T4.

Evidence from psychology and behavioural economics on the effects of reference points and salience suggests that the difference in turnout between T4 and T5 should be greater than the difference between T1 and T3. Under T4 and T5, subjects are not only informed about the rule they will use to vote but also about other rules that could have been used for the vote. Subjects in these treatments can therefore contrast different voting rules. In particular, subjects can use the rules that were presented but with which they did not vote as reference points to judge the voting rule with which they voted. The influence of explicit contrasts of this sort on preference elicitation has been documented in various domains. Kahneman (1992) has seminally shown that individuals do not perceive their utility in an absolute way but in terms of changes with respect to reference points. Other researchers have also underlined the sequential process that occurs when attitudes or choices are elicited under such contrast or saliency manipulations (e.g. Shavitt and Fazio 1991; Dhar et al. 1999). According to these authors, subjects first form a similarity judgement and then express a preference, and the first comparative process (similarity) can affect the second one (preference). To the best of our knowledge, there is no direct evidence about salience or reference points in the context of alternative voting methods. However, these indirect pieces of evidence suggest that T4 and T5 may make the relatively low (high) expressive power of the plurality rule (A-Borda rule) more salient via the comparison to the other rules. We are therefore more likely to find results consistent with Prediction 2 than with Prediction 1.

5 Results

The following analysis is based on 1977 subjects that completed the experiment (393 in T1, 398 in T2, 400 in T3, 392 in T4, and 394 in T5).Footnote 16 We asked one attention question and controlled for the duration that each subject took to complete the survey. In Appendix A, we present the results for the restricted sample that did not fail the attention question and responded in a duration no lower than one-half of the treatment mean duration per turnout decision. In a nutshell, our main results are the same with the two samples.

In the final part of the experiment, we collected some social demographic measures. Table 1 summarizes subjects’ characteristics. Two features of this data are worth mentioning. First, the data shows no systematic differences across treatments. Empirical tests reported in Appendix B show that, with few exceptions, the differences in observed covariates across treatments are small and the sample is mostly balanced on these variables. Still, the observed differences across treatments and the potential effect of these particular covariates on our main variable of interest (turnout) justifies a robustness check, carried out below, in which we check for the treatment effect on turnout controlling for these covariates. Second, it is worth noting that education is relatively high in our sample, with a mean of 5.5 (between post-secondary non-tertiary education and Bachelor’s or equivalent level) and all subjects (with one exception) had lower secondary education or higher. We also asked subjects how difficult it was to understand the survey. From 1 (very easy) to 10 (very difficult), the sample average was 2.2 (2.1 in T1, 2.1 in T2, 2.2 in T3, 2.1 in T4, 2.4 in T5). This supports premises concerning the understanding of the instructions.

Table 1 Subjects’ characteristics

Before proceeding to our main analysis, we report the main results of the poll. The grey bars of Fig. 1 show the preferred policy of subjects that decided to register their vote and reported a unique favourite policy when voting. The results show that while Policy A was strictly preferred by almost half of the subjects that registered their vote (44.3%), the sample was far from consensual in terms of what the government should have done to “open up” the country. Policy B was strictly preferred by 19.7% of the subjects that registered their vote and Policy C by 17.8%. The black tops of each bar show the percentage of subjects that “approved” the corresponding policy alongside another policy. Several subjects voted for two preferred policies (17.1% in the treatments that allowed to report indifference), with the figure showing that subjects approved Policy B alongside another policy slightly more often than Policy A and Policy C (13.3% against 11.6% and 9.3% respectively). Finally, only 1.13% of subjects voted that they were indifferent between the three policies.

Fig. 1
figure 1

Preferred policy. Percentages are based on the subjects that decided to register their vote. Grey bars are percentages of subjects that voted for a unique favourite policy. The black tops of each bar are percentages of subjects that “approved” the corresponding policy alongside another one when voting. 1.13% of subjects “approved” the three policies

5.1 Actual and expressed preferences

We start our main analysis by looking at subjects’ actual and expressed preferences, and the relationship between the two. We are particularly interested in ascertaining if there is a difference between actual and expressed preferences, and if this difference is statistically significantly different across treatments.

Before voting, we asked subjects to rank the 3 policies in order of preference, allowing for indifference between two or more policies. We infer subjects’ actual preferences from this ranking. Table 2 reports the Kemeny distance between participants’ actual and expressed preferences, where a Kemeny distance of 0 means that subjects’ actual and expressed preferences were the same.Footnote 17

Table 2 % of subjects with Kemeny distance equal to 0, 1, and equal or higher than 2

The results show, first, that very few subjects vote with their actual preference in T1, T2, and T4 (see row “=0” in Table 2). This is not surprising, as the voting rules in these treatments restrict the preferences that can be expressed. Namely, these voting rules exclude the possibility to vote with a complete antisymmetric preference over the 3 policies, and 93% of subjects (1838 out of 1977) across all treatments reported a complete antisymmetric actual preference. We conjecture that this high number of antisymmetric preferences (or lack of indifference between policies) was due to the particular context in which our experiment was run—the heart of the Covid-19 crisis and the daily discussion of public health policies in the media—which is likely to have induced strong opinions about this topic. Second, these results show that a large majority of subjects in T3 and T5 voted with their actual preference. Still, a significant proportion of subjects voted with a preference different than their actual preference even though they could have voted with their actual preference. This difference can be explained by several factors, notably (i) mistakes (“noise” common in experiments) and/or (ii) strategic voting. We discuss these issues in Sect. 6. Third, these results show that most subjects that did not vote with their actual preference voted with a Kemeny distance of 1. This is expected, given that all voting rules allowed subjects to vote within a Kemeny distance of 0 (T3 and T5), 1 (T2), or 2 (T1 and T4) to all possible actual preferences.

We are particularly interested in differences across treatments. First, we find that subjects’ actual and expressed preferences are statistically significantly more distant in T1/T4 than in T2 (\(p \le 0.01\) in both cases, Pearson Chi-square tests).Footnote 18 This is explained by the presence of subjects with actual preferences with two favourite policies, that were allowed to express them in T2 but not in T1/T4. According to our model, this suggests that some subjects derived more expressive utility in T2 than in T1 and T4. However, our results show that there are few subjects for whom this may be the case in our sample (around 5%; see Table 2). This will be important for the interpretation of our results below. Second, we find that subjects’ actual and expressed preferences are statistically significantly more distant in T1/T2/T4 than in T3/T5 (\(p < 0.001\) in all cases, Pearson Chi-square tests). In this case, the number of subjects for whom the Kemeny distance is 0 in T3 and T5 is much larger than the corresponding number of subjects in T1, T2 and T4 (see Table 2). This suggests that a large number of subjects derived more expressive utility in T3/T5 than in T1/T2/T4. Third, we find that the difference between T1 and T4 and the difference between T3 and T5 are not statistically significant. These results can be summarised as follows:

Result 1

Subjects’ actual and expressed preferences are statistically significantly more distant in T1/T4 than in T2, and in T2 than in T3/T5.

In Appendix C, we report the results of an Ordered logit regression on Kemeny distance, with dummies for each treatment and controls for gender, age, education, health status, activity status, and past exposure to Covid-19. In a nutshell, Result 1 is robust when we account for individual observed characteristics: Subjects’ actual and expressed preferences are statistically significantly more distant in T1/T4 than in T2 (\(p<0.01\) for both tests, two-tailed Wald tests), and statistically significantly more distant in T2 than in T3/T5 (\(p<0.001\) for both tests, two-tailed Wald tests).

According to our conceptual framework, Result 1 shows that our experimental manipulation led to significant differences in terms of the expressive utility that voters derived from voting with different voting rules, and that such differences are in the intended direction: Subjects derived more expressive utility in T3/T5 than in T2, and derived more expressive utility in T2 than in T1/T4.

5.2 Turnout

Of the 1977 subjects that participated in the experiment, 80% decided to register their vote (1589 subjects). Subjects that decided to register their vote spent on average 12.8 min to complete the survey, while those who abstained spent on average 8.8 min to complete the survey.Footnote 19 This means that turning out entailed 4 additional minutes to complete the survey on average (an increase of 45% in duration for the same pay). This shows that 80% (20%) of subjects were willing (not willing) to exert considerable effort to turnout.

Figure 2 summarises our main turnout results. Looking first at the treatments without reference points (T1, T2, and T3), Fig. 2 shows that turnout was 80.15% in T1, 82.92% in T2, and 79.0% in T3. While the difference between T1 and T2 aligns with our main predictions, the differences between T1/T2 and T3 do not. However, the differences between these treatments are not statistically significant (Pearson Chi-square tests). These findings can be summarised as follows:

Result 2

Turnout is not statistically significantly different between T1, T2, and T3.

Result 2 contrasts with our predictions. In the next section, we discuss potential underlying reasons for this result. Once reference points are introduced (T4 and T5), however, results do align with our predictions. While turnout is 76.79% in T4, it is 82.99% in T5. The difference between T4 and T5 is statistically significant at 5% (\(p = 0.03\), Pearson Chi-square test). A 6.2 percentage point increase in turnout seems also “economically” meaningful given the high average turnout observed across treatments. Note as well that turnout is lower in T4 than in T1 and higher in T5 than in T3. Although these differences are not statistically significant (Pearson Chi-square tests), these results are qualitatively aligned with the relevance of reference points and salience. These findings can be summarised as follows:

Result 3

Turnout is statistically significantly higher in T5 than in T4.

Fig. 2
figure 2

Subjects’ turnout in percentage. The main bars display brute percentages of turnout per treatment. The error bars display one standard error of the mean, with values based on the asymptotic standard error of the maximum likelihood estimate of the proportion who voted

These results are robust when we account for possible differences in subjects’ observed characteristics across treatments. In Appendix D, we present the results of a Probit regression on turnout, with dummies for each treatment and controls for gender, age, education, health status, activity status, and past exposure to Covid-19. In a nutshell, we find no statistically significant difference for the effect of T1, T2, and T3 on turnout (two-tailed Wald tests), while the effect of T5 on turnout is statistically significantly higher than the effect of T4 when we control for individual characteristics at around 1% (\(p=0.016\), two-tailed Wald test). These results are also robust when we restrict the sample to the 1838 subjects that reported a complete antisymmetric actual preference, who can benefit from the greater expressive power of A-Borda rule (T3/T5) in relation to approval voting and plurality rule (T1/T2/T3). Taken together, these robustness results bring further support to Results 2 and 3, which suggest that differences in expressive power can have a significant impact on turnout as long as voters are aware of (and can compare) different voting rules.

5.3 Turnout and the distance between actual and expressed preferences

If the underlying mechanism of our conceptual framework is correct (our main hypothesis captured in Expression 2), we should observe, on average, subjects to be more likely to turn out if their expressed preferences are closer to their actual preferences. To test this, we look at the relationship between turnout and Kemeny distance. We find that subjects are more likely to turn out if they express their actual preference (Kemeny distance \(= 0\)) and less likely to turn out if they do not express their actual preference (Kemeny distance \(\ge 1\)), and this effect is statistically significant at 2% (Pearson Chi-square test; \(p = 0.002\) if we use Kemeny distances \(=0\), \(=1\), and \(\ge 2\)). When looking at the treatments separately, the results are qualitatively aligned with the positive effect of lower Kemeny distance on turnout, although the effect is only statistically significant for T4 and T5 (T4: \(p = 0.044\) if Kemeny distance \(=0\) and \(\ge 1\), Pearson Chi-square test; \(p = 0.019\) if Kemeny distance \(=0\), \(=1\), and \(\ge 2\)) (T5: \(p = 0.009\) if Kemeny distance \(=0\), \(=1\), and \(\ge 2\), Pearson Chi-square test). This means that, in line with our model, lower Kemeny distance is correlated with higher turnout, and this effect is particularly prominent in T4 and T5 as predicted.

However, note that this effect can be driven by non-sincere voters if these voters (for whom the Kemeny distance is higher than 0) are less likely to turn out. This is actually the case: subjects who could have voted with their actual preference but have not done so (around 30% in T3 and T5) were less likely to turn out than those who did vote with their actual preference (\(p = 0.041\) for T3 and T5, Pearson Chi-square test).Footnote 20

To further probe the relationship between turnout and the Kemeny distance, we test it in a restricted sample for which we exclude all subjects who did not vote with their actual preference but could have done so. This addresses the concern raised in the previous paragraph. We also exclude all remaining subjects in T3 and T5 because—contrary to T1, T2, and T3—all subjects that remain after implementing the previous restriction have voted with their actual preference (i.e., in T3 and T5 we cannot compare voters who voted with their actual preference against voters who did not vote with their actual preference because they could not have done so). We end up with a sample of 1177 subjects (392 in T1, 394 in T2, and 391 in T4). In a nutshell, in this restricted sample we find analogous results in terms of both sign and statistical significance to the previous results on the relationship between Kemeny distance and turnout. Taken together, these findings demonstrate that lower Kemeny distance is correlated with higher turnout, bringing further support to the main hypothesis of our conceptual framework (Expression 2).

5.4 Expressiveness, rule complexity, and preferred voting rule

At the end of the experiment, we asked participants several questions concerning the voting rules (see Sect. 3.3.4 for details). One of these is a direct question about the perceived “expressiveness” of the voting rule that the subject has used. We expected perceived expressiveness to increase in line with expressive power, i.e., expressiveness in T5/T3 to be higher than in T2 and expressiveness in T2 to be higher than in T1/T4. We also expected the effect of reference points to be noted here, such that the highest value would be observed in T5 and the lowest value in T4. Another relevant question is about the perceived complexity of the rule that the subject has used. We expected perceived complexity to be higher in T3/T5, followed by T2, followed by T1/T4. The results for these two questions are summarized in Table 3.

Table 3 Perceived expressiveness and perceived complexity (averages per treatment)

These results suggest that, in line with our predictions, T3 and T5 are considered the most expressive rules and T1 and T4 the least expressive. These differences are statistically significant (\(p \le 0.001\) for all tests, based on two-tailed Student tests and Wilcoxon signed-rank tests). T2 is also considered more expressive than T1/T4 as predicted (\(p = 0.003\) for T2 and T1, \(p \le 0.06\) for T2 and T4, based on two-tailed Student tests and Wilcoxon signed-rank tests), and less expressive than T3/T5, although the difference between T2 and T3 is very small and statistically insignificant. Once reference points are introduced, the difference between the perceived expressiveness of plurality and A-Borda rules increases. This supports the hypotheses that reference points are important for perceived expressiveness (which can explain why the difference between T2 and T3 is very small) and that the observed difference in turnout between T4 and T5 is due to the difference in the expressive power of the two rules. Note as well that according to this measure, the effect of reference points is more relevant in T5 than in T4, since perceived expressiveness is statistically significantly higher in T5 than in T3 (\(p \le 0.02\), based on a two-tailed Student test and Wilcoxon signed-rank test), but the difference between T1 and T4 is not statistically significant. In Appendix E, we test if perceived expressiveness has a significant effect on turnout using Probit regressions with individual controls. Our results show that perceived expressiveness has a positive and significant effect on turnout (\(p=0.02\)). Taken together, these results support the underlying mechanisms of our model, according to which higher expressive power (via higher expressive utility) leads to higher turnout.

In terms of perceived complexity, the results suggest that T3 and T5 are considered more difficult to understand than T1, T2 and T4. This accords with our predictions. Parametric and nonparametric tests suggest that the difference between T1/T4 and T3/T5 and the difference between T2 and T5 are statistically significant.Footnote 21 The complexity of the voting rule can, in principle, impact turnout in the following way: The more complex a rule is, the lower the expected turnout is, all else equal. To probe this effect, we test if perceived complexity has a significant effect on turnout using Probit regressions with individual controls (see Appendix E). Our results show that, as expected, perceived complexity has a negative and significant effect on turnout (\(p=0.04\)).Footnote 22 Note, importantly, that this effect runs in the opposite direction to the effect of the expressive power on turnout. This can partly explain why turnout is not higher in T3 than in T1.

Table 4 Preferred voting rule

Finally, at the end of the experiment, we presented subjects with the three voting rules we used in our experiment and asked them which voting rule they would have most liked to have used to vote for the 3 policies. Table 4 summarizes the results. A large majority of subjects (68.39%) would like to have used the A-Borda rule. This suggests that the A-Borda rule is preferred over plurality and approval voting to express opinions over policies.

6 Discussion

In this section, we provide an interpretation for our main findings and discuss some potential confounding factors. We start with the interpretation of Result 1: Subjects’ actual and expressed preferences are statistically significantly more distant in T1/T4 than in T2, and in T2 than in T3/T5. This result shows that differences in expressive power can have a significant impact in terms of the representation of voters’ actual preferences. According to our conceptual framework, it follows that differences in expressive power can have a significant impact in terms of the expressive utility that voters derive from voting. Our results in terms of the perceived expressiveness of the rules (Table 3), voters’ ex-post preferences over voting rules (Table 4), and the relationship between turnout and Kemeny distance (Sect. 5.5.3), all support this interpretation. If this holds more generally, the expressive power of voting rules has important implications in terms of expressive utility and the representation of voters’ actual preferences, and therefore the welfare of voters.

Turning to Result 2, the finding that turnout is not statistically significantly different between T1 and T2 can be explained by the low percentage of subjects who were able to better express themselves in T2 than in T1 (around 5%, see Table 2). Such a low percentage significantly reduces the impact of the treatment, and more research is needed to compare the expressive power of plurality and approval voting. In terms of T1/T2 and T3, the results reported above show that perceived complexity and non-sincere voting have a negative and statistically significant effect on turnout. Importantly, these effects run in the opposite direction to the effect of the expressive power on turnout. Therefore, this can partly explain the null result. In addition, there may be a bias in favour of more familiar (and/or even “real”) rules. In particular, we conjectured that without being able to compare the voting rule being used with other rules (absence of explicit reference points), the ability/inability to better express oneself would not be salient and its effect on the expressive utility of voting muted.

The anticipation of these potential effects motivated the inclusion of T4 and T5 in our experiment. The statistically significant difference in turnout between T4 and T5 (Result 3) brings support to the relevance of reference points and salience in our setting. According to this interpretation, subjects can use the rules that were presented but with which they did not vote as reference points to judge the voting rule with which they voted. This may create an emotional reaction of elation (frustration/disappointment) with the ability (inability) to express oneself. For example, a voter may feel frustrated for not being able to use a rule with which she would better express her opinion. It follows, according to this interpretation, that the ability/inability to express oneself becomes salient. In other words, T4 and T5 make the relatively low (high) expressive power of the plurality rule (A-Borda rule) more salient. This then magnifies the effect of the expressive power of voting rules on voters’ turnout likelihood.

We now turn to potential confounding factors that could, at least in principle, change across treatments and undermine the interpretation of our results: (i) non-sincere voting and (ii) instrumental sincere voting. In terms of non-sincere voting, there seem to be more possibilities for non-sincere (such as strategic voting) in T3/T5 than in T1/T4. This issue may be relevant since our results for T3 and T5—treatments in which voters could always vote with their actual preferences—suggest that around 30% of subjects did not vote sincerely (see Table 2).Footnote 23 While we cannot ascertain if the discrepancy between actual and expressed preferences is due to mistakes (“noise” common in experiments), strategic voting, or other reasons for non-sincere voting, two pieces of evidence suggest that non-sincere voting is not driving our results. First, non-sincere voting could not explain the effect of reference points observed in T4 and T5 with respect to T1 and T3. Second, it is not clear from a theoretical point of view if “non-sincere voters” are more or less likely to turn out than “sincere voters”, and our results suggest that they are less likely to do so than sincere voters. As reported above, subjects in T3 and T5 who could have voted with their actual preference but have not done so were less likely to turn out than those who did vote with their actual preference (\(p = 0.041\), Pearson Chi-square test). Similarly, the 9% of subjects in T1 and T4 that did not vote for their actual preferred option were less likely to turn out than those that did (\(p = 0.004\), Pearson Chi-square test). This evidence suggests that non-sincere voters (strategic and others) are less likely to turn out than sincere voters and are more prevalent in T3 and T5 than in T1 and T4. It follows that this effect, if present, runs in the opposite direction to the effect of the expressive power on turnout. Then, this can partly explain why turnout is not higher in T3 than in T1, but cannot drive the significant difference observed between T4 and T5.

In terms of instrumental (sincere) voting, this effect could also change across treatments. In particular, it is possible that some subjects may have voted fully (or partially) due to instrumental motives about influencing the result of the vote and/or impacting policy-making; if these voters had the illusion that their vote was pivotal and gave instrumental weight to voting with a certain preference than another (e.g. gave instrumental weight to voting for two favourite policies instead of voting for a single preferred policy, because their least preferred alternative, according to this illusion, would be less likely chosen), then this could in principle impact turnout differently across treatments. To probe the possibility of instrumental voting, after the turnout decision, we asked subjects (in an open-end format) the reasons why they decided to register their vote/not register their vote. Therefore, we can test if subjects stated expressive or instrumental motives behind their turnout decision. To check for this, we asked a research assistant unaware of our research objectives to code the answers as follows:

  • “Instrumental” whenever a subject refers that he/she voted to influence the result of the vote and/or impact policy-making (e.g. impact what the government will do). Examples include: “I think it’s good that my opinion might be able to have some impact, no matter how small”, “I think the more information you have regarding the population choices the better for the government to make an informed decision”, and “It might give me a voice in influencing a government policy which I think is in the interests of the country”. In case of abstention, whenever a subject refers they cannot influence the result of the vote and/or the vote cannot impact policy-making.

  • “Expressive” whenever a subject refers to reasons related to one’s voice and/or expressing opinions and/or the value of democracy and/or wanting one’s vote to count. Examples include “I think it is important to share opinions”, “It is important to have my say”, “Because it is important we voice our opinion”, “Because I think people should be listened to”. In the case of abstention, whenever a subject refers there is no point in expressing one’s opinion and/or no one listens or other ideas that negate the expressive value of voting.

  • “Other” whenever a response does not fit the two previous reasons. This includes general interest/curiosity, wanting to assist research, convenience, issues of privacy or lack of knowledge, or whenever the reason is not clear or well-connected to the two previous reasons.

The results show that among those who decided to register their vote, 36% of subjects gave expressive reasons for their decision, 10% gave instrumental reasons, and 54% gave other reasons. This suggests that few voters had instrumental reasons as a primary motive to register their vote. In addition, subjects’ responses suggest that differences in the frequency of instrumental reasons across treatments are small and not systematic (8.3% in T1, 11.2% in T2, 11.7% in T3, 10.63% in T4 and 8.9% in T5).Footnote 24 Taken together, these pieces of evidence suggest that different strengths of instrumental motives across treatments are also not driving our results.

7 Concluding remarks

In this paper, we proposed a simple ordinal model of expressive voting that allows us to compare voting rules in terms of the expressive utility that voters can derive from voting (the “expressive power” of voting rules). Our model provides a novel testable implication according to which expected turnout increases with expressive power, all else equal. We ran an online experiment to test this implication in a controlled environment and found that if voters are made aware of different voting rules, turnout is higher in voting rules with higher expressive power. This result suggests that voting rules with higher expressive power may increase participation whenever voters are aware of different voting rules (i.e., when there are explicit reference points). In other settings, such as committees or elections, one can easily raise awareness of alternative voting rules and provide reference points. For example, if a national voting reform increases expressive power, the government could link the reform with an informational campaign that would provide information about the old and the new rules and, in a more direct way than done in our setting, highlight the expressive benefit of the latter. In addition, our results show that higher expressive power is associated with a better representation of voters’ actual preferences and, according to our model, higher expressive utility.

Our analysis suggests that the expressive power of voting rules is a relevant criterion when deciding which voting rule to use in economic and political decisions. It shows how it is aligned with goals for voting and turnout behaviours which are generally supported in the literature. For example, there is a wide acceptance that higher turnout is a desirable goal for modern liberal democracies (e.g. Lijphart 1997; Engelen 2007; cf. Saunders 2012). Similarly, the better representation of voters’ actual preferences is often supported in the literature. For instance, Horowitz (2003) (p. 115) argues that “the best electoral system is the one that straightforwardly and most accurately reflects the preferences of voters”. Overall, this suggests that the expressive power of voting rules is a criterion with descriptive and normative appealing features that can complement existing criteria for the evaluation of voting rules.

Finally, our analysis points towards relevant future empirical research. For instance, are the results we find via our online experiment with an opinion poll generalised to political elections? While a similar experiment may be difficult (if not impossible) to run in political elections, it may be possible to gather indirect data as well as laboratory experimental data. While both these methods have relevant caveats in this setting that made us choose our experimental design, they would complement our analysis. It would be also interesting to compare electoral systems (the wider set of rules for elections that include but are not limited to voting rules) in terms of the expressive utility that voters derive from them. These lines of investigation are left for future work.