Introduction

ReferendumsFootnote 1 are seen as important vote-centric instruments of democratic innovation and a panacea against the crisis of representative democracy (Setälä and Schiller 2009). If run well, they are said to mitigate conflict (Collin 2014) and to strengthen political accountability (Le Bihan 2018), legitimacy (Persson et al. 2013), and efficacy (Bowler and Donovan 2002). On the other hand, flawed ballot processes endanger all these outcomes, be it in democracies or autocracies (Norris 2014). As the 2016 Brexit vote suggests, declining trust in public officials, under-funded electoral authorities, misinformation, and late additions to the legal framework may undermine the procedural quality of a referendum (James and Clark 2020). Even in well-run referendums, such as the Finnish citizen initiatives, or the Scottish petitions system, mere perceptions of procedural flaws are associated with declines in political trust (Carman 2010; Christensen 2019). What is more, modern authoritarian leaders from Rwanda to Russia use referendums to claim public support, ensuring results through the repression of political campaigning, media intimidation, or the co-optation of nominally independent electoral authorities (Reyntjens 2004; Irisova 2020). Thailand’s 2016 constitutional referendum, for instance, was a mere façade that effectively banned public debate and completely lacked independent oversight (McCargo et al. 2017). It failed to convincingly legitimize the new junta-drafted charter and instead fermented public discontent. The quality of referendum conduct clearly matters. Yet, thus far, there is no comparative and systematic evidence that allows us to assess direct democracy integrity (DDI).Footnote 2

In this paper, we develop and pilot an empirical instrument to evaluate the variety and integrity of referendums. Our conceptual frame of reference is the referendum cycle, which is based on the electoral cycle approach (Norris 2014), but also reflects important differences between referendums and elections. First, the initiation phase is a unique sub-cycle of referendums which does not exist in elections at all. Second, some dimensions of electoral integrity, such as boundary delimitation or candidate registration, are not relevant in referendums. Third, while electoral integrity focuses sharply on pro-incumbent bias as an impediment to fair elections, DDI highlights bias in favor of the status quo proponents, who may or may not be incumbents. The referendum cycle allows the evaluation of referendum quality in a number of dimensions from legal frameworks and procedures or the thematic limitations of referendums, to voter registration, the initiation of referendums, campaign and media coverage as well as campaign financing. Furthermore, the voting process itself, the post-referendums vote count and the role of the supervising authorities are important areas for evaluation.

Based on this framework we construct a measurement instrument to assess DDI, encompassing 19 factual and 59 perceptual items, to be used in expert surveys. This draws on the work of the Electoral Integrity Project (Norris et al. 2014) and similar expert survey methodologies. The instrument is implemented in a pilot study surveying 45 election experts on the Turkish constitutional referendum in 2017. The purpose of this pilot study is to ascertain—before scale-up—whether our instrument (a) is feasible; (b) produces exploitable variation; and (c) has any obvious issues regarding validity that need to be addressed.

The paper proceeds as follows: The next section develops the empirical framework, including the referendum cycle and criteria for integrous referendums. Section three introduces the measurement instrument. Section four discusses the context and results of the pilot study, and section five assesses critically the results of the survey in regard to internal and external validity. Section six concludes.

The integrity of direct democratic instruments

The waves of democratization in the 1980s and 1990s saw stronger demand for political participation. This led, among other things, to new forms of vote-centric direct democracy in some new and old democracies (Kersting 2016). Referendums are such instruments of direct democratic participation and of numeric democracy, not focusing on the election of personnel and candidates, but on thematic topics (Kersting et al. 2008). A plethora of different institutional arrangements exists, complicating the evaluation of direct democracy in referendums (Schiller 1999; Setälä and Schiller 2009; Kaufmann et al. 2010).

Just as elections, referendums are also meant to bring about collectively binding decisions about important societal questions. Yet, they differ from the former in regard to their initiation, bindingness, thematic limitations, and quorums. Table 1 presents a typology of referendums based on who initiates them and whether the results are binding (Kersting 2013). Regarding initiation, referendums are either mandatory or initiated in a top-down or bottom-up fashion. Plebiscites are started top down mostly by the executive, with varying provisions for control through the legislature, e.g., approval of the parliament or initiation only by the majority party. Another possible avenue in some polities is the initiation by citizens, in which the instrument is the final stage of the decision-making process. Regarding bindingness, referendums can be binding or merely consultative in nature, depending on the legal setting. Furthermore, there are frequently thematic limitations to the topics amenable to direct legislation as well as legislative changes. In most cases, this consists of a negative list, which specifies the exclusion of certain policy fields, for instance, budget policy, fiscal and financial policies, but also decisions on personnel (Qvortrup 2018). Finally, quorums also limit the application of referendums. Here, the threshold should be high enough to avoid an inappropriate inflationary use of referendums, while at the same time being low enough to make them relevant participatory instruments. Besides the ‘quorum for the initiative’ and a ‘quorum of signatures,’ there is also often a required threshold of turnout and/or proportion of participants for the results to be actioned (Aguiar-Conraria and Magalhães 2010).

Table 1 A typology of referendums

Direct democracy integrity

When does a referendum exhibit procedural integrity? The Global Commission on Elections, Security and Democracy, initiated by Kofi Annan, points to ‘democratic principles of universal suffrage and political equality as reflected in international standards and agreements’ and to ‘professional, impartial, and transparent […] preparation and administration throughout the electoral cycle’ (Global Commission 2012, p. 13). The relevant standards and agreements refer to the 1948 Universal Declaration of Human Rights and the subsequent specifications of the 1966 International Covenant on Civil and Political Rights. They have been ratified through numerous treaties, conventions, and protocols by individual countries and organizations such as the European Union (EU), the African Union (AU), or the Organization for Security and Cooperation in Europe (OSCE) (Davis-Roberts and Carroll 2010). There is an understanding that these international norms apply equally to different ballot processes, be they in representative or direct democracy. Indeed, election observation missions by the UN, EU, or OSCE have monitored numerous referendum processes, going as far back as the 1956 Togoland independence referendum (Beigbeder 1994), and making reference to these international agreements. It therefore stands to reason that direct democratic instruments should follow the same international standards and norms as elections.

We adopt this approach and, drawing on Norris (2014), define direct democracy integrity (DDI) as referendum conduct in accordance with international standards and obligations applying universally to all countries throughout a referencum cycle before, during, and after polling day.

Importantly, this definition recognizes that referendums consist of more than casting a vote. Although much media attention focuses on irregularities such as ballot box stuffing, ‘ghost voting,’ or vote count falsification, happenings on polling day are not the linchpin of procedural integrity. Rather, problems may emerge during the formulation of the legal framework, the registration of voters, campaign coverage by the media, the vote count, the performance of the electoral authorities, and dispute adjudication (Elklit and Reynolds 2005). A cyclical approach is called for, one that recognizes that procedural integrity can be broken at any step of the referendum cycle.

We therefore propose an evaluative scheme guided by the referendum cycle depicted in Fig. 1. The cycle leans on the work of Schedler (2002), Elklit and Reynolds (2005) and Norris (2014), encompassing administrative and managerial aspects of referendum conduct as well as the possible deliberate partisan undermining of the playing field. As mentioned above, the referendum cycle differs from the electoral cycle by including the phase of initiation. On the other hand, aspects of boundary delimitation or political party registration are irrelevant for referendums, which is why they are omitted from the cycle. Finally, while the above-mentioned works highlight pro-incumbent bias as an impediment to fair elections, the focus of our approach lies on bias in favor of the status quo proponents.

Fig. 1
figure 1

The referendum cycle

In the following, we delineate the different stages of the referendum cycle and their relevance for DDI.

Phase A: pre-referendum

The legal framework is the essential bedrock of procedural integrity, because it affects core aspects of the equity of the instrument itself (Elklit and Reynolds 2005, p. 155). It should have strong safeguards for the protection of individual political rights as outlined in the international standards mentioned above, and minority rights in particular. Are any parties treated unfairly or is the ‘status quo’ side favored? Are individual human rights protected?

As Table 1 suggests, legal provisions for referendum initiation vary considerably and may facilitate popular access to the instrument or, conversely, present institutional obstacles (Breuer 2011). Regardless of these provisions, the specific way in which a concrete referendum is initiated may elicit a stronger or weaker sense of legitimacy for the outcome. One important factor in this regard is the extent to which the executive uses its privileged position to dominate the agenda-setting process.

Voter registration differs from country to country, either being automatic or opt-in. In the latter case, bureaucratic hurdles and managerial incapacities may hamper citizens’ attempts to register for a referendum (James and Clark 2020). More nefariously, artificial increases or decreases in the electoral roll may subtly rig the vote (Cheeseman and Klaas 2018, pp. 44–48). In general, all (and only) eligible citizens have to be listed in the register,

Phase B: campaign

During the campaign, citizens’ form their opinion about the referendum question in different ways. Unimpeded campaigning by both proponents and opponents is a key requirement here. Package referendums, bundling a number of different topics, and complex constitutional referendums require some cognitive sophistication on the part of the electorate. Lack of knowledge is often wielded as an argument against direct democratic instruments, which is why accurate, balanced, accessible, and relevant information about the referendum topic is a crucial facet of referendum integrity (Renwick et al. 2019). In this regard, unambiguous wording of the referendum question itself (Rocher and Lecours 2017) or outreach programs with a deliberative component (Landemore 2018) may be important safeguards of integrity.

In regard to media coverage, proponents as well as opponents should have equitable access to political broadcasting and advertisement, and media reporting should not favor the status quo proponents nor the executive, but instead cover all parties and NGOs in a balanced way (Renwick and Lamb 2013). While attentive media, particularly social media, could expose electoral fraud by crowd monitoring the referendum, disinformation and factually inaccurate reporting may also undermine its integrity (Birch and ElSafoury 2017).

The regulation and practice of campaign finance has direct bearing on the competition among different interests campaigning in a referendum. International standards demand that this competition must take place on an even playing field, encompassing equitable access for proponents and opponents to political funding. This can be achieved, for instance, through bans and limits on donations and spending, financial disclosure provisions, and/or public funding (Reidy and Suiter 2015).

Phase C: polling day

The management of referendum procedures as a whole is an important part of integrity. Not only does information about the procedures of voting, etc., need to be made available. Officials must also treat the different actors impartially and in line with all legal provisions (van Ham and Garnett 2019).

Oftentimes observers and the media focus on problems during the voting process on polling day, where manipulation is more visible. Here it is important to evaluate whether voters’ expression of their preferences were altered through ballot box stuffing, shortcomings in the secrecy of the ballot, or the undue influence of coercion and clientelism on people’ vote choice (Birch 2011, p. 35). In addition, the process of voting should be easy and convenient for electors.

Phase D: post-referendum

The vote count presents a sensitive moment making or breaking a vote’s integrity and perceptions thereof. The counting itself should be supervised by different actors, including international and domestic election monitors (Grömping 2017). In general, the safety of voting results must be protected against physical and cyber security breaches, ballot boxes need to be secure, and results should be announced in timely fashion to avoid opportunities or room for manipulation (Elklit and Reynolds 2005). In case of electronic voting, an auditable paper trail of votes should be ensured (Alvarez et al. 2012).

The acceptance of results is a possible indicator of integrity in the post-referendum environment, as contests lacking integrity are often associated with claims of fraud, challenges to the results, and peaceful or even violent protests (Sedziaka and Rose 2015). Ideally, the existing legal channels resolve all disputes. However, in some cases, even integrous referendum results may be challenged for political gain. And contrarily, fraudulent referendums might remain unchallenged.

The electoral authorities administering the referendum must be impartial, transparent, and professional (James et al. 2019). They need to distribute all necessary information to citizens and allow scrutiny of their performance by external evaluators. The roles of electoral authorities in referendums may differ from elections (James 2017). An electoral management body (EMB) may be centrally in charge of the management of the polls, or delegate this role to local EMBs. It may oversee neutrally the drafting and wording of the referendum questions. Regardless of legal provision, the electoral authorities should administer the process neutrally and avoid politicization of their own role (van Ham and Garnett 2019).

Instrument and methodology

These theoretical considerations provide the basis for assessing DDI in a systematic, comparable, and suitably fine-grained manner. Different avenues for the measurement of electoral processes in representative democracy have been proposed, including coding from news sources, election observation reports, US State Department reports, or academic accounts (van Ham 2015). We construct a measurement instrument based on expert assessments, by adopting and adapting the approach used by the Electoral Integrity Project’s ‘Perceptions of Electoral Integrity’ (PEI) Index (Norris et al. 2014).

In this context, a referendum expert is a social scientist who has demonstrated knowledge of referendums or electoral processes in a country, such as through publications, membership of a relevant research network, or university employment. We identify experts from scholarly publications, the membership lists of national political science associations, the websites of political science departments in the country, and via snowballing.

An online survey instrument is then to be administered to these experts about 1 or 2 months after the referendum in question. The survey questionnaire includes 59 perceptual measures of referendum integrity covering the whole referendum cycle outlined above.Footnote 3 These items are grouped into eleven sequential sub-dimensions that reflect the dimensions of the referendum cycle (Fig. 1). Experts are asked on a five-point Likert scale to what extent they agree with the statement (‘strongly disagree’ to ‘strongly agree’). In addition, our survey instruments also ask three factual questions with a total of 19 items. These questions regard the types of referendums in a country, the topics referendums can deal with, provisions for public financing, and the availability of different convenience voting facilities. Finally, several demographic questions are asked of the experts.

We refrain from constructing an additive index similar to the PEI Index, and rather opt to analyze the pilot case by looking at individual items and the eleven sub-dimensions. There are several reasons for this. First, we do not want to make the assumption of equidistance that would be required to sum our ordinal measures. Second, non-response requires missing values to be imputed in order to construct an additive index. With only one case, we lack sufficient statistical power to run a multiple imputation process. Third, the PEI Index gives equal weight to all individual survey items. This could potentially be problematic, as some scholars have argued that electoral integrity should be multiplicative, rather than additive (Schedler 2002), and research questions the equal weighting assumption (Frank and Martínez i Coma 2017). For these reasons, we rely on a direct prompt asking the expert for an overall assessment of referendum integrity from 0 to 10 at the end of the survey.Footnote 4

In the following, this survey instrument will be used to describe the integrity of the Turkish constitutional referendum of 16 April 2017. Survey invitations were sent via email to 229 experts in mid-June, with two follow-up reminders 1 week apart each.Footnote 5 We received 45 responses, making for a response rate of 20%.

Pilot study: the Turkish constitutional referendum

The Turkish referendum of 16 April 2017 focused on constitutional amendments, abolishing the office of the Prime Minister and introducing an executive President. The change from a parliamentary system toward a presidential one included the considerable expansion of presidential rights, for example the appointment of the Supreme Board of Judges and Prosecutors. Only a few provisions in the referendum package strengthened the parliament. The number of seats in parliament increased from 550 to 600. Some of these amendments were highly controversial, and the referendum took place in a tense and highly polarized political environment. The state of emergency, prevailing since the failed military coup in July 2016, led to the dismissal of thousands of public sector officials, teachers, or professors, and an atmosphere of intimidation.

With a voter turnout of 85% the Yes vote won only with a very small margin, in that 51.4% voted in favor of the constitutional amendments, and 48.6% voted against it. A substantial amount of the electorate was Turkish migrant population abroad, and about 1.4 million cast their ballot from overseas. In Germany alone, the Turkish diaspora gathered around 660,000 votes. In Turkey itself, there was a substantial urban–rural divide, with most of the rural population in central Turkey—with the exclusion of Ankara—and in the North of Turkey voting in favor of the constitutional amendment. Urban centers, as well as areas on the Mediterranean Sea and in the Kurdish-dominated areas in East Turkey voted against the amendments.

The referendum was deeply criticized by electoral observer groups who noted an uneven playing field before the referendum. While they saw the technical administration of the referendum in a favorable light, the OSCE, for instance, criticized uneven access to the media, abundant negative campaigning against the ‘No’ side, including charges of terrorism, and lacking avenues for effective redress of electoral board decisions, among other things (OSCE 2017). The decision of the higher electoral commission to allow non-official ballot papers to be accepted was lamented by the Council of Europe and OSCE observers. While the Erdoğan government denounced international observers as being politically biased, the pro-Kurdish political party HDPE announced a complaint to the European Human Rights Court, and different opposition groups demanded the annulment of the referendum (Quamar 2017).

As such, the Turkish referendum provides a fertile pilot case for our research instrument. A variety of malpractices occurred, especially during the pre-referendum stage, whereas other more technical aspects were less problematic, which should ideally lead to variation in experts’ responses in different stages of the referendum cycle. The pilot study also allows us to assess the feasibility of the instrument in an adverse environment of intimidation where experts may be less likely to respond. Finally, the pilot case gives us valuable insight into whether the surveyed experts differ significantly in their assessment, or whether their evaluations coalesce.

Survey results

In the following, we describe the results of the expert survey by noting the percentages for each response category, in line with recommendations for presenting ordinal variables (Pollock III and Edwards 2019, p. 42). Percentages for the high (‘strongly agree’ and ‘agree’) and low (‘disagree,’ ‘strongly disagree’) categories are added up. Percentages for the middle category (‘neither agree nor disagree’) are not presented but can be calculated. The supplementary materials plot the full results as diverging stacked bar charts for each of the eleven stages of the referendum cycle. The focus of the analysis in-text is on the items where expert opinions show clear agreement, while also mentioning particularly controversial statements within the group of experts.

Of note, we do not focus on the political implications of the results, although the outcome of the referendum was arguably the final step in Turkey’s creeping autocratization (Esen and Gumuscu 2016; Çalışkan 2018). The concept of referendum integrity is process-focused and therefore excludes per se the outcomes of any given referendum. It is logically possible that an integrous referendum may have produced the same autocratizing outcome, or, conversely, that even a referendum without integrity may not have produced it.

Figure 2 shows the eleven dimensions of the referendum cycle, plotting for each dimension the average percentage of agreement with survey questions denoting higher integrity. For this purpose, positively phrased statements were counted as denoting integrity, if the expert answers ‘strongly agree’ or ‘agree’ to the prompt. Conversely, negatively phrased statements (27 of the 59) were counted as denoting high integrity if the expert disagreed or strongly disagreed. The graph shows clearly that campaign finance, media coverage, and the performance of the electoral authorities were seen as the most problematic dimensions. Only 10% of experts on average responded positively to the campaign finance prompts, while 16% and 17% on average saw media coverage and the authorities as integrous. Conversely, we see a slightly more positive picture in the areas of voter registration, where 35% of experts on average responded in the positive direction, and the vote count (36%). It must be noted, however, that even these slightly more positive categories are overall evaluated fairly negatively, keeping in mind that the scale potentially ranges from 0 to 100%. A high value of only one-third in the vote count therefore only means that, within the context of the Turkish referendum, this dimension performed relatively better.

Fig. 2
figure 2

Integrity of eleven stages of the referendum cycle. Note: Mean percentage of agreement statements denotes higher referendum integrity (‘strongly agree’ or ‘agree’ for positively worded statements and ‘disagree’ or ‘strongly disagree’ for negatively worded ones). N = 45

Pre-referendum

The legal framework was regarded as highly skewed, in line with damning remarks from international monitors that there was no ‘coherent legal framework adequate for holding a genuinely democratic referendum’ (OSCE 2017, p. 1). 70% of the experts agreed with the statement that elections laws were unfair, while 61% stated that the referendum laws favored the ‘Yes’ side. Only half of the experts (56%) stated the referendum laws were restricting citizen rights.

In terms of referendum initiation, the contest was a top-down plebiscite. As opposition parties were excluded from the development of the constitutional amendments, 93% of the experts stated that the executive dominated the agenda setting. A much smaller number, 61%, postulated that the holding of a referendum weakened representative democracy. On the question whether the referendum strengthened citizen interest and engagement half of the experts agreed and half of them disagreed. Finally, 89% of the experts believed that the process overall was not seen as legitimate and fair, casting strong doubt on the way the referendum was initiated.

Overall, the process of voter registration was seen as less problematic, although one problem noted by observers was that internally displaced persons from the war-torn East of the country had difficulties registering (OSCE 2017, p. 6). There is, however, strong disagreement among experts in this dimension. Slightly more than half of the experts claimed that some citizens were not listed in the register and that the electoral register was inaccurate, while the other half disagreed. Only the statement that some ineligible electors were registered was agreed on by 63%.

Campaign

As mentioned above, electoral observer groups sharply criticized the conduct of the campaign. Experts also reflect this with 90% agreeing that politicians offered patronage to their supporters, and 86% stating that some groups were restricted from holding campaign rallies. The electoral authorities in fact barred large swaths or interest groups and even political parties from conducting campaign rallies, often using the emergency laws as a pretext (Esen and Gümüşçü 2017, p. 313). More than two-thirds of experts said that the authorities did not implement appropriate dialogical outreach programs, and the same amount of experts noted the lack of sufficient information given to understand the issues at stake. Furthermore, half of the experts (55%) also criticized that citizens were given sufficient time to discuss the issues. Nevertheless, only a small group of experts, about one quarter, stated that the questions listed on the ballot paper were ambiguous.

Media coverage was seen as a highly problematic area, as the ‘Yes’ campaign benefitted from extensive and favorable coverage (Quamar 2017). Experts noted that the opposing sides did not have equitable access to the media to broadcast their messages (91% agreed on this). 62% said that TV news favored the ‘Yes’ camp, whereas only 11% each said that journalists in general, and newspapers in particular provided fair and balanced coverage. A more positive note is that 75% said that media paid attention to issues of procedural fairness in their coverage, but on the other hand this may simply mean that procedural fairness was undermined and therefore topical.

The regulation and practice of campaign finance was seen as a particular problematic stage of the referendum cycle. The biggest critique, by about 93% of the experts, focused on the misuse of state resources for campaigning, as state budgets financed large official ceremonies-cum-campaign rallies by the President and Prime Minister (Esen and Gümüşçü 2017, p. 313). Furthermore, there was strong agreement that there was no equitable access to public subsidies (95%) or political donations (84%), respectively. A lack of transparency of financial accounts was lamented by 93% of experts. Nevertheless, when it came to the statement that rich people buy referendums, a plurality (36%) denied this and only one quarter of the experts saw this as a problem.

Polling day

In terms of the referendum procedures, a clear majority of two-thirds (68%) said that the authorities provided an adequate number of polling stations. In addition, the information about voting procedures and its availability was seen positively by a plurality of 47%. Nevertheless, an astounding 87% said that the referendum was not conducted in accordance with the law, 77% said that the election officials were unfair, and 89% claimed that the referendum was not well managed. This draws into sharp focus the unequitable referendum management. Polling places in Kurdish-dominated regions, for instance, were relocated to remote and difficult to access locations, depressing turnout in these areas who predominantly favored the ‘No’ campaign (Esen and Gümüşçü 2017, p. 314).

The voting process was also regarded critically. There was only one positive aspect, which focused on the ease of the process of voting (78% agreement). The relative majority (45%) saw the whole process and the ballot as confidential; meanwhile, a substantial group of 30% disagreed in this regard. Independent election forensic analysis found that roughly a tenth of polling stations experienced some form of ballot stuffing, more than enough to sway the outcome of the referendum (Klimek et al. 2018). Consequently, 86% of experts criticized that some fraudulent votes were cast. More than three quarter of the experts also mentioned that some voters were threatened with violence at the polling stations. The same amount said that people were not free to vote and there was a feeling of pressure, while 69% complained that some voters feared of becoming victims of political violence. Criticism also focused on bribery, in that people received cash gifts or personal favors in exchange of their votes (70% agreement), and that voters were bribed (60%). This concurs with extant research noting the pervasiveness of clientelism in Turkish elections (Carkoglu and Aytaç 2015).

Post-referendum

The vote count was evaluated somewhat more positively in comparison with other aspects. For instance, 82% of experts declared that the results were announced without undue delay. However, responses overall still suggest a seriously flawed counting process, as nearly two-thirds of experts agreed that both domestic and international election monitors were restricted. Paper records of the vote were not kept (74% agreement), and 79% complained about the fair counting of the votes. A smaller group of 64% complained about the lack of safeguards for hacking into official electoral records, and 58% claimed that the ballot boxes were insecure.

There is overwhelming agreement among experts that the post-referendum environment was contentious and contested, given the ‘No’ campaigns strong rejection of the results. All but 2% of experts stated that political parties or individuals challenged the results. Also, 81% said that the referendum led to peaceful protests, and 72% stated that violent protests were triggered. A bit more than two-thirds of the experts complained that voting results were not subject to a post-election audit (72%) and that disputes were not resolved through legal channels (68%).

Finally, the electoral authorities administering the referendum were seen in a negative light, reflecting all the problems with ballot stuffing, inaccessible polling places, inaccurate voter rolls, or pro-‘Yes’ bias mentioned above. Only 38% of experts agreed that they distributed adequate information about the process, and even 76% denied that authorities allowed public scrutiny of their performance. While 86% did not see the authorities as impartial, 81% of the experts stated that the authorities did not perform well.

Overall assessment

Bringing together the different aspects, the experts were asked for an overall evaluation of the integrity of the referendum. Here the answer could range from very poor (0) to very good (10). The mean response was 2.5, with a standard deviation of 2.3. The median answer was even lower at 2. The final overall conclusion about the integrity of the referendum therefore casts an extremely negative light on the whole process. However, there are some outliers in the responses. Specifically, there are a small number of nine experts that evaluate overall integrity much more positively than the rest, with scores of above seven.

Challenges in measuring direct democracy integrity

The empirical results of the pilot study show that the research instrument is feasible and picks up variation of referendum integrity along the different stages of the referendum cycle. Unfortunately, we cannot assess the validity of the instrument directly against other measures, as no such measures currently exist. We also cannot compare the experts’ evaluation of the Turkish referendum with a baseline of other referendums. Nevertheless, in evaluating the research instrument and plotting ways forward, we can look at three aspects of the experts’ responses: congruence with other independent evidence, possible bias in the responses due to expert characteristics, and disagreement of experts on factual and perceptual questions.

First, do the experts agree with other measures of electoral integrity? The experts’ very low assessment of overall integrity (2.5 out of 10, see above) is suggestive of serious problems in direct democracy integrity. This concurs with assessment of the integrity of representative democracy (elections) in Turkey. For instance, the Electoral Integrity’s PEI Index ranks Turkey 116th out of 164 countries, with a PEI Index of 47 (out of 100) (Norris and Grömping 2019). Similarly, the Varieties of Democracy (V-Dem)Footnote 6 project detects a substantial decline in electoral integrity in the country, with a decrease in the ‘Clean Elections Index’ from 3.6 (out of 4) in 2006, to only 2.3 in 2017. Furthermore, comparative political scientists see the referendum as a major step in the further autocratization of Turkey (Çalışkan 2018). The low score of the 2017 referendum therefore seems broadly in line with other indicators of electoral integrity in Turkey.

Second, it is possible that certain characteristics of the experts color their assessment. To test this possibility, Table 2 shows the results of OLS regression of the rating variable (integrity from 0 to 10) over a number of demographic and attitudinal characteristics of the experts. As it turns out, political leanings of the experts are significant predictors of their assessment. Specifically, experts who self-identified as having voted for the ‘No’ side tend to evaluate the referendum more negatively. Yet, this association is only significant at the 10% level. More telling may be the association of a more right-leaning political ideology with more positive assessments of integrity. This suggests that disagreement among experts may be an outcome of their political leanings. This urges for more research into the effects of this bias on the expert evaluations. Given that the majority of experts identified as ‘No’ voters (58%), and similarly, a majority of 79% identified as left-leaning (score of 5 or less on the Left–Right scale), this calls for concerted efforts to achieve more balanced panels of experts for future iterations of the instrument.

Table 2 Explaining referendum integrity rating by expert characteristics

Third, disagreement of experts may be an indicator of lacking internal validity. There are two types of disagreement. The first is perceptual, and the second is factual. Perceptual disagreements show up in the divergence of experts’ views on the 59 integrity items. Here, disagreement may be an outcome of different perceptions of events, either due to diverging information or diverging interpretation of these events. Ideally, we would have hoped for considerable agreement, but as Fig. 3 shows, there are many items where this is not the case. The figure plots on the Y-axis the extent of agreement as the absolute difference between the percentage of experts answering positively on a survey item and the percentage answering negatively. High levels of agreement are seen about whether parties challenged the results, inequitable access to public subsidies, or executive domination of the agenda setting. Here, upward of 85% of experts answered in the same direction. In contrast, some items elicited considerable disagreement. Examples are prompts such as ‘Holding a referendum strengthened citizen interest and engagement,’ ‘The authorities provided information to citizens,’ or ‘Rich people buy elections. Here, almost equal numbers of experts answered positively as negatively. The graph also shows that as disagreement increases, the percent of neutral answers (‘neither agree nor disagree’) does as well (r = 0.56, p < 0.001). It therefore appears that experts opt for the neutral response as an indication of uncertainty about the question.

Fig. 3
figure 3

Agreement of experts on perceptual questions

We can also look at the set of factual questions about the regulatory setting of the referendum. Ideally, experts should agree on these items to an even higher degree than on the perceptual questions, because there is an objectively ‘correct’ answer to them. Using all 19 factual items, an overall Krippendorff’s alpha of 0.51 is calculated, which, by most accounts, would not pass requirements of inter-coder reliability. Figure 4 visualizes this divergence of experts’ knowledge about the country in question. About half of the factual questions is answered unambiguously by the experts with either an overwhelming ‘Yes’ or ‘No.’ However, there are a small number of items where the disagreement is strikingly large. These are questions about voting facilities for the disabled, required distribution of impartial information by the EMB, public subsidies, and thematic restrictions. Here, almost as many experts answered ‘Yes’ as ‘No.’

Fig. 4
figure 4

Agreement of experts on factual questions

Conclusion

Like representative elections, referendums may not live up to the promises of increased legitimacy, efficacy and conflict reduction. These outcomes may or may not be related to the procedural integrity of the referendum process itself. However, in order to ascertain this, we need a way to evaluate the integrity of direct democratic instruments.

This article developed an empirical instrument to assess the variety and integrity of referendums. Drawing on the work of the Electoral Integrity Project (Norris et al. 2014), we proposed an expert survey instrument that measures direct democracy integrity (DDI) along eleven dimensions of a referendum cycle. This instrument was piloted in the Turkish constitutional referendum of April 2017, and evaluated as to its feasibility, sensitivity to variation in integrity, and its validity.

Firstly, we are encouraged about the feasibility of this instrument, suggested by a good response rate and acceptable levels of missingness, even in a difficult setting of creeping autocratization where response rates might be expected to be lower. Secondly, the instrument produced exploitable variation, in that integrity was evaluated more positively in the areas of voter registration, referendum procedures and the vote count, and more negatively in regard to the campaign, media coverage, as well as the voting process and the post-election environment. The electoral authorities and the fairness of the initiation process were equally criticized. The assessed variation concurred with election monitors’ evaluations and with scholarly work on the subject. Thirdly, a look at the validity of the instrument itself showed that legal settings governing referendums are complicated, leading to expert disagreements on crucial factual questions and on the perceptual aspects of the survey. Regarding factual questions, the data can be crosschecked with national data on direct democracy. Besides the legal requirements (constitutional law) there are informal settings which interpret, stretch and sometimes contradict the constitution and lead to a different outcome (constitutional reality). This suggests that expert evaluations need to be used with caution. On the other hand, we also saw that the overall negative assessment of integrity in the Turkish referendum was broadly in line with other independent evidence, although political leanings of experts color their assessments.

This is only the first step in the further development of the instrument. The analysis presented here suggests that we need to broaden our expert pool to include a wider spectrum of political attitudes. On the other hand, we may need to consider using more narrow criteria to identify election experts, or use the factual questions to filter out responses from individuals that are not knowledgeable about electoral regulations and practices in the country. Finally, we may consider dropping some of the survey items that elicit the greatest level of disagreement. It may be that those survey items are phrased ambiguously and in a way that confuses respondents.

Nevertheless, the pilot study encouraged us that the instrument works in principle and, specifically, that it has the potential to be applied to a range of contextual settings. Besides the analysis of liberal democracies, it allows the evaluation of DDI in (semi-) authoritarian regimes, making referendums comparable. With a vogue of electoral authoritarian regimes, some dictators use referendums as well as elections to strengthen their position. Here, an index of direct democracy integrity, similar to the Perceptions of Electoral Integrity (PEI) index, will be a useful and relevant instrument facilitating future comparative research.