Corrupted Estimates? Response Bias in Citizen Surveys on Corruption Bachelor’s thesis in Statistics (STG350)

,

There is now a near consensus among researchers about the destructive consequences of corruption 1 and the societal benefits of clean government (Bardhan 1997;Holmberg and Rothstein 2011;Mauro 1995;Mungiu-Pippidi 2013;Rose-Ackerman 1999;Rothstein 2011). In the light of this, measuring corruption has become a global industry, with leading actors like Transparency International (TI) spending millions of dollars on the construction of corruption indicators and the surveying of ordinary citizens' attitudes to and experiences of corruption. These measures are nowadays used to estimate the incidence of corruption in different countries and to research different corruption-related questions in political science and economics. However, we still know little about the quality of many of these commonly used measures. This paper aims at investigating two potential sources of bias in frequently used indicators based on citizens' perceptions and experiences of corruption, stemming from political bias and sensitivity bias.
While official statistics might seem like a natural starting point for measuring corruption, these data are often problematic for this purpose and say more about the independence of the judiciary than actual levels of corruption in a country (Fisman and Golden 2017;Holmes 2015). Most existing indicators and measures are therefore either perception based or based on self-reported experiences of corruption. While important corruption measures like TI's Corruption Perception Index (CPI) 2 are largely based on the (aggregated) perceptions of business people and country experts, many large-scale multi-country projects survey citizens directly about their perceptions and experiences of corruption. 3 These projects include, for instance, the Global Corruption Barometer (GCB) 4 and the Eurobarometer on corruption 5 . One big advantage with citizen surveys is that they give individual-level data from ordinary people around the world. These data can then be used to study important individuallevel research questions like: who gets asked to pay bribes (Mocan 2004;Olken 2009); how individual corruption perceptions and experiences are related to incumbent support and vote choice (Gingerich 2009;Klasnja et al. 2016;Xezonakis et al. 2016;Zechmeister and Zizumbo-Colunga 2013); how individual corruption perceptions and experiences are related to political legitimacy and support for the democratic system (Anderson and Tverdova 2003; 1 Corruption in the World Bank's definition is "the extent to which public power is exercised for private gain, including both petty and grand forms of corruption as well as 'capture' of the state by elites and private interests" (Kaufmann et al. 2011, p. 4).
2 https://transparency.org/research/cpi/overview 3 Unlike corruption surveys based on citizen interviews, expert-based corruption indicators have been widely discussed and criticized (see Hamilton and Hammer (2018) for an overview of this debate). 4 https://transparency.org/research/gcb/overview 5 https://ec.europa.eu/home-affairs/news/eurobarometer-country-factsheets-attitudes-corruption_ en Dahlberg and Holmberg 2014;Seligson 2002). Standard corruption questions are therefore nowadays incorporated into general surveys like Lapop 6 , World value survey 7 , Comparative Study of Electoral Systems 8 , and the International Social Survey Programme 9 . However, little is known about how people form their perceptions of corruption and to what extent their reports of encounters with corruption are accurate.
A large body of research has documented reporting bias among survey respondents, where an individual's reported perception of some phenomenon might be influenced by factors other than the actual occurrence of the phenomenon (see Bartels 2002;Berinsky 1999;Tourangeau and Yan 2007). This paper explores two potential sources of bias in individual reports on perceptions and experiences of corruption. First, I draw upon research on economic perceptions and economic voting 10 , and related research on political bias and motivated reasoning, and argue that respondents are likely to respond in a political manner when asked how they perceive corruption in their country. Corruption is an issue that citizens care deeply about (see Holmes (2015) and World Economic Forum (2017)), and hence an issue where citizens can be expected to react to information on the basis of prior affect and political affiliations (see Anduiza et al. (2013) and Fischle (2000)). As a result, we should, for instance, expect incumbent supporters to in general report a substantially more positive view of corruption in their country, compared to other groups.
Second, I argue that direct questions about corruption experiences are likely to be sensitive and hence subject to 'sensitivity bias' (or, more specifically, 'social desirability bias') (Blair et al. 2018), e.g. a type of response bias where respondents answer questions in a manner that will be viewed favorably by others (Tourangeau and Yan 2007). Research shows that citizens around the world strongly disapprove of corruption and bribe giving. For instance, data from the World Value Survey show that 'accepting a bribe' is viewed as a worse offense than 'stealing' or 'cheating on taxes'. Corruption is also something that is illegal in all countries. Even in countries where corruption is very widespread, citizens still view bribe payments and the misuse of public money as a serious moral wrong that can not be justified (Persson et al. 2013;Rothstein and Varraich 2017). Therefore, admitting to having been part of a corrupt transaction is arguably an act of revealing sensitive information and hence something that is likely to be under-reported. This, in turn, makes estimates of the level of corruption in society based on experiential surveys likely to be biased.
I test these conjectures with two survey experiments included in a large survey fielded in Romania to over 3000 respondents. The first experiment is designed to randomly make the political affiliations of one group of respondents more salient before answering questions about corruption perceptions. The second experiment, deploying a so called 'list experiment', is designed to minimize the likelihood of sensitivity bias among another group of respondents (Blair and Imai 2012;Glynn 2013). This allows me to evaluate both hypotheses empirically.
The results show strong evidence of different types of response bias with regard to questions about corruption. First, incumbent supporters systematically report a much more positive view of corruption in Romania. A simple question order prime -asking about political affiliation before corruption perceptions -makes this effect almost twice as large, suggesting that a substantial part of the gap is the result of respondents 'defending' and justifying their political identity. Second, the results suggest that direct questions about corruption experiences are sensitive and under-reported. For some groups, like women, the underreporting is massive, according to the estimates. For this group the true rate of corruption victimization might be three times as high as the reported rate.
The study makes several contributions to the literature. Research has shown that respondents often show strong political bias with regard to attitudes and perceptions about the economy. The study shows that questions about corruption exhibit very similar patterns, and that responses to these questions are malleable and susceptible to political bias. The study also demonstrates that direct questions about corruption experiences need to be treated like sensitive questions, where different groups show diverging patterns of reporting bias. These results call into question some conclusions from previous research about who is most likely to be the victim of corruption. Overall, the results from the study should be of interest for corruption researchers, designers of surveys, and anti-corruption practitioners alike.

Political bias in surveys
Researchers have long acknowledged that survey responses are sometimes unstable and inaccurate, and that there are clear psychological incentives to shape responses in certain ways (Berinsky 1999;Zaller 1992). In particular, research on citizens' evaluations of the economy has uncovered strong response biases stemming from the respondent's political leanings (Lau et al. 1990;Palmer and Duch 2001;Sears and Lau 1983;Wilcox and Wlezien 1996). For instance, in a study of British voters Evans and Andersen (2006) show that sociotropic perceptions of the economy are strongly conditioned by prior opinions of the incumbent Conservative Party. On the other hand, lagged economic perceptions seem not to influence incumbent popularity in the same way.
Why would political affiliation affect how respondents perceive the economy? Campbell et al. (1960) argued that party identification is a 'perceptual screen' that works as a filter through which economic performance is assessed, and that an individual tends to see what is favorable to his or her partisan orientation. In this sense, a survey response might reflect a sort of expressive political 'cheerleading' in which respondents express their general affinity for an incumbent or a political party. The perceptual screen might also affect the perception of objective reality directly. Beliefs about basic political facts have been shown to be shaped by political identification to a significant degree, partly due to selective information processing (Taber and Lodge 2006). Survey respondents from different political camps can hence experience different versions of 'objective' reality (Bartels 2002;Gerber and Huber 2010).
Respondents might also reason their way to the conclusion that the economy is doing better when their preferred party or politician is in power. The theory of motivated reasoning holds that all reasoning is motivated in the sense that it is driven by specific motives and goals. Taber and Lodge (2006) argue that these goals often are directional goals (as opposed to accuracy goals) where individuals apply their reasoning powers in defense of a prior specific conclusion. Directional goals are thus often defensive of particular identities, attitudes, or beliefs that are strongly held (Leeper and Slothuus 2014).
In political science the majority of this body of research has focused on political bias in economic perceptions. Much less is known about how political affiliations might interact with corruption perceptions. Recent decades have seen a rapid increase in efforts to measure corruption, both via expert surveys and surveys of the public (Fisman and Golden 2017;Holmes 2015). Many of the survey instruments used in the latter category clearly resemble the instruments used to tap into people's economic perceptions; Klasnja et al. (2016) even adopt the terms 'sociotropic' and 'egotropic corruption voting' directly from the economic voting literature. Several studies use survey measures of corruption perceptions to predict political attitudes like incumbent support (Klasnja et al. 2016;Xezonakis et al. 2016;Zechmeister and Zizumbo-Colunga 2013, e.g.), or satisfaction with democracy (Anderson and Tverdova 2003;Dahlberg and Holmberg 2014;Seligson 2002, e.g.). To be able to estimate the effect of corruption in such a setting it is important that corruption perceptions are (exclusively) determined exogenously to avoid bias. Hence, we would hope that these perceptions are only determined determined by external changes in an individual's environment.
However, recent studies give us reasons to believe that this might not be the case. Anduiza et al. (2013) show that tolerance for corruption can have a clear political dimension. In a survey experiment fielded in Spain the authors show that respondents' judgment of the seriousness of a political corruption scandal partly is determined by whether the accused politician belongs to the respondent's preferred party. The authors argue, in line with the literature on economic perceptions, that this is a way for respondents to reduce cognitive dissonance: by downplaying the importance of corruption when it affects the own party respondents make the political world more consistent with their political predispositions. Jerit and Barabas (2012) show that individual-level motivated bias is present on a wide range of topics, as long as a question has importance or strong political implications.
I argue that corruption perceptions are likely to be such a topic. People view corruption in society as a question of great importance (Holmes 2015). For instance, when the World Economic Forum in 2017 surveyed individuals in 186 countries about the most pressing political issue 'government accountability and transparency/corruption' ranked 1st (World Economic Forum 2017). About 25% of Europeans say that they are 'personally affected by corruption in their daily lives'; the number for countries like Romania, Croatia, and Spain is as high as 60-70% (Eurobarometer 2017). People in countries where corruption is widespread also tend to associate current levels of perceived corruption with the incumbent government (Klasnja 2015;Klasnja et al. 2016;Xezonakis et al. 2016), and view 'the fight against corruption' as one of the priorities that should be most important for political leaders (Holmes 2015). Incumbent supporters therefore have a 'preferred world-state' where corruption levels are decreasing (this supports their political leanings), while opposition supporters have incentives to view the situation as worse (this would be a reason to oust the current incumbent). Voters who sympathize with the government (for whatever reason), for instance, might therefore convince themselves that the situation with regard to corruption is more positive than what is warranted by evidence.
In general, a connection between reported corruption perceptions and incumbent support can exist for two reasons: (1) the respondent might experience changing corruption levels in society and adjust his or her support for the incumbent accordingly, (2) the respondent reports perceived corruption levels that are consistent with his or her political predispositions. If the latter is true, making political affiliations more salient should affect reported corruption perceptions, whereas if corruption perceptions are only determined exogenously this should not be the case. In line with the literature on economic perceptions reviewed above, I argue that we have reasons to believe that some degree of political bias is present in the reporting of corruption perceptions. In this sense a respondent's reported corruption levels can be a way to defend and justify his or her beliefs about the current incumbent. I choose to focus on incumbent supporters since this group is relatively easy to define, also in a multiparty system (I discuss this in more detail below). My first hypothesis can thus be stated as follows: • H 1a . On average, incumbent supporters will report lower perceived levels of corruption compared to opposition supporters.
• H 1b . Increasing the salience of political affiliations will cause incumbent supporters to report even lower levels of corruption.

Sensitivity bias in surveys
According to Tourangeau and Yan (2007) a survey question is likely to be 'sensitive' if it touches on 'taboo' topics, if it induces concerns that the information given will become known to a third party, or if the question elicits answers that are socially unacceptable or undesirable. If this is the case the respondent can be expected to give a 'socially desirable' answer. That is, an answer that the respondent thinks will be viewed favorably by others, resulting in under-reporting of 'undesirable' attitudes and behavior. Misreporting may reflect intentional deception, but may also reflect a failure to deeply reflect on the true answer (Blair et al. 2018). Such sensitivity bias (SB) has been shown to be present on a wide range of topics based on self-reports, from questions about drug use (Fendrich and Vaughn 1994) to questions about voter turnout (Holbrook and Krosnick 2010). 11 Surveys based on self-reported experiences are also common in corruption research. These so called 'experiential surveys' is one of the most direct methods for gauging the amount of corruption in society, by simply asking citizens about their experiences of corruption (e.g."Did you in the last 12 months have to pay a bribe in any form?"). The method is now widely deployed by several large organizations in multi-country surveys (Holmes 2015), including, for instance, the Global Corruption Barometer. Should we expect citizens to truthfully report their first-hand experiences with corruption and bribery?
One reason that such a direct question about corruption might be sensitive is that corruption is illegal (in essentially every country in the world) (Fisman and Golden 2017). In an overview of the research on sensitive questions Krumpal (2013) identifies several studies reporting substantial SB on topics related to criminal behavior and crime-victimization. Admitting to having been part of a corrupt exchange (for example, paying a bribe) is to admit part in an illegal transaction, and in the light of this something that could be considered sensitive. Even asking about whether an individual has been offered or asked to pay a bribe should be sensitive, given that an individual in this situation also would be more likely to actually take part in the transaction in the end.
Corruption is also something that people find morally reprehensible. This is true even in countries where corruption is ubiquitous (Persson et al. 2013;Rothstein and Varraich 2017). World value survey (WVS) has been asking about people's attitudes towards bribery in several waves and respondents all over the world consistently show a very strong distaste 11 See Tourangeau and Yan (2007) for an overview of research on sensitive survey questions.
for corruption. Figure 1 shows data from Romania -one of the most corrupt countries in Europe -based on the most recent WVS wave for this question. It is clear that there exists a very strong norm against bribery, even in this context where corruption is widespread: over 80% of respondents say that accepting a bribe can never be justified. Given this norm, it is hence reasonable that people would view admitting to being part of a corrupt transaction as something that is 'socially undesirable' (Tourangeau and Yan 2007). Still, when asked directly about 18% of Romanians (7% in the whole of EU) report that they were asked to pay a bribe during the past 12 months by a public official (Eurobarometer 2017, p. 80). This number might of course still be under-reported. For instance, Corstange (2012) finds that 26% of citizens in Lebanon admit to having sold their vote in 2009, but estimates (using a list experiment) that the true number is over 50%.
In general, vote-buying can be considered as a similar case to that of corruption experiences: both are illegal transactions that are highly stigmatized by society. Traditionally, studies on vote-buying have asked direct questions about its occurrence and found limited evidence. In recent years researchers have acknowledged that such direct questions might be sensitive and instead considered more sophisticated survey methods. As a result, several more recent studies have discovered substantial under-reporting of vote-buying due to SB (Carkoglu and Aytac 2015;Corstange 2012;Gonzalez-Ocantos et al. 2012). Important to note is that severe under-reporting has been found even for questions asking respondents whether someone 'offered' them to sell their vote -and not only for question asking if they actually sold their vote. Given these similarities to vote-buying, the illegality of corruption, and people's almost unanimous distaste for its occurrence, I argue that we should expect a similar pattern with regard to reported corruption experiences: • H 2 . Reported experiences of bribery are subject to sensitivity bias and hence underreported.

A survey experiment in Romania
To test these hypotheses about political bias (H 1 ) and sensitivity bias (H 2 ) in corruption reporting I conducted a large survey experiment in Romania. Romania is one of the most corrupt countries in Europe, where the problem of corruption is very much a current issue. After the legislative elections of 2016 the Social Democratic Party (PSD) and the Alliance of Liberals and Democrats (ALDE) formed the governing coalition. After massive protests in 2017 against a bill that (among other things) would have pardoned officials imprisoned for bribery offenses (see The New York Times May 4th, 2017), the government resigned and was replaced by a second iteration of the coalition. After an internal power struggle in PSD a third iteration of the PSD-ALDE coalition took office in January 2018. Due to the political turbulence, and partly related to accusations of corruption within the government, the public support for the coalition decreased over the course of 2018. In the end of 2018 many opinion polls showed a support of around 35% for the governing coalition. 12 Similar situations are not uncommon. Researchers even talk about an incumbency disadvantage in many developing democracies, where holding office seems to decrease chances of reelection. Such an effect has been demonstrated in, for instance, Brazil (Klasnja and Titunik 2017), India (Uppal 2009), and post-communist Eastern Europe (Roberts 2008). Klasnja (2015) shows, in a study of Romanian mayors, that one plausible explanation for this pattern is corruption, where office holders exploit their position to reap private gains -at the cost of subsequent electoral success. In this sense the turbulent situation during the past years in Romanian politics is rather typical, making Romania an interesting and informative case to study.
Similar to many other developing democracies, partisanship has generally been relatively weak in Romania in the post-communist period (Tatar 2013). This makes the case a relatively tough test for the political bias hypothesis. Any effects found in a context like this, with weak partisanship, are likely to be more pronounced in contexts where partisanship is strong.

Testing the PB hypothesis (H 1 )
The aim of the study is to assess each hypothesis in turn with two different research designs. To test the political bias hypothesis the design exploits question order effects. With regard to economic voting researchers have shown that question order effects can be substantial (e.g. Wilcox and Wlezien 1996). Sears and Lau (1983) argue that two such effects are common: political preferences might be personalized when assessed immediately after the respondent's own economic situation has been made salient, or perceptions of the economic situation might be politicized when assessed immediately after important political preferences have been made salient. Given my hypotheses in this paper I will focus on the latter. Questions subsequent to the political questions are hence assumed to exhibit stronger politically biased response patterns since asking the political questions make the respondent's political identity more salient. In this sense, the question ordering activates a particular political 'frame' around the corruption questions (Zaller 1992). If my hypothesis about PB is correct 'politicizing' corruption perceptions in this way will significantly affect the response to these questions.
In this setup, some respondents (the treatment group) were randomly assigned to a question ordering where the questions about political preferences were asked right before a specific corruption question (political prime), while the rest of the respondents (the control group) were given an ordering where the same corruption question instead was asked before the political questions. The setup hence randomly increases the salience of political affiliations for a group of respondents with regard to a specific corruption question.
Following Evans and Andersen (2006) I asked the following political questions: (1) What political party would you vote for if the national parliamentary election were today? (2) Please choose one of the following phrases to say how you feel about the current government of Romania: 'Strongly against', 'Against', 'Neither in favor nor against', 'In favor', 'Strongly in favor'. Based on these questions I coded respondents as government supporters if they said that they would vote for one of the parties in the current ruling coalition in Romania (PSD and ALDE) and answer that they are 'Neither in favor nor against', 'In favor', or 'Strongly in favor' of the current government in Romania. This way I identify government supporters in terms of vote intention, but exclude respondents that are explicitly against the government from the definition. I consider different coding decisions with regard to this variable below.
To measure corruption perceptions I asked three different questions that are commonly used in the literature and that are of theoretical interest. The three questions give reasonably comprehensive picture of how the respondent perceives current corruption in Romania, both in terms of absolute levels and in terms of recent change. First, I adopted the following question (corruption increase), used for instance in the Global Corruption Barometer: In your opinion, over the last year, has the level of corruption in this country increased, decreased, or stayed the same? The respondent was given five answer alternatives ranging from 'increased a lot' to 'decreased a lot'. Second, I asked a commonly used question about the absolute level of political corruption (corruption in politics): In your opinion, about how many politicians in Romania are involved in corruption? The question has five answer alternatives ranging from 'almost none' to 'almost all'. This question is, for example, asked in several waves of the ISSP survey. Third, I asked how worried respondents are about the consequences of corruption (corruption worry): In general, how worried are you about the consequences of corruption for the Romanian society? This is a question similar to the questions asked in Peiffer (2018). The question taps into how concerned a respondent is about the consequences of corruption, and hence also how important the issue of corruption is for the respondent. Four possible answer alternatives were given to the question: 'not worried at all', 'a little worried', 'somewhat worried', 'very worried'. Finally, as a point of comparison, I also included a standard question about economic perceptions (economy worse) (see Evans and Andersen (2006)): In your opinion, over the last year, would you say that Romania's economy has got stronger, weaker, or stayed the same? The five answer alternatives range from 'got a lot weaker' to 'got a lot stronger'. With this design I am able to compare the treatment effect on the corruption questions with the (well-established; see the review above) political bias-effect on the economy question. All outcome questions were coded so that high values indicate 'bad' outcomes; increased corruption, worsened economy, high political corruption, and high worry about corruption.
To avoid artificially induced correlation between different corruption items, and to retain statistical power, the experiment had the following structure. First, a respondent was randomly assigned to one of the four corruption/economy questions above. This question was asked before any questions about political preferences. A couple of questions later in the survey the respondent was asked the two political preference questions described above, after which the respondent was randomly assigned to one of the three remaining corrup-tion/economy questions. This means that each respondent is part of the control group with regard to one of the corruption/economy questions, and part of the treatment group with regard to another of these questions. For each specific corruption/economy question, about a fourth of the sample was hence assigned to the control group and a fourth was assigned to the treatment group. The basic structure of the experiment is illustrated and discussed more in depth in the appendix.

Modeling corruption perceptions
In general, the causal effect of interest with regard to the corruption/economy questions can then be estimated with a simple regression model: Where y i represents the outcome variable of interest, x i is an indicator variable equal to 1 if a respondent is an incumbent supporter and 0 otherwise, T i indicates if a respondent is in the treatment group (T i = 1) or the control group (T i = 0), (x i × T i ) is an interaction term including x i and T i , and i represents the residual term. The treatment, again, consists of the intervention of priming respondents with their political preference right before answering one of the three corruption questions. In the interest of space I simply refer to 'corruption perceptions' as a catch-all term referring to all three questions (change, level, and worrycoded in the way described above). As per H 1a , I expect β 1 to be < 0 (on average, incumbent supporters perceive corruption to be lower) and, as per H 1b , I expect δ to be < 0 (the effect of the prime is negative for incumbent supporters -that is, incumbent supporters report even lower perceived corruption when their political preference has been made salient). I consider a confirmation of these expectations for all three corruption outcomes to be strong evidence in favor of H 1a and H 1b . I consider a partial confirmation of the expectations (finding significant results in the expected direction for one or two of the outcomes) to be somewhat weaker evidence in favor of H 1a and H 1b . 13 To facilitate interpretability and graphing of the results I first estimate equation (1) using OLS with robust standard errors as the baseline model. Even when the underlying data-generating process is not linear, OLS can often be a good and surprisingly robust approximation of the 'true' model. Angrist and Pischke (2009, pp. 34-40) point out that OLS can be viewed as the 'best linear approximation' (in a MMSE-sense) of the true conditional expectation function (E[Y i |X i ]) even when this function is non-linear.
I use robust standard errors to account for potential heteroscedasticity in the model. 14 More specifically, I use the HC2-estimator described in Long and Ervin (2000) to compute the variance-covariance matrix, shown to be a consistent estimator of V ar(β) in the presence of heteroscedasticity of an unknown form: where 2 i is the residual of observation i and h ii is the leverage for the same observation. Still, given that the outcome variables in this case in fact are ordered categorical variables I also estimate equation (1) using ordinal logistic regression as a robustness check. The ordinal logistic regression (OLR) model can be defined in terms of a latent variable model with y * as a latent variable ranging from −∞ to ∞. The latent model can then be defined as x i is the design matrix and β is a vector of regression coefficients. We can define the relationship between the latent model and the observed outcomes by dividing y * i into J ordinal categories: where the cutpoints τ 1 through τ J−1 are estimated from the data. For the case of an outcome variable with four categories (numbered from 1 to 4) we get the following relationship: We define the extreme categories 1 and J as open-ended intervals with τ 0 = −∞ and τ J = ∞ (Long 1997, pp. 114-119).
Since y * is latent we cannot estimate its mean and variance directly. However, by assuming a specific form of the error distribution we can estimate the regression equation y * i = x i β + i using Maximum likelihood. For the OLR model we assume that has a logistic distribution 15 with a mean of 0 and a variance of π 2 /3, which gives the following cdf: The probability of a specific observed value can then be computed as: where Another way of stating the same thing is that we are modeling the log of the odds that an outcome is less than or equal to m versus greater than m, given x i : Important to note is that the model assumes proportional odds in that the β's are the same for all values of m. The explanatory variables are hence assumed to exert the same effect on each cumulative logit, regardless of the cutpoint m.
Assuming that observations are independent, we get the following likelihood function (Long 1997, p. 124): where y i =j indicates multiplying over all cases where the observed y equal j. The log likelihood function can thus be stated as: The Maximum likelihood estimates can then be obtained by using numerical optimization methods (see Long (1997)). 15 The most common alternative to the OLR model is the ordinal probit model where instead is assumed to be distributed normally with mean 0 and variance 1.

Testing the SB hypothesis (H 2 )
To test the sensitivity bias hypothesis I deploy a list experiment, which was implemented in the middle of the survey. 16 This is a survey method previously used to estimate the prevalence of sensitive behavior like drug abuse, cheating, and vote buying, where the respondent does not have to directly disclose any information about the sensitive item (see Glynn (2013)). The list experiment works by aggregating the sensitive item with a list of non-sensitive items so that the respondent only has to indicate the number of items that apply and not which specific items that are true. To implement this design, I asked the respondents to do the following: Here is a list with different things that you might have done or experienced during the past 12 months. Please read the list carefully and enter how many of these things that you have done or experienced. Do not indicate which things, only HOW MANY.
• Attending a work-related meeting; • Investing money in stocks; • Being unemployed for more than 9 months; • Discussing politics with friends or family.
The treatment group was shown the same list but with a fifth item added (the item-order was randomized for all lists): • Being asked to pay a bribe to a public official The design protects the respondents' privacy since as long as respondents in the treatment group answer with anything less than "five", no one directly admits to answering affirmative to the sensitive question (having been asked to pay bribe). Following the advice in Glynn (2013) the control items were chosen to be negatively correlated to avoid floor and ceiling effects (where respondents would select either 0 or all items).

Modeling responses to the list experiment
As shown by Blair and Imai (2012), if we assume that the addition of the sensitive item does not alter responses to the control items (no design effect) and that the response for each sensitive item is truthful (no liars), then randomizing respondents into the treatment and control groups allows the analyst to estimate the proportion affirmative answers for the sensitive item by taking the difference between the average response among the treatment group and the average response among the control group (i.e. a difference-in-means estimator). 17 By asking the sensitive question directly to the control group (who did not receive the sensitive item on their list) I can also model the amount of sensitivity bias by comparing the direct question with the estimated proportion of affirmative answers to the sensitive item in the list experiment. For the direct question I asked: In the past 12 months were you at any point asked to pay a bribe to a public official? The answer alternatives given were 'yes', 'no', and 'prefer to not answer'. I coded affirmative answers as 1 and other answers as 0. 18 For the basic analysis of the list experiment I rely on the linear estimator in Imai (2011), corresponding to a standard difference-in-means estimator. 19 To estimate the overall level of SB I use the procedure described in Blair and Imai (2012) and compare the predicted response to the direct question, modeled with a logistic regression model, to the predicted response to the sensitive item in the list experiment. 20 The logistic regression model for the direct question can be defined in the same way as the OLR model described above, but with an outcome variable with only two categories. Using the same logic, we can define define the probability that the outcome variable equals 1 as: P (y = 1|x i ) = F Λ (x i β). This gives the following log likelihood function (from which we can obtain the Maximum likelihood estimates with numerical optimization methods): The predicted response to the sensitive item in the list experiment can then be compared to the response to the direct question (modeled with the logistic regression model) to get an estimate of the amount of SB. I consider an SB estimate that is positive and statistically 17 As stated above, the treatment assignment for the political bias experiment was independent of the treatment assignment in the list experiment. A respondent can hence be in both treatment groups (for both experiments), in one treatment group, or in no treatment group. 18 The formulation of the sensitive item in the list experiment and the direct bribe question follows the formulation used in (Eurobarometer 2017). This is the less sensitive version of the question that is commonly used; the other version asks directly if the respondent have actually paid a bribe. Any estimates of SB found with regard to the somewhat less sensitive bribe question should therefore arguably be larger for the more sensitive question.
19 The difference-in-means estimator can be written as: whereτ is the estimated proportion affirmative answers to the sensitive item, N 1 = N i=1 T i is the size of the treatment group and N 0 = N − N 1 is the size of the control group. 20 The procedure is implemented in the R package list.
different from 0 to be evidence in favor of H2. An important limitation of the difference-in-means estimator is that it does not allow researchers to efficiently estimate multivariate relationships between preferences over the sensitive item and respondents' characteristics. Researchers may apply this estimator to various subsets of the data and compare the results, but such an approach is inefficient and is not applicable when the sample size is small or when many covariates must be incorporated into analysis. To overcome this problem Imai (2011) developed two new multivariate regression estimators that allows the researcher to model the response to the sensitive item as a function of respondent characteristics. Imai (2011) uses the fact -shown by Glynn (2013) -that we can identify the joint distribution of the treatment and control group from the list experiment, under two assumptions stated above (no liars and no design effects). To see this, we define all possible respondent types that correspond to a specific answer to the list experiment. Let Y i (0) denote a respondent's truthful answer to the J non-sensitive items, and Z i denote a respondent's truthful answer to the sensitive item (0 or 1). 21 Each respondent's type can thus be categorized by (Y i (0), Z i ). Based on the possible answers to the list experiment we can then define what respondent types that would give a certain answer. For instance, a respondent belonging to the treatment group (T i = 1) giving the answer '1' would be either type (Y i (0) = 1, Z i = 0) or (0,1) (using shorthand notation). A respondent belonging to the control group (T i = 0) and answering '1' would be either type (1,0) or (1,1) -we would however not directly observe the latter type in the data since this respondent will not have the option of answering affirmatively to the sensitive item. Based on this we can describe all possible respondent types. Table 1 shows this for a case with 3 control items (shown to the control group) and 1 sensitive item.
(1,1) (2,0) (2,1) (2,0) 1 (0,1) (1,0) (1,1) (1,0) 0 (0,0) (0,1) (0,0) Let π yz be the population proportion (P r) of each type, such that π yz = P r(Y i (0) = y, Z i = z). For y = 0, ..., J and z = 0, 1 we can then identify π yz for each specific y as follows (Blair and Imai 2012): First, Imai (2011) develops a nonlinear-least squares (NLS) estimator to model the response to the list experiment as a function of respondent characteristics. The estimator can be defined as: Where x i is a matrix with respondent covariates, E[ i |x i , T i = 0], and (γ, δ) is a vector of unknown parameters. The model thus puts together two potentially nonlinear regression models where f (x i , γ) represents the conditional expectation of the control items, given the covariates, and g(x i , δ) represents the expected response to the sensitive item, given the covariates. The estimates are obtained by minimizing the sum of squared residuals: Imai (2011) suggests a two-step procedure to estimate the model where f (x i , γ) first is fitted to the control group and then g(x i , δ) is fitted to the treatment group using the response variable Y * i = Y i − f (x i ,γ) whereγ represents the estimate of γ from the first stage. 22 The functional form of the models has to be specified, but (Blair and Imai 2012) suggests using logistic regression submodels. 23 The NLS model is consistent as long as the functional form is correctly specified. However, the estimator can be inefficient (since it does not use all information in the joint distribution specified above). An alternative is to model the joint distribution directly using Maximum likelihood estimation. Imai (2011) shows how this can be done by modeling the population proportions of different respondent types: where x i denotes the respondent covariates, y = 0, ..., J and z = 0, 1. Imai (2011) suggests that both functions can be modeled with, for instance, binomial logistic regression. The resulting Maximum likelihood function is complex and is described in Imai (2011) where the author also develops an expectation-maximization (EM) algorithm to facilitate optimization of the functions.
The NLS and Maximum likelihood model can hence be used to estimate how affirmative responses to the sensitive item vary between respondent groups. Previous research has found that both corruption reports and/or SB in general might vary between different subgroups (e.g. Eurobarometer 2014Eurobarometer , 2017Gonzalez-Ocantos et al. 2012;Krumpal 2013;Mocan 2004). Blair et al. (2018) argue that people are not only concerned with how they themselves are perceived by others, but also how their group is perceived by other groups. So while people in some groups individually might be more prone to under-report the sensitive item, they might also under-report to 'protect' their group. Given the PB hypothesis, this could for instance be the case with regard to government supporters: these respondents might under-report to make supporters of the government look better.
To check for heterogeneity in SB I will perform exploratory analyses with regard to the following variables (the variables were identified based on previous research): Incumbent supporter, Gender, University degree, Big city inhabitant, Age, High-income household (top 20% of the distribution in the data set). For the exploratory analyses I will rely on the NLS and ML estimators described above.

Results
After a pilot study was conducted to test the questions in the survey as well as one of the assumptions underlying the PB experiment (see appendix), the final survey was fielded between 19th of December 2018 and 24th of January 2019 in collaboration with the public opinion research company Luc.id 24 . Before data collection the hypotheses and overall analysis plan was preregistred at EGAP 25 . Based on two series of power analyses (see appendix) the target number of respondents was set to at least 2900. The sample was collected based on nationally representative quotas on gender, age, and region. 3027 respondents in total completed the survey. Descriptive statistics for the sample are available in the appendix.  H 1 (a and b). The unpopularity of the current government is reflected in the survey: about 24% of the sample said they would vote for a party in the governing coalition if the national parliamentary election were today. PSD is still the most popular party in the sample, but its share of the total vote decreases as many respondents indicated that they would 'not vote'. The share true 'government supporters' according to the definition above is smaller, at about 14%. While this group is relatively small it still contains a large number of respondents given the large overall sample. However, below I also consider alternative ways of coding the 'prime variable' that utilizes the sample in a different way.
The PB hypothesis predicts that government supporters, on average, should have a more positive view of corruption in Romania, and that this group should report an even more positive view when primed with their political affiliation. To test this, I estimated equation (1) for each of the four outcome variables (the three corruption variables + the economy variable), using OLS. The results are reported in Table 2. The coefficient for Government support shows the baseline difference between government supporters and others for the control group (holding other variables constant). That is, in the group that answered the outcome questions before the political questions. The first two models show the results for the outcomes corruption change and economy change -the two outcome questions that are the most similar in terms of structure. The results for these outcomes are also very similar: government supporters in general place themselves about one category lower (in the direction of less corruption/better economy) than the rest of the respondents. The corresponding coefficient for the last other two outcomes are somewhat smaller, but still highly significant. Overall, this is in line with H 1a : Government supporters report a much less negative view about corruption in Romania in general and say that they are less worried about the problem.
The interaction effect (Gov. support x Prime) estimates the effect of the 'political prime' -e.g. being asked about political affiliation before the corruption questions, rather than the other way around. As shown in the table, the effect is large. For the corruption increase outcome the difference between government supporters and others increases from 0.9 in the control group to about 1.6 ((−0.893) + (−0.664)) in the treatment group. The pattern is, again, similar to that for the economy worse outcome where the difference increases from 1.2 to 1.7 ((−1.189) + (−0.471)). In both cases are government supporters substantially more positive (or less negative) to begin with, and become even more positive when randomly assigned to the political prime.
The last two outcomes show the same pattern: government supporters think corruption in politics is lower and worry less about corruption, a difference that become significantly more pronounced with the political prime. In this experimental condition government supporters answer on average about 0.8 to 1 categories lower. To graphically display the results, predicted responses for all four outcomes are shown in Figure 2. 26 In sum, the results provide strong evidence in favor of the PB hypothesis. The estimates show that reported corruption perceptions differ substantially depending on whether a respondent supports the government or not. Moreover, the experiment shows how a simple prime (changing the order of the questions) can strongly affect the results and increase the 'supporter effect'. This is clearly in line with previous research on economic perceptions (as also shown by the worse economy estimates), and suggests that respondents, to a significant extent, shape their reported perceptions to align with their stated political affiliation. This is clear evidence that respondents' reported corruption perceptions are not simply a reflection of external circumstances in society. Rather, when increasing the salience of political affiliations respondents seem to engage in a 'directional reasoning process' where they use their response to the corruption question to substantiate their previously stated political preferences.
To check the reliability of these results I also estimated models for the same four outcomes using ordinal logistic regression, to account for the ordinal nature of the dependent variables. The reults are reported in Table 3.
The coefficients in the output unfortunately cannot be interpreted directly. The coefficients represent the change in the natural log of the odds of being in one higher category  Table 2. when a given x-variable changes one step (holding other variables constant). This is, again, a consequence of the fact that we are modeling ln (y≤m|x i ) (y>m|x i ) . For instance, the coefficient for government support in model (1) is −0.837. This indicates that the odds that government supporters are in a higher category (which equals saying that corruption has increased) is 57% lower (e −0.837 = 0.43), compared to other respondents. When receiving the prime, the odds are instead 76% lower (e −0.837−0.582 = 0.24). The general patterns are the same as in the OLS models, and suggest that the results discussed above with regard to H1 are robust. At the same time, the effects with regard to the economy outcome are clearly more pronounced in the OLR model. The prime effect with regard to the corruption in politics outcome is also no longer statistically significant at the 0.05-level (the p-value is about 0.1), suggesting that the effect probably is weaker (and more variable) for this outcome.
The appendix includes several additional robustness checks, including alternative codings of the supporter variable. Among other things, I report estimates where I instead code political affiliation only based on the variable measuring the respondents' attitudes towards the current government (see above). The respondents are coded as either 'opposing', being 'neutral', or 'favoring' the current government. 27 These results, reported in full in the appendix, show the same pattern as the results above, with neutral respondents being more positive than 'oppose' respondents and 'favoring' respondents being the most positive. The prime also has the strongest effect on respondents favoring the government, followed by neutral respondents. The results from this analysis are in many ways more striking than the results reported above. For instance, for the corruption change outcome when comparing respondents in the treatment group favoring the government with respondents opposing the government the total difference is over 2 ((−1.479) + (−0.665)), e.g. more than two full categories on the 5-point scale. While the robustness checks in general corroborates the main results, they also show that the 'prime effect' for the outcome corruption in politics is variable and somewhat model dependent.

List experiment
I now turn to the SB hypothesis. As argued above, it is reasonable to assume that the often used direct question about bribe experience is sensitive and hence under-reported. Before proceeding to the analysis I tested for potential violations of the assumptions underlying the list experiment (no design effects and no liars). Specifically, Blair and Imai (2012) proposes a test for detecting design effects, e.g. when the inclusion of the sensitive item affects how respondents answer the control items. The proposed test is based on the calculation of the proportion of respondent different respondent types (see above). If one of these proportions would be negative this is a violation of the no design effects assumption, and a sign that the list experiment did not work as intended. Formally, the null hypothesis of 'no design effect' can be stated as: The alternative hypothesis is that at least one value of y does not satisfy the inequalities described under H 0 . Blair and Imai (2012) derives methods to compute p-values for observed proportions under the null hypothesis. Importantly, if none of the proportions are estimated to be negative the null hypothesis will not be rejected. The table below shows the estimated distribution of respondent types based on the list experiment in the study at hand.
As shown in the table, none of the proportions are estimated to be negative, and we can conclude that we do not find evidence of any violations of the 'no design effects' assumption, based on the test.
To evaluate H 2 , I started by estimating the proportion of affirmative responses to the sensitive item in the list experiment using the basic difference-in-means estimator (Glynn 2013). I then estimated a logistic regression intercept-only model with the responses to the direct bribe question as the dependent variable. The two estimates show what proportion of respondents giving an affirmative answer when their privacy is protected (in the list experiment) vs when their privacy is not protected (the direct question). I also used the procedure in Blair and Imai (2012) to compute a 95% confidence interval of the difference between the two estimates. The results are presented in Table 5 and displayed graphically
in Figure 3. The direct estimate of 19% 'yes' is very close to the reported statistic in the 2017 Eurobarometer for Romania at about 18% (Eurobarometer 2017). This stand in stark contrast to the list estimate at over 35%. The difference of more than 16 percentage points is highly statistically significant. This is clear evidence that respondents under-report the sensitive item when asked directly and suggests that the true estimate might be 90% higher than the estimate based on the commonly used bribe question. As noted above, these estimates are based on the 'less sensitive' version of the bribe question (the other version asking if the respondent actually paid the bribe), and are also based on a survey mode that should be less likely to elicit SB (online survey).
Voters under-reporting their experiences with corruption is obviously a serious problem for researchers or organizations trying to estimate the occurrence of bribery based on direct questions. However, if this sort of measurement error is randomly distributed across the population it would still be possible to use direct questions to explore the dynamics of bribery and assess which individuals or groups that are most likely to be asked to pay bribes. This is for instance done in Mocan (2004). To explore this, I proceeded by considering the multivariate regression estimators above, together with the six described covariates. Given that the ML estimator is based on the specification of the full likelihood function, this estimator is more sensitive to model miss-specification, compared to the NLS estimator. Blair et al. (2019) suggests a general specification test, based on Hausman (1978), as a formal means of comparing, and deciding between, the ML and NLS estimator. The idea is that if the underlying modeling assumptions are correct the estimators should yield results that are statistically indistinguishable. In this case the ML estimator will be more efficient.
The test takes the following form: whereθ N LS = (γ N LS ,δ N LS ),θ M L = (γ M L ,δ M L ), and V(θ N LS ) and V(θ M L ) are their estimated asymptotic variances. The null hypothesis in the test assumes 'correct model specification', in which case the ML estimator should be preferred.
Depending on the exact model specification (which covariates that were included), the test yielded significant results on some occasions, with a p-value of less than 0.05. This suggests that the ML model might not be appropriate to model the data, and that the NLS estimator is the safer option.
To explore if the extent of under-reporting differs between groups I therefore used the NLS estimator to model the relationship between different respondent characteristics and responses to the sensitive item, based on the six specified covariates. I also estimated a logistic regression model regressing the direct bribe question on the same variables. Comparisons between the direct estimate and the list estimate based on these models are shown in Figure  4. The Figure displays the results based on the variables government supporter, gender, and income. In the interest of space, the results for the variables age, city inhabitant, and education are presented and discussed in the appendix. Figure 4 reveals interesting differences in under-reporting among different subgroups. The left-hand graph indicate that government supporters tend to severely under-report the sensitive item. When asked directly, under 9% of government supporters say that someone asked them to pay a bribe, compared to the list estimate at 58%. Given the relatively small size of this group the point estimate from the list experiment needs to be taken with a grain of salt, given the substantial uncertainty around the estimate. 28 The results do suggest, however, that under-reporting is huge among government supporters. This is completely in line with both the SB and PB hypothesis: government supporters might under-report the sensitive item to make their group look better (Carkoglu and Aytac (2015) find a similar pattern with regard to vote buying in Turkey). It has long been noted that women seem to be less involved in corruption than men. Some have argued that one reason for this might be that women simple have fewer opportunities to engage in corrupt activities and that women get asked to pay bribes less often than men (e.g. Goetz 2007;Mocan 2004). This is also the pattern shown in the direct estimate of about 13% for women and 21% for men. The list estimates, however, suggest the opposite pattern; when using the indirect questioning method women seem to be asked for bribes more often than men. The list estimate for women is over three times as high as the direct estimate -43% vs 13%. This result is interesting, given that it goes against what much of previous research has argued. At this point I can only speculate about the reasons behind this pattern. One possibility is that women as a group are more affected by sensitivity bias. 29 The higher list estimate could reflect the fact that women utilize the health care sector more than men, and that this sector, according to many estimates, is the sector most permeated by corruption (see Eurobarometer 2014Eurobarometer , 2017. Finally, Mocan (2004) argues that we should expect income to be positively related to bribe victimization, given that it should be possible for a rent-seeking official to extract higher bribes from a wealthier individual. This is also the pattern found in the study at hand. Interestingly, both the list estimate and the direct estimate are substantially higher for individuals in the top 20% of the income distribution, possibly suggesting a 'normalization' of bribe-paying in this group.
Overall, these results provide evidence in favor of the SB hypothesis and suggest not only that bribe victimization is under-reported in general, but also that under-reporting differs substantially between different subgroups. As a consequence, researchers and practitioners should be very cautious in using direct, obtrusive, questions about corruption experiences to gauge overall levels of corruption and to model the dynamics of bribery based on these questions. As in the case of male and female respondents, using different questioning techniques might lead to opposite conclusions.

Conclusions
Respondents' responses to survey questions are constructed and shaped in many different ways. Research on survey methodology and public opinion has convincingly shown that responses often are unstable and strongly affected by things like social context, motivated reasoning, and particular frames (Bartels 2002;Berinsky 1999;Taber and Lodge 2006;Tourangeau and Yan 2007;Zaller 1992). In this paper I argue that these findings have been underappreciated by corruption researchers and practitioners using individual-level survey data. Recent years have seen a steady increase in the availability of different corruption measures and the use of corruption questions in large multi-country surveys (Fisman and Golden 2017;Holmes 2015). Many important measures and data sets are based on surveys directly probing the perceptions and experiences of the general public. The measures have been of great interest to political scientists and have opened up several new research avenues with individual-level data. The increase in data availability has not, however, been accompanied by sufficient reflection about problems and potential pitfalls with regard to these survey-based measures. This paper departs from two potential sources of bias that have been demonstrated in previous research: political bias and sensitivity bias. As a first test of the prevalence of these biases in corruption surveys I conducted an original survey experiment fielded to over 3000 respondents in Romania. The survey aimed at testing two specific hypotheses, in two different experiment, based on these suggested patterns of response bias. The results from the first experiment provide strong evidence in favor of the political bias hypothesis (H 1 ). Government supporters report a much more positive view when asked common corruption questions that, in principle, ask about about the objective state of society (has corruption increased? how common is political corruption?). Government supporters also report being less worried about corruption in general, possibly signaling that they attach less importance to the issue. Priming these respondents with their political affiliation makes this general effect even more pronounced. This suggests that corruption reports to a significant extent might be subject to political motivated reasoning and expressive 'political cheerleading'.
Researchers should hence be cautious in estimating models with individual-level measures of corruption perceptions and individual-level political outcomes such as incumbent support or vote intention. Relationships like these are likely to be affected by strong feedback mechanisms and reversed causality, especially in surveys asking political questions before corruption questions. The results also show that responses to questions about corruption perceptions in general are malleable and affected by simple frames. This means, for instance, that corruption perceptions among the public should be expected to be more polarized along political lines at times when political affiliations are more salient, for instance during an election year. From a broader perspective, the results show that political bias can be substantial even outside of traditionally studied topics like perceptions about unemployment and inflation (Bartels 2002;Gerber and Huber 2010;Jerit and Barabas 2012), and also an important factor shaping public perceptions in a multiparty system like Romania with traditionally weak party identification (Tatar 2013).
The results from the second experiment on sensitivity bias strongly suggest that direct questions about corruption experiences need to be treated as sensitive questions. According to the results, the direct question both fails to accurately capture the overall occurrence (which is heavily under-reported), and to capture the dynamics of bribery and which groups are most likely to be targeted. This is something that anyone who uses this, or a similar question, needs to take into account. At the same time, direct questions are an important tool to gauge actual rates of corruption victimization -given that alternatives such as perceptions about 'general levels of corruption' can be unreliable, as shown in the PB experiment. Different techniques to unobtrusively ask sensitive questions do exist, out of which the list experiment is one. In general, these techniques come at the cost of statistical efficiency, but when bias is large -like in the study at hand -the bias-variance trade-off should come down in favor of unbiased (or less biased) estimators (Blair et al. 2018). In essence, this means that researchers will need larger samples and more sophisticated survey designs to accurately capture sensitive topics like corruption victimization. Fortunately, recent methodological developments make many of these techniques more accessible and powerful (Blair and Imai 2012;Blair et al. 2015Blair et al. , 2019.
The findings in this paper should not be taken as a discouragement of research on corruption or of efforts to quantify the incidence of corruption. Rather, given the immense importance of the topic it is crucial that we scrutinize the methods we use and try to be cognizant of sources of error and bias. The experiments in this study are only a first step in pointing out these potential problems. A task for future research is to think more deeply about when and in what contexts such problems are most likely to be present and which techniques that best can mitigate them. Other interesting avenues include extending the experiments to political systems with different dynamics than Romania, for instance systems where party identification is stronger, like Spain. Overall, this study suggests that paying more attention to issues of response bias is an important part of further advancing the field of corruption research.

Appendix A Appendix
A.1 Descriptive statistics Note: Some extreme (probably miscoded) outliers were excluded from the 'Income' variable.

A.2 Power analysis
To decide how many respondents I needed in the final survey (the total number of completes) I conducted two basic power-analyses with simulated data. First, I simulated answers to the list experiment using a list with 4 control items (modeled as the sum of 4 draws from different Bernoulli distributions). 30 The simulated responses to the sensitive item (the fifth item in the treatment group) was a random draw from a Bernoulli distribution with p = 0.3. In this case I assumed that the answer to the direct sensitive question was a random draw from a Bernoulli distribution with p = 0.2. That is, I assumed that the 'true' sensitivity bias was 0.1, or 10 percentage points. For comparison, this is half of the size of the amount of sensitivity bias uncovered in Gonzalez-Ocantos et al. (2012), in their study on vote buying. I then simulated 5000 surveys and calculated the number of times the null hypothesis of the estimate for the sensitive item being ≤ 0.2 was rejected (using the difference-in-means estimator). I then repeated this process for different numbers of 'respondents'. Figure 1 shows the estimated power (the % of the times the null hypothesis was rejected) for different n.
Next, I simulated data for the political bias hypothesis. I assumed an equal share of two types of respondents; incumbent supporters and 'others'. I then randomly assigned all respondents to either the control group (no political prime) or the treatment group (political prime). I assumed the following mean values for the different groups (on an ordinal scale ranging from 0 to 4). Incumbent supporter (control): 1.64, incumbent supporter (treatment): 1.8, 'others' (control): 1.56, 'others' (treatment): 1.4. These values was chosen to simulate a 'small' effect size of interest (the coefficient δ) of about 0.1 (see Lakens (2013)). Note that this variable is reverse-coded compared to the variables used in the paper. For each survey simulation I estimated equation (1) and tested if δ was negative and statistically significant. Figure 2 shows the estimated power based on these simulations.
Because of the randomization scheme described above I will have 1 4 of the respondents in the control group and another 1 4 of the respondents in the treatment group with regard to each specific corruption/economy question. The effective sample for testing H 1 is hence about half the size of the sample for testing H 2 . Based on these two power-calculations above 2800-2900 respondents in total should give me enough statistical power to test both hypotheses. This gives substantial power to detect the main effect in the list experiment (H 2 -over 90%), and also plenty of room to conduct sub-group analyses (for instance, splitting this sample in half still gives me reasonable power to detect an effect of this size). An effective sample of about 1400 respondents also gives me over 80% power to detect the main effects with regard to H 1 (given the assumptions stated above).

A.3 Additional analysis and robustness checks
A.3.1 Political bias experiment   The design with the 'political prime', where some respondents answered the corruption questions after the political questions, assumes that the fact that the control group answered the corresponding corruption questions a little earlier in the survey did not affect the outcome. The effect of the prime with regard to a specific PB question i is estimated by comparing Control 1 to Treatment 2 (see figure). Formally, we assume that E[Y (0)|T = 0] = E[Y (0)|T = 1] -that the potential outcomes for untreated observations on average are the same for respondents assigned to the control and treatment group respectively (see Holland 1986). As part of the pilot study I therefore conducted a test of this assumption where respondents were first randomized into the 'corruption in politics' question (see above) and then received the 'economy question' later in the survey, or the other way around (I only used two questions to retain power with the small pilot sample). However, in this version of the survey the randomized political prime was not included. Therefore if the assumptions stated above holds we should not expect the answer to the questions to differ depending on whether the question was given earlier or somewhat later in the survey. I then conducted an independent sample t-test for each question to see if the placement in the survey itself affected the responses. The results showed no significant differences. Corruption in politics, difference (earlier − later): t 104 = −0.22, p = 0.83. Economy, difference (earlier − later): t 104 = −0.065, p = 0.95. This suggests that the assumption holds up and that any differences observed in the experiment is due to the political prime.