Unresponsive and Unpersuaded: The Unintended Consequences of a Voter Persuasion Effort


To date, field experiments on campaign tactics have focused overwhelmingly on mobilization and voter turnout, with far more limited attention to persuasion and vote choice. In this paper, we analyze a field experiment with 56,000 Wisconsin voters designed to measure the persuasive effects of canvassing, phone calls, and mailings during the 2008 presidential election. Focusing on the canvassing treatment, we find that persuasive appeals had two unintended consequences. First, they reduced responsiveness to a follow-up survey among infrequent voters, a substantively meaningful behavioral response that has the potential to induce bias in estimates of persuasion effects as well. Second, the persuasive appeals possibly reduced candidate support and almost certainly did not increase it. This counterintuitive finding is reinforced by multiple statistical methods and suggests that contact by a political campaign may engender a backlash.

This is a preview of subscription content, access via your institution.

Fig. 1


  1. 1.

    The data set and replication code are posted online at https://dataverse.harvard.edu/dataverse/DJHopkins. Due to their proprietary nature, two variables employed in our analyses are omitted from the data set: the Democratic performance in a precinct and each respondent’s probability of voting for the Democratic candidate.

  2. 2.

    Strategies to study persuasion include natural experiments based on the uneven mapping of television markets to swing states (Simon and Stern 1955; Huber and Arceneaux 2007) or the timing of campaign events (Ladd and Lenz 2009). Other studies use precinct-level randomization (e.g. Arceneaux 2005; Panagopoulos and Green 2008; Rogers and Middleton 2015) or discontinuities in campaigns’ targeting formulae (e.g. Gerber et al. 2011).

  3. 3.

    In a related vein, Shi (2015) finds that postcards exposing voters to a dissonant argument on same-sex marriage reduce subsequent voter turnout.

  4. 4.

    Experimental studies also rely on self-reported vote choice, not the actual vote cast. This is less of a concern, as pre-election public opinion surveys like this one typically provide accurate measures of vote choice (Hopkins 2009).

  5. 5.

    Such support scores are commonly employed by campaigns. To generate them, data vendors fit a model to data where candidate support is observed, typically survey data. They then use the model, alongside known demographic and geographic characteristics, to estimate each voter’s probability of supporting a given candidate in a much broader sample. The specific model employed is proprietary and unknown to the researchers. The Pearson’s correlation with a separate measure of precinct-level prior Democratic support is 0.47, indicating the importance of precinct-level measures in its calculation in this data set. For more on the use of such data and scores within political science, see Ansolabehere et al. (2011), Ansolabehere et al. (2012), Rogers and Aida (2014) and Hersh (2015).

  6. 6.

    This age skew reduces one empirical concern, which is that voters under the age of 26 have truncated vote histories. Only 2.1% of targeted voters were under 26 in 2008, and thus under 18 in 2000.

  7. 7.

    Specifically, voters were coded as “strong Obama,” “lean Obama,” “undecided,” “lean McCain,” and “strong McCain.”

  8. 8.

    We can do additional analyses to approximate the effect of the treatment on people who actually spoke to the canvassers (the so-called Complier Average Causal Effect; see Angrist et al. 1996), and report the results in the Conclusion.

  9. 9.

    For the full regression, see the first column of Table 6 in the Appendix.

  10. 10.

    Similar results for the phone and mail treatments show no significant differences across groups.

  11. 11.

    For the corresponding regression model, see the second column of Table 6 in the Appendix.

  12. 12.

    Voters under the age of 26 would not have been eligible to vote in some of the prior elections, and might be disproportionately represented among the low-turnout groups. We have age data only for 39, 187 individuals in the sample. The negative effects of canvassing in the zero-turnout group persist (with a larger confidence interval) when the data set is restricted to citizens known to be older than 26.

  13. 13.

    The effects for phone calls are generally similar, but not statistically significant (see Table 9 in the Appendix).

  14. 14.

    For example, Enos et al. (2014) find that direct mail, phone calls, and canvassing had small effects on turnout for voters with low probabilities of voting, large effects for voters with middle-to-high probabilities of voting, and smaller but still positive effects for those with the highest probabilities of voting.

  15. 15.

    Results using logistic regression are highly similar.

  16. 16.

    In separate, ongoing research, we use the turnout results described above as a benchmark with which to evaluate each of these methods.

  17. 17.

    As Little et al. (2012) explain, “weighted estimating equations and multiple-imputation models have an advantage in that they can be used to incorporate auxiliary information about the missing data into the final analysis, and they give standard errors and p values that incorporate missing-data uncertainty”(1359).

  18. 18.

    But that fact also means that the “implied joint distributions may not exist theoretically”(Buuren et al. 2006, p. 1051). Still, that important theoretical limitation does not prevent MICE from working well in practice (Buuren et al. 2006).

  19. 19.

    To examine the performance of our model for multiple imputation, we performed tests in which we deliberately deleted 500 known survey responses from the fully observed data set (n = 12,442) and then assessed the performance of our imputation model for those 500 cases where we know the correct answer. In each case, we used the full multiple imputation model to generate five imputed data sets for each new data set, and then calculated the share of deleted responses which we correctly imputed. The median out-of-sample accuracy across the resulting data sets was 74.9 %, with a minimum of 73.3 % and a maximum of 76.0 %. This performance is certainly better than chance alone.

  20. 20.

    In fact, the associated p value is less than 0.002, meaning that the finding would remain significant even after a Bonferroni correction for multiple comparisons to account for the analyses of the phone and mail treatments.

  21. 21.

    The associated 95 % confidence interval spans from −3.03 to −0.60.

  22. 22.

    We could add additional covariates that only affect this equation without affecting our discussion below. The existence of such variables is commonly necessary for empirical estimation of selection models, although it is not strictly required, as these models can be identified solely with parametric assumptions about error terms.

  23. 23.

    Throughout these analyses, we drop our measure of respondents’ age, which is the only independent variable with significant missingness.

  24. 24.

    Here, \(\delta\) is set to 0.0001.

  25. 25.

    Siddique and Belin (2008a) report that a value of \(k=3\) works well in their substantive application, while Siddique and Belin (2008b) recommend values between 1 and 2.

  26. 26.

    Still, even in light of this potential to under-estimate variance, Demirtas et al. (2007) demonstrate that the small-sample properties of the original ABB are superior when compared to would-be corrections.

  27. 27.

    IPW requires data that are fully observed with the exception of the missing outcome. We thus set aside 20 respondents who were missing data for covariates other than age or Obama support.


  1. Adams, W. C., & Smith, D. J. (1980). Effects of telephone canvassing on turnout and preferences: A field experiment. Public Opinion Quarterly, 44(3), 389–395.

    Article  Google Scholar 

  2. Albertson, B., & Busby, J. W. (2015). Hearts or minds? Identifying persuasive messages on climate change. Research & Politics.

  3. Angrist, J. D., Imbens, G. W., & Rubin, D. B. (1996). Identification of causal effects using instrumental variables (with discussion). Journal of the American Statistical Association, 91, 444–455.

    Article  Google Scholar 

  4. Ansolabehere, S., & Hersh, E. (2011). Who really votes? In P. M. Sniderman & B. Highton (Eds.), Facing the challenge of democracy: Explorations in the analysis of public opinion and political participation. Princeton University Press.

  5. Ansolabehere, S., & Hersh, E. (2012). Validation: What big data reveal about survey misreporting and the real electorate. Political Analysis, 20(4):437–459.

  6. Arceneaux, K. (2005). Using cluster randomized field experiments to study voting behavior. The Annals of the American Academy of Political and Social Science, 601(1), 169–179.

    Article  Google Scholar 

  7. Arceneaux, K. (2007). I’m asking for your support: The effects of personally delivered campaign messages on voting decisions and opinion formation. Quarterly Journal of Political Science, 2(1), 43–65.

    Article  Google Scholar 

  8. Arceneaux, K., & Kolodny, R. (2009). Educating the least informed: Group endorsements in a grassroots campaign. American Journal of Political Science, 53(4), 755–770.

    Article  Google Scholar 

  9. Arceneaux, K., & Nickerson, D. W. (2009). Who is mobilized to vote? A re-analysis of 11 field experiments. American Journal of Political Science, 53(1), 1–16.

    Article  Google Scholar 

  10. Bechtel, M. M., Hainmueller, J., Hangartner, D., & Helbling, M. (2014). Reality bites: The limits of framing effects for salient and contested policy issues. Political Science Research and Methods (forthcoming).

  11. Broockman, D. E., & Green, D. P. (2014). Do online advertisements increase political candidates’ name recognition or favorability? evidence from randomized field experiments. Political Behavior, 36, 263–289.

    Article  Google Scholar 

  12. Cardy, E. A. (2005). An experimental field study of the GOTV and persuasion effects of partisan direct mail and phone calls. The Annals of the American Academy of Political and Social Science, 601(1), 28–40.

    Article  Google Scholar 

  13. Cranmer, S. J., & Gill, J. (2013). We have to be discrete about this: A non-parametric imputation technique for missing categorical data. British Journal of Political Science, 43(2), 425–449.

    Article  Google Scholar 

  14. Das, M., Newey, W. K., & Vella, F. (2003). Nonparametric estimation of sample selection models. The Review of Economic Studies, 70(1), 33–58.

    Article  Google Scholar 

  15. Demirtas, H., Arguelles, L. M., Chung, H., & Hedeker, D. (2007). On the performance of bias-reduction techniques for variance estimation in approximate Bayesian bootstrap imputation. Computational Statistics & Data Analysis, 51(8), 4064–4068.

    Article  Google Scholar 

  16. Enos, R. D., Fowler, A., & Vavreck, L. (2014). Increasing inequality: The effect of GOTV mobilization on the composition of the electorate. The Journal of Politics, 76(1), 273–288.

    Article  Google Scholar 

  17. Enos, R. D., & Hersh, E. D. (2015). Party activists as campaign advertisers: The ground campaign as a principal-agent problem. American Political Science Review, 109(02), 252–278.

    Article  Google Scholar 

  18. Gerber, A., Karlan, D., & Bergan, D. (2009). Does the media matter? A field experiment measuring the effect of newspapers on voting behavior and political opinions. American Economic Journal: Applied Economics, 1(2), 35–52.

    Google Scholar 

  19. Gerber, A., & Green, D. (2000). The effects of canvassing, telephone calls, and direct mail on voter turnout: A field experiment. American Political Science Review, 94(3), 653–663.

    Article  Google Scholar 

  20. Gerber, A. S., Kessler, D. P., & Meredith, M. (2011). The persuasive effects of direct mail: A regression discontinuity based approach. Journal of Politics, 73(1), 140–155.

    Article  Google Scholar 

  21. Gerber, A. S., & Green, D. P. (2012). Field experiments: Design, analysis, and interpretation. New York, NY: W.W. Norton and Company.

    Google Scholar 

  22. Gerber, A. S., Huber, G. A., Doherty, D., Dowling, C. M., & Hill, S. J. (2013). Who wants to discuss vote choices with others? Polarization in preferences for deliberation. Public Opinion Quarterly, 77(2), 474–496.

    Article  Google Scholar 

  23. Gerber, A. S., Huber, G. A., & Washington, E. (2010). Party affiliation, partisanship, and political beliefs: A field experiment. American Political Science Review, 104(04), 720–744.

    Article  Google Scholar 

  24. Gerber, A. S., Gimpel, J. G., Green, D. P., & Shaw, D. R. (2011). How large and long-lasting are the persuasive effects of televised campaign ads? Results from a randomized field experiment. American Political Science Review, 105(01), 135–150.

    Article  Google Scholar 

  25. Glynn, A. N., & Quinn, K. M. (2010). An introduction to the augmented inverse propensity weighted estimator. Political Analysis, 18(1), 36–56.

    Article  Google Scholar 

  26. Green, D. P., & Gerber, A. S. (2008). Get out the vote: How to increase voter turnout. Washington, DC: Brookings Institution Press.

    Google Scholar 

  27. Heckman, J. (1976). The common structure of statistical models of truncation, sample selectionand limited dependent variables, and simple estimator for such models. Annals of Economic and Social Measurement, 5, 475–492.

    Google Scholar 

  28. Hersh, E. D. (2015). Hacking the electorate: How campaigns perceive voters. New York, NY: Cambridge University Press.

    Google Scholar 

  29. Hersh, E. D., & Schaffner, B. F. (2013). Targeted campaign appeals and the value of ambiguity The Journal of Politics, 75(02), 520–534.

    Article  Google Scholar 

  30. Hopkins, D. J. (2009). No more wilder effect, never a Whitman effect: When and why polls mislead about black and female candidates. The Journal of Politics, 71(3), 769–781.

    Article  Google Scholar 

  31. Huber, G. A., & Arceneaux, K. (2007). Identifying the persuasive effects of presidential advertising. American Journal of Political Science, 51(4), 957–977.

    Article  Google Scholar 

  32. Imai, K., King, G., & Stuart, E. A. (2008). Misunderstandings between experimentalists and observation lists about causal inference. Journal of the Royal Statistical Society: Series A, 171(2), 481–502.

    Article  Google Scholar 

  33. Issenberg, S. (2012). Obama Does It Better. Slate.

  34. King, G., Honaker, J., Joseph, A., & Scheve, K. (2001). Analyzing incomplete political science data: An alternative algorithm for multiple imputation. American Political Science Review, 95(1), 49–69.

    Google Scholar 

  35. Ladd, J. M., & Lenz, G. S. (2009). Exploiting a rare communication shift to document the persuasive power of the news media. American Journal of Political Science, 53(2), 394–410.

    Article  Google Scholar 

  36. Little, R. J., D’Agostino, R., Cohen, M. L., Dickersin, K., Emerson, S. S., Farrar, J. T., et al. (2012). The prevention and treatment of missing data in clinical trials. New England Journal of Medicine, 367(14), 1355–1360.

    Article  Google Scholar 

  37. Little, R. J. A., & Rubin, D. B. (2002). Statistical analysis with missing data (2nd ed.). New York, NY: Wiley.

  38. Matland, R. E., & Murray, G. R. (2013). An experimental test for backlash against social pressure techniques used to mobilize voters. American Politics Research, 41(3), 359–386.

    Article  Google Scholar 

  39. Michelson, M. R. (2014). Memory and voter mobilization. Polity, 46, 591–610.

    Article  Google Scholar 

  40. Moore, R. T. (2012). Multivariate continuous blocking to improve political science experiments. Political Analysis, 20(4), 460–479.

    Article  Google Scholar 

  41. Nicholson, S. P. (2012). Polarizing cues. American Journal of Political Science, 56(1), 52–66.

    Article  Google Scholar 

  42. Nickerson, D. W. (2005a). Partisan mobilization using volunteer phone banks and door hangers. The Annals of the American Academy of Political and Social Science, 601(1), 10–27.

    Article  Google Scholar 

  43. Nickerson, D. W. (2005b). Scalable protocols offer efficient design for field experiements Political Analysis, 13, 233–252.

    Article  Google Scholar 

  44. Nickerson, D. W. (2008). Is voting contagious? Evidence from two field experiments. American Political Science Review, 102(1), 49.

    Article  Google Scholar 

  45. Nickerson, D. W., & Rogers, T. (2010). Do you have a voting plan? Implementation intentions, voter turnout, and organic plan making. Psychological Science, 21(2), 194–199.

    Article  Google Scholar 

  46. Panagopoulos, C., & Green, D. P. (2008). Field experiments testing the impact of radio advertisements on electoral competition. American Journal of Political Science, 52(1), 156–168.

    Article  Google Scholar 

  47. Rogers, T., & Nickerson, D. (2013). Can inaccurate beliefs about incumbents be changed? And can reframing change votes? HKS Faculty Research Working Paper Series RWP13-018.

  48. Rogers, T., & Middleton, J. A. (2015). Are ballot initiative outcomes influenced by the campaigns of independent groups? A precinct-randomized field experiment showing that they are. Political Behavior, 37, 567–593.

  49. Rogers, T., & Aida, M. (2014). Vote self-prediction hardly predicts who will vote, and is (misleadingly) unbiased. American Politics Research, 42(3), 503–528.

    Article  Google Scholar 

  50. Rubin, D. B. (2008). For objective causal inference, design trumps analysis. The Annals of Applied Statistics, 2, 808–840.

  51. Rubin, D. B., & Schenker, N. (1991). Multiple imputation in health-care databases: An overview and some applications. Statistics in Medicine, 10(4), 585–598.

    Article  Google Scholar 

  52. Rubin, D., & Schenker, N. (1986). Multiple imputation for interval estimation for simple random samples with ignorable nonresponse. Journal of the American Statistical Association, 81(394), 366–374.

    Article  Google Scholar 

  53. Schafer, J. L. (1997). Analysis of incomplete multivariate data. London: Chapman & Hall.

    Google Scholar 

  54. Shi, Y. (2015). Cross-cutting messages and voter turnout: Evidence from a same-sex marriage amendment. Political Communication. (forthcoming).

  55. Siddique, J., & Belin, T. R. (2008a). Multiple imputation using an iterative hot-deck with distance-based donor selection. Statistics in Medicine, 27(1), 83–102.

    Article  Google Scholar 

  56. Siddique, J., & Belin, T. R. (2008b). Using an approximate Bayesian bootstrap to multiply impute nonignorable missing data. Computational Statistics & Data Analysis, 53(2), 405–415.

    Article  Google Scholar 

  57. Simon, H. A., & Stern, F. (1955). The effect of television upon voting behavior in Iowa in the 1952 presidential election. American Political Science Review, 49(2), 470–477.

    Article  Google Scholar 

  58. Sinclair, B. (2012). The social citizen. Chicago, IL: University of Chicago Press.

    Google Scholar 

  59. Sinclair, B., McConnell, M., & Green, D. P. (2012). Detecting spillover effects: Design and analysis of multilevel experiments. American Journal of Political Science, 56(4), 1055–1069.

    Article  Google Scholar 

  60. Taber, C. S., & Lodge, M. (2006). Motivated skepticism in the evaluation of political beliefs. American Journal of Political Science, 50(3), 755–769.

    Article  Google Scholar 

  61. Van Buuren, S., Brand, J. P. L., Groothuis-Oudshoorn, C. G. M., & Rubin, D. B. (2006). Fully conditional specification in multivariate imputation. Journal of Statistical Computation and Simulation, 76(12), 1049–1064.

    Article  Google Scholar 

  62. Vavreck, L., et al. (2007). The exaggerated effects of advertising on turnout: The dangers of self-reports Quarterly Journal of Political Science, 2(4), 325–343.

    Article  Google Scholar 

  63. Westfall, P. H., & Young, S. S. (1993). Resampling-based multiple testing: Examples and methods for p-value adjustment. New York, NY: Wiley.

    Google Scholar 

  64. Zaller, J. R. (1992). The nature and origins of mass opinion. New York, NY: Cambridge University Press.

    Google Scholar 

Download references


This paper has benefitted from comments by David Broockman, Kevin Collins, Eitan Hersh, Seth Hill, Michael Kellermann, Gary King, Marc Meredith, David Nickerson, Maya Sen, and Elizabeth Stuart. For research assistance, the authors gratefully acknowledge Julia Christensen, Zoe Dobkin, Katherine Foley, Andrew Schilling, and Amelia Whitehead. David Dutwin, Alexander Horowitz, and John Ternovski provided helpful replies to various queries.Earlier versions of this manuscript were presented at the 30th Annual Summer Meeting of the Society for Political Methodology at the University of Virginia, July 18th, 2013 and at Vanderbilt University’s Center for the Study of Democratic Institutions, October 18th, 2013.

Author information



Corresponding author

Correspondence to Daniel J. Hopkins.



Persuasion Script

Good Afternoon—my name is [INSERT NAME], I’m with [ORGANIZATION NAME]. Today, we’re talking to voters about important issues in our community. I’m not asking for money, and only need a minute of your time.

As you are thinking about the upcoming election, what issue is most important to you and your family? [LEAVE OPEN ENDED—DO NOT READ LIST]

If not sure, offer the following suggestions:

  • Iraq War

  • Economy/ Jobs

  • Health Care

  • Taxes

  • Education

  • Gas Prices/Energy

  • Social Security

  • Other Issue

Yeah, I agree that issue is really important and that our economy is hurting many families in Wisconsin. Do you know anyone who has lost a job or their health care coverage in this economy?

I understand that a lot of families are struggling to make ends meet these days.

When you think about how that’s affecting your life, and the people running for president this year, have you decided between John McCain and Barack Obama, or, like a lot of voters, are you undecided? [IF UNDECIDED] Are you leaning toward either candidate right now?

  • Strong Obama

  • Lean Obama

  • Undecided

  • Lean McCain

  • Strong McCain

[If strong McCain supporter, end with:] Ok, thanks for your time this evening. [If strong Obama supporter, end with:] Great, I support Obama as well, I know he will bring our country the change we need. Thanks for your time this evening.

[ONLY MOVE TO THIS SECTION WITH LEANING OR UNDECIDED VOTERS] With our economy in crisis, job and heath care loses at an all-time high, our country is in need of a change. But as companies are laying off workers and sending our jobs overseas, John McCain says that our economy is “fundamentally strong”—he just doesn’t understand the problems our country faces. McCain voted against the minimum wage 19 times. His tax plan offers 200 billion dollars in tax cuts for oil companies and big corporations, but not a dime of tax relief for more than a hundred million middle-class families. During this time of families losing their homes, McCain voted against measures to discourage predatory lenders and John McCain has never supported working families in the Senate and there is no reason to believe he will as President.

On the other hand, Barack Obama will do more to strengthen our economy. Obama will cut taxes for the middle class and help working families achieve a decent standard of living. Obama’s tax cuts will put more money back in the pockets of working families. He’ll stand up to the banks and oil companies that have ripped off the American people and invest in alternative energy. Obama will control the rising cost of healthcare and reward companies that create jobs in the U.S.

After hearing that, how are you feeling about our presidential candidates? What are your thoughts on this?

Obama will reward companies that keep jobs in the U.S., and make sure tax breaks go to working families who need them. Barack Obama offers new ideas and a fresh approach to the challenges facing Wisconsin families. Instead of just talking about change, he has specific plans to finally fix health care and give tax breaks to middle-class families instead of companies that send jobs overseas. Obama will bring real change that will finally make a lasting improvement in the lives of all Wisconsin families.

Now that we’ve had a chance to talk, who do you think you’ll vote for in November? John McCain and Barack Obama, or, are you undecided? [IF UNDECIDED] Are you leaning toward either candidate at this point?

  • Strong Obama

  • Lean Obama

  • Undecided

  • Lean McCain

  • Strong McCain

Thanks again for your time, [INSERT VOTER’S NAME], we appreciate your time and consideration.

Survey Questions

“Hi, I’m calling with [survey firm redacted] with a brief, one-minute, opinion survey. We are not selling anything and your responses will be completely confidential.

Now first, thinking about the election for President this November, will you be voting for Senator Barack Obama, the Democratic candidate, or Senator John McCain, the Republican candidate?

  1. 1.

    Obama: Thank you. [GO TO Q2]

  2. 2.

    McCain: Thank you. [GO TO Q2]

  3. 3.

    VOLUNTEER ONLY Undecided/Don’t Know/Other: Thank You. [GO TO Q1]

  4. 4.


If the election were held today and you had to decide right now, toward which candidate would you lean?

  1. 1.


  2. 2.


  3. 3.

    VOLUNTEER ONLY Completely Undecided

  4. 4.


Finally, for demographic purposes only, in what year were you born?” [Collect four-digit yea]

Additional Tables

Table 5 Experimental conditions
Table 6 Omnibus balance tests
Table 7 Balance in random assignment
Table 8 Balance in survey response assignment
Table 9 Survey response rate differences across phone call treatment for all turnout levels
Table 10 Non-parametric selection model results

A Formal Statement of Selection Bias

Here, we formalize the problem of sample selection. Doing so enables us to group estimators based on their underlying assumptions about how fully the observed covariates can account for the patterns of missing data.

The dependent variable of interest is \(Y_i^*\), support for Barack Obama. It is a function of the treatment (denoted as \(X_{1i}\)) and a vector of covariates (denoted as \(X_{2i}\)) that may or may not be observed. The treatment is randomized and is therefore uncorrelated with \(X_{2i}\) and the error terms in both equations below assuming a sufficient sample size.

$$\begin{aligned} Y_i^*= \,& {} \beta _0 + \beta _1X_{1i} + \beta _2X_{2i} + \epsilon _i \end{aligned}$$

We only observe the \(Y_i^*\) for those voters who respond to the survey, indicated by the indicator variable \(d_i\).

$$\begin{aligned} Y_i= \,& {} Y_i^* d_i \end{aligned}$$

The variable indicating that \(Y_i^*\) is observable is a function of the same covariates which affect \(Y_i^*\).

$$\begin{aligned} d_i^{*}\,=\, & {} \gamma _0 + \gamma _1X_{1i} + \gamma _2X_{2i} + \eta _i\\ d_i\,=\, & {} 1 \text{ if } d_i^* > 0 \end{aligned}$$

We assume the \(\epsilon\) and \(\eta\) terms are random variables uncorrelated with each other and any of the independent variables.Footnote 22 (Particular \(\beta\) or \(\gamma\) coefficients may be zero for variables that affect only selection or the outcome.)

We can re-write the equation for the observed data as

$$\begin{aligned} Y_i\,=\, & {} Y_i^*|_{d_i=1}\\ \,=\, & {} \beta _0 + \beta _1X_{1i}|_{d_i=1} + \beta _2X_{2i}|_{d_i=1} + \epsilon _i|_{d_i=1} \end{aligned}$$

The various statistical approaches for dealing with sample selection diverge regarding their assumptions about \(X_{2i}\). One common approach is to assume that \(X_{2i}\) is fully specified and observed. In such cases, we can predict the missing values for which \(d_i^* < 0\) using the observed data. Statisticians refer to this assumption as “missing at random” (Schafer 1997; King et al. 2001; Little et al. 2002). Under this assumption, we might then apply some form of multiple imputation, which leverages the observed covariances among the variables to impute potential values for the missing data. Given that \(X_{2i}\) is fully specified, multiple imputation can be employed to estimate missingness in an outcome variable, an independent variable, or both.

Other approaches to sample selection are unwilling to assume that \(X_{2i}\) is fully observed—in such cases, the data are instead assumed to have non-ignorable missingness. These approaches turn to other assumptions, typically concerning the process that generates the missing data. If \(X_{2i}\) is unobserved, \(\beta _2X_{2i}\) will become part of the error term in the \(Y_i\) equation and \(\gamma _2X_{2i}\) will become part of the error term in the \(d_i\) equation. While \(X_{1i}\) (the randomized treatment) and \(X_{2i}\) are uncorrelated in the whole population, they are not necessarily uncorrelated in the sampled population. To see this, note that

$$\begin{aligned} X_{1i}|_{d_i=1}\,=\, & {} X_{1i}|_{\gamma _0 + \gamma _1X_{1i} + \gamma _2X_{2i} + \eta _i>0}\\ X_{2i}|_{d_i=1} \,=\, & {} X_{2i}|_{\gamma _0 + \gamma _1X_{1i} + \gamma _2X_{2i} + \eta _i>0} \end{aligned}$$

The turnout case provides an example of how this bias can manifest itself. Suppose that the unobserved variable (\(X_{2i}\)) is unmeasured civic-mindedness, and it has a positive effect on whether someone responds to a pollster (implying \(\gamma _2>0\)) as well as a positive effect on Obama support (implying \(\beta _2>0\)). This would mean that in the observed data, the treated respondents would be more civically minded on average. Naturally, this could induce bias, as the treated, observed respondents are disproportionately high in civic-mindedness compared to observed respondents in the control group. This can explain the spurious finding in the surveyed-only column of Table 3. We know from the full data set that the treatment had no overall effect on turnout, but in the sub-sample of those who answered the follow-up survey, the canvass treatment is spuriously associated with a statistically significant positive effect.

Assuming \(X_{2i}\) is unobserved, two conditions must be met for sample selection to cause bias in randomized persuasion experiments with subsequent surveys:

  1. 1.

    \(\gamma _1 \ne 0\). This is necessary to induce a correlation between randomized treatment and some unobserved variable in the observed sample. This can be tested and, for our data, we found \(\gamma _1 < 0\) for low-turnout types and \(\gamma _1 >0\) for middle-turnout types.

  2. 2.

    \(\gamma _2 \ne 0\) and \(\beta _2 \ne 0\). In other words, given our characterization of the data-generating process, the error terms in the two equations are correlated.

If \(X_{2i}\) is not fully observed, the errors in the selection and outcome equations may be correlated. Heckman (1976) models such correlated errors by assuming that the errors in the two equations are distributed as bivariate normal random variables. This allows us to derive the value of the error term in the outcome equation conditional on being observed. Non-parametric selection models such as Das et al. (2003) approximate the conditional value of the error term with a polynomial function of the covariates. In practice, this involves fitting a first-stage model that produces a propensity of being observed. Powers of this fitted propensity are then included in the outcome equation.

Additional Estimation Strategies

Approximate Bayesian Bootstrap

Since non-random attrition threatens to bias listwise deletion models, we consider another imputation model that accounts for this possibility. In particular, we use hot deck imputation, which can be useful under three conditions satisfied by this experiment: when the missingness of interest is present primarily in a single variable, when the data contain many variables that are not continuous (Cranmer et al. 2013), and when there are many available donor observations (Siddique and Belin 2008b). Here, we employ the particular variant of hot deck imputation outlined in Siddique and Belin 2008b): an Approximate Bayesian Bootstrap (ABB) (see also Rubin and Schenker 1986; Rubin and Schenker 1991; Demirtas et al. 2007; Siddique and Belin 2008a). That approach has the added advantage that it can relax the assumption of ignorability in a straightforward manner by incorporating an informative prior about the unobserved outcomes.Footnote 23 These analyses focus on the 45,875 respondents who had Catalist phone match scores, although the results are similar when instead analyzing the full data set of 56,000 respondents.

Specifically, each iteration of the ABB begins by drawing a sample from the fully observed “donor” observations, which in our example number 12,439. This step allows the ABB to more accurately reflect variability from the imputation. One can draw the donor observations with equal probability in each iteration, which effectively assumes that the missingness is ignorable conditional on the observed covariates. But importantly, researchers can also take weighted draws from the donor pool, which is the equivalent of placing an informative prior on the missing outcome data (Siddique and Belin 2008b). This allows researchers to relax the ignorability assumption, and to build in additional information about the direction and size of any bias.

Irrespective of the prior, we then build a model of the outcome using the covariates for the respondents with no missing outcome data, being sure to weight the donor observations by the number of times they were drawn in each iteration of the bootstrap. The subsequent step is to predict \(\hat{Y}\) for all observations—both donor and donee—by applying that model to the covariates X. For each observation with a missing outcome—there are 33,025 in this example—we next need to draw a “donor” observation that provides an outcome. Following Siddique and Belin (2008b), we do so by estimating a distance metric for each observation i as follows: \(D_i = (|\hat{y}_0-\hat{y}_i|+\delta )^k\), where \(\delta\) is a positive number which avoids distances of zero.Footnote 24 For each missing observation, an outcome is imputed from a donor chosen with a probability inversely proportional to the distance \(D_i\). As k grows large, note that the algorithm chooses the most similar observation in the donor pool with high probability, while a k of zero is equivalent to drawing any observation with equal probability.Footnote 25

Unlike a single-shot hot deck imputation, this approach does account for imputation uncertainty—and here, we fit our standard logistic regression model to 5 separately imputed data sets and then combine the answers using the appropriate rules (Rubin and Schenker 1986; King et al. 2001). Yet there is an important potential limitation to this technique. While running the algorithm multiple times will address the uncertainty stemming from the imputation of missing observations, it will not address the uncertainty stemming from small donor pools—and the reweighting in the non-ignorable ABB has the potential to exacerbate this concern (Cranmer et al. 2013).Footnote 26

We first run the Approximate Bayesian Bootstrap assuming ignorablility (which means the prior is zero) and setting \(k=3\). Table 11 shows that, as we reported in the manuscript, such a model estimates the average treatment effect of canvassing to be −1.65 percentage points, with a corresponding 95% confidence interval from −3.29 to −0.01. That estimate is similar to those recovered using listwise deletion. We also report additional results after adding an informative prior which reduces the share of respondents who back Obama from 57.5% in the observed group to 54.0% in the unobserved group. We chose the magnitude of the decline–3.5 percentage points–to approximate the largest decline in survey response observed across any of the turnout groups. In other words, in light of the differential attrition identified above, 3.5 percentage points is a large but still plausible difference between the observed and unobserved populations conditional on observed covariates. Here, the estimated treatment effect becomes −1.73 percentage points, with a 95% confidence interval from −3.34 to −0.05. This result is essentially unchanged from the result with no prior. The table then presents various combinations of the prior and the k parameter, with little difference across the specifications except that reducing k below two (which means we are reducing the penalty for matching less similar observations) appears to increase the uncertainty regarding the estimated treatment effect. We also report results using all observations with, again, similar results.

Table 11 Overview of all results

Inverse Propensity Weighting

Inverse propensity weighting (IPW) is an alternative approach to dealing with attrition that uses some of the same building blocks as multiple imputation: it leverages information in the relationships among observed covariates to reweight the observed data such that they approximate the full data set (Glynn et al. 2010).

Specifically, we first use logistic regression on the full sampleFootnote 27 to estimate a model of survey response. We employ the same model specification as above, with the exception that we drop our measure of age because it has substantial missingness. From the model, we generate a predicted probability of survey response for each respondent, estimates which vary from 0.13 to 0.35. For the 12,439 fully observed respondents, we then calculate the average treatment effect of canvassing, weighted by the inverse predicted probability of responding to the survey. Doing so, the estimated treatment effect of canvassing is −1.79 percentage points, with a 95% confidence interval from −3.52 to −0.05 percentage points.

Heckman Selection

Heckman selection models assume that the errors in the selection equation and outcome equation are distributed bivariate normally. With this assumption, the expected value of the error in the outcome equation conditional on selection can be represented with an inverse Mills’ ratio. There is considerable disagreement in the literature about the appropriateness of this assumption. Some find it implausible, given that the key assumption is about the joint distribution of unobserved quantities. Others find the approach more plausible than assuming away the correlation of errors across selection and outcome equations as is done in other selection models.

Table 12 shows results from several specifications of a Heckman selection model. In the first column no additional controls are included. In the second column, the controls listed at the bottom of the table are included. In the third column, the sample is limited to those who voted in 2 or fewer previous elections in the dataset. The results are qualitatively similar to the non-parametric selection model. The significant (or nearly so) \(\rho\) parameter indicates that there is some modest correlation between errors in the two equations. A statistically significant \(\rho\) parameter indicates that the errors are correlated, a necessary, but not sufficient condition for selection bias. In this case, since the estimates are similar to methods that assume no correlation of errors, there does not appear to be selection bias.

Table 12 Heckman selection model results

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Bailey, M.A., Hopkins, D.J. & Rogers, T. Unresponsive and Unpersuaded: The Unintended Consequences of a Voter Persuasion Effort. Polit Behav 38, 713–746 (2016). https://doi.org/10.1007/s11109-016-9338-8

Download citation


  • Field experiment
  • Political campaigns
  • Political persuasion
  • Non-random attrition
  • Survey response