OLS and 2SLS estimates
In this section, we estimate the effect of COVID-19 incidence on voting behavior using OLS and 2SLS. We focus on COVID-19 cases in the main analysis. We note again that our analysis and choice of control variables were fully detailed in our pre-analysis plan. Table 2 contains OLS estimates of Eq. 1 (columns 1–3). The sample size is 2689 observations (i.e., counties).Footnote 26 The dependent variable is the differential in vote for Donald Trump in 2020 and 2016. A positive value indicates that Trump received more votes in 2020 than in 2016. We report standard errors clustered at the state-level. The variables of interest are the cumulative numbers of COVID-19 cases per 100,000 inhabitants.
What clearly emerges is that COVID-19 cases are negatively related to votes for Trump during the 2020 presidential election in comparison to the 2016 election. In column 1, we include state fixed effects and our set of demographic and socioeconomic controls. We find that a county with 100 more COVID cases per 10,000 people (as compared to others in the same state) reduced its Trump vote share from 2016 to 2020 by an additional 0.12 percentage points on average. The point estimate is statistically significant at about the 10% level.
In column 2, we add to the model our indicator of social distancing, i.e., time spent at workplaces in April 2020. Column 3 is our most extensive model specification. We saturate our model with all the previous controls and state fixed effects. In addition, we add to the model the unemployment change from before to during COVID-19. The magnitude of the estimates and statistical significance remain the same.
In column 4, we add to the model a quadratic term of COVID-19 cases. The quadratic term of COVID-19 cases is positive and statistically significant at the 5% level, but very small in magnitude (6.10e-08). In contrast, the magnitude and sign of the coefficient of our variable of interest, cumulative COVID-19 cases per 10,000, remain the same. This result suggests that the negative effect of additional COVID-19 cases on Trump’s 2020 vote share becomes slightly smaller as the number of cases increases.
Of note, the coefficient of unemployment change (August 2019 to August 2020) is small and statistically insignificant.Footnote 27 Our results thus suggest that job losses during the pandemic are not significantly related to voting behavior and that increases in the unemployment rate does not seem to be a major factor behind the negative effect of COVID-19 on the share of votes for Trump. A possible explanation is that job losses are triggered by policies, e.g., lockdowns, implemented by the states, which Trump opposed or at the very least for which Trump cannot be held directly responsible.
The coefficients for some of the other control variables are worth discussing (not shown for space consideration). We find that the share of women is strongly negatively correlated with the change in vote share for Trump. Similarly, Trump seems to have lost vote share in counties with a high share of adults aged 25–54.
Our OLS results provide suggestive evidence that the pandemic affected the 2020 presidential election. The main concern with our OLS estimates is that omitted variables could be related to both COVID-19 incidence and differential voting behavior in the 2016 and 2020 presidential elections. We now turn to our instrumental variable strategy.
In Table 2 (columns 5–7), we present the first stage (panel A) and the two-stage estimates (panel B) of specification (2) in which we instrument COVID-19 incidence in the first stage by the share of employment in meat-processing factories. We control for our usual set of fixed effects and control variables. As shown in Fig. 3, we find that the share of employment in meat-processing factories is strongly positively correlated with COVID-19 incidence. The coefficient is always significant and the F-statistics indicate no concern of a weak instrument.
Our second-stage estimates are presented in the bottom panel (columns 4–6). We find that counties with more COVID-19 cases substantially decreased their vote share for Trump in 2020. The 2SLS estimates are larger than the OLS estimates and suggest that a county with 100 more COVID cases per 10,000 people (as compared to others in the same state) reduced its Trump vote share from 2016 to 2020 by an additional 1.2 percentage point on average.Footnote 28 The point estimates are statistically significant at the 1% level and robust to the inclusion of our large set of controls and the share of manufacturing employment as well as the share of employment in food manufacturing.
So far, our analysis has underscored an important finding: the COVID-19 pandemic costs Trump votes. But is this effect large enough to have changed the outcome of the 2020 presidential election? To answer this question, we conduct a simple counterfactual exercise to determine the magnitude of the effect by exploring how the composition of votes in a number of closely contested states would have differed if there had been fewer COVID-19 cases. The computation of the counterfactual is based on the coefficient estimate in column 1 of Table 2. For each county, we compute the fraction of total votes that Trump would have received if the number of COVID-19 cases had been X% smaller as − 0.0012 ×COVIDc × X% – i.e., the point estimate of the effect of COVIDc on Trump’s vote share from the OLS estimates, the size of each county’s measured COVID-19 cases, and the scaling factor X%. We next multiply this product by the number of total votes in a county to calculate the number of additional votes that Trump would have received in the counterfactual scenario. We then aggregate these county-level votes into state totals. To allow for the margin of error in our counterfactual calculations, we use the lower and upper bounds of our estimate (i.e., 0.0012), using the 90% confidence interval. We report these bounds in parenthesis.
Table 3 presents the results of this counterfactual analysis. Column 1 shows the actual vote margin in favor of Biden in the 2020 election for a set of closely contested states. The three subsequent columns show counterfactual outcomes had COVID-19 cases been 5% or 10% or 20% fewer. Since we find that the COVID incidence decreased Trump’s vote share, the counterfactual analyses for fewer COVID-19 cases correspondingly increase Trump’s counterfactual vote totals. The results in Table 3 show that, ceteris paribus, Trump would have won Michigan in a counterfactual scenario with 20% fewer cases. Trump would have won Pennsylvania with 10% fewer COVID-19 cases. He would have won Arizona, Georgia, and Wisconsin, with 5% fewer COVID-19 cases. Under this last counterfactual, Trump would have been reelected. Even if we consider the lower bound calculations, which are very conservative, Trump would have kept the presidency with 21% fewer cases.
We investigate heterogeneous effects of COVID-19 on voting in Table 4. In columns 1 and 2, we first investigate whether the effect of COVID-19 incidence is larger/smaller for states that have implemented a stay-at-home order than for states that did not implement such a policy during the pandemic. Data on stay-at-home order comes from Raifman et al. (2020) and only include directives/orders for the entire state, i.e., did not include guidance or recommendations. Of note, all states without a stay-at-home order are states that Trump won in 2016. Our 2SLS estimates suggest that COVID-19 had a larger effect in states that did not implement a stay-at-home order during the pandemic. This result seems to suggest that if Trump had taken the pandemic more seriously and had placed more emphasis on health and safety issues, he would have lost less electorally and he would have had higher chances to get re-elected.
Columns 3 and 4, we document the relationship between COVID-19 incidence and the differential in vote for Trump in 2020 and 2016, for Trump’s and his opponent Hillary Clinton’s states separately. We define states as Trump’s or Clinton’s using the electoral votes for the 2016 US presidential election.Footnote 29 We find that the negative effect of COVID-19 cases on Trump’s vote is driven by those states that he won in the 2016 presidential election. The magnitude of the coefficient is about 50% larger than the magnitude of the coefficient in the entire sample.Footnote 30 In contrast, the coefficient of COVID-19 cases is small, positive, and not significant in those states that Clinton won in the 2016 presidential election (column 2).Footnote 31
Columns 5 and 6 restrict the sample to swing and non-swing states.Footnote 32 Our results indicate that the negative effect of COVID-19 cases on Trump’s vote is almost twice as large in swing states as it is in non-swing states.
Columns 7 and 8 restrict the sample to urban and rural counties, respectively. We define a county as “urban” (“rural”) if over (below) 50% of its population was living in an urban area in 2010 (US Census). Our results show that urban counties drive the negative effect of COVID-19 cases on Trump’s vote. Indeed, the effect is negative and significant in the urban sample, whereas it is smaller and statistically insignificant in rural counties.
In Table 5, we investigate heterogeneity by county demographic characteristics. We find that negative effect of COVID-19 cases is stronger for countries below the median percentage of residents aged 65 than for counties above the median percentage of residents aged 65. Our estimates also indicate that the negative effect of COVID-19 cases is stronger in more racially diverse counties (i.e., those with white population shares below the median). Furthermore, our findings show that the negative effect of COVID-19 cases on Trump’s vote is driven by less educated counties (i.e., those with a below-median share of residents with college degrees), which may help explain Biden’s victory in the Rust Belt.
We now check whether our results are robust to the use of COVID-19 deaths instead of cases. Table 6 shows our estimates. We do not find any evidence that COVID-19 deaths are related to changes in voting behavior from the 2016 to the 2020 presidential election with our OLS model. The estimates are all statistically insignificant. For our 2SLS estimates, our first stage is weaker than for cases, with F-statistics ranging from 2 in the less parsimonious model to 6 in our model with the full set of controls. The 2SLS estimates are all negative and of similar magnitude as our 2SLS estimates for cases, but more imprecise with only the estimate in column 6 being statistically significant at conventional levels.
The fact that our 2SLS estimates are of about the same magnitude for cases and deaths suggests that our conclusions are similar when using deaths instead of cases. Nonetheless, two differences are worth mentioning. First, our instrumental variable is only weakly related to COVID-19 deaths.Footnote 33 The probability that a COVID-19 infection results in death rises dramatically with age, and we expect that this and other factors such as healthcare coverage may contribute to the divergence in estimated effects. Second, it is plausible that voters are less aware or less likely to know someone who has died of COVID-19 than to know someone who has tested positive for COVID-19.
One of the defining outcomes of the 2020 presidential election was the record-high turnout. Both presidential candidates would had won any previous elections, given their number of votes at the national-level. We use differences in total votes between the 2016 and 2020 presidential elections as a rough proxy of turnout. We run the same model specification as in Eqs. 1 and 2. We show the results in Table 7.
We find no evidence that COVID-19 cases affected voters’ mobilization in the OLS estimates.Footnote 34 On the contrary, there is some evidence that the incidence of COVID-19 boosts turnout in the 2SLS. These conflicting results on the effect of the pandemic on mobilization are also found in previous studies that explore this topic in different electoral contexts (Giommoni and Loumeau 2020; Fernández-Navia et al. 2020).Footnote 35
We provide many robustness checks for our 2SLS results in Baccini et al. (2020). For instance, we add to the models well-known predictors of voting behavior or COVID-19 incidence. We show that our results are robust to controlling for the China shock variable (Autor et al. (2020)) and two variables capturing Chinese tariffs and protection by US tariffs at the county-level from Kim and Margalit (2021). The rationale for including these variables has been explained in the previous section.
We also show that our estimates are robust to the inclusion of weather controls such as precipitation and air pollution (i.e., PM2.5 and precipitation for the first months of the pandemic),Footnote 36 the share of employment in nursing care facilities,Footnote 37 county-to-county (in and out) migration, and the duration (in days) of the following statewide non-pharmaceutical interventions: stay-at-home orders, mandatory face mask policies, day care closures, freezes on evictions, and mandated quarantine for out-of-state individuals.Footnote 38 Overall, the inclusion of one or all of these control variables has no effect on the magnitude and significance of our 2SLS estimates.
We also check whether our OLS and 2SLS point estimates vary if we change the date for the moment in which we calculate the cumulative number of COVID-19 cases. As stated in our pre-analysis plan, we rely on October 22nd for our main analysis. In a set of robustness checks, we instead rely on July 1st, August 1st, September 1st, and August 1st. The estimates for the OLS are all larger and more significant than for our baseline, i.e., cases as of October 22nd, suggesting that we are very conservative in estimating the relationship between COVID-19 cases and the differential in votes for Trump. For the 2SLS, the point estimates all range from − 0.011 to − 0.013 and are statistically significant at the 1% level. Similarly, changing the start period to April or May, i.e., excluding cases that occurred early on in the pandemic, has no impact on our conclusions.
Last, we show that our conclusions are robust to using a different geographical level for the analysis. In Baccini et al. (2020), we replicate our main analysis using commuting zones as our unit of analysis. Commuting zones are significantly larger than counties, which provides the advantage that the distribution of employment in the meat-processing industry is less limited geographically, since many commuting zones have at least one meat-processing factory.Footnote 39