Detecting Dividing Lines in Turnout: Spatial Dependence and Heterogeneity in the 2012 US Presidential Election

US voters have been moving further and further apart, most notably in terms of partisanship. This trend has led to a strong geographic concentration of voters’ preferences. We look at how turnout shows a similar pattern by jointly addressing two features of the data: spatial autocorrelation and heterogeneity of the observed units. Results obtained through a spatial lag regression tree procedure for the 2012 US presidential elections allow us to identify twelve groups of counties with similar characteristics. We find that (i) close counties behave similarly in terms of turnout; (ii) across various groups of counties, some variables have different statistical significance (or lack of it, such as household income and unemployment), and often different signs (such as the shares of adherents to congregations, Blacks, and Hispanics, and urban population). These results are useful for targeting geographically based groups in get out the vote operations.


Introduction
In November 2020, Joe Biden won the presidential election in Georgia-for the first time since 1992-and in January 2021, the two Democrat candidates for the US Senate won the runoffs in the same State. Pundits credit these outcomes to two concurring forces: the shift from Republican to Democrat in the suburbs of Atlanta and the increased turnout of the Black population. 1 The former is the local incarnation of a wide liberal shift of well-educated voters. The latter has been nurtured by former state representative Stacey Abrams and her New Georgia Project over the last decade. 2 These recent examples show how turnout has become a decisive and divisive issue in US politics. In the last two decades, the US electorate is becoming increasingly polarized in socio-demographic, economic, and ideological terms (among others, Boxell 2020). This ongoing polarization is reflected by the growing and deeper divisions between Republicans and Democrats on fundamental political values, such as government, abortion, race, immigration, national security, and environmental protection. According to the Pew Research Center (2017), the average partisan gap has increased from 15% in 1994 to 36% in 2017. One of the most significant consequences of this tendency is the entrenchment of political partisanship in different areas, with cities becoming more liberal and rural areas more conservative (Storper 2018;Durkan 2021). Several causes have been identified for this increase, such as political activism, election policies, in-group bias, and media bubbles (see among others Iyegar et al., 2019). 3 This premise contributes to defining the research question this paper aims to investigate: Do socioeconomic and demographic dividing lines (i.e., heterogeneity) take place also for the decision to vote? 4 We focus on the geographical parcellation of the electorate into several homogenous subgroups scattered around the country by analyzing the non-linearity effects of the determinants of voter turnout (i.e., the effects producing a sort of "polarization" of voters along with the determinants of electoral behavior) in combination with the spatial dependence of voter turnout (i.e., the "neighborhood effect"). We investigate these two issues by relying on recent work by Wagner and Zeileis (2019) combining the regression tree from Zeileis et al. (2008) with a spatial autoregressive parameter. The integration of these two methodologies enables the non-homogeneity of regression coefficients to be dealt with, thereby capturing the role of some critical social junctures. In addition, the potentially distortionary effects of spatial dependence can be coped with.
The electoral turnout at the county level in the 2012 US presidential elections provides the empirical setting for our research question. The results confirm a complex multiregime picture of the determinants of electoral participation. The choice of 2012 and not a more recent election allows us to show that this process has been already in the making for quite some time, before becoming more clearly visible.
By answering our research question, we contribute to the extant literature in two ways. First, when dealing with fine geographical data, two possible concerns may plague the relationship between voter turnout and its covariates. On the one hand, the relationship between turnout and its covariates may not be the same for all the units that are observed, raising the possibility of non-linearity. On the other hand, despite the large set of covariates one may consider, there may still be omitted factors showing a geographical or spatial component that influence electoral participation (Moretti 2012). Research on voter turnout in US presidential elections has increasingly underlined the role of geographical influences, hinting at voting behavior as a result of a multidimensional process that occurs in space and crucially reflects-and is mediated by-the social and geographical environment where individuals are located and interact (Agnew 1987;Pattie and Johnston 2000). However, incorporating heterogeneity has received little attention in the current research on electoral participation.
Second, considering heterogeneity in voter turnout in US presidential elections may help to better understand how local disparities translate into the election outcomes and can be used by the competitors to propose targeted electoral policies to win the elections, especially in closely contested states or electoral districts.
The paper is structured as follows. The "Literature Review" section reviews the existing literature, the "Empirical Strategy, Methodology, and Data" section describes the empirical strategy and data, the "Results" section illustrates the results, and the "Conclusions" section concludes.

Literature Review
Elections are a central feature of democracy, and scholars have long tried to identify and explain the variation in voter turnout. Decades of theoretical and empirical research have shown the latter to be mainly influenced by socio-economic, cultural, demographic characteristics, and politico-institutional environment at the individual as well as at aggregate level (see Cancela and Geys 2016).
Complementing this approach and drawing on the tradition of political geography, a growing number of studies have been paying attention to the role of geographical influences on electoral turnout (among others, Agnew 1987;Pattie and Johnston 2000). Voters do not cast their ballots regardless of where they live. Their voting behavior is rather the outcome of a complex process that occurs in space and is influenced and mediated by their respective environments.
From this perspective, neighborhood processes, and sociogeographical and household interactions Cutts 2008, 2012;Galster 2012) produce particular political traditions, practices, and outcomes (Gimpel et al. 2003), and promote a shared outlook, which translates into widely held attitudes about the value of political and electoral participation. The decision about whether to vote and how to exercise one's franchise is influenced by local information exchanges. In sum, "people who talk together, vote together" (Pattie and Johnston 2000).
Several scholars have analyzed the determinants of voting in the US (among others Kahane 2009;Cann and Cole 2011). Within this strand of literature, an emerging line of research uses spatial econometrics techniques to reveal the 3 The appropriate definition of polarization as well as the extent of its increase are highly debated in the literature and conflicting views have emerged in the past (see Glaeser and Wards, 2006;Abramowitz and Saunders 2008;Fiorina and Abrams 2008;Prior 2013). Recent data, however, paint a richer and more nuanced picture of political polarization than previous discussion have suggested (see Boxell et al. 2017). 4 To avoid confusion, we stress that we do not deal with partisan polarization. influence of local contests and interactions on turnout. A number of studies have also explored the geography of the US electoral behavior through a spatial econometric methodology to probe the existence of spatial dependence. Kim et al. (2003) test the hypothesis of reward-punishment and issue-priority voting behavior in the presidential elections from 1988 to 2000 using a spatial lag model in a Bayesian framework and find significant spatial dependence. Tam Cho and Rudolph (2008) investigate the spatial structure of political participation through a spatial lag model of political participation across thirty-two cities and eighteen states, covering every region of the nation. Their findings show that spatial proximity influences voter turnout. Moreover, the spatial structure of electoral participation is consistent with a contagion process that occurs irrespective of involvement in social networks. Lacombe et al. (2014) focus on the potential role that spillovers may exert in explaining voter turnout in the 2004 presidential election. Exploiting advances in Bayesian computation, they compare the normal a-spatial linear model, the spatial lag model, the spatial Durbin model and the spatial Durbin error model, and find that the latter is the most appropriate empirical model. Their results show the existence of direct and indirect effects from the set of explanatory variables on voter turnout. Furthermore, several variables traditionally shown to affect voter turnout (i.e., per capita income and the county unemployment rate) are not associated with turnout at the county level.
These contributions, although revealing important insights into voting behavior by recognizing that space and context matter, assume that the relationship between voter turnout and the explanatory variables is linear. This assumption, which is already restrictive when analyzing local voters and politics more generally (see Calvo and Escolar 2003), is even more limiting when focusing on the American electorate, which is becoming more heterogeneous and polarized in terms of demographic, socio-economic, and cultural characteristics (Glaser and Ward 2006;Boxell et al. 2017). 5 In an increasingly polarized landscape, incorporating heterogeneity in presidential election models could capture a crucial feature for the understanding of voting behavior. 6

Empirical strategy, Methodology, and Data
To address the research question, our empirical strategy consists of applying a spatial autoregressive regression tree methodology (Wagner and Zeileis 2019). 7 Such a methodology applies a regression tree approach after filtering out spatial autocorrelation in the independent variable. The regression tree approach, initiated by Morgan and Sonquist (1963), is a recursive algorithm that checks parameter instability, i.e., non-linearities, in each covariate, and splits the sample accordingly. In detail, in the first step, the algorithm splits the sample into two according to the threshold on that particular covariate that allows minimizing the global residual sum of squares. After splitting, the algorithm proceeds recursively to further split the resulting subsamples by the same method, creating more child nodes and continuing until no further non-linearities are found or until a stopping rule becomes binding. This enables multiple regimes from a set of control variables to be endogenously identified. However, when using geo-referenced data such as ours, spatial autocorrelation in the residuals may occur, leading the regression tree-and more generally a regression analysis-to biased and/or inefficient results (Anselin 1988).
To jointly identify the determinants of voter turnout we start from the spatial lag (SAR) model estimated for the full sample, following Anselin (1988): where y is the vector corresponding to the independent variables, W is a N × N spatial weight matrix, where N is the number of observations, X is the matrix of independent variables, β is the vector of coefficients and ε is the vector of i.i.d. residuals. Finally, the parameter ρ is the spatial lag coefficient, which varies between the minimum and maximum value of the eigenvalue extracted by W, typically around − 1 and 1, and measures the strength of the spatial dependence. The initial SAR model is estimated via a maximum likelihood estimator.
After Eq. (1) is estimated, the spatial lag model can be rewritten as follows: where I is an identity matrix and Y * = (I − )Y is the spatially filtered dependent variable, i.e., with the effect of autocorrelation taken out (Anselin and Bera 1998). Subsequently, the estimation of Eq. (2) is performed through ordinary least squares (OLS), which allows us to apply a standard (aspatial) regression tree approach. In our case, the algorithm is set to stop when it finds no additional significant non-linearities at the 5% level or the sample is smaller than 100 counties. As widely known in spatial econometric literature, a benchmark generalized model is the spatial Durbin (SDM). However, we did not apply such a model for the following reasons: first, we wanted to rely as much as possible on the framework proposed by Wagner and Zeileis (2019), which does not include the spatial lag of the independent variables; second, Wagner and Zeileis (2019) methodology aims at filtering spatial dependence in the dependent variable, not in preserving geographic proximity between units. As a consequence, given that even contiguous counties could belong to different nodes, we find it difficult to justify theoretically the inclusion of spatial lags based on the average values of contiguous counties. Finally, the inclusion of the spatial lags of the independent variables doubles the variables for which non-linearities can be found, complicating the interpretation of the outcomes of the model. The algorithm aimed at identifying non-linearities in the covariates is based on Zeileis et al. (2008). According to the authors the parametric model in Eq. (2) can be written as Ϻ(Y * ; β) with observations Y * ∈ y and a k-dimensional vector of parameters β ∈ Θ. Given n observations Y * i = (i = 1; …; n) the model can be fitted by minimizing some objective function ψ(Y * ; β) yielding the parameter estimate In our case, as the estimates are performed via OLS, is the error sum of squares, so the observations Y * are normally distributed with mean and covariance matrix Σ : Y * ~ N( ;Σ ) with the combined parameter vector β = ( ;Σ).
So, given that in our specific case it is unreasonable to assume that a single global model Ϻ(Y * ; β) for all n observations, we partition the observations with respect to covariates such that a well-setting model can be found locally in each cell of the partition. To achieve this aim Zeileis et al. (2008) propose a recursive partitioning approach based on ℓ partitioning variables Z j ∈ Z j (i = 1; …; ℓ), in our case X i ≡ Z j , to adaptively find a good approximation of this partition.
More formally, Zeileis et al. (2008) assume that a parti- Given the correct partition {ẞ b } the estimation of the parameters {β b } that minimize the global objective function can easily be achieved by computing the locally optimal parameter estimates ̂ b in each segment ẞ b . However, if there are more partitioning variables (ℓ > 1) and {ẞ b } is unknown, minimization of the global objective function , requires a greedy forward search where the objective function can at least be optimized locally in each step. In particular, the algorithm has the following steps: 1. Fit the model once to all observations in the current node by estimating ̂ via minimization of the objective function. This means, recalling Eq. (4), that where (Y * ; ) = ψ(Y * ; ) is the score function or estimating function corresponding to ψ(Y * ; ). 2. Assess whether the parameter estimates are stable with respect to every ordering Z 1 ;…; Z ℓ . If there is some overall instability, select the variable Z j associated with the highest parameter instability, otherwise stop. To achieve this aim, it is necessary to check whether the scores fluctuate randomly around their mean 0 or exhibit systematic deviations from 0 over Z j . These deviations can be captured by the empirical fluctuation process: where (Z ij ) is the ordering permutation that gives the antirank of the observation Z ij in the vector Z j = (Z 1j ,…, Z nj ) T . Thus, W j (t) is simply the partial sum process of the scores ordered by the variable Z j , scaled by the number of observations n and a suitable estimate Ĵ of the covariance matrix COV( Y * ;̂ ) . This empirical fluctuation process is governed by a functional central limit theorem (Zeileis and Hornik 2007) under the null hypothesis of parameter stability: it converges to a Brownian bridge W 0 . A test statistic can be derived by applying a scalar functional λ(.) capturing the fluctuation in the empirical process to the fluctuation process λ(W j (.)) and the corresponding limiting distribution is just the same functional (or its asymptotic counterpart) applied to the limiting process λ(W 0 (.)). 3. Compute the split point(s) that locally optimize ψ, either for the adaptively chosen number of splits. 4. Split the fitted model with respect to variable Z j* into a segmented model with B segments and repeat the procedure.
Drawing on the literature on the determinants of turnout and considering that local parameters are estimated for each node ℓ, we estimate the following regression: where c refers to an individual county, for a total of 3046, 8 and Turnout * c = (I − W)Turnout c is the spatially filtered turnout where is obtained as described in Eq. (1). The choice of the spatial weight matrix W has been performed checking for nine different weighting schemes: k-nearest neighbors of order 5 to 10, a Queen contiguity matrix an inverse distance matrix and an inverse distance matrix with a cut-off at 200 km. We estimate our regression tree for each of them and we ended up with the model with the lowest Akaike Information Criterion (AIC), which corresponds to a W with a weighting scheme equal to a k-nearest neighbor of order 9. As customary, the W has been row-standardized.
The variables can be described as follows 9 : 1) Turnout, the dependent variable, is the share of the county voting-eligible population (VEP), which is registered and legally empowered to cast its vote in presidential elections. Using VEP instead of VAP (voting age population) to measure turnout corrects for the number of ineligible felons and non-citizen residents. Nevertheless, the use of these two different types of data is controversial as both measurement methods have their biases (Holbrook and Heidbreder 2010). We, therefore, estimate the models also using the citizen voting age population (CVAP). The results are presented in the Appendix (Table A3 and Fig. A2). (7) 2) The vectors EDU and ECO include variables aiming to capture the role of education (Less than graduate and University education) and the economic system (Unemployment and Household income) respectively. 3) The vector SOCIODEM contains several socio-demographic variables (Adherents, Urban population, Hispanics, Blacks, and Veterans). Blacks and Hispanics indicate the percentage of a county's population in the respective racial category.
The description, the summary statistics and the source of all the data are set out in Tables 1 and 2. Table A1 in the Appendix presents the correlation matrix of the variables considered. Figure 1 shows that the distribution of turnout across US counties is not random in space, as indicated by    Table 3 details the results of our model for the eleven terminal nodes identified by the regression model. The spatial lag coefficient is positive and statistically significant (therefore the "neighborhood effect" is in play). 11 Moran's I on residuals is positive and significant in both the standard (Table A2 and Fig. A1) and spatial lag regression tree model (Table 3). However, the very low value (0.04) of the index in the latter model indicates that biased results with respect to spatial autocorrelation are unlikely.

Results
The spatial lag regression tree is also preferred to the standard regression tree model in terms of the AIC. 12 We find that political participation is geographically clustered. Although this is in line with Tobler's First Law of Geography (Tobler 1970), this is likely the result of several factors, such as labor and capital mobility, common culture, social involvement, as well as social networking (Lacombe et al. 2014).
The most significant non-linearities are found for median household income, the root or father node (the cut is at $10,557), followed by Black population (the cut is 2%) and subsequently for urban population (91.1%) and university graduates (43.7%). This suggests that income dominates the share of the Black, the less than graduate and the urban populations as the main identifier for multiple regimes. The economic conditions of voters, as well as race and ethnicity, are among the most important social and cultural divisions within the American electorate and their impact on voter turnout is a persistent finding in the literature (Fraga 2016;McCarty et al. 2016). Nonetheless, within the eleven subsets of counties, the results underscore the different signs and significance of the determinants of turnout (Fig. 2).
Node 3 includes 371 counties with median household income lower than $10,557 and a Black population lower than 2.4%. In this node Hispanics, household income and Veterans increase voter turnout while Blacks, the urban population and the percentage of the unemployed have a negative impact on voter turnout.
Node 5 (103 counties) includes a subset of counties with the same income characteristics as before, a Black population higher than 2.4% and a percentage of people with less than graduate education below 49.7%. In this subgroup, the higher the Black population and household income, and the fewer the graduates and the unemployed, the higher the voter turnout. In this node, a larger share of the population living in urban areas reduces voter turnout. Fig. 1 Voter turnout in the 2012 US presidential elections across counties. Source: our elaboration based on uselectionatlas. org 10 The index measures the correlation of turnout in a county with the average of the neighboring counties. Where it shows positive (negative) and significant sign, similar (dissimilar) counties in terms of turnout are located close to each other. 11 As in Wagner and Zeileis (2019), we do not compute direct, indirect, and total effects, as customary in literature. In fact, as highlighted in the previous section, the initial estimation of the spatial lag model aims at filtering spatial dependence in the independent variable in such a way that a standard regression tree can be estimated. In this way, non-linearities in the independent variables can be accounted for minimizing the risk of having residual spatial autocorrelation that may affect the results. 12 AIC is used as a measure of goodness of fit as it is computed for the overall model, while R-squared is not computed for the overall model, but for each node.
Node 6 (346 counties) differs from the previous node with a percentage of people with less than graduate education higher than 49.7%. In this subset, the results show that the higher the percentage of people affiliated to a group, Blacks and Veterans the higher the voter turnout while Hispanics, the urban population, less than graduate or with university education are negatively associated with voter turnout.
Node 9 (144 counties) encompasses a group of cases of median household income higher than $10,557, Black population lower than 2%, and a percentage of less than graduates below 43.7%. In this terminal group, as the Hispanics, household income, and the number of Veterans increase, voter turnout improves.
Node 11 (138 counties) includes a subset of counties with the same income characteristics as before, a Black population lower than 2%, a percentage of people less than graduate above 43.7% and a percentage of people affiliated to a group below 30.7%. In this node, median household income and the Veterans increase turnout while Blacks, urban population, and less than graduate and university graduate do not.
Node 13 (255 counties) includes counties with median household income higher than $10,557, Black population lower than 2%, a percentage of people with less than graduate education above 43.7%, a percentage of people affiliated to a group higher than 30.7% and Veterans lower than 10.9%. In this node the higher the Blacks and the median household income, the higher the voter turnout. The results also indicate that the urban population, less than graduate, university graduate and the unemployed negatively impact on turnout.
Node 14 (664 counties) has the same characteristics as node 13 but the percentage of Veterans is higher than 10.9%. In this subset people affiliated to a group as well as Veterans increase voter turnout. Household income also has a positive impact, while Blacks, people living in urban areas, people not graduated and people unemployed decrease turnout.
Node 18 (394 counties) takes in counties with median household income higher than $10,557, Black population above 2%, the percentage of urban population lower than 91.1%, people with less than graduate education lower than 3.7% and Black population lower than 15.3%. It is worth noting that in this (and the following) nodes Black population enters again as a cut, showing a second non-linearity in this variable. In this terminal group, Adherents, Veterans, the urban population and finally the median household income increase voter turnout. The results also show that Hispanics negatively affect voter turnout. Node 19 (100 counties) has the same characteristics as node 18 with the Black population higher than 15.3%. In this subset of counties, membership of a group and the median household income increase voter turnout while Hispanics and Veterans decrease participation.
Node 20 (370 counties), in addition to the first three cuts as before, has people with less than graduate education above 3.7%. In this subset, the Hispanics and population with less than graduate education and the population living in urban areas reduce turnout. Members of a group and Blacks are more likely to turn out to vote. Household income and voting age also positively impact on turnout.
Finally, node 21 (161 counties) consists of counties that after the first two cuts seen above, have a percentage of the urban population above 91.1%. In this terminal group members of a group and the population with less than graduate education reduces the voter turnout.
Overall, our findings show that throughout the subgroups the effects of the explanatory variables on voter turnout are not uniform and vary over contexts. Indeed, the magnitude and the sign of the coefficients of the variables included in the analysis partly differ from subgroup to subgroup.
In Fig. 3, shades of the same colors represent differences within similar counties, whereas different colors denote differences between groups of counties. The first way to interpret the figure (i.e., differences within similar colors) emphasizes the variables involved in the estimations and, in particular, the most important non-linearities shared by the counties (median household income and Black population). From this perspective, the grayscale counties are those with household income below $ 10,557, while the green and blue counties are those with household income above $10,557. The bifurcation generating the other two groups occurs for a 2% lower (green scale) or higher (blue scale) Black population.
The second way (i.e., dissimilarities among colors) deals with the outcome of the splitting process. Specifically, it looks at the geographical patterns of terminal nodes across the US. From this point of view, the map shows a clustering of different subgroups in several areas of the country, which in turn is a sign of polarization. The group of counties with household income below $10,557 is mainly located in the south-eastern part of the country, except for the coast, where there are richer counties with a Black population higher than 2%. The area with vertices Kansas to Michigan and from Northern California to the State of Washington includes counties with household income above $10,557 and less than 2% of Black people. A few interesting subsets of counties emerge. The counties located on the Southern coasts and in the South-East of the country belong to node 3, while there are large swathes of green (node 14) in most of the central states. Texas and Florida are divided into two groups: node 3 and node 20 (very far apart in the figure) for the Lone Star State and node 18 and node 21 for the Sunshine State. Node 3 is also recurrent in the Appalachians, South Texas and mid-West, and includes mostly rural counties with a small population, predominantly White.

Conclusions
Over the past few decades and more dramatically in recent years, the American society has become more polarized in demographic, socio-economic, cultural, and political terms. An increasingly divided landscape hints at the heterogeneity of the electorate as a crucial factor for understanding trends in voting behavior. This paper tackles this issue. We put forward a combined methodology to deal with heterogeneity in regression coefficients and spatial dependence in the analysis of turnout in the 2012 US presidential elections to characterize the behavior of voters into several geographically-located clusters. We find that turnout in a given county is associated with the turnout in the surrounding counties. As pointed by LeSage and Dominguez (2012), one of the possible sources of spillover effects might be commuting, or social involvement and social networking (Tam Cho and Rudolph 2008). Furthermore, for different groups of US counties obtained through a spatial lag regression tree procedure, some variables have different statistical significance (or lack of it), and sometimes different signs, revealing multiple regimes of voting behavior mainly driven by household income, i.e., the root node. This non-homogeneity in regression coefficients, which in the spatial lag regression tree are unbiased by spatial dependence, is obfuscated by traditional methods that extrapolate a single average relationship between the variables.
Analyzing voting mobilization and identifying spatially clustered homogeneous socioeconomic and demographic subgroups of counties has some implications for political campaigns. It facilitates their ability to target specific groups of voters, defined by their socio-demographic characteristics, in order to mobilize voters, especially in swing states or districts, which are crucial in winning elections.
The methodology we use allows only cross-section data to be analyzed. Further research may compare different election years to trace turnout behavior over time and improve campaign goals as well as policy design and implementation proposals accordingly. In addition, party behavior and different types of elections may be studied, enriching the dataset with political characteristics (such as incumbency) that have not been considered here.