Spatial links in the analysis of voter turnout in European Parliamentary elections

This paper investigates the turnout in European Parliamentary elections by analyzing the four EP elections from 1999 to 2014 in 155 regions in EU-12. We use a number of econometric techniques that account for spatial dependence, also dealing with heteroskedasticity and endogeneity. The results confirm the role of spatial spillovers and indicate a significant role for GDP per capita, unemployment, age, institutional and electoral variables in influencing turnout. Finally, we disentangle the direct and indirect effects of the regional variable in affecting turnout.


Introduction
Political participation is the lifeforce of democracy. The existing research on this topic has established some robust patterns. A large body of theoretical and empirical literature has identified a set of individual-as well as aggregate-level variables explaining variations in electoral turnout within a single country and between countries (see Smets and van Ham 2013;Cancela and Geys 2016).
In recent years, scholars have increasingly emphasized the role of geographical influences on voting. They highlight that voting behavior is not fully explained by individual or aggregate characteristics. Rather it is the result of a complex and multidimensional process that occurs in space and crucially reflects-and is mediated by-the social and geographical environment where individuals are located and interact (Agnew 1987;Pattie and Johnston 2000). These location-specific factors are the outcome of a common background and shared values and may influence voting behavior through a variety of possible mechanisms like social networks and political discussion (Books and Prysby 1991;Huckfeldt and Sprague 1995) or shared economic experiences (Pattie and Johnston 1995). The general contribution of this literature is that localized, rather than general, analyses of what affect levels of electoral participation may be more successful in accounting for variations in turnout. Within this field, an emerging stream of research incorporates spatial econometrics techniques to uncover the influence of neighbors on voter turnout. Most of this research focuses on the US (Tam Cho and Rudolph 2008;Wing and Walker 2010;Foley and Demšar 2013;Lacombe et al. 2014). Other studies investigate the Italian contest (Shin and Agnew 2011), while Mansley and Demšar (2015) focus on London mayoral elections.
This paper contributes to this literature exploring through spatial modeling the role of spillover effects from one region to another in explaining variations in turnout in European Parliamentary (henceforth EP) elections. We focus on the four EP elections between 1999 and 2014 in the EU-12, for 155 regions, by integrating traditional predictors with regional spatial and contextual conditions that enable and constrain voter decisions. To the best of our knowledge, this issue has not yet been considered in relation to the EP elections. Empirical studies of the determinants of voter turnout in EP elections are mainly cross-national analyses based on aggregate data (Mattila 2003) or a mix of individual and aggregate data (Holbot 2012). Recently, Fiorino et al. (2019) have explored sub-national variations in participation in EP elections using a multilevel modeling approach that allows them to analyze both regional and national data. Nevertheless, despite the large set of covariates these previous studies consider, the methodological approach they use does not allow to account for potential unobservable geographical or spatial factors that may affect electoral participation in EP elections. Figure 1 shows the quartiles of turnout in the 1999, 2004, 2009, and 2014 elections across 155 European regions respectively. The figure significantly denotes changes in the countries' and regions' level of turnout during the four considered EP elections. It also shows that there are well-defined clusters of regions characterized by a high and low turnout respectively, indicating that regions that behave similarly are usually close to each other. Furthermore, large evidence supports the existence of economic interdependence among (cross-border) neighboring European regions, as well as the presence of spillover effects across national borders (see Ertur et al. 2006). Recall that socio-economic factors are important contextual drivers of voter turnout (defined at regional level). All these arguments suggest that a more complete discussion of the determinants of voter turnout in the EP elections requires exploring spillover effects on turnout in bordering regions.
The paper is organized as follows: Sect. 2 describes the spatial features of turnout. Section 3 introduces the model specification and the data, Sect. 4 presents the results and provides a discussion. Conclusions are drawn in Sect. 5.

3
Spatial links in the analysis of voter turnout in European…

Spatial properties of turnout
To analyze space dependence, the best-known indicator is Moran's I (MI) (Moran 1950). MI relates the value of a given variable to the values of the same variable in neighboring areas, namely its spatial lag. The intuition is that socio-economic phenomena might not be isolated in space and what is happening in a certain location may be correlated to what is happening in neighboring locations. We based the calculation of this measure of spatial autocorrelation on a row standardized queen spatial weight matrix W, where islands are connected to the closest region. Queen contiguity scheme makes a region to be connected to other regions if they touch for at least one point of the border, avoiding dropping neighboring relationships for regions even from two different countries. This allows for spatial spillovers across regions belonging to the same and/or different countries. When the spatial matrix standardized by row, the MI varies between − 1 and 1. A positive coefficient points to positive spatial autocorrelation, i.e. clusters of similar values can be identified. The reverse represents regimes with negative associations, i.e. dissimilar values clustered together in a map. A value close to zero indicates a random spatial pattern.
One advantage of this statistic is that it can be visualized in a so-called Moran scatterplot, with the spatial lag of the (standardized) variable on the vertical axis and the original (standardized) variable on the horizontal axis. Thus, each point in the scatterplot represents a combination of a location's value and its corresponding values in the surrounding regions, i.e. the spatial lag. The x-and y-axes divide the scatterplot into 4 quadrants (anticlockwise from top right): in the first and third (high-high, HH, and low-low, LL, respectively) a location that exhibits a high (low) value of the variable is surrounded by locations also with a high (low) value for the variable. In the second and fourth (low-high, LH, and high-low, HL, respectively) a location with a low (high) value of the variable is surrounded by locations with a high (low) value for the variable. A concentration of points in the first and third quadrants means positive spatial dependence (nearby locations have similar values), while the concentration of points in the second and fourth quadrants reveals the presence of negative spatial dependence (i.e. nearby locations have dissimilar values).
The Moran scatterplots (Fig. 2) based on a queen row standardized spatial weight matrix show positive slopes and the concentration of points representing the regions in the first and third quadrant suggests that areas with high (low) turnout are clustered in space. In our case, at the top of the Moran scatterplot, MI is shown and is quite stable above 0.60. This means that there is spatial persistence over time, with well-defined clusters of regions characterized by high and low turnout, respectively.
Nevertheless, a closer look at the points in the Moran scatterplots reveals that their distribution is much more widespread in 1999 than in 2014, with a group of Belgian regions steadily located in the upper part of the first quadrant. 1 Despite the fairly stable value of MI, this indicates a changing pattern in EU turnout. Such a pattern, without altering the ranking of EU regions in terms of turnout, signals greater similarity over time.

Model specification, variables and methodology
The analysis includes 12 EU countries (Portugal, Spain, France, Belgium, Luxembourg, the Netherlands, UK, Germany, Italy, Greece, Austria, and Sweden) for 155 regions. 2 The data for turnout is from the European Election Database and national sources regarding four elections: 1999, 2004, 2009, and 2014. 1 3

Spatial links in the analysis of voter turnout in European…
The time dimension of the data is used via a panel structure. In addition, we assume that the turnout in EP elections is affected by several regional and national factors, including the average level of turnout in the neighboring regions, and we estimate the following regression: where Turnout is a n × 1 vector of the dependent variable, that is the number of votes on registered citizens in each region at election year t. t is the election year (1999,2004,2009,2014) and u t the i.i.d. residuals. W is the n × n Queen spatial matrix, and W t is a space-time spatial weight matrix defined as I t ⊗ W where ⊗ represents the Kronecker product and I t the identity matrix of size t. The n × 1 vector W t Turnout

Fig. 2 Moran scatterplots
defines the spatially weighted linear combination of the turnout in neighboring regions, and the coefficient ρ measures the strength of the relation between turnout in a region (the dependent variable) and the neighbors. 3 Following Gimpel et al. (2004), we estimate pooled data from the four election years, including dummy variables A t for each year (using 1999 as the baseline). Equation (1) is estimated with a standard maximum likelihood approach (ML) (Anselin 1988) and with a heteroskedastic-consistent ML (Cribari-Neto 2004). Furthermore, we perform robustness using alternative methods that account for the endogeneity of the spatial autoregressive coefficient, namely two-stage least squares (S2SLS) and spatial two-stage least squares with a heteroskedasticity and autocorrelation-consistent (HAC) estimator of the variance-covariance matrix (S2SLSHAC), as well as through generalized spatial two-stage least squares (GS2SLS) (Kelejian and Prucha 1999). Finally, we estimate the model with generalized spatial two-stage least squares accounting for heteroskedasticity in the error term (GS2SLSHET), using spatial lags of exogenous variables as instruments (Kelejian and Prucha 2010). Following Anselin and Rey (1991), the choice of the spatial lag model is performed through a Lagrange Multiplier (LM) test on OLS estimates to choose the spatial model (Anselin and Rey 1991). 4 The vector SOCEC includes several regional economic as well as socio-economic variables at the aggregate level. 5 GDP per capita (the logarithm for GDP per capita in PPS, source of data Eurostat Regional Database) 6 is the usual indicator of economic development (Shachar and Nalebuff 1999), Unemployment measures the percentage of long-term unemployed in total unemployment (source: Eurostat Regional Database) to control for labor market conditions (Verba et al. 1995;Rosenstone 1982). The vector also includes Density (the log of the number of inhabitants per km 2 , source Eurostat Regional Database for population and Cambridge Econometrics for regional areas) (Oliver 2000) and the Dependency ratio (the percentage of people aged over 65 years and youngsters between 20 and 24, source Eurostat Regio Database) (Franklin and Holbot 2011). Participation in EP elections, and more generally attitudes towards the EU, may be based on the experiences of voters with the EU, for example, through funds that they receive from Bruxelles. Objective 1 regions is a dummy variable equal to 1 if regions are below 75% of EU GDP per capita and thus receive the majority of EU Structural Funds, and 0 otherwise (Mattila 2003, Flickinger andStudlar 2007;Fiorino et al. 2019). 7 Finally, the variable Education measures the share of people aged 15-64 with upper secondary and post-secondary non-tertiary education (source: Eurostat Regio Database, levels 3 and 4). Multiple explanations have been proposed to explain the link between education and turnout. Among other effects, education impacts the vote because it enhances political interest and political knowledge, encourages the sense of civic responsibility and increases political efficacy (among others, Nie et al. 1996;Gallego 2010).
The vector POLINST includes a set of politico-institutional variables. Institutional quality captures at the regional level good governance, that is high impartiality and quality of public service delivery, along with low corruption (source of data for 2014 is Teorell et al. 2020, 2004is Crescenzi et al. 2016. A perception of bad quality of political and institutional systems may compromise citizens' satisfaction, lower generalized trust and lead many individuals to abstain from elections (Rothstein and Teorell 2008;Sundstrom and Stockemer 2015). Herfindahl gov and Herfindahl opp measure respectively the sum of the squared seat shares of all government or opposition parties. These variables are meant to capture the fractionalization of the government and the opposition, respectively. They are defined at the national level (source: Beck et al. 2001). Compulsory voting and Week vote are linked to the national electoral systems and have largely been explored in the literature (among others, Franklin 2002). While compulsory voting leads to higher turnout rates, elections on weekdays decrease electoral participation since people follow their daily routines. The sources of both these variables are the national electoral agencies. Table 1 reports the descriptive statistics. Week vote 0.1548 0.3620 0.0000 1.0000 Table 2 Linear spatial lag model results

Results and discussion
Our results are displayed in Table 2, with the OLS regression. The LM tests for autocorrelation applied to the OLS residuals clearly show that spatial dependence in the form of spatial lag is present in the residuals. The Akaike Information Criterion (AIC) performed on the spatial lag ML model and OLS confirms the choice of the former, whose AIC is equal to − 643.59 against the − 582.24 of OLS. When the spatial structure is included in the model, the value of the dependent variable in one spatial unit is affected by the independent variables in nearby units. In this case, the assumption of uncorrelated error terms as well as independent observations is violated. As a result, parameter estimates are biased and inefficient. Furthermore, as shown by the Breush Pagan test, the presence of heteroscedasticity in the OLS and spatial lag models is a source of additional inefficiency in the standard error estimation. This calls for the estimation of heteroskedastic robust models, in columns 3, 5, and 7.
However, the spatial lag models estimated using different estimators show consistent results, highlighting the robustness of our specification even when heteroskedasticity is accounted for. Although the standard errors increase, overall, the statistical significance of the parameters does not change.
The presence of a positive significant spatial autoregressive parameter ρ signals the existence of global externalities due to how the spatial multiplier is defined, i.e. as (I − Wρ) −1 = I + Wρ + W 2 ρ 2 + ··· + W N ρ N , where I is an N × N the identity matrix. Thus, turnout is determined by each region's own factors as well as those of immediate neighbors (ρW), second-order neighbors (W 2 ρ 2 ) and so forth.
Indeed, a shock in region i is transmitted to its neighbors by parameter ρ related to turnout in neighbors and, in turn, this is transmitted back to region i through , reinitiating the process until the effect becomes negligible for N tending towards infinity (LeSage and Fischer 2008).
The spatial autoregressive parameter is comprised between 0.0851 and 0.0888 corresponding, in scalar terms, to a spatial multiplier around 1.09. This implies that around 90% of turnout is explained by direct effects, i.e. the impact of a change in a variable in the same region i, while the remainder, the so-called indirect or spillover effect, is given by the effect arising from changes in a variable in the neighbors. Note that in an OLS ρ = 0 thus, if there is spatial dependence in the dependent variable, the regression coefficients are upward biased. We argue that one of the possible sources of spillover effects might be commuting (LeSage and Dominguez 2012).
We discuss the results for regional variables when we comment direct, indirect and total effects. Here we comment on the national variables. Compulsory voting has the expected positive effect on turnout, whereas Week vote has the expected negative effect since voting becomes more costly. In both cases, the coefficients are very consistent and significant at the highest level. The fractionalization of the government and opposition have opposite effects on turnout. The significance of the former is smaller than for the latter.

Spatial links in the analysis of voter turnout in European…
Direct, indirect and total effects for the regional variables are shown in Table 3 to offer a richer interpretation of our results in terms of spillover effects. Indeed, change in a single observation (region) associated with each explanatory variable may affect the region itself (direct effect) and potentially all the other regions indirectly (an indirect effect) (LeSage and Pace 2009). The positive and significant GDP per capita suggests that an improvement in economic conditions promotes the political engagement of citizens (Radcliff 1992). This is valid for a change in the region of study, but also if a shock occurs in the neighborhood because of social and economic interconnections between regions. 8 Dissatisfaction in labor market conditions is a salient issue to be relevant for the decision of whether to vote or not. The results show that Unemployment is positively related to voting in EP elections, which provides evidence in favor of the mobilization approach (Rosenstone 1982).
The Dependency ratio is also positively correlated with the turnout: the oldest segment of the population is more likely to vote than the youngest. This confirms that the age structure of the population helps to explain variations in the electoral behavior of voters.
Objective 1 regions have significantly negative coefficients. Although marginalized areas receive funds from the EU, they appear less interested in European Table 3 Estimates of direct, indirect and total effects based on the estimates of ML models in Table 2 Significant at 1%, **significant at 5%, ***significant at 10%. In brackets, direct, indirect, and total effects and related t-statistics computed using 1000 draws from the estimated variance-covariance matrix of parameters (the spatial multiplier (I− ρW) −1 is calculated every draw politics than in other areas. The responsiveness and effectiveness of institutions at the regional level enhances the willingness of people to engage in voting.
There is no significant relationship between population density and turnout. The often-found results that in relatively low-density areas relationships are closer and lead to 'social pressure' to vote (Overbye 1995) are possibly undone by the perceived distance of the EP. Also Education does not impact voter turnout, a result that is not new among previous studies on European democracies (among others, Lijphart 1997;Norris 2002;Teorell, et al. 2007), but that is at odds with several works on political behavior (e.g., Burden 2009; Gallego 2010).

Conclusions
This paper analyzes the geographical features of voter turnout in the four EP elections held from 1999 to 2014 in 155 European regions. We find some spillover effects that are presumably due to commuting, cultural traits, and values that neighboring communities share. Our results point out how changes in some determinants in a region influence voter turnout in the same region (direct effects) as well as turnout in the neighboring ones (indirect effects), indicating that there may be additional benefits to turnout that spill-over the border of a region. Indirect effects are smaller than direct effects, but failing to account for them results in a partial understanding of the interrelation between dependent variables and turnout.
Funding Open Access funding provided by Università degli Studi di Verona.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.