Our analysis is restricted to the study area of Northern Italy (Fig. 1), which encompasses the sub-regions of Valle D’Aosta, Piemonte, Liguria, Lombardia, Emilia-Romagna, Veneto, Friuli-Venezia Giulia and Trentino-Alto Adige/Südtirol. Official territorial data on COVID-19 mortality in Italy are available at the rather aggregate regional or provincial level, corresponding to the levels 2 and 3, respectively of the European nomenclature units for territorial statistics (NUTS).Footnote 5 In addition, these official data refer to the deaths of patients tested positive for severe acute respiratory syndrome coronavirus 2 (SARS-CoV2) only and do not include (potential) patients without COVID-19 diagnosis because they were not tested and died at home or elsewhere. Hence, the officially reported deaths are likely underestimated. Because testing policies vary among regions in Italy, the induced measurement error is also non-randomly distributed among the provinces. Ciminelli and Garcia-mandicó (2020) compare the official COVID-19 fatality rates with historical death data and report that deaths were higher than official fatalities throughout the period of COVID-19 epidemic.
Working under the assumption that COVID-19 deaths are underestimated in Italy, the choice is made in this paper to use the total deaths from the official registries, accordingly, and to scale the analysis at the municipality level, the smallest administrative units, to have a more granular representation of the spatial dimension of the phenomenon. Since we are interested in excess deaths, we take the difference between the number of deaths in the period January 1—April 30, 2020, and the average number of deaths in the same period of the previous 5 years (ExDeaths) and use this metric as the dependent variable in our statistical model. Figure 2 displays the geographical distribution of the above-described data among the 4041 municipalities for which data is available.
The variable is assumed to follow a Negative Binomial distribution, a generalization of the Poisson distribution that avoids the restrictive mean–variance equality of the latter, and is modelled as follows:
$$\begin{aligned} & {\text{ExDeaths}}_{i} \sim NB\left( {\mu_{i} ,\theta } \right) \\ & \log (\mu_{i} ) = \alpha + \beta PM_{i} + \delta^{\prime }X_{i} + \varepsilon_{i} \\ \end{aligned}$$
(1)
where \(\theta\) is the overdispersion parameter to be estimated and \(\mu_{i}\) is the municipality-specific expectation conditional on the value of the covariates. Among the covariates, PM is the concentration of fine particulate matter in municipality i and \(\beta\) is the associated parameter, which we expect positive and statistically different from zero; X is a vector of control variables that adjusts for the potential confounding effects and includes the (log of) total population as the offset while \(\varepsilon\) is a normally-distributed error term.
Our main source of PM2.5 data is the European Environmental Agency’s (EEA) air monitoring database, which is provided to EEA by the Institute for Environmental Protection and Research (ISPRA). ISPRA conducts ground-level air measurements of PM2.5 air concentrations (µg/m3) collected at 268 monitoring sites throughout Italy. Specifically, we use the EEA’s E1a and E2a datasets, which are primary validated assessment data and primary up-to-date assessment data reported by the European Member States, respectively. Although the measurements come both in hourly and daily averaging formats, we work with daily values and use them to obtain yearly aggregates for the years 2015, 2016, 2017, 2018, and 2019. However, because model (1) does not include a time component, we further compute a six-year averaging time to obtain a metric of long-term (chronic) PM2.5 concentration levels throughout different spatial units of Northern Italy. The number of 6 years for the reference period is sufficiently long to account for long-term exposure while being not too long to be affected by the mobility of people among municipalities, and it is in line with existing literature assessing long-term effects of PM exposure (Yorifuji et al. 2019). Since the air monitoring stations provide only partial spatial coverage for municipality-level PM2.5 concentration data, we impute missing observations using a spatial interpolation model. Specifically, we fill in the gaps using a mean stationary Ordinary Kriging (see Bivand et al. 2013, p 209) defined through an exponential covariance function with nugget, partial sill and range parameters estimated through (restricted) maximum likelihood methods. Figure 2 displays the resulting PM2.5 concentration data.Footnote 6
Comparing Figs. 2 and 3, it is possible to visually appreciate a spatial coincidence between higher levels of excess mortality and higher levels of PM2.5, in particular in the Lombardia region which notably is the region with both the highest particulate concentration and the highest number of excess mortality.
The hypothesis that PM2.5 concentration affected COVID deaths, that is \(\hat{\beta } > 0\), is tested among several possible specifications. In model (2) we include regional effects \(\left( {\lambda_{j} } \right)\). These effects are expected to capture the aspects related to the management of the outbreak, which may have systematically influenced COVID-19 mortality and that are common to all the municipalities in the same region. Italy has a national health system that ensures equal access to healthcare to all citizens. The system is managed by regions at the local level, and, in the specific case of this pandemic, regions were responsible for defining the testing and contact-tracing protocols and implementing the necessary measures to contain the outbreak, among which the measure to protect healthcare workers. In model (3), we include LLS-specific effects \(\left( {e_{k} } \right)\). LLS are spatial clusters of contiguous municipalities related by commuting flows that share a common specialization in a specific sector of manufacturing production and correspond to the conceptualization of Marshallian districts (Becattini 2002). The number of LLS clusters per-region and the total number of municipalities belonging to clusters are reported in Table 1, along with the minimum, maximum, and average cluster size.
Table 1 Number of LLS spatial clusters in each region The use of LLS captures the interlinkages within neighbouring municipalities that may have favoured the geographical spreading of coronavirus around specific hotspots. Mortality data are then expected to vary among municipalities in different LLS, but differences are expected to be non-systematic in this case. In model (4) we include both the regional fixed effects and the LLS random effects.
$$\begin{aligned} & ExDeaths_{i} \sim NB\left( {\mu_{ij} ,\theta } \right) \\ & \log (\mu_{ij} ) = \alpha + \beta PM_{ij} + \delta^{\prime }X_{ij} + \lambda_{j} + \varepsilon_{ij} \\ \end{aligned}$$
(2)
$$\begin{aligned} & ExDeaths_{i} \sim NB\left( {\mu_{i} ,\theta } \right) \\ & \log (\mu_{ik} ) = \alpha + \beta PM_{ik} + \delta^{\prime }X_{ik} + u_{ik} \\ & u_{ik} = \varepsilon_{ik} + e_{k} \\ \end{aligned}$$
(3)
$$\begin{aligned} & ExDeaths_{i} \sim NB\left( {\mu_{ijk} ,\theta } \right) \\ & \log (\mu_{ijk} ) = \alpha + \beta PM_{ijk} + \delta^{\prime }X_{ijk} + \lambda_{j} + u_{ijk} \\ & u_{ijk} = \varepsilon_{ijk} + e_{k} \\ \end{aligned}$$
(4)
Control variables to be included in the model were chosen to avoid any potential spatial confounding effect and considering as well the emerging literature on the impact of PM on COVID-19 related deaths (Cole et al. 2020; Wu et al. 2020). The population density and per-capita income account for urbanisation level. The most densely populated and wealthy municipalities are among the most polluted due to the spatial concentration of manufacturing and service activities but are also the places where the contagion could have been easier, with a potential impact on mortality. In addition to the density of population, the shares of municipality area occupied by industrial sites and the average size of manufacturing firms are included in the regression because they are related to pollutant concentration and possibly to mortality. National measures to stop the spreading of the viral infection (lockdown) involved the service sector to the largest extent while many manufacturing activities, being considered necessary, were left open and, in the absence of social distance and individual protection measures, the geographical concentration of these activities in a municipality with their complex logistics and transport interconnections, and the size of plants, may have influenced mortality. Average temperature, for which an association with COVID-19 deaths has also been found (Ma et al. 2020), is also included in the regression.Footnote 7 Moreover, COVID-19 incidence has proven to be higher among men than women and people aged 65 or more. Hence these two variables are considered in the model, even though these aspects are not necessarily connected with the average PM2.5 exposure in a municipality. Underlying socioeconomic conditions can also play a role in COVID-19 related mortality (Goutte et al. 2020). Brandt et al. (2020) and Mukherji (n.d.) have shown that, in the US, COVID-19 is more threatening for ethnic minorities, and we believe that the share of migrants, identified as non-EU citizens, can control for this aspect influencing the observed excess mortality. On the other hand, Mukherji (2020) and Goutte et al. (2020) also find that places with a higher share of the population with a low level of education have higher deaths. In our paper, given the lack of updated data on education at the municipal level, we proxy it with the percentage of university students on the total population. The distance from the closest airport is a proxy for the functional and relational linkage between a municipality and a place of highly frequent national and international connections and potential sources of coronavirus spreading. Finally, we consider the number of hospital beds as a proxy for the supply of health services to account for the fact that many people died at home without being diagnosed for coronavirus due to the shortage of beds in public structures. The full details of the variables in the model, including sources and summary statistics, are presented in Table 2.
Table 2 Description of model variables and summary sample statistics Having accounted for the confounding effect due to the omission of relevant information from the empirical specification, we exclude any other potential source of endogeneity considered in similar papers. In particular, we exclude endogeneity due to measurement error in the outcome variable and the main independent variable. Concerning the outcome variable, the relationship between deaths and cases with fine PM could be spurious because more cases could be registered, and more individuals tested in highly polluted areas as people there are more likely to show COVID-19 symptoms due to the chronic inflammation induced by PM. The high toll of deaths of people diagnosed with COVID-19 would be a natural consequence of that. In contrast, the number of deaths in excess, used in this paper, is not affected by testing problems since it considers all the potential COVID-19 deaths. Concerning the PM variable, measurement errors are likely to occur when using satellite data or modelled data. We preferred to use PM2.5 levels observed from monitoring stations to avoid such a measurement error. Some caution is needed in the spatial interpolation because the method chosen to fill the missing data may underestimate the value in locations farther from the monitoring stations. With this concern in mind, we test the robustness of our results using PM2.5 data obtained from different interpolation approaches.