Keywords

1 Introduction

Satellite data, due to their capacity to guarantee constant and increasingly detailed observation, have long been permanently used to monitor and study spatial patterns and the spread of epidemics, in particular with respect to those variables that are believed to determine or favor the emergence and development of diseases, such as environmental conditions, distribution of causative agents and socio-demographic characteristics of human populations [8, 18, 20, 43]. In particular, the combination of derived field data, statistical variables and satellite data is a fundamental element for building epidemiological risk maps and predictive models [1, 25]. Moreover, the current pandemic emergency has highlighted the need to develop and implement such tools not only in the so-called developing countries, as has prevailed so far, but also in the more advanced countries, which have found themselves completely fragile with respect to such calamities and unprepared in the prevention and response measures, although such a danger was somewhat predictable [24, 42].

In the context of epidemic cartography, this contribution intends to deal in operational terms with an issue not always adequately considered in the use of satellite data for the creation of maps and spatial models of epidemics, i.e. the preliminary verification of the level of spatial correlation between remote sensing environmental variables and epidemics [19, 37]. More precisely, we intend to evaluate the contribution of exposure to the pollutant nitrogen dioxide (NO2) on the spatial spread of the virus and on the severity of the current COVID-19 infection, in order to confirm the operational validity of the use of this environmental variable in the related epidemic cartography. The improvement of data quality, not only in technical terms but also of scientific relevance and robustness, is in fact one of the most important aspects for health information technology that can make further significant and useful progress in monitoring and managing epidemics [20, 39].

As it is known, Northern Italy was the area most affected by the first wave of the COVID-19 epidemic. The great speed and intensity with which COVID-19 disease has spread to these regions has prompted the hypothesis, in some preliminary studies, that high levels of pollution may play a role in viral transmission and in determining the severity of the infection [4, 6, 28, 35, 38, 40]. In fact, Northern Italy is considered as one of the most heavily polluted area in Europe in terms of smog and air pollution [2, 26, 33] because it is characterized by a high concentration of densely populated urban areas, as well as by a strong presence of industrial activities. In addition, the particular closed geomorphological conformation of Po Valley prevents pollutants re-circulation and release with their consequent stagnation due to the low ventilation [16].

2 Materials and Methods

We worked on data from the Northwest regions and the first wave of the epidemic.

2.1 Environmental Data

Pollution data on average concentrations of nitrogen dioxide (NO2) expressed in µmol/m2 were obtained using information provided by the space satellite Sentinel 5 Precursor (S5P), managed by the European Space Agency (ESA) and the European Commission under the Copernicus program [9].

The periods analyzed refer to one baseline period identified before the spread of the epidemic in Italy (February 1 - 24) and to the following weeks (February 24 - March 8, March 8 - 22, March 22 - April 5 and April 5 - 19).

For this study, it was considered the tropospheric vertical column of NO2 reported by ESA Sentinel-5P and made available through high resolution offline (OFFL) image processing of nitrogen dioxide concentrations, obtainable approximately 5 days after detection time [10]. The satellite data therefore comes from the Sentinel-5P OFFL NO2 dataset [15] of the Google Earth Engine API platform [17] through the use of Area of Interest (AOI) tools and a simple Python programming code. We obtained a single satellite image defined by the mean of the NO2 concentrations expressed in µmol/m2, for each of the periods considered. The satellite images were downloaded in raster (Geotiff) format, georeferenced according to the World Geodetic System (WGS-1984) and then a population-weighted average was made for the year 2020 of the NO2 values for each individual provinces and regions through the QGIS software. The population data used were obtained from the Gridded Population of the World - Fourth Version (GPWv4) dataset provided by the Center for International Earth Science Information Network (CIESIN) which models the distribution of the global human population consistent with national censuses and population registers, for the years 2000, 2005, 2010, 2015 and 2020 on grid cells of about 1 km [3]. In addition, we retrieved vertical air flows (omega) at 850 mb (about 1,5 km above sea level) that define the atmospheric capacity to disperse the gas, in order to obtain a better understanding of NO2 concentrations during the period of the event considered. In regions where positive omega is observed, the atmosphere forces the polluted NO2 to remain close to the surface, resulting in increased exposure to the risk factor for the population. On the contrary, in regions with negative omega, atmospheric conditions allow the dispersion of the gas further away and at higher altitudes. Therefore, in these regions there is a lower exposure of the population to air pollution and associated health risks [30]. Data were provided by the NOAA/OAR/ESRL PSD, Boulder, Colorado, USA [29].

2.2 Epidemiological Data

The trend data on the number of total positive cases of SARS-CoV-2 infection at regional and provincial level, corresponding respectively to levels 2 and 3 of the Nomenclature of territorial units for statistics (NUTS), were available on the website of the Civil Protection Department [7]. Considering the number of positive cases of COVID-19, prevalence rates per 100.000 inhabitants were obtained using the most up-to-date population data on January 1, 2019, available from the Italian National Institute of Statistic [22]. Prevalence rates were calculated as the ratio of the number of SARS-CoV-2 positive subjects to the total number of individuals in the population during the lock down (as of March 8, March 22, April 5 and April 19 2020).

Finally, in this research an analysis of excess mortality (in percentage values) was carried out, in order to indirectly evaluate the effect of COVID-19 epidemic on total deaths observed during the study period.

The excess mortality data defines the percentage change in deaths at the provincial level recorded in 2020 compared to the average of the previous five years (2015–2019). Clearly, positive values indicate an increase in deaths compared to the previous period considered. Deaths data were available from the European Statistical Office dataset [11], aggregated weekly at provincial level (NUTS 3).

For this analysis, it was chosen the period between the 10th week and the 16th week (March 2 – April 19 2020) because the first COVID-19 deaths in this regions occurred from March 3 to 5 respectively for Liguria and Piemonte, while the Valle d’Aosta has encountered the first deaths only later on March 11 [7].

2.3 Statistical Analysis

First, an exploratory and descriptive analysis of the dataset used was carried out, which made it possible to investigate the distribution of each variable and the presence of any anomalous values identifiable as outliers.

Second, the relationships between the average levels of NO2 prior to the onset of the Italian epidemic (February 1–24) and the prevalence rates of SARS-CoV-2 infection in the periods March 8 and 22 March, April 5 and 19, 2020 were examined using Spearman’s correlation coefficient (ρ). This non-parametric index calculates the relation based not on the values of the two variables but on their ordinal position (ranks). This allows to obtain an index value much less affected by outliers than the Pearson’s linear coefficient. Similar to the latter, the Spearman’s coefficient provides values between −1 and +1; the closer the index is to zero, the weaker the relationship will be, the closer it gets to −1 or +1 the stronger the relationship will be negative or positive.

Spearman’s correlation coefficient was also useful for investigating the association between the average levels of NO2 before February 24 and excess mortality data calculated for the period March 2 - April 19, 2020.

Once the functional relationships between the variables under consideration have been established, subsequent exploratory analyses have been carried out with Poisson regression model. Since evidence of overdispersion was observed, we applied quasi-Poisson multivariate models. These ones are a generalization of the Poisson regression and they allow to take into account the overdispersion of the data, adjusting the variance according to a specific dispersion parameter [36].

Within the models, some possible confounding factors were considered such as the percentage of the population over the age of 65 and the ratio of females to males, as the incidence of COVID-19 has proven to be higher among men and people 65 years of age or older [21]. In addition, another possible confounding factor taken into account was population density (population/km2); in fact, one would expect the most densely populated provinces to be among the most polluted, due to the social and economic spatial concentration, but also the places where the contagion could have spread more easily with a potential greater impact on the exposed population. These data used were available from the Italian National Institute of Statistic (ISTAT), updated on January 1, 2019 [22].

The estimated coefficients, obtained from quasi-Poisson multivariate regression models, define the size of the variation in the dependent variable (prevalence rates or excess mortality) for a unit increase of the independent variable (defined as a 10 µmol/m2 increase in the average concentration of NO2 before February 24).

The data of each variable was collected and organized in table format through Microsoft Excel program and then processed in statistical analysis using RStudio software.

3 Results

3.1 Analysis of Tropospheric NO2 Concentrations

Figure 1 shows the geographical distribution at provincial level of the average concentrations of NO2 tropospheric in µmol/m2 weighted on the population for the five periods analyzed, corresponding to before and after the spread of the Italian epidemic.

The average concentrations of NO2 had high values in the first period before the outbreak (February 1 to 24). They were particularly high in the Metropolitan City of Torino (119 µmol/m2) and in the province of Novara (118 µmol/m2). While, in the following weeks during the spread of the epidemic, there was a drastic reduction in concentrations of polluted, less than 90 µmol/m2, in all the provinces analyzed. This reduction is attributable to the containment measures implemented by the government against the spread of COVID-19 disease, which led to a consequent sharp reduction in transportation-related emissions, as well as the decrease in industrial activities and electricity production.

Fig. 1.
figure 1

Study area of North-Western Italy showing the average concentrations of tropospheric NO2 (µmol/m2) weighted on the population for the five periods considered in the analysis.

Key to abbreviations/provinces:

AL

Alessandria

CN

Cuneo

NO

Novara

TO

Torino

AO

Aosta

GE

Genova

SP

La Spezia

VB

Verbano C.O

AT

Asti

IM

Imperia

SV

Savona

VC

Vercelli

BI

Biella

      

These concentrations of nitrogen dioxide, for the whole event considered, were also accompanied by vertical downward air flows (positive omega between 0 and 0,02 Pa/s) which prevented the dispersion of the pollutant and increased exposure and risk factors for the population.

3.2 Relationship Between NO2 Pollution and Prevalence Rates

Table 1 shows the total number of SARS-CoV-2 positive cases and prevalence rates (per 100.000) calculated over the four time periods considered in the study, i.e. the one corresponding to the establishment of the total block (March 8) and the following weeks (March 22, April 5 and 19). In this table are also reported population data on January 1, 2019 with the values of the confounding factors used in the multivariate analysis (females/males, % population over 65 years old and population density).

Table 1. Total number of positive cases from SARS-CoV-2 and prevalence rates (per 100.000) on March 8, March 22, April 5 and April 19 and population data as of January 1, 2019.

The preliminary exploratory analysis of prevalence rates showed very high values that were numerically distant from the rest of the data collected, identifiable as outliers and corresponding to the Valle d’Aosta region. Therefore, the latter was excluded from the subsequent correlation analysis.

The variables have a positive monotone relationship for the periods March 22, April 5, and April 19, as evidenced by the regression lines in the scatter plots in Fig. 2.

Fig. 2.
figure 2

Scatterplot of correlations between mean NO2 levels before February 24 and prevalence rates (TP).

Sperman’s coefficients showed positive correlations between NO2 concentrations before February 24 and prevalence rates for the periods of March 22, April 5 and 19 (ρ = 0,65, p-value < 0,05; ρ = 0,21 and ρ = 0,40, p-value > 0,05, respectively). While a weak negative correlation for the period March 8 (ρ = −0,07, p-value > 0,05) may probably depend on the slowdown in cumulative positive cases of SARS-CoV-2 infection in the studied territories, until that day.

The results of estimates prevalence rate ratio of quasi-Poisson regression models are summarized on a logarithmic scale in the following Table 2 together with the corresponding standard error (se) for the four different periods considered (March 8 and 22, April 5 and 19).

An increase of 10 units in the concentration of NO2 in µmol/m2 is associated with an increase between 9.5% and 22% (95% CI: -2.6 ÷ 55) on the prevalence rates in the territories analyzed during the first wave of COVID-19.

Table 2. Estimates of prevalence rate ratio and the corresponding standard error (se) of quasi-Poisson regression models over the four periods considered.

3.3 Relationship Between NO2 Pollution and Excess of Mortality

The analysis of excess mortality for the period March 2 – April 19, 2020 is shown spatially in graphic form in the map in Fig. 3, for all the provinces considered. The map was made taking into account the average (µ) and the standard deviation (σ). Therefore the classes identified are broken down according to the range defined by these two statistical values: lower (x < µ − σ), low (µ − σ ≤ x < µ), high (µ ≤ x < µ + σ) and higher (x ≥ µ + σ).

The excess mortality is evident in all three regions with a significant increase in deaths in the provinces of Alessandria, Vercelli and Biella, respectively with 103% for the first two and 101% for the third one. The least affected provinces appear to be Cuneo and Savona, despite an increase in mortality between 43% and 47%.

Even there, the statistical analysis returned a positive correlation between pollution from NO2 before February 24 and data on excess mortality for the period March 2 - April 19, 2020 (ρ = 0,44, p-value > 0,05), as also evidenced by the regression line of the scatterplot graph in Fig. 4.

The quasi-Poisson multivariate regression model returned the rate ratio estimated (RR) that are shown in the Table 3 with the corresponding standard error (se) values. An increase of 10 units in the concentration of NO2 in µmol/m2 has an estimated association of 4,7% (95%CI: 1,8 ÷ 7,9) on excess mortality over the period March 2 to April 19, 2020.

Fig. 3.
figure 3

Excess mortality recorded in the provinces of North-West Italy

Fig. 4.
figure 4

Scatterplot of correlation between NO2 levels before February 24 and excess mortality.

Table 3. Results of estimates rate ratio and the corresponding standard error (se) of quasi-Poisson regression model for excess mortality data.

4 Conclusion

The processing of satellite information showed high levels of nitrogen dioxide in µmol/m2 in the pre-epidemic period and a consequent drastic reduction in pollution in the following weeks. In all the provinces considered, this reduction revealed an overall average of -43%, following the national containment and mitigation measures implemented by the government to deal with the spread of the SARS-CoV-2 virus.

The statistical analysis carried out in this research has allowed to obtain good evidence of the relationship between exposure to nitrogen dioxide (NO2) and the COVID-19 epidemic. The relationships turn out to be positive but not significant, as also reflected in the wide confidence intervals (95%CI) because the dataset considered has low number and the statistical analysis was carried out with data at aggregate levels that do not allow to consider all the possible confounding factors that influenced the disease epidemic. With reference to the estimates obtained from the multivariate models of quasi-Poisson regression and the confounding factors, no effect related to the relationship between females and males is observed. Whereas it is noted that provinces with a higher share of the population aged 65 and over and with a higher population density were the most affected during the epidemic, as was likely.

Results from Spearman’s correlation coefficients (ρ) and quasi-Poisson’s multivariate regression models highlighted the presence of positive relationships between NO2 pollution and the spatial spread of the virus, as well as a positive association between the same concentrations of NO2 and the severity of SARS-CoV-2 infection in 12 of the 13 provinces of North-Western Italy analyzed, excluding Valle d’Aosta. These results are consistent with the emerging literature on the subject [5, 12,13,14, 23, 27, 30, 32, 44], while biological plausibility gives greater robustness to the positive association observed between the average concentrations of NO2 and the data on excess mortality. In fact, there is clear evidence that the presence of previous diseases can contribute to a more clinically severe forms of COVID-19 and increased mortality from the disease [21, 31, 34, 41]. On the other hand, biological validity is weaker in confirming a potential positive association between polluted nitrogen dioxide and the spatial spread of the virus.

This research project finds possible elements of improvement through the validation of concentrations obtained from satellite information with those collected by ground monitoring stations; analyses carried out with other polluted such as atmospheric particulate matter (PM2,5 and PM10) or tropospheric ozone (O3) to investigate their reduction in the period corresponding to lock-down but also to assess their possible contribution to the COVID-19 epidemic; analysis at more detailed scales, referring to individual urban areas or areas defined on mobility data (e.g. local labour systems - SLL); and finally, studies carried out with individual data that consider the individual risk factors that influenced SARS-CoV-2 infection. This allows regression models to be adjusted for all potential confusing factors, so that more robust and important statistical and biological validity can be achieved than those obtained here.

In conclusion, relationships obtained in this research confirm the hypothesis of an important contribution of chronic exposure to air pollution of nitrogen dioxide on the spatial spread and lethality of the SARS-CoV-2 virus. However, there is an awareness that a correlation study at the aggregate level and at the regional and provincial scale cannot identify a real causal link between an exposure and an outcome, but it only suggests a potential association. Therefore, the present work has addressed only a small part of this complex problem and it is appropriate to proceed with further analyzes to better clarify the role of air pollution during the COVID-19 pandemic, which may be useful to activate prevention plans for future health emergencies and encourage and promote sustainable environmental policies.