1 Introduction

In December 2019, many cases of pneumonia were reported in Wuhan, China. The subsequent investigation revealed the cause to be a novel coronavirus, which was later called 2019 novel coronavirus (SARS-Cov-2) or more commonly, the COVID-19 (Barcelo 2020). The virus that originated in China then spread across the world, becoming a global pandemic and affecting millions worldwide. According to Worldometers (https://www.worldometers.info/), the death count (as of Aug 24, 2020) due to the COVID-19 has surpassed 810,000, a number that probably underestimates the actual count, due to reporting issues (Lachmann et al. 2020; Cohen 2020; Bendix 2020). Though no vaccine is available yet (primarily due to the long time required for testing), studies continue in full flow (Lawton 2020; Mamedov et al. 2020). Apart from working to develop a vaccine, many researchers have investigated other factors that might impact coronavirus fatality. There are studies to evaluate the non-pharmaceutical interventions (NPI) such as social distancing, response of relevant agencies during the disease spread (Masrur et al. 2020; Li et al. 2020; Lai et al. 2020). Some of the other factors being studied include arctic oscillation (Sanchez-Lorenzo et al. 2020), air quality and wealth (Antonietti et al. 2020; Nichol et al. 2020), particulate matter pollution (Setti et al. 2020), diabetes (Means 2020), wind speeds (Coccia 2020), and ultraviolet radiation (Carleton et al. 2020), to name a few.

One of the environmental factors that received wide attention was nitrogen dioxide (NO2) reported by Ogen (2020a). That paper found that NO2 has a strong association with a high fatality rate of COVID-19 cases. The four countries selected in that study were France, Germany, Italy and Spain. However, Chudnovsky (2020) pointed out that Ogen’s study did not consider data of other factors that determine COVID-19 fatality and suggested that population density considerations would be important for a comprehensive analysis. Meanwhile, a response published by Ogen (2020b) agreed that the work did not prove any causality between NO2, and the fatality caused by COVID-19; rather the study was meant to establish an idea and would need further exploration. An association analysis relating NO2 and population density to COVID-19 fatality has been done by at least one other study, by Travaglio et al. (2020). They found that both these variables are significantly related to COVID-19 fatality. Following up on these investigations, this study brings NO2, population density and aerosol optical depth (AOD) into the analysis, which attempts to see if adding AOD to the model, along with NO2 and population density, improves its predictive skill.

Fifty-four administrative regions from three European countries (Germany, Italy and Spain) are investigated in this study. France was not included, since the total number of positive coronavirus cases per region in France was not available in time for our purposes; only the number of hospitalized cases per region was recorded. The European Environment Agency recorded around 76,200 premature deaths due to bad air quality in Italy, 73,900 in Germany and 33,300 in Spain for the year 2016 (EEA 2019). The population in these countries is thus exposed to air pollution which gives rise to adverse health impacts and getting worsened when infected by the coronavirus. In this paper, several tools such as AOD maps and regression models were used to explore the possible relationship between NO2, population densities, AOD, and the fatality rate of COVID-19 in the selected regions. This study is divided into two main parts—(1) pre-exposure period and (2) complete exposure period. During the pre-exposure period, NO2 and AOD were extracted during a time before the coronavirus fatalities were at their peak, that is, Dec 1, 2019–Feb 29, 2020. The hypothesis here is that pre-exposure may worsen the health conditions that are critical to the chances of survival after getting infected. During the complete exposure period, the NO2 and AOD data within the same time frame as the coronavirus fatalities were also included (Mar 1, 2020–Jul 1, 2020), along with the data from the pre-exposure period. This is based on the understanding that NO2 and AOD would be impacted during COVID-19 timelines due to human efforts such as lockdowns. Figure 1 is a diagram that displays some of the factors expected to impact the transmission of the virus, including weather conditions, human activities, and population density. There are several relationships open to be explored, such as the environment’s influence on the concentrations of NO2 and AOD, air pollutant’s relationship to the fatality rate, as well as human activities’ impact on the emissions of air pollutant. This study tests the hypothesis that aerosols are an important factor in coronavirus transmission rate (Asadi et al. 2020; Guo et al. 2020), which in turn would impact the coronavirus fatalities.

Fig. 1
figure 1

A diagram to show the environmental factors in the pandemic

2 Materials and Methods

2.1 Data

2.1.1 The COVID-19 Fatality Data and Population Density Data

The COVID-19 fatality data were taken until Jul 1, 2020. For Italy, the COVID-19 fatality numbers per region were taken from the GitHub account (https://github.com/pcm-dpc/COVID-19) of the Department of Civil Protection, Italy. For Germany, data published by the Robert Koch Institute (https://www.rki.de/DE/Content/InfAZ/N/Neuartiges_Coronavirus/Situationsberichte/Gesamt.html) was used. For Spain, data maintained by the New York Times (https://www.nytimes.com/interactive/2020/world/europe/spain-coronavirus-cases.html) was used, who in turn retrieved it from the Spanish Ministry of Health. The death rate is calculated as the total number of coronavirus related deaths per region divided by the total number of coronavirus positive cases. For the regression models, the data for population density corresponding to each region was derived from the “citypopulation” program (https://www.citypopulation.de/). The “WorldPop Project Population Data” (www.worldpop.org) dataset retrieved by the Google Earth Engine platform (Gorelick et al. 2017) was used to show the population density distribution for the selected countries. The WorldPop dataset depicts the estimated number of people residing in each 1 km × 1 km grid cell.

2.1.2 NO2 Data

NO2 data were obtained from the onboard sensor, TROPOMI, on Sentinel-5 Precursor satellite version Near Real-Time (NRTI) (Veefkind et al. 2012). The data were collected and processed, from Dec 1, 2019 to Jul 1, 2020.

2.1.3 AOD data

The daily AOD data (Level 2, 1-km resolution) were obtained from MODIS Terra and Aqua using Multi-angle Implementation of Atmospheric Correction (MAIAC) (Lyapustin and Wang 2018). The green band (Optical_Depth_055) was used to extract the values and the cloudy pixels were masked for better quality. Like NO2, the data were collected and processed (e.g. filtered to remove the low-quality observations) from Dec 1, 2019 to Jul 1, 2020.

2.2 Method

To assess the significance of AOD in predicting the COVID-19 fatality rate, a linear regression model was fit to the data, followed by use of the stepAIC() function from the MASS package (Venables and Ripley 2002) in R. The stepAIC() function performs stepwise model selection using AIC (Akaike information criterion) as the criterion for selection. Along with improving the model performance, this function helps to simplify a model through feature selection. This function also penalizes the model if more variables are added (Tripathi2019) and thus, is a good technique to determine whether an unrestricted model (having NO2, Population Density and AOD as independent variables) is better than a restricted model (having only NO2 and Population Density as independent variables). This helps to evaluate the significance of the third variable; AOD in this instance. The direction of the stepwise model selection is backward.

A beta regression model (Cribari-Neto and Zeileis 2009) was also fit to the data, given that the fatality rate lay in the standard unit interval of (0,1). The beta regression model provides good flexibility, which is delivered by the assumed beta law. The beta density assumes several different shapes that depend on the combination of parameter values, including left- and right-skewed, or the flat shape of uniform density. To assess the performance of the nested beta regression models (restricted model without AOD and unrestricted model including AOD), Wald test and Likelihood Ratio test were performed from the “lmtest” package (Zeileis and Hothorn 2002) in R. The Wald test statistic is given by,

$$W_{\text{T}} = I_{\text{n}} \left( {\hat{\theta }} \right) \left[ {\hat{\theta } - \theta_{0} } \right]^{2} ,$$

where \(\hat{\theta }\) is the maximum likelihood estimator (MLE) and In\((\hat{\theta })\) is the expected Fisher information (evaluated at the MLE) (Stephanie 2016a). The likelihood ratio test statistic is given by,

$${\text{LRT}} = \log_{\text{e}} \left( {\frac{{L_{\text{s}} \left( {\hat{\theta }} \right)}}{{L_{\text{g}} \left( {\hat{\theta }} \right)}}} \right),$$

where Ls(\(\hat{\theta }\)) and Lg(\(\hat{\theta }\)) are log-likelihood functions for the two models (Stephanie 2016b). These are statistical tests to evaluate the significance of variables and compare the goodness of fit between two nested (generalized) linear models.

3 Results

3.1 Distribution of Population in the Three Countries

The boxplots shown in Fig. 2 display the distributions of population density of Italy, Spain and Germany, with the y-axis showing the population per 1 km2 pixel. The histogram shares the same y axis with boxplot, with its x axis representing the number of pixels (equal to the areas in km2) having the same population values. Among these countries, Germany and Italy have similar median population densities (~ 30 people per 1 km2), which are much higher than Spain (< 10 people per 1 km2). Italy has a wider interquartile range (the IQR between 25th and 75th percentiles is shown between the upper and lower boundary of the box) than Germany and Spain have, indicating a more uniform distribution of population density in Italy. The cutoff outlier values defined for Germany, Italy and Spain are approximately 1000, 10,000 and 400 people per 1 km2, respectively. However, both Italy and Spain have a wider range of entire population density (from near 1 to over 10,000 people per 1 km2) than Germany. Therefore, it is concluded that: (1) Germany is mainly constituted with medium populated areas (10–1000 people per 1 km2), with limited areas of both densely (> 1000 people per 1 km2) and sparsely (near 1 person per 1 km2) populated areas, yet having no overcrowded area (> 10,000 people per 1 km2); (2) Italy has a more evenly distributed population, having both sparsely (near 1 person per 1 km2) and densely (> 1000 people per 1 km2) populated areas, as well as some overcrowded areas (> 10,000 people per 1 km2); (3) Spain has a skewed distribution with many areas having a limited population (near 1 person per 1 km2), while also having overcrowded areas (> 10,000 people per 1 km2).

Fig. 2
figure 2

Boxplot and histograms depicting the population densities of Italy, Spain and Germany (pixels of population less than 1 person per km2 are not included)

3.2 Spatial Distributions of AOD during Pre-Exposure Period

Figure 3 displays spatial distributions of AOD in the atmosphere for Italy, Germany and Spain. The time frame is from Dec 1, 2019 to Feb 29, 2020, which is the pre-exposure period.

Fig. 3
figure 3

Spatial Distributions of AOD extracted from MODIS for all regions in aa Italy; b Germany; c Spain; for the pre-exposure period (Dec 1, 2019–Feb 29, 2020). The scale factor is 0.001. The actual AOD values displayed in the figure are in the range of 0.04–0.16. For Spain (c), the Canary Islands have been moved to fit in the figure

By observing the maps alone, we can see Germany has higher values of AOD compared to Italy and Spain. Northern Italy, Southern Spain as well as the Canary and Balearic Islands of Spain all have relatively high AOD values.

3.3 Regression Analysis During Pre-Exposure Period for NO2 and AOD

For the linear and beta regression models, data from 54 administrative regions of Italy (20 regions), Germany (16 regions) and Spain (18 regions) were used. Individual countries, and countries coupled with each other, were also evaluated to compare the results. For each combination, two models were generated: one, with NO2 and Population Density as independent variables, and another with NO2, Population Density, and AOD as independent variables. For a specific combination, a decrease in the AIC value or an increase in the adjusted R2 value determines the significance of the added variable (AOD). The stepAIC() function also helps in identifying the most important variables with the best (lowest) AIC value. Table 1 compares AIC, adjusted R2, and the variables which are considered most important by the stepAIC() function after the stepwise model selection.

Table 1 Results of linear regression models (pre-exposure)

From the results, we can observe that AOD is a significant variable in 3 of the 7 models while in the others it is not. Some of the models have low adjusted R2 values, but for the models where AOD is found to be a significant predictor, the adjusted R2 values are relatively high (> 0.4). In those cases, there is an increase in the adjusted R2 value when compared to its restricted (NO2 + PD) counterpart. This increase in adjusted R2 value for the unrestricted model (NO2 + PD + AOD) suggests that the addition of AOD makes a significant contribution to that model. When Italy, Germany and Spain are included together (Italy + Germany + Spain), population density appears to be the most significant variable. The stepAIC() function dropped NO2 and AOD from the model as it did not consider them to be significant. For Germany alone, the models performed poorly, with neither of the 3 independent variables considered to be more significant than the intercept.

Table 2 compares the AIC values of the beta regression models by including and excluding AOD, along with NO2 and population density. It also shows the results of the Likelihood Ratio (LR) test and the Wald test performed on the two models to evaluate their performance. The significance level was set at 0.1. A p value of less than 0.1 indicates that the unrestricted model (including AOD) is statistically better than the restricted one.

Table 2 Results of beta regression models (pre-exposure)

Again, the AOD appears to have some significance in three models. The AIC values improve, and the p-value suggests a significance too for Italy, Spain and Spain + Germany models. Similar tests were performed by excluding NO2 and population density. When investigating population density, 4 out of 7 model combinations had a p value less than 0.1 indicating its statistical significance while NO2 displayed significance in 3 out of 7 models, as did AOD. For pre-exposure, the results indicate a greater significance for population density in the COVID-19 fatality rate than when the significance of AOD and NO2 was investigated in these countries. The results of these investigations are attached in the Supplementary Materials section (Tables S1, S2, S3, S4).

3.4 Spatial Distributions of AOD during Complete Exposure Period

Studies have shown that concentrations of both NO2 and AOD changed dramatically during the peak of the pandemic in countries like China and India. During this period, NO2 had a consistent decline across countries but AOD was impacted differently in different countries. China experienced an increase in AOD during the lockdown period while India recorded a decrease (Ghosh et al. 2020; Nichol et al. 2020; Ranjan et al. 2020). In view of this, NO2 and AOD data from Mar 1, 2020 to Jul 1, 2020 were also included in our analysis. This is the period when the coronavirus case-count had reached its peak. The AOD distributions during pre-exposure were displayed in Fig. 3. Figure 4 displays the spatial distributions of AOD in the atmosphere for the remainder period, which is Mar 1, 2020–Jul 1, 2020 in Italy, Germany and Spain.

Fig. 4
figure 4

Spatial Distributions of AOD extracted from MODIS for all regions in a Italy; b Germany; c Spain; from Mar 1, 2020 to Jul 1, 2020. The scale factor is 0.001. The actual AOD values displayed in the figure are in the range of 0.04–0.16. For Spain (c), the Canary Islands have been moved to fit in the figure

There is a stark contrast between the AOD maps in Fig. 4 and the corresponding maps for the pre-exposure period (Fig. 3), especially over Italy and Spain, where the difference between the two periods appear to be larger. In Germany also there appears to be a slight increase in the AOD values from the pre-exposure period. This behavior appears similar to what was observed in China during the period when coronavirus cases were at their peak (Nichol et al. 2020).

3.5 Regression Analysis During Complete Exposure for NO2 and AOD

NO2 and AOD data are divided into two periods and included in the regression analysis. Data from Mar 1, 2020 to Jul 1, 2020 (NO2_2, AOD_2) is also included along with data from the pre-exposure period Dec 1, 2019–Feb 29, 2020 (NO2_1, AOD_1). Table 3 displays the AIC, adjusted R2 values and the significant variables (from stepAIC()) for the linear regression models.

Table 3 Results of linear regression models (complete exposure)

Multiple models selected AOD to be the significant variable and this is asserted in the increase of adjusted R2 values for those models. AOD showed its significance in 4 out of 7 model combinations. In Italy and Italy + Germany, the higher values of AOD during the current period (AOD_2) were considered significant while for Spain and Spain + Germany, the lower values of AOD in the previous period (AOD_1) was considered significant in determining the coronavirus fatality rate. Table 4 displays the AIC results from the beta regression models along with the p value from LR and the Wald tests for the same period as Table 3. The significance level was kept at 0.1. Thus, a p value less than 0.1 indicates that the unrestricted model (including AOD) performs statistically better than the restricted model.

Table 4 Results of beta regression models (complete exposure)

Complementing the linear regression results, some beta regression results also display stronger significance when AOD is added to a model. The p values from 4 out of 7 model combinations show significant contributions from AOD. Even the lower AIC values for those models suggest the same.

4 Discussion

The regression models showed results regarding the significance of AOD in statistically explaining COVID-19 fatality rates, along with NO2 and population density. The statistical tests gave understandably mixed results in determining the significance of AOD, as environmental factors alone would not be expected to determine the fatality rate of a virus. The results from the complete exposure period (Dec 1, 2019–Jul 1, 2020) displayed a stronger significance of AOD than the results from pre-exposure (Dec 1, 2019–Feb 29, 2020). From the complete exposure period, 4 out of 7 models selected AOD as a significant variable with a subsequent increase in the adjusted R2 values too, indicating a better model when AOD was added as an independent variable to NO2 and population density. The AOD maps from the period (Mar 1, 2020–Jul 1, 2020) in Fig. 4 clearly show higher values of AOD than the maps from the pre-exposure period (Fig. 3). The results from the regression models during the complete exposure period are closely aligned with an article by Zhang et al. (2020) that discussed the indoor and outdoor airborne transmission of the coronavirus. Their study utilized space borne AOD measurements and was conducted for China, Italy and the United States of America (USA). It concluded that airborne transmissions do play a very important role in the spread of the virus. Though the AOD maps in Figs. 3 and 4 show that Germany has a decent prevalence of AOD, the models did not pick AOD as a significant variable there. In fact, the models did not pick NO2 and population density either when the regions in Germany alone were evaluated. It is known that these variables (NO2, Population Density and AOD) alone would not fully determine the fatality rate of the coronavirus. Other important factors work in synergy to increase the potency of the coronavirus. In some regions, the synergy might have been achieved due to these other factors, and thus led to the results showing the significance of AOD in models involving Italy and Spain. During the complete exposure analysis, the 4 models out of 7 selected AOD to be a significant variable. This would be the most important result from this study; namely, that when the AOD values increased, the models displayed an increased significance of AOD in determining the coronavirus fatality rate. The models were not a great fit, which could be due to a relatively small number of samples, the omission of other confounding factors, and some anomalies being missed using mean values (over 3 months) of NO2 and AOD per region. However, the models still gave valuable information regarding the significance of the variables. Adding data from other countries in Europe, along with the addition of other important factors could help in generating a better model. Also, looking at the higher AOD values contributing to better model performance, it is suggested that a similar study could be explored in regions with very high AOD values like the Middle East and North Africa (MENA).

5 Conclusion

This study not only evaluated previous studies which showed that NO2 and population density are significant variables in determining COVID-19 fatality rates but also showed that AOD, used in conjunction with NO2 and population density, could help to improve the performance of the regression model to estimate COVID-19 fatality rate in regions with high AOD. The results from this study open the door for further scientific studies, using advanced modeling techniques and more accurate data, to determine different environmental characteristics affecting the COVID-19 fatality rate.