1 Introduction

The SARS-COV-2 first emerged in Wuhan city China, and there was a rapid spread of corona disease to more than 200 countries in six months. Then the pandemic swept different countries and became the leading cause of death. As of October 2022, 6.56 million cumulative deaths are occurring worldwide, and India alone has witnessed more than half a million deaths. The majority of deaths that occurred due to COVID-19 infection are due to suppurative pulmonary infection, and a few cases with respiratory failure due to diffuse alveolar damage [1]. Since pulmonary infection and alveolar damage are directly associated with PM2.5 [2] due to the small diameter of PM2.5, inhaled virus-laden PM can directly transport the virus deep into alveolar and tracheobronchial regions. So, there might be a high chance of mortality from air pollutants exposure. However, in earlier studies [3, 4] it was stated that exposure to air pollution and its association with increased COVID-19 incidences/mortality is mostly unknown. But later, a few studies such as [5,6,7,8,9] stated that there is a possible link between air pollution and COVID-19 incidence/ mortality. Here, in the case of India, a large number of studies [10,11,12,13,14] stated that there is a reduction in mean concentrations of air pollution. Still, arguably there is again an increment in pollution levels post-lockdown [15,16,17,18]. The statistical evidence suggests that most of the deaths occurred in the second wave of COVID-19, in which post-lockdown impacts are visible. Hence, there is a great need to quantify the pollutant effects on COVID-19 cases and mortality. Still, it is challenging to state that air pollution in India is really escalating the COVID-19 incidence/mortality. The cause of mortality is highly dependent on various aspects [19] of human physiology. External interventions such as weather, and socioeconomic burdens also play an equal role in mortality. According to WHO, Every year, there are more than 4 million deaths due to chronic pulmonary and heart disorders all over the globe due to air pollution. The Asia–pacific region alone has 2.3 million deaths because of high pollution and dense population. The major cities in India have hazardous AQI (Air Quality Index) levels leading to fatal deaths across the country due to various lung diseases along with the Covid-19 virus. According to previous studies [20], there were 1.6 million deaths in India due to air pollution in 20, contributing to 17% of total deaths. Most of these deaths are due to PM2.5, with an average of 0·98 million, and deaths due to household pollutants are 0.61 million. Thus, it suggests all major pollutants might have a significant contribution to COVID-19 deaths like it had with other respiratory diseases. Recent studies such as [21, 22] quantified the increase in death with the increase in pollutant concentration. Along with the pollutants, the meteorological parameters also affect the COVID-19 spread and mortality. In general, temperature decrement results in an increase in infection. Relative humidity also has a negative correlation, but wind speed [23,24,25] has a positive correlation with virus transmission. The meteorological variables always play an essential role in virus spread, but at the same time, mortality also may be affected by the weather conditions. In most developed countries, it is not difficult to afford cooling/heating systems, especially in hospitals when the person is deeply affected by a virus. But in developing countries, making amenities in hospitals is also difficult because any extreme weather condition may severely affect the patient. The common statistical models in the literature to find the relative risk are linear, log-linear, GLM (generalized linear model), and GAM (generalized additive model). In recent times GLM and GAM models have been widely used in epidemiology. The GAM models facilitate nonlinear terms by taking splines in the model. These are widely used in epidemiology in different diseases to find relative risk, and details of GAM uses are given in [26]. Here, we attempted to predict relative risk using the GAM algorithm with singel-pollutant model and meteorological parameters as confounding variables. The effects of the lockdown and post-lockdown (surge of air pollution in different parts of India) are addressed by taking temporal data on pollution and COVID-19 from starting of 1st wave to the end of 2nd wave.

2 Materials and methods

2.1 Study area and data

To study the Pollution exposure to COVID-19 cases and deaths (Fig. 1a and b), we have taken the 20 major cities across India shown in Table 1. Since urban areas are more vulnerable to Pollution extremes, population densities also have a major effect on COVID-19 spread. Highly air-polluted cities having daily measurements of pollution are chosen for analysis. Table 1 shows the temporal mean of maximum pollutant concentration of each city for the entire study period. The pollution and meteorological data were collected [27]. The data sets consist of daily maximum, minimum, and median values for pollution and meteorological variables. The maximum concentrations of pollutants are taken for the analysis. The data was pre-processed because some missing days occurred. The missing data were filled with the shape-preserving piecewise cubic spline interpolation. Here we proposed the hypothesis that even decrement in the pollution levels significantly in lockdown. Still, the post-lockdown surge in pollutant concentration leads to a positive association of escalated COVID-19 incidence/ mortality. Especially this surge will happen during peak traffic periods and holidays. So, we tried to analyze the relative risk with the maximum noticed pollutant concentrations in a day. For this purpose, we showed key statistics of pollutants. Every pollutant had more concentration in the post-locdown period (Fig. 2a). The shaded area is the lockdown period, the non-shaded area is the post-lockdown period. It is clearly visible that there is a surge in pollutant concentration post-lockdown period (Fig. 2b), and the pollutant concentrations of PM2.5, PM10, SO2, O3 raised by 20%, 24%, 12%, 19% respectively.

Fig. 1
figure 1

Location of 20 cities and cumulative COVID-19 cases and deaths in each City as of 01-Nov-2021. a COVID-19 cases, b COVID-19 deaths

Table 1 The temporal mean daily observed pollutants and COVID-19 statistics for each city
Fig. 2
figure 2

Temporal plots of different pollutants in the study period and their mean magnitudes in lockdown and post-lockdown. a logarithm of daily maximum pollutant concentration, b mean concentration of daily maximum pollutant concentration (Lockdown vs Post-lockdown)

2.2 Methodology

In the present analysis, the generalized additive model (GAM) was applied. The dependent variable is daily COVID-19 incidence/ deaths, the independent variable is pollution concentration, and meteorological variables are taken as a smooth confounding variable with 20 splines and having a degree of 3. Since past studies such as those [28, 29] considered meteorological variables such as Temperature (Ta), Relative humidity (RH) and wind speed etc., as cofounding variables with splines to represent the seasonality or trends, so we try to implement the same meteorological variables as confounding variables. Here, we include the daily observed maximum temperature (Ta), and daily observed maximum Relative humidity. Non-meteorological cofounders such as demographic, social, and economic factors are previously analyzed [30] but in the present study, these parameters are not considered because of a lack of availability and inconsistency in daily level data. The modeled equation is.

Model 1:

$$ {\text{log}}\left( {{\text{deaths}}_{{{\text{i}},{\text{t}}}} } \right) = {\upbeta }_{0} + {\upbeta }_{1} \overline{{{\text{X}}_{{{\text{i}},{\text{t}}}} }} + {\text{s}}\left( {{\text{T}}_{{\text{a}}} ,{\text{d}}} \right) + {\text{s}}\left( {{\text{RH}},{\text{d}}} \right) + {\text{dow}} $$
(1)

where \(\overline{X }\) is the moving average of pollution of a ith city on tth day. The 14 day moving average was taken for analysis. S(.) is the smoothing spline with a degree of 3, and dow is the day of the week. Here Ta, RH is the air temperature and relative humidity.

The above models were implemented for each pollutant with a lag ranging from 0 to 21 days.

The relative risk is estimated as follows

$$ RR = exp\left( {{\upbeta }_{1} } \right) $$
(2)

The confidence interval can be as follows

$$ {\text{CI}} = {\text{exp}}\left( {{\upbeta }_{1} \pm 1.96*{\text{SE}}} \right) $$
(3)

where SE is the standard error, before applying the model, the impact of each pollutant on COVID-19 is estimated by applying linear regression to each pollutant and COVID-19 cases/deaths. The Pearson correlation was found between pollutants and COVID-19 incidence/deaths.

2.3 Model selection and validation

The choice of Basis dimension ‘k’ is a bit difficult and risky. If we increase the basis dimension, then the fit will be too wriggle (in general overfitting); at the same time, a lower number leads to a linear fit of the GAM model. So, in order to choose the best possible ‘k,’ there should be a trade-off between the ‘k’ and \(\uplambda \) (it is the smoothing parameter). Again the choice of higher \(\uplambda \) minimizes the overfitting and smoothens the curve. Here the condition to trade off the ‘k’ and \(\uplambda \) is edof (effective degree of freedom). Though the choice k is arbitrary [31, 32] the condition to get the best possible fit is edof < k − 1. We iterated the model to satisfy the stated condition and chose k = 20 and \(\uplambda \)=10, the estimated edof of every model is far less than k − 1. As we already trade-off the smoothing parameter \(\uplambda \), which minimized the overfitting, a thorough check is needed in GAM models. The overfitting is estimated by dividing the pooled data into training and testing data. If the accuracy of the model in both the training and testing data set is similar, then the model is fitted correctly and this was done by sensitivity analysis and and maximizing \(\uplambda \). The other odd in GAM model is multi-collinearity, as all major pollutants will have a high correlation among them, so we chose a single pollutant at a time, and the model is fitted with splines of temperature, relative humidity, and week of the day. Still, one should be aware of the collinearity among the confounding variables. For this purpose, we examined the multi-collinearity effects with the VIF factor. The details of the VIF factor are given in Table 2. The VIF of every model is less than 10, which states that the models behave well when considering the above confounding variables. The model follows the normal distribution assumptions with the above-iterated parameters. The residual check plots for all four pollutant models are generated. The diagnostic plots are given as supplementary, where the histogram plots for COVID-19 cases and plots for COVID-19 deaths show that residuals have a normal distribution and have mean, median, mode around zero. The probplots (similar to Q-Q plot) plots for COVID-19 cases and plots show theoretical quantiles, and observed values have linear relation (slop of 45°) but, there exists a small deviation near zero within the acceptable range.

Table 2 The VIF factor of each pollutant model

3 Results

In the present study, 20 major air-polluted cities are taken for the analysis and consecutive 553 days between April 2020 and November 2021. The pollutant concentrations of different pollutants along with COVID-19 statistics are shown in Table. The GAM model was applied to non-cumulative cases and non-cummulative deaths. Before estimating the relative risk, the surge of pollutants post-lockdown is analyzed and quantified. In order to estimate the relative risk, the pooled data was taken for correlation. Even after considering large data sets, there is a positive correlation for every major pollutant. The non-correlated pollutant (R < 0.1) CO, NO2 is omitted from the analysis. Unlike some past studies where cities and durations are less, this multiple-city analysis showed that pollution has the ability to exacerbate the severe COVID-19 conditions and possibly cause an increase in the death rate. Though in the present study, correlations are weak compared to small-duration studies but having a significant impact on both incidence and mortality.

In correlation analysis (Fig. 3) the blue line indicates the single-variable regression line, and the shaded area is the confidence interval. The R value is the Pearson correlation coefficient, suggesting how strong the relationships are between per day maximum concentration of pollutants and COVID-19 incidence and deaths, with p value representing the level of significance for the coefficient. All pollutants (Fig. 3a, c and d) have an upward regression line, suggesting a positive relationship. In addition, both R values are positive, and p values are smaller than the commonly accepted threshold of 0.01, confirming the significant positive association. For the analysis of COVID-19 cases (Fig. 3, a and b) PM2.5, PM10 has a minimum R value in regression analysis. The SO2, O3 (Fig. 3 c and d) showed the maximum R value with upward slop in COVID-19 incidence. The PM2.5 and PM10 (Fig. 4c and d) had minimal effects on deaths and the SO2, O3 (Fig. 4c and d) showed maximum upward slop for deaths. All pollutant concentrations are taken in µmg/m3.

Fig. 3
figure 3

Correlation between pollutant concentrations and logarithm of non-cumulative confirmed incidences (cases). a regression line of PM2.5 vs confirmed incidences, b regression line of PM10 vs confirmed incidences, c regression line of SO2 vs confirmed incidences, d regression line of O3 vs confirmed inciden

Fig. 4
figure 4

Correlation between pollutant concentrations and logarithm of non-cumulative confirmed deaths. a regression line of PM2.5 vs confirmed deaths, b regression line of PM10 vs confirmed deaths, c regression line of SO2 vs confirmed deaths, d regression line of O3 vs confirmed deaths

Initially, PM2.5 (Fig. 5a) showed a relative risk of RR (1.030) with CI 95% [1.028,1.031], and RR kept on decreasing with a lag of up to 21 days. This RR indicates that PM2.5 was associated with only a 2.8 to 3.1% increase in COVID-19 incidence with every 10 µmg/m3 increase. Similarly, for PM10 (Fig. 5a), the relative risk of RR (1.0116) with CI 95% [1.011,1.020]. Like PM2.5, PM10 showed a decreasing trend of RR up to lag of 21 days. The PM10 exhibited a 1% increase in COVID-19 incidence with every 10 µmg/m3 increase. Both PM10 (Fig. 5a and b) and PM2.5 exhibited a very low association of transmission of COVID-19. But SO2, O3 (Fig. 5 c and d) has the RR (1.079), CI 95% [1.076, 1.083] and RR (1.101), CI 95% [1.100, 1.04], respectively, which shows a positive association of 7.9% and 10% respectively with 10 µmg/m3 increment of pollutant concentration. It is observed that every pollutant has a decreasing trend except O3.

Fig. 5
figure 5

Relative risk due to different pollutants with lag 0–21 days for COVID-19 incidences (cases). a relative risk of PM2.5, b relative risk of PM10, c relative risk of SO2, d relative risk of O3

Initially, PM2.5 (Fig. 6a) showed a relative risk of RR (1.021) with CI 95% [1.019,1.029], and RR kept on decreasing with a lag of up to 21 days. This RR indicates that PM10 was associated with only a 1.9 to 2.9% increase in COVID-19 incidence with every 10 µmg/m3 increase. Similarly, for PM10 (Fig. 6a), the relative risk of RR (1.010) with CI 95% [1.008,1.0105]. Unike PM2.5, PM10 showed an increasing trend of RR up to lag of 21 days. The PM10 exhibited a 1% increase in COVID-19 incidence with every 10 µmg/m3 increase. Both PM10 (Fig. 6 a and b) and PM2.5 exhibited a very low association of transmission of COVID-19. But SO2, O3 (Fig. 6c and d) has the RR (1.045), CI 95% [1.040, 1.050] and RR (1.072), CI 95% [1.065, 1.075], respectively, which shows a positive association of 4.5% and 7.2% respectively with 10 µmg/m3 increment of pollutant concentration. It is observed that pollutants PM2.5, SO2, have a decreasing trend, PM10, and O3 have an increasing trend.

Fig. 6
figure 6

Relative risk due to different pollutants with the lag ranging 0–21 days for COVID-19 deaths. a relative risk of PM2.5, b relative risk of PM10, c relative risk of SO2, d relative risk of O3

In order to analyze the sensitivity of the above-proposed model, we excluded Delhi city from the analysis. Delhi is the city with the most cumulative COVID-19 cases reported up to date and is also the most pollutant city. Even after excluding Delhi from the analysis, the relative risk of both COVID-19 cases and deaths didn’t change much. The relative risk due to SO2 (Fig. 7a and b) in COVID-19 cases and deaths decreased slightly, but PM10 and O3 (Fig. 7a and b) showed an increment in RR. The relative risk due to PM2.5 neither increased nor decreased in both incidences and deaths.

Fig. 7
figure 7

Percentage associated COVID-19 incidences(cases) for pollutants with the full model and without Delhi. a COVID-19 cases, b COVID-19 deaths

The pooled studies will give a glance at the overall estimation of the effects of the pollutants on COVID-19. Still, this analysis may not be a correct representation of each city and each region. For this purpose, we estimated the relative risk of each city during the same period of study. The city-wise relative risk is given in Table 3. From Table 3, the relative risk of each city was significantly changed. So, in order to understand the relative risk patterns throughout India, the nearest neighbor algorithm is used to generate the maps of the pollutions (Fig. 8a, b, c, and d) and RR of the entire India. Figure 9 shows the relative risk of each pollutant of COVID-19 cases. The relative risk due to O3 (Fig. 9a) is highly dominated in the north and central part of India and somewhat extended to northwestern regions such as Punjab, upper west such as Gujarat, and a slight pattern was observed in the east coast. The northeast is totally risk-free. Similarly, the relative risk due to PM2.5 (Fig. 9b) is dominant in the central part of India and showed a partial effect on southern India. The relative risk due to PM10 (Fig. 9c) is in the central part of India and high risk in northwestern regions such as Punjab, and lower south such as Kerala. Interestingly the relative risk due to SO2 (Fig. 9d) is much higher in different parts such as the northeast, Indo-Gangetic plane (IGP), northwestern, western, and south.

Table 3 Relative risk of COVID-19 cases/mortality of individual city
Fig. 8
figure 8

Spatial map of pollutant concentration. a mean of daily max of O3, b mean of daily max of PM2.5, c mean of daily max of PM10, d mean of daily max of SO2

Fig. 9
figure 9

Spatial map of relative risk due to pollutants for COVID-19 cases. a relative risk of O3, b relative risk of PM2.5, c relative risk of PM10, d relative risk of SO2

The relative risk due to O3 (Fig. 10a) is highly dominated in the north and central part of India and extended to northwestern regions such as Punjab, upper west such as Gujarat, and the east coast. Similarly, the relative risk due to PM2.5 (Fig. 10b) is dominant in the central part of India and showed a similar effect on northwestern India. The PM10 (Fig. 10c) has the highest effect on the northwestern part of India, along with a partial effect on the IGP plane. The relative risk due to SO2 (Fig. 10d) is much higher in different parts such as the northeast, Indo-Gangetic plane (IGP), northwestern, western, and south, and partial effect on Delhi.

Fig. 10
figure 10

Spatial map of relative risk due to pollutants for COVID-19 deaths. a relative risk of O3, b relative risk of PM2.5, c relative risk of PM10, d relative risk of SO2

4 Discussion

The main aim our study is to quantify the pollution effects on the COVID-19. For this purpose, firstly we did correlation analysis on non-cummilative cases and deaths with major air pollutants. In past studie we observed a correlation analysis of COVID-19 incidences/mortality with previous years mean pollutant concentrations and found that in 71 provinces of Italy with 2017–2019 correlations are 0.340, 0.267, 0.247, 0.264 for PM2.5, PM10, NO2, O3 respectively [5] and we obsevered a little weak correlation because of high temporal data sets. As limited period studies such as [33], found that PM2.5 and PM10 are positively correlated with COVID-19 incidences. Later the long period study in Chile [34] from January 2020 up to June 2021, found excess mortality compared to 2016–2019, using a generalized additive model. A recent study of multiple cities [29] in (Tehran, Mashhad, and Tabriz) from February 20th, 2020 to January 4th, 2021 concluded that significant association with pollutants is possible in short-term exposure in longer period studies but there is no suggested time period for study and from Table 3 the relative risk of different cities vary significantly. So, we chose the study period on basis of data availability. Recently a national wide study in US [28] with almost 800 counties are taken for analysis with 6 months time period. Results indicated a positive correlation for PM2.5 and O3 with both COVID-19 cases as well as deaths. The other such cases of pooled study in Korea [35] have more exacerbated risk observed. Thus by observing the high spatio-temporal studies from above stated researches, we attempted pooled data from 20 major pollutant cities in India. These are taken for analysis with a time span of 553 days.

The percentage associations of particulate matter and other pollutants are observed in the range of some early studies. The evidence from 126 cities in china [36] showed that 2.24%, 1.76%, 6.94%, 4.76%, 7.79% percent of the increase in COVID-19 cases for pollutants PM2.5, PM10, NO2, O3, SO2, respectively. Similarly, we observed the same ranges in present study such as 3%, 1%, 7.7%, 10% increases in COVID-19 cases attributed for PM2.5, PM10, SO2, O3. Here on O3 has significanly vary from past study and we notice that approximately 5% more contribution.

These kinds of studies are rare in India compared to other developing countries as well as highly developed countries. But recently, a study on India [37] showed a significant impact on short-term exposure to major pollutants. The association is as follows, PM2.5, PM10, and NO2 are associated with 2.21%, 2.67%, and 4.56% increase in daily counts of COVID-19-infected cases, respectively. The similar long period (from 1 April 2020 to 31 December 2020 in the National Capital Territory (NCT) of Delhi) study [38] suggest that at a moving average of 14 days of PM2.5, PM10, and NO2 were significantly associated with increased risk of COVID-19 daily new incidence, and pollutants PM2.5, PM10, SO2, NO2, O3, and CO were significantly associated with COVID-19 daily new deaths.

The present results are in accordance with a few above-stated previous studies. However, these statistical analyses alone may not be sufficient to establish a strict relation that COVID-19 is escalated by air pollution. But previously, some studies [39,40,41] established a possible mechanism between air pollution and COVID-19. The evidence from thirty five observational studies [30] showed a significant positive association of all major pollutants. The exact quantification of the increment of COVID-19 incidence/ mortality from every single individual pollutant is questionable. So, sptial maping may be more appropriate representation of the present study. From our study the spatial variability is very high, the probable cause for this is number of cities. However, the spatial variability of pollutant concentration is also high. From cross comparing pollutant (Fig. 8) concentration of one region with relative risk (Figs. 9 and 10) at that region. The relative risk is not exbhiting a high value where exactly the pollutant concentration is high. Thus, it is strongly evident that the high mean concentration alone not sufficient to tell that relative risk will be high where high mean pollutant concentration exists. So, temporal dynamics play a vital role along with human physiology and virual dynamics. Also the nearest neighbour algorithem along with resampling smoothens the plots from point source (in our case relative risk of a city), so, these map shows a probable and causal relationship between pollutants and COVID-19. Thus pollution effects on specifically COVID-19 are limited study.

However, the air pollution effects on lung diseases are well studied. The PM2.5 is the major cause of the lower respiratory disease [42, 43] and the systematic review [44] showed alterations in miRNA expression when exposed to different size of PM and their potential functions in cardiorespiratory toxicity. Similarly, SO2, O3 also cause severe lung diseases. The chronic inhalation of SO2 results in difficulty in breathing and asthma. Negative impacts of SO2 gas on humans include irritation of the skin, tissues, and mucus membranes of the eyes, nose, and throat. According to WHO, acute O3 in the atmosphere results in breathing problems causes, asthma, and reduced lung function. According to this evidence, it can be hypothesized that pollutants escalate COVID-19. But this study had several limitations, such as the availability of data and the consistency of data. More Spatio-temporal data is required to establish the firm relation, and any ecological relation is sensitive to age, sex, and past medical history. The incidence or death of any disease will depend on several other factors, and every single individual exposure to pollution will vary significantly. Indoor pollution also plays a major role in short-term exposure to pollution. Not only short-term effects but there is also a great need to consider long-term pollution effects. This consideration is especially required in countries like India, where urban population density is very high. Most of the cities in India fall under non-attainment zones where chronic inhalation of toxic pollution is common and should be addressed.

5 Conclusions

Indian cities always come under non-attainment zones because of poor air quality. Moreover, the lifting of the lockdown caused an increase in activities like traffic and tourism. These daily level resumed activities caused high concentrations of pollution. All pollutants showed an increment in the post-lockdown scenario. The questionable thing is whether this surged pollution is really a threat to COVID-19. Thus our results are a primary assessment to policymakers of how statistically significant pollution rise in increment of relative risk?. The pooled results showed that the highest attribution went to O3, followed by SO2, PM2.5, and PM10. Spatial analysis showed that COVID-19 escalation highly depends on demographic features and population densities. The spatial maps revealed that north India, IGP, and somewhat western India are the most vulnerable zones due to different pollutants. So, proper mitigation strategies should implement based on the type of pollutant observed in the region. Since the type of pollutant is region-specific, different aspects like population density, tourism levels, traffic, and industries change from region to region. Properly implementing region-specific mitigation strategies is highly helpful in the smooth functioning of cities and results in better economic growth during the pandemic. This research also gives the vision to tackle future pandemics from pollution escalated infection.