Introduction

Coronavirus disease 2019 (COVID-19) already considered as a global pandemic is rapidly spreading across the world and significantly affecting many countries (Singhal 2020; Asyary and Veruswati 2020). This outbreak of a novel coronavirus (SARS-CoV-2) disease began in December 2019 in Wuhan, Hubei Province, China (Gorbalenya 2020; Ma et al. 2020; Wu et al. 2020). By March 25, 2020, the disease had rapidly spread from Wuhan to 196 countries, located in different parts of the world (Chen et al. 2020; Xu et al. 2020). As of April 28, 2020, there have been a total of 3.12 million confirmed cases from all around the world. This contact transmissible disease has an average incubation period from 6 to 14 days (Tosepu et al. 2020). Fever, respiratory disorder, coughing and shortness of breath are some of the early symptoms; while in the acute stage, it can even lead to death (Holshue et al. 2020; Perlman 2020; Tosepu et al. 2020).

According to WHO, the first infected case in India was reported on Jan 30, 2020. Later, around March 4 onwards, it turned into a major outbreak. Till April 27, Maharashtra was the leading state with a total number of 8590 cases; while the whole country recorded a total of 29,458 cases. Social distancing is the only measure that is adopted due to the lack of vaccine. SARS-CoV-2 can be transmitted through various bio-aerosols, large droplets or direct contact with secretions similar to the influenza virus (Li et al. 2005; Qi et al. 2020). Virus transmission can be influenced by several geographical factors such as climatic conditions (temperature and humidity) and population density (PD) (Dalziel et al. 2018; Casanova et al. 2010). It was observed that the outbreak is more severe in the countries located in the mid-latitude where the temperature is considerably low in contrast to the tropical countries. Many researchers from different parts of the world tried to establish a relationship between COVID-19 transmission and various meteorological factors (Bashir et al. 2020; Prata et al. 2020; Shi et al. 2020). In a study conducted in New York, USA, using Kendall and Spearman rank correlation test, it was found that mean temperature, minimum temperature and air quality had a significant association with the COVID-19 pandemic (Bashir et al. 2020). Shi et al. (2020) reported a significant correlation between daily temperature and daily count of COVID-19 cases in China and suggested, temperature above 8–10 °C would lead to the declination of infected cases. Prata et al. (2020) concluded that a rise in 1 °C temperature would result  in a decrease in the number of daily confirmed COVID-19 cases in Brazil.

In India, so far, no comprehensive study regarding the climatic influences on COVID-19 has been reported. Therefore, in this study, we investigated the correlations among climatic and topographic factors with the state-wise total number of infected cases. The main goal is to examine scientific evidence about the spread of COVID-19 cases in India based on regional factors, including PD, climatic conditions and topography.

Data and methodology

Data collection

In this study, we made an attempt to correlate different climatic and topographic variables with the number of COVID-19 infections in different states of India. We retrieved data regarding the number of COVID-19 cases in all the states of India as of April 27th, 2020 from https://www.covid19india.org/. PD data were acquired from census India website (https://www.census2011.co.in). Due to limitation of the daily ground-monitored weather data in India, we obtained long-term annual climatic data [viz. temperature, rainfall, actual evapotranspiration (AET), wind speed (WS), solar radiation (SR), and specific humidity (SH)] from TerraClimate and Worldclim websites (http://www.climatologylab.org/terraclimate.html). Shuttle radar topographic mission (SRTM) digital elevation model of 90-m spatial resolution was obtained from CGIAR website (http://srtm.csi.cgiar.org/).

Determination of climatic zones

The first part of our research was intended to understand the relative climatic conditions of different states. Hence, we implemented De Martone aridity–humidity index (De Martonne 1925). Although, this methodology is more appropriate in a smaller area (Baltas 2007), however, due to its easier calculation and fair generalization, the approach was implemented for regional classification (Ahmadi et al. 2020). Moreover, due to the easier availability of temperature and rainfall data, this method has wider popularity (Zareiee 2014). The computation of the aridity index was done by the following equation:

$$I_{\text{DM}} = \frac{P}{T + 10},$$
(1)

where IDM denotes the aridity index, P is the annual mean precipitation in mm, and T is the annual mean air temperature in °C.

Correlation and bivariate linear regression

Initially, the Pearson product moment correlation was implemented to the number of infected cases along with all the input variables to find out their inter-correlations. Later, bivariate linear regression was done to determine any existence of significance between the topo-climatic factors with the COVID-19 transmissions.

Variable importance of projection (VIP)

Partial least square regression (PLS) is a common method that reduces the predictor variables to a smaller set of uncorrelated components. Instead of original data, it runs least square regression on the reduced number of components. In general, PLS is very useful in collinear predictor variables. However, PLS is having a component called VIP that determines the relative importance of each factor (Akarachantachote et al. 2014). For easier computation of relative importance, we applied PLS in our topo-climatic data to construct a model and determine the relative importance of the variables. The VIP score of variable (j) can be calculated using the following equation:

$${\text{VIP}}_{j} = \sqrt {\frac{{\mathop \sum \nolimits_{a = 1}^{h} R^{2} \left( {y,t_{a} } \right)\left( {\frac{{W_{aj} }}{{W_{a} }}} \right)^{2} }}{{\left( {\frac{1}{p}} \right)\mathop \sum \nolimits_{a = 1}^{h} R^{2} \left( {y,t_{a} } \right)}}} ,$$
(2)

where Waj denotes weight of the jth factor in component a and R2 (y,ta) indicates fraction of variance in y explained by the component a.

Detailed methodology of PLS and VIP can be found in the study of Wold et al. (1993) and Akarachantachote et al. (2014).

Generalized additive model (GAM)

Very recently, GAM is extensively used in numerous studies and found useful to correlate COVID-19 cases with various local meteorological parameters (Ma et al. 2020; Qi et al. 2020; Prata et al. 2020; Wu et al. 2020). In the present study, log-linear GAM was applied to analyze the state-specific associations between infected counts and regional climatic factors, topography and PD. First, the basic model was built for total infected case as the outcome of all other input parameters. Then, parameters were log-transformed as well as smooth spline function was incorporated particularly to PD and E, because for only these two variables, standard deviation exceeded mean due to extremely high heterogeneity at the regional level. Thus, the equation can be expressed as follows:

$$\begin{aligned} \ln ({\text{NI}})& = \ln (T) + \ln (R) + \ln ({\text{SH}}) + \ln ({\text{WS}}) + \ln ({\text{SR}}) \\&\quad+ \ln (AET) + s(\ln ({\text{PD}})) + s(\ln (E)). \end{aligned}$$
(3)

This approach also helped to explore linear and nonlinear effects of various parameters to health outcomes in terms of COVID-19 infections.

Results

Descriptive analysis

Total 29,487 confirmed cases of infections were reported till April 27, 2020 across India. Maharashtra was registered the highest number of confirmed cases (8590); while only 9 among 36 provinces (comprising 28 states and 8 union territories in India) individually registered more than 1000 such cases. PD in India varies from 17 to 11,320 considering all the states and union territories (Table 1). While considering different climatic variables, due to the broad latitudinal differences among states, high variability in values was observed (Fig. 1). The annual mean temperature varies from – 5 °C (Ladakh) to 28 °C (Puducherry) (Table 1); while, highest annual mean rainfall is observed in Meghalaya (3914 mm) and the lowest in Ladakh (164 mm). SH ranges from 0.002 to 0.015 kg kg−1. Among all these states, range of AET is observed to be very high (10.75–100.99 mm). Monthly mean WS above 10 m from the surface varies from 0.99 to 2.76 m s−1. The SR varies between 15,236 and 20,301 kj m−2 day−1. Average elevation varies from 15 to 4661 m above mean sea level.

Table 1 The descriptive statistics of state-wise COVID-19 infections and variation in climate in India
Fig. 1
figure 1

The spatial distribution of different climate, topography, and social factors in India. a Number of infections; b population density; c mean temperature; d rainfall; e specific humidity; f actual evapotranspiration; g wind speed; h solar radiation; and i elevation

Climatic regions and COVID-19 cases

Based on De Martonne classification (Table 2), we found six different climatic zones in entire India (e.g., semi-arid, moderate, semi-wet, wet, very wet and extremely wet) (Fig. 2). According to this climatic classification, we establish five provinces under semi-arid, two under moderate, three under semi-wet, five under wet, seven under very wet and thirteen under extremely wet categories. The spatial distribution of COVID-19 cases in India indicates that maximum transmissions occurred within the states that fall under semi-arid and wet categories. However, provinces under wet (7) and extremely wet (13) categories are likely less infected by such transmission (Fig. 2).

Table 2 De Martone classification table of aridity index
Fig. 2
figure 2

De Martonne climatic classification of India. The inset bar-graph is indicating the total number of infections in each climatic zone

Bivariate correlation among variables

To understand the influence of different climatic and topographic factors, we performed bivariate correlation using the long-term climatic data and topographic elevation. Table 3 shows the Pearson correlation coefficients between each variable. We selected the number of infections as dependent variable and all the geographical parameters were correlated as independent variables. We observed a significant positive correlation between temperature and rainfall with SH and AET. A strong correlation between temperature with SR (+) and elevation (−) was also noticed (Table 3). Moreover, we found a significant positive relationship between the numbers of infections with SR (Fig. 3). Although no such significant correlation was found between number of infections and other variables, a notable positive relationship with temperature and negative relationship with rainfall were observed. Similarly, SH, AET, and altitudinal variation have a negative relation with number of infections; while WS shows a positive relation (Fig. 3). Surprisingly, we found no significant correlation between PD with the number of infections. Mention worthy, improvement in correlation was noticed for most of the variables when the variables were log-transformed and inter-correlated.

Table 3 Correlation among different variables
Fig. 3
figure 3

Scatter plots and VIP values of individual variables with respect to the number of infections. a Population density; b temperature; c rainfall; d specific humidity; e actual evapotranspiration; f wind speed; g solar radiation; h elevation and i VIP values

Variable importance of projection (VIP)

Figure 3i illustrates the VIP of each variable. A large value (> 1) of VIP was recorded in case of SR, rainfall, temperature and AET. Elevation, winds speed, PD, and SH were found having a VIP number lower than 1.

Implementing GAM

Using the GAM model, an attempt was made to relate the number of infected cases with all the geographical variables, which are taken into consideration for this study. Initially, we found no significant relation (R2 = 0.219) using simple linear GAM. However, log-transformed values of all variables significantly improved the performance of the model (R2 = 0.782). Using the flexibility of spline smoothening function to log-transformed value of PD (Fig. 4a) and E (Fig. 4b), the R2 value mounted on 0.895 (Fig. 4c). The parametric coefficients and approximate significance of smooth terms are enlisted in Table S4 (see supplementary files), which shows that all input parameters were able to fit in the model in such a way that all coefficients were found statistically significant at 0.05 significance level (p values < 0.05).

Fig. 4
figure 4

Plots of GAM analysis. a Performance of GAM to related geographic factors; b residuals of smooth terms; c relationship between the actual and predicted model

Log-transformed data with smoothening function to E and PD customized the model’s prediction accuracy at a very significant level. Thus, it well explored the complex non-linearity in the relation of COVID-19 infections with geographical distribution. Although simple bivariate correlation does not produce any significant relevance, the sublime outcome through GAM model suggests multiple complex parameters to take into account for further investigation in any spatial context. GAM experiment depicts that NI is negatively associated with SH and R, while positively associated with SR and T.

Discussion

The recent COVID-19 has caused significant health encumbrance in many places around the world (Ma et al. 2020). In this paper, we investigated the spatial relationship among long-term climate, topography and social factors with the counts of confirmed COVID-19 cases in India. A substantial amount of studies in different places around the world has already tried to examine if there are any correlations between COVID-19 outbreak and the existing weather or climatic conditions (Bashir et al. 2020; Sajadi et al. 2020). The prevailing meteorology (temperature, humidity, WS, etc.) significantly alters the environmental stability, therefore, it might affect the sustainability of viruses and the transmission process (Tosepu et al. 2020). According to Chen et al. (2020), COVID-19 transmission is significantly affected by surrounding air temperature and humidity conditions, agreed by Shi et al. (2020), on the occasion of major outbreak in mainland of China.

In this study, we found a positive correlation between the number of infections with long-term climatic records of temperature, WS, SR (significant) and PD. In China, Shi et al. (2020) reported a negative correlation between temperature and COVID-19 transmission on the basis of the daily weather report. However, Ma et al. (2020) reported a positive association with mortality rate and daily temperature in Wuhan, China. Subsequently, considering the global context, transmission is found higher in particular regions of subtropical countries where the surrounding air temperature is significantly low (Poole 2020).

The significant correlation between SR and COVID-19 infection in India clearly indicates that high insolation during daytime does not prevent COVID-19 transmission. However, sunlight has the ability to boost the immune system and slow down the growth of infections in human body (Cannell et al. 2006; Miller 2018; Asyary and Veruswati 2020). Asyary and Veruswati (2020) investigated the role of sunlight in COVID-19 outbreak and recovery. These workers did not observe any noticeable trend of sunlight exposure with the transmission rate, but reported a significant recovery rate under sunlight exposure.

Our study indicates a negative association between rainfall, SH, AET and elevation. A time-series study from China indicated a negative correlation between daily relative humidity and COVID-19 transmission (Qi et al. 2020). Moreover, a large number of previous epidemiological investigations reported a negative association between humidity and corona virus alike diseases (Zhang Qiang et al. 2004; Gardner et al. 2019). Thus, the findings of present research on Indian context were agreed.

We did not find any literature that is correlating the regional elevation with the COVID-19 transmission. Hence, we subsumed the average elevation for each province, since it significantly controls the climatic conditions. Our study indicated the regions in low lying elevation in India are more likely to get infected by higher COVID-19 transmission.

Occurrence of infection counts in various climatic regions suggests that the transmission rate is likely inferior in the provinces under very wet and extremely wet categories of climatic conditions, and thus, significantly stipulate lower rate of transmission in wet condition. Moreover, accounting 29.2% of total cases in India, Maharashtra has been already opened up as the prime hotspot. Moreover, 24.72% of the total cases were found in neighboring states (i.e., Gujrat, Madhya Pradesh, Goa, Chhattisgarh and Telengana). This might be a result of rapid migration before lockdown (March 25, 2020).

In the present study, we found significant outcome of predicting infected cases through GAM model accounting several geographical parameters altogether. From GAM model, we understood that hot and dry areas are more likely to be infected by COVID-19 transmission. Higher WS at micro-scale may induce the ventilation, but, our study suggests, it will not have suitable impact over regional scale. Residual plots of smooth terms (i.e., PD and E) indicate that population statistics or regional topography may not have any accountability solely; however, these are important with a combination with meteorology.

Same as any scientific investigations, our study has a significant amount of limitations: (1) we have presented only long-term climatic records to indicate the association between COVID-19 cases and prevailing circumstances. There is indeed a requirement of investigation using real-time daily weather data in different states. (2) As the disease is caused by a virus, there are lots of other factors that might be considered such as population migration, immunity power, age groups, hygiene systems, etc. Despite having limitations, this study is highly significant as it is the first report that is investigating the association of climate and COVID-19 transmission in the Indian context. This is simply a basic analysis and a large amount of data (district wise) might be incorporated for a stronger conclusion.

Conclusion

The present study aimed to understand the geographical influence on spatial distribution of COVID-19 transmission at regional level in the context of India. It is observed by several statistical analyses that climatic factors have an unavoidable influence on this viral disease in India. The heterogeneity in the spatial occurrence of infections might be attributed to local meteorology with its geographical location and population. However, no single attribute individually can well explain the nature of transmission. Positive association with SR and temperature as well as negative association with humidity and rainfall suggests that hot and arid areas in low altitude regions are required to strictly follow-up preventive measures on an emergency basis.