1 Introduction

Coronavirus disease 2019 (COVID-19), caused by the novel coronavirus (official name is SARS-CoV-2; formerly called 2019-nCoV), has become a major public health problem all over the world [1]. In light of the rising danger, World Health Organization (WHO) declared COVID-19 as an international public health emergency [2]. Although it is still unknown exactly where the outbreak first started, many early cases of COVID-19 have been attributed to people who have visited the Huanan Seafood Wholesale Market, located in Wuhan, Hubei, China [3]. Globally, as of March 21, 2021 there have been 123.55 million confirmed cases of COVID-19, including 2.72 million deaths and among confirmed cases 99.53 million are recovered [4].

Bangladesh is a well-known climate-vulnerable country due to its high population density and complex meteorological settings [5]. In Bangladesh, the first coronavirus cases were confirmed on March 08, 2020 by the country’s epidemiology institute, the Institute of Epidemiology Disease Control and Research (IEDCR). It has been reported that the temperature, humidity, wind, and precipitation may favour either the spread or the inhibition of epidemic episodes [6, 7] reported that the transmission of viruses is influenced by weather conditions and the density of people. Although Bangladesh is an over-populated country (about 160 million), COVID-19 in Bangladesh seems less acute. As of March 21, 2021 there have been 570,878 confirmed cases including 8690 deaths and among confirmed cases 522,105 are recovered [4]. The reason for moderate transmission of COVID-19 might be an influence of tropical weather (consisting of high temperature, often excessive humidity).

Meteorological parameters are the important factors influencing infectious diseases such as severe acute respiratory syndrome (SARS) and influenza [8]. It is supposed that high temperature and humidity, together, have a combined effect on the inactivation of coronaviruses. In contrast, the opposite weather condition can support the prolonged survival time of the virus on surfaces and facilitate the transmission and susceptibility of the viral agent [9]. There is also some evidence that COVID-19 cases have particularly clustered around cooler, drier regions [10, 11]. Many articles have been published to examine the effects of temperature and humidity on the spread of COVID-19. A systematic review article has also been published in [12]. Most of the researches findings are that there is a significant effect of temperature and humidity on the spread of COVID-19. However, there is still a lack of evidence because some studies found no association between COVID-19 transmission with temperature (see for example, [13, 14]).

In addition, we know that the viruses continuously mutate, and SARS-CoV-2 also change similarly. Callaway [15] state that SARS-CoV-2 has been mutating at a rate of about 1–2 mutations per month. Mutations can have a negative or positive impact on the SARS-CoV-2 virus’s capability to sustain and replicate, depending on where in the SARS-CoV-2 the genome misconstructions transpire. The researcher cautioned that these mutant genealogies of the SARS-CoV-2 strain would be continued uncontrolled transmission of SARS-CoV-2 in many parts of the world. Viral mutations and variants in the United States are regularly scanned through sequence-based surveillance, laboratory studies, and epidemiological investigations [16].

Recently, a novel SARS-CoV-2 mutated (known as lineage B.1.1.7) emerged in the United Kingdom (UK) in November 2020 and expanded quickly in other countries [17]. A total of 17 mutations have been recorded in the new strain found in the UK. Virologists in Bangladesh have announced that a new SARS-CoV-2 strain is a bit similar to the one discovered in the United Kingdom recently [18]. After the mutation, we do not know the effects of the temperature and humidity on the transmission of SARS-CoV-2 strain. Hence, it is crucial to understand the behaviour of the transmission of SARS-CoV-2 for the current data.

Therefore, the main objective of this research is to investigate the effects of temperature and humidity on the transmission of SARS-CoV-2 by using flexible regression models. We try to understand the seasonal behaviour of the transmission of SARS-CoV-2 and the spread of COVID-19. A detailed material and methods regarding data source and statistical models are explained in Sect. 2. Section 3 describe the data analysis and results. Finally, the discussions and conclusions are portrayed in Sect. 4.

2 Material and Methods

2.1 Data Source

Data of Covid-19 cases are collected from the daily reports of the Institute of Epidemiology Disease Control and Research (IEDCR), Dhaka, Bangladesh, during the period of March 08, 2020 to January 31, 2021. Data are available on the website with the link https://en.wikipedia.org/wiki/COVID-19_pandemic_in_Bangladesh. The daily temperature (measured in \(^{\circ }\hbox{C}\)) and humidity (%) of Bangladesh are collected from the website https://www.timeanddate.com/weather/bangladesh/dhaka.

2.2 Generalised Additive Models for Location Scale and Shape

Generalized Linear Models (GLM) and Generalized Additive Models (GAM) respectively introduced by [19, 20], are very popular in statistical data analysis. Rigby and Stasinopoulos [21] proposed a generalized additive model for location, scale and shape (GAMLSS) as a way of overcoming some of the limitations associated with GLM and GAM models for regression analysis. It is a general framework of (semi)parametric regression models where the distribution of response variable does not necessarily belong to the exponential family and includes highly skew and kurtotic continuous and discrete distribution. We consider the logarithmic transformation of the number of SARS-CoV-2 infected new cases and the number of death due to COVID-19 as response variables of the GAMLSS model. In the sequel we denote, for notational convenience, “number of SARS-CoV-2 infected new cases" as “number of new cases". To avoid the logarithmic transformation of zero and the computational complexity under the GAMLSS modelling framework, we add 1.1 to each response variable before the logarithmic transformation. For each response variable, we fit the GAMLSS model separately. The probability distribution of each response variable (Y) under the GAMLSS modelling framework is chosen based on the minimum Bayesian information criterion (BIC) and Akaike information criterion (AIC) values. The Normal Exponential-t distribution is selected for \(Y=\hbox{log}(\hbox{number of new cases})\), and the Gumbel distribution is selected for \(Y=\hbox{log}(\hbox{number of death})\). A detailed selecting procedure is described in Sect. 3.2.

2.2.1 Normal Exponential-t Distribution

The Normal Exponential-t Distribution (NET) distribution was first introduced by [22] as a robust method of fitting the mean and scale parameters of symmetric distribution as functions of explanatory variables (X). The probability density function (pdf) of NET distribution, which is denoted as NET(\(\mu ,\sigma ,\nu ,\tau\)), is given by [22] and defined by

$$\begin{aligned} f_Y(y|\mu ,\sigma ,\nu ,\tau )=\frac{c}{\sigma }\left\{ \begin{array}{lll} \exp \left\{ -\frac{(y-\mu )^2}{2\sigma ^2}\right\} , &{} \qquad \text{when} &{} \, |\frac{y-\mu }{\sigma }|\le \nu \\ \exp \left\{ -\nu |\frac{y-\mu }{\sigma }|+\frac{\nu ^2}{2}\right\} , &{} \qquad \text{when} &{}\, \nu <|\frac{y-\mu }{\sigma }|\le \tau \\ \exp \left\{ -\nu \tau \log \left( \frac{|y-\mu |}{\tau \sigma }\right) -\nu \tau +\frac{\nu ^2}{2}\right\} , &{} \qquad \text{when} &{}\, |\frac{y-\mu }{\sigma }|>\tau \end{array} \right. \end{aligned}$$

for \(-\infty< y <\infty\), where \(-\infty<\mu <\infty\), \(\sigma >0\), \(\nu >1\), \(\tau >\nu\), and \(c=(c_1+c_2+c_3)^{-1}\), where \(c_1=\sqrt{2\pi }[1-2\varPhi (-\nu )]\), \(c_2=\frac{2}{\nu }\exp \left\{ -\frac{\nu ^2}{2}\right\}\) and \(c_3=\frac{2}{(\nu \tau -1)\nu }\exp \left\{ -\nu \tau +\frac{\nu ^2}{2}\right\}\), where \(\varPhi (\cdot )\) is the cumulative distribution function of the standard normal variate. Note that the location parameter \(\mu\) is the mean of Y, for detailed density can be found in [23]. We are interested in estimating the mean function in the regression settings.

2.2.2 Gumbel Distribution

The pdf of the Gumbel distribution (also called an extreme value or Gompertz distribution), denoted by GU(\(\mu ,\sigma\)), is defined by:

$$\begin{aligned} f_Y(y|\mu ,\sigma )=\frac{1}{\sigma }\exp \left[ \left( \frac{y-\mu }{\sigma }\right) -\exp \left( \frac{y-\mu }{\sigma }\right) \right] \end{aligned}$$

for \(-\infty<y<\infty\), where \(-\infty<\mu <\infty\) and \(\sigma >0\), with \(E(Y)=\mu -\gamma \sigma\), where \(\gamma \approx 0.577\) is Euler- Mascheroni constant and \(\text{Var}(Y)=\pi ^2\sigma ^2/6\), for detailed density can be found in [23].

The covariates for both response variables are time (in days), temperature, and humidity are considered for this article. The beauty of the GAMLSS model is that the systematic part of it can be elaborated to endorse modelling not only the location (usually, mean) but other parameters of the distribution such as scale, shape. These parameters could be linear parametric and/or additive non-parametric functions of covariates and/or random effects. In this research, we choose flexible predictor models via fractional polynomial and B-spline functions for finding the smoothing function of the predictor time. To estimate the conditional mean of the response variable Y given covariate \(X=({\text{time}}, {\text{temperature}}, {\text{humidity}})\), we have to estimate the parameters (as a function of X) of the conditional distribution of Y given X. Therefore, the flexible regression models for the location function \(\mu (X)\) and the scale function \(\sigma (X)\) under the flexible GAMLSS modeling framework can be written as

$$\begin{aligned} \mu (X,\varvec{\beta })=\beta _0+f({\text{time}};\varvec{\beta }_1)+\beta _2\times {\text{temperature}}+\beta _3\times {\text{humidity}}, \end{aligned}$$
(1)

and

$$\begin{aligned} \log \left( \sigma (X;\,\varvec{\gamma })\right) =\gamma _0+f({\text{time}};\varvec{\gamma }_1)+\gamma _2\times {\text{temperature}}+\gamma _3\times {\text{humidity}}. \end{aligned}$$
(2)

The (penalized) maximum likelihood estimation is used to estimate the parameters of the model (1) and (2).

2.2.3 Flexible Regression with Fractional Polynomial Function

The fractional polynomial in flexible predictor models is a generalization of the polynomial function. The general form of a fractional polynomial in x of degree m can be written as

$$\begin{aligned} f_p(x;\,\varvec{\theta }, p_1,p_2,\ldots ,p_m)=\sum _{l=0}^m \theta _l H_l(x), \end{aligned}$$
(3)

where m is an integer and

$$\begin{aligned} H_l(x)=\left\{ \begin{array}{lll} x^{p_l}&{}\quad \text{if}\quad p_l\ne p_{l-1}\\ H_{l-1}(x)\times \log (x) &{} \quad \text{if}\quad p_l= p_{l-1}, \end{array}\right. \end{aligned}$$

with \(p_0=0\) and \(H_0(x)=1\), for a sequence of powers \(p_1\le p_2\le \cdots \le p_m\) from the grid

$$\begin{aligned} \{-2, -1, -0.5, 0, 0.5, 1, 2, \max (3, m)\}. \end{aligned}$$

The optimal combination of powers will be selected by using the smallest value of BIC.

We select \(p_1=0\), \(p_2=0\), \(p_3=0.5\) and \(m=3\) for the response variable of log(number of new cases) and hence the model (3) can be written as

$$\begin{aligned} f_p(\text{time};\varvec{\theta }_{1}; 0, 0, 0.5)=\theta _{10}+\theta _{11}\log (\text{time})+\theta _{12}[\log (\text{time})]^2+\theta _{13} (\text{time})^{0.5}. \end{aligned}$$
(4)

For the response variable of log(number of death), we select \(p_1=1\), \(p_2=2\) and \(p_3=2\), and the fractional polynomial in time (in days) variable of degree \(m=3\) for the model (3) can be written as

$$\begin{aligned} f_p(\text{time};\,\varvec{\theta }_1;\, 1,2,2)=\theta _{10}+\theta _{11}\times \text{time}+\theta _{12}\times (\text{time})^2+\theta _{13}\times (\text{time})^2\times \log (\text{time}). \end{aligned}$$
(5)

2.2.4 Flexible Smoothing Regression with B-Splines model

Flexible smoothing function with basis spline (B-spline) models were also fitted in order to get a more flexible approximation to the data. A general form of B-spline predictor model of x for the degree D can be written as

$$\begin{aligned} f_b(x;\,\varvec{\theta }_0, D, K)=\sum _{j=0}^D\theta _{0j}x^j+\sum _{k=D+1}^{D+K}\theta _{0k} (x-b_k)^D H(x>b_k), \end{aligned}$$
(6)

where K is the number of knot values, \(b_k\) is the knot value at kth interval or piece and \(H(x>b_k)\) is the Heaviside function taking value 1 if \(x>b_k\), otherwise 0. The combination of D, K, and the number knots values will be chosen based on the lowest value of BIC.

3 Data Analysis and Results

3.1 Exploratory Data Analysis

To explore the raw data and find an indication for selecting the more sophisticated statistical model, we provide descriptive statistics and some graphical presentation of the variables in this section. Table 1 summarizes the descriptive statistics of the daily number of death due to COVID-19, SARS-CoV-2 infected new cases, and meteorological variables such as temperature and humidity for \(n=324\) days.

Table 1 Descriptive statistics of daily number of death due to COVID-19, number of SARS-CoV-2 infected new cases, temperature and humidity for March 08, 2020–January 31, 2021

This study included 8033 total death, and 535,139 confirmed cases during that period. The average of the daily number of death due to COVID-19 and number of SARS-CoV-2 infected new cases are 24.79 and 1626.2, respectively. Besides, other factors showed that the lowest temperature of \(20\,^{\circ }\hbox{C}\) with the highest temperature of \(37\,^{\circ }\hbox{C}\), and the lowest humidity of 21% with the highest humidity of 100%.

The histogram with kernel density plot of the number of death due to COVID-19 and the number of SARS-CoV-2 new cases are presented in Fig. 1. Figure (a) shows that the distributional shape of the number of death due to COVID-19 seems symmetric, indicating that the bell-shape distribution would be one of the best probability models for this variable. In contrast, Figure (b) reveals that the distributional shape of the number of SARS-CoV-2 infected new cases looks similar to a skewed pattern, indicating a skewed distribution would be more suitable for predicting this variable’s values.

Fig. 1
figure 1

Histrogram a Number of death due to the Covid-19; b number of SARS-CoV-2 infected new cases for Covid-19

The scatter plot of the number of death due to COVID-19 and the number of SARS-CoV-2 infected new cases against time index for the period from August 03, 2020 to January 31, 2021 are drawn in Fig. 2. We clearly see a nonlinear relationship between the response variables and the time index. We depict the scatter plot of the number of death due to COVID-19, and the number of SARS-CoV-2 infected new cases against humidity in Fig. 3. The relationship between the number of death due to COVID-19 and the number of SARS-CoV-2 infected new cases against temperature are shown in Fig. 4. It is observed from these figures that there is a connection between both response variables and temperature and humidity covariates.

Fig. 2
figure 2

Scatter diagram a number of death due to Covid-19; b number of SARS-CoV-2 infected new cases for Covid-19 against time index during the period August 03, 2020–January 31, 2021

Fig. 3
figure 3

Scatter diagram a number of death due to the Covid-19 versus humidity during the period August 03, 2020–January 31, 2021; b number of SARS-CoV-2 infected new cases for Covid-19 versus humidity during the period August 03, 2020–January 31, 2021

Fig. 4
figure 4

Scatter diagram a number of death due to the Covid-19 versus daily temperature during the period August 03, 2020–January 31, 2021; b number of new cases for Covid-19 versus daily temperature during the period August 03, 2020–January 31, 2021

Without adjusting time effect in the model, we consider the following regression model to explore only the conditional relationship between two response variables \(Y= \hbox{log}(\hbox{number of new cases})\) and \(Y=\hbox{log}(\hbox{number of death})\) and two covariates named temperature and humidity. The mean regression model is, for \(i=1,2,\ldots ,n\)

$$\begin{aligned} y_i=\beta _0+\beta _1\times \text{temperature}_i+\beta _2\times \text{humidity}_i +\epsilon _i, \end{aligned}$$
(7)

where \(y_i=\) log(number of new cases) (and log(number of death)) and \(\epsilon _i\) is the disturbance term for ith individual. Under the classical regression model assumptions (see for example, [24]), the summary statistics of the model (7) are tabulated in Table 2. The exploratory results show that the temperature is not significant on the number of new cases and on the number of death. In contrast, the humidity is highly significant on both response variables.

Table 2 Summary statistics of the estimated model given in (7)

Next, we consider the time (in days) variable as a covariate in the model. Since the exploratory data analysis shows a nonlinear relationship between time and response variables, we need advanced computer-intensive statistical models for further research.

3.2 Generalised Additive Models for Location Scale and Shape (GAMLSS) family

For selecting the best probability model for the response variable \(Y= \hbox{log}(\hbox{number of new cases})\), the summary including AIC and BIC values with their degrees of freedom of all selected candidate distributions coming from the GAMLSS family, are provided in Table 7 in “Appendix”. Above all of the distributions, we selected five possible candidate distributions based on the minimum BIC provided in Table 9 in “Appendix”. It is noticed that the smallest BIC and AIC are observed for the NET model. In contrast, the highest value of BIC (and also AIC) is observed for the Skew-t type-3 model. Based on the minimum BIC, we select the NET model to explain the transmission of SARS-CoV-2 for further investigation.

Similarly, for the response variable \(Y=\hbox{log}(\hbox{number of death})\), the summary results including AIC and BIC values with degrees of freedom of all selected candidate distributions coming from the GAMLSS family, are provided in Table 8 in “Appendix”. Above all of the distributions, we selected five possible candidate distributions based on minimum BIC presented in Table 10 in “Appendix”. It is noted that the smallest values of BIC and AIC are observed for the Gumbel model. In contrast, the highest value of BIC (and also AIC) is observed for the Skew-t type-4 model. Therefore, the Gumbel model is chosen as the best model to describe the number of death due to COVID-19 for further analysis.

3.2.1 Flexible Regression with Fractional Polynomial Function

A fractional polynomials flexible models for log(number of new cases) given in (4) and for \(Y=\) log(number of death) given in (5) are estimated within the GAMLSS modelling framework via the best chosen probability distribution of each response variable. The fitted flexible predictor model for the \(\mu (X)\) for log(number of new cases) is

$$\begin{aligned} {\widehat{\mu }}(X_i;\,\widehat{\varvec{\beta }})=10.524+f_{p}(\text{time}_i,\widehat{\varvec{\beta }}_1)-0.089\times \text{temperature}_i-0.015\times \text{humidity}_i, \end{aligned}$$
(8)

and the estimated flexible predictor model (2) is \(\sigma (X;\,\widehat{\varvec{\gamma }})=\exp ( -1.322)=0.267\). We, here, leave out the insignificant effects of the estimated model. The corresponding estimated fractional polynomial model for the \(\mu (X)\) in time (in days) of degree 3 is

$$\begin{aligned} f_{p}(\text{time}_i,\widehat{\varvec{\beta }}_1)&= 35.067+18.203\times \log (\text{time}_i)\\&\quad+2.204\times [\log (\text{time}_i)]^2-33.766\times (\text{time}_i)^{0.5}. \end{aligned}$$

The summary statistics of this estimated flexible predictor model (8) is tabulated in Table 3. Hence, the estimated flexible regression model for mean function \(E(Y|X)=\mu (X)\) of the conditional NET distribution under the GAMLSS modeling framework is

$$\begin{aligned} {\widehat{\mu }}(X_i;\widehat{\varvec{\beta }})&=45.591+18.203\times \log (\text{time}_i)+2.204\times [\log (\text{time}_i)]^2\nonumber \\&\quad -33.766\times (\text{time}_i)^{0.5}-0.089\times \text{temperature}_i-0.015\times \text{humidity}_i. \end{aligned}$$
(9)

Note that the values of two fixed parameters \(\nu\) is 1.5 and \(\tau\) is 2 in the GAMLSS modelling framework. We found the Global Deviance is 288.373, AIC is 308.373, and SBC is 346.181 for the final fitted model. Table 3 shows that the temperature and humidity are highly significant on the number of SARS-CoV-2 infected new cases. In addition, the regression coefficients for both temperature and humidity are negative which indicates that there is a negative relationship between these variables and the number of SARS-CoV-2 infected new cases.

Table 3 Summary statistics of the estimated flexible predictor model given in (9)

Similarly, for the response variable of log(number of death), the estimated flexible regression model under the GAMLSS modelling framework of the location function \(\mu (X)\) of Gumbel distribution is:

$$\begin{aligned} {\widehat{\mu }}(X_i;\widehat{\varvec{\beta }})=5.495+f_{p}(\text{time}_i,\widehat{\varvec{\beta }}_1)-0.062\times \text{temperature}_i-0.008\times \text{humidity}_i, \end{aligned}$$
(10)

and the estimated flexible regression model of (2) is \({\widehat{\sigma }}(X;\,\widehat{\varvec{\gamma }})=\exp (-1.264)=0.283\). The estimated fractional polynomial model for the \(\mu (X)\) in time (in days) of degree 3 is: for \(i=1,2,\ldots ,n\)

$$\begin{aligned} f_{p}({\text{time}}_i,\widehat{\varvec{\beta }}_1)&= -3.584+9.832\times {\text{time}}_i-5.549\times {\text{time}}_i^2+2.398\times {\text{time}}_i^2\times \log ({\text{time}}_i). \end{aligned}$$

The summary statistics of the estimated model are provided in Table 4. Hence, the estimated flexible regression model of (10) can be written as

$$\begin{aligned} {\widehat{\mu }}(X_i;\,\widehat{\varvec{\beta }})&=1.911+9.832\times {\text{time}}_i-5.549\times {\text{time}}_i^2 +2.398\times {\text{time}}_i^2\times \log ({\text{time}}_i)\nonumber \\&\quad -0.062\times {\text{temperature}}_i-0.008\times {\text{humidity}}_i. \end{aligned}$$
(11)

Finally, we obtain the estimated flexible regression model for mean function \(E(Y|X)=\mu (X)-\gamma \sigma (X)\) of conditional Gumbel distribution under the GAMLSS modeling framework is

$$\begin{aligned} \widehat{ E(Y_i|X_i)}&={\widehat{\mu }}(X_i;\,\widehat{\varvec{\beta }})-\gamma {\widehat{\sigma }}(X_i;\,\widehat{\varvec{\gamma }}) \\&=1.748+9.832\times (\text{time}_i)-5.549\times (\text{time}_i)^2 +2.398\times (\text{time}_i)^2\times \log (\text{time}_i)\nonumber \\&\quad -0.062\times \text{temperature}_i-0.008\times \text{humidity}_i. \end{aligned}$$

For this model, the global deviance is 199.555, AIC is 219.555 and SBC is 257.363. Table 4 shows that the temperature and humidity are highly significant on the number of death due to COVID-19.

Table 4 Summary statistics of the estimated flexible regression model via fractional polynomial function

3.2.2 Flexible Smoothing Regression with B-Splines Function

For the response variable log(number of new cases), we also use B-spline function given in (6) for estimating \(\mu (X;\varvec{\beta })\) and \(\sigma (X;\varvec{\gamma })\) of the NET distribution. With \(D=3\) and \(K=4\) in the model (6), the B-spline predictor function for estimating \(\mu (X;\varvec{\beta })\), the estimated B-spline smoothing function of \(f_b(\text{time}_i; \varvec{\beta }_0,3,4)\); \(\forall i=1,2,\ldots ,n\) is

$$\begin{aligned} f_b(\text{ time}_i; \widehat{\varvec{\beta }}_0,3,4)&= -0.635 + 5.703 \times \text{ time}_i- 9.591\times \text{time}_i^2 +9.919 \times \text{time}_i^3\nonumber \\&\quad + H(\text{time}_i > b_4) [8.534\times (\text{time}_i-65.6)^3 + 9.285\times (\text{time}_i-130.2)^3 \nonumber \\&\quad + 8.219\times (\text{time}_i-194.8)^3 + 7.295\times (\text{time}_i-259.4)^3]. \end{aligned}$$
(12)

With \(D=3\) and \(K=1\) in the model (6), the estimated function of \(f_b(\text{time}_i;\varvec{\gamma }_0,3,1)\) for \(i=1,2,\ldots ,n\), is

$$\begin{aligned} f_b(\text{time}_i;\,\widehat{\varvec{\gamma }}_0,3,1)&= 1.529- 3.627\times \text{time}_i - 2.365\times \text{time}_i^2-2.726\times \text{time}_i^3\nonumber \\&\quad - 2.992 (\text{time}_i-162.5)^3 H(\text{time}_i > b_1). \end{aligned}$$
(13)

Using the estimated B-spline function for estimating \(\mu (X;\,\varvec{\beta })\) given in (12), we find the estimated flexible regression function of \(E(Y|X)=\mu (X;\varvec{\beta })\) which is

$$\begin{aligned} \mu (X_i;\, \widehat{\varvec{\beta }})&= -0.635 + 5.703 \times \text{ time}_i- 9.591\times \text{time}_i^2 +9.919 \times \text{time}_i^3\\&\quad + H(\text{time}_i > b_4) [8.534\times (\text{time}_i-65.6)^3 + 9.285\times (\text{time}_i-130.2)^3 \\&\quad + 8.219\times (\text{time}_i-194.8)^3 + 7.295\times (\text{time}_i-259.4)^3]\\&\quad -0.022\times \text{temperature}_i-0.003\times \text{humidity}_i. \end{aligned}$$

The summary statistics of the estimated function \({\widehat{\mu }}(X,\widehat{\varvec{\beta }})\) and \(\log \left( {\widehat{\sigma }}(X,\widehat{\varvec{\gamma }})\right)\) are presented in Table 5. For this estimated model, the Global Deviance, AIC and SBC are \(-46.572\), \(-12.572\) and 51.700, respectively. In the estimated mean function \(\widehat{E(Y|X)}={\widehat{\mu }}(X_i;\, \widehat{\varvec{\beta }})\), we see the slope co-efficient of temperature (\(\beta _1\) ) and humidity (\(\beta _2\)) are negative which indicates that there is a negative relationship between these variables. In addition, the regression co-efficients for both temperature and humidity are highly significant on the number of SARS CoV-2 infected new cases. Similarly for estimated \(\log \left( {\widehat{\sigma }}(X,\widehat{\varvec{\gamma }})\right)\), we see the slope co-efficient of temperature (\(\gamma _1\) ) and humidity (\(\gamma _2\)) are also negative which indicates that there is a negative relationship between these variables where both regression coefficients are not significant on the number of SARS-CoV-2 infected new cases at 5 % level of significance.

Table 5 The summary statistics of flexible regression models of \(\mu (X;\,\varvec{\beta })\) and \(\log (\sigma (X;\,\varvec{\gamma })\) via B-spline smoothing function for the response variable log(number of new cases)

For the response variable log(number of death), we use the B-spline function of time predictor to estimate \(\mu (X)\) and \(\sigma (X)\) of the Gumbel distribution. For estimating \(\mu (X;\,\varvec{\beta })\), we select \(D=3\) and \(K=4\) in the (6) and the estimated flexible function of \(f_b(\text{time}_i; \varvec{\beta }_0,3,4)\) for \(i=1,2,\ldots ,n\), is

$$\begin{aligned} f_b(\text{time}_i;\, \widehat{\varvec{\beta }}_0,3,4)&=0.978+ 1.409\times \text{time}_i+ 3.809\times \text{time}_i^2+ 4.361\times \text{time}_i^3 \nonumber \\&\quad +H(\text{time}_i > b_4) [3.733 \times (\text{time}_i- 65.6 )^3 +3.289\times (\text{time}_i-130.2)^3\nonumber \\&\quad +3.667\times (\text{time}_i-194.8)^3 +2.600\times (\text{time}_i-259.4)^3]. \end{aligned}$$
(14)

To estimate \(\sigma (X;\varvec{\gamma })\) of the Gumbel distribution, we select \(D=3\) and \(K=0\) in the model given in (6). The estimated flexible function of \(f_b(\text{time}_i;\, \varvec{\gamma }_0,3,0)\) for ith individual is

$$\begin{aligned} f_b(\text{time}_i; \widehat{\varvec{\gamma }}_0,3,0)&=0.086- 1.894\times \text{time}_i - 0.929\times \text{time}_i^2-1.118\times \text{time}_i^3. \end{aligned}$$
(15)

Using the estimated B-spline function given in (14), the estimated function of \(\mu (X;\,\varvec{\beta })\) can be written as

$$\begin{aligned} {\widehat{\mu }}(X_i;\, \widehat{\varvec{\beta }})&=0.978+ 1.409\times \text{time}_i+ 3.809\times \text{time}_i^2+ 4.361\times \text{time}_i^3 \\&\quad +H(\text{time}_i > b_4) [3.733 \times (\text{time}_i- 65.6 )^3 +3.289\times (\text{time}_i-130.2)^3\\&\quad +3.667\times (\text{time}_i-194.8)^3 +2.600\times (\text{time}_i-259.4)^3]\\&\quad -0.032\times \text{temperature}_i-0.003\times \text{humidity}_i. \end{aligned}$$

By using the estimated B-spline model given in (15), the estimated scale function \({\widehat{\sigma }}(X_i;\, \widehat{\varvec{\gamma }})\) for \(i=1,2,\ldots ,n\) is

$$\begin{aligned} {\widehat{\sigma }}(X_i;\, \widehat{\varvec{\gamma }})&= \exp ( 0.086- 1.894\times \text{time}_i - 0.929\times \text{time}_i^2-1.118\times \text{time}_i^3\\&\quad -0.008\times \text{temperature}_i-0.004\times \text{humidity}_i). \end{aligned}$$

The summary statistics of the estimated models are tabulated in Table 6. In the estimated mean function \({\widehat{\mu }}(X_i;\,\widehat{\varvec{\beta }})\), we see the slope co-efficient of temperature (\(\beta _1\)) and humidity (\(\beta _2\)) are positive which indicates that there is a positive relationship between these variables. Table 6 shows that, the temperature is highly significant but the humidity is not significant on the number of death due to COVID-19 at 5% level of significance. Similarly for estimated \(\log \left( {\widehat{\sigma }}(X,\widehat{\varvec{\gamma }})\right)\), we see the slope co-efficient of temperature (\(\gamma _1\) ) and humidity (\(\gamma _2\)) are negative which indicates that there is a negative relationship between these variables and the number of death due to COVID-19. In addition, both regression co-efficients are not significant on the number of death due to COVID-19 at 5% level of significance. Based on these results, we obtain the estimated flexible regression model via B-spline smoothing function for \(E(Y|X)=\mu (X)-\gamma \sigma (X)\), where \(\gamma \approx 0.577\) is Euler- Mascheroni constant of conditional Gumbel distribution. Hence, the estimated mean function for ith individual (\(\forall i=1,2,\ldots ,n\)) can be written as

$$\begin{aligned} \widehat{ E(Y_i|X_i)}&={\widehat{\mu }}(X_i;\,\widehat{\varvec{\beta }})-\gamma {\widehat{\sigma }}(X_i;\,\widehat{\varvec{\gamma }}) \\&=0.978+ 1.409\times \text{time}_i+ 3.809\times \text{time}_i^2+ 4.361\times \text{time}_i^3 \\&\quad +H(\text{time}_i > b_4) [3.733 \times (\text{time}_i- 65.6 )^3 +3.289\times (\text{time}_i-130.2)^3\\&\quad +3.667\times (\text{time}_i-194.8)^3 +2.600\times (\text{time}_i-259.4)^3]\\&\quad -0.032\times \text{temperature}_i-0.003\times \text{humidity}_i\\&\quad -0.629\exp (- 1.894\times \text{time}_i- 0.929\times \text{time}_i^2-1.118\times \text{time}_i^3\\&\quad -0.008\times \text{temperature}_i-0.004\times \text{humidity}_i). \end{aligned}$$
Table 6 The summary statistics of flexible regression models of \(\mu (X;\,\varvec{\beta })\) and \(\log (\sigma (X;\,\varvec{\gamma })\) via B-spline smoothing function for the response variable log(number of death)

We also calculate the predicted values of response variable via fractional polynomial and B-spline models. The graphical presentation of actual values and predicted values are depicted in Fig. 5.

Fig. 5
figure 5

Fractional polynomial curve versus basis spline curve a number of death due to the Covid-19 versus days during the period August 03, 2020–January 31, 2021; b number of SARA-CoD-2 infected new cases for Covid-19 versus days during the period August 03, 2020–January 31, 2021

We see the estimated curve via B-spline function is a smooth curve which is expected. On the other hand, the estimated curve via fractional polynomial function is not smooth. However, estimated both curves are very close.

4 Discussion and Conclusions

This study examined whether the temperature and humidity in the transmission of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) affect humans’ respiratory system that causes Coronavirus disease (COVID-19). We relied on the daily count of the number of confirmed SARS-CoV-2 infected new cases and the total number of death due to COVID-19 per day from Institute of Epidemiology Disease Control and Research (IEDCR), Dhaka, Bangladesh. A generalised additive model location scale and shape (GAMLSS) model is used to examine the effect of temperature and humidity on the number of confirmed SARS-CoV-2 infected daily new cases and the total number of death due to COVID-19 separately. Without adjusting the time effect in exploratory data analysis, we did not find the significant impact of temperature on both response variables.

To investigate the significant effects of temperature and humidity after adjusting the time variable, we used the flexible GAMLSS model. The best response distribution is chosen based on the minimum BIC under the GAMLSS modeling framework. The Normal Exponential-t distribution for log(number of new cases) and Gumbel distribution for log(number of death) are selected. To estimate the systematic part of the GAMLSS model, we have employed two flexible predictor models such as (i) fractional polynomial model and (ii) B-spline smoothing model. Both models suggested that high temperature and high humidity significantly reduce the transmission of SARS-CoV-2. A fractional polynomial model indicates that high temperature and high humidity significantly reduce the number of deaths due to COVID-19. Many researches support these results (see, for example in [12]) but these are opposite of the findings of [25]. According to the fitted fractional polynomial model, for every 1\(^{\circ }\)C increase in temperature, the number of deaths due to COVID-19 reduced by 8.9% (95% CI: 7.3%, 10.5%) and daily new cases reduced by 6.2% (95% CI: 4.6%, 7.8%); for every 1% increase in humidity, the number of deaths due to COVID-19 reduced by 1.5% (95% CI: 1.2%, 1.8%) and daily new cases reduced by 0.8% (95% CI: 0.48%, 1.1%), holding all the other factors constant.

On the other hand, the B-spline model suggested that high temperature and high humidity minimise the number of death due to COVID-19, where the temperature significantly affects. However, the humidity significantly affects the number of deaths at a 10% level of significance but not significantly affects at a 5% level of significant. Note that there are a number of reasons for getting the insignificant effect of the humidity in the B-Spline model. It might happen that the sample size (\(n=324\)) is not enough to find the significant humidity effect in the B-spline model. Moreover, the temperature and humidity are correlated. As the response variable is already well explained by the temperature and B-spline function of the time variable, it is possible to get a high p value of the regression coefficient of humidity. According to the fitted B-spline model, for every \(1^{\circ }\hbox{C}\) increase in temperature, the daily number of deaths due to COVID-19 reduced by 0.8% (95% CI: 7.9%, 9.5%) and the daily new cases reduced by 2.2% (95% CI: 0.03%, 4.4%); for every 1% increase in humidity the number of deaths due to COVID-19 reduced by 0.4% (95% CI: 1.2%, 1.9%) and daily new cases reduced by 0.3% (95% CI: 0.02%, 0.62%), holding all the other factors constant.

Although our analysis shows that the temperature and humidity will be affected by the transmission of SARS-CoV-2, we notice that the temperature and humidity alone do not explain most of the variability of the transmission of SARS-CoV-2 infection. To find the actual behaviour and variability of transmission of SARS-CoV-2 infection, we have to consider the temperature and humidity with other confounding factors such as population density, public health policies, public health intervention, social isolation campaigns, actual diagnosis, transportation system, people lifestyle, etc. in the computer-intensive statistical model.