1 Introduction

The count data for response variable (Y) contain nonnegative integers which are generally explained in two ways, namely simple count and categorical data. The simple count data explain how many numbers of times an event occur within a specific period of time, such as how many days rainfall occur within a week or fortnight in during several years. Furthermore, the categorical data are categorized on the basis of different criteria of each variable within a specific period of time. The distribution of count data usually follows exponential family under the assumption that the events are independently and identically distributed [25, 26]. Nelder and Wedderburn [33], and McCullagh and Nelder [31, 32] developed generalized linear models (GLMs) for analyzing the effect of explanatory variables on count data considering the parameter of count data distribution as link function. Therefore, the count data models are explained the rate of change of occurrence of an event for per unit time or time interval under different covariates [11, 18, 20, 47].

To explain count data using GLMs, several authors have suggested Poisson, hurdle Poisson, negative binomial (NB), zero-inflated Poisson (ZIP) models as the most suitable for analyzing the violation of safety regulations, turnover in child care arrangements, labor mobility, market entry, recruitment rates, fish catching behavior, influencing factors on TV channel viewing, etc. [9, 15, 19, 20, 34, 46, 48, 49]. Vivekanandan [45] also conducted heavy rainfall study for Tohana, Maharashtra, India, and shown that the log Pearson type-3 provides satisfactory result compared to all other distributions. Srikanthan and McMahon [42] described Markov chain, hidden Markov chain, log normal distribution, etc., for explaining annual, monthly and daily rainfall data. These studies were not considered any atmospheric variables relating to the rainfall occurrences for analyzing the patterns of rainfall data.

For exploring the causes of extreme flood and climate change scenarios in the western Ireland, Chandler and Wheater [14] obtained satisfactory result employing GLMs method for daily rainfall occurrences. Segond et al. [37] demonstrated Poisson cluster processes using GLMs for analyzing simulated hourly and daily rainfall data of Thames region, UK. Buishand et al. [10] employed GLMs considering logistic transformation as the link function for analyzing daily and monthly rainfall occurrences of Bern, Switzerland. Considering time function Fourier series as the link function, Coe and Stern [16] developed GLMs for Markov model for analyzing the patterns of daily rainfall data of Kharja, Jordan; Lunuwila, Sri Lanka; etc. Similar analysis is also described by Stern and Coe [41] for non-stationary Markov model for rainfall data of Morogoro, Tanzania.

Sinha et al. [38,39,40], Islam and Chowdhury [22] explained the patterns of daily rainfall occurrences in Bangladesh considering logistic regression for Markov chain model. Thyer and Kuczera [43, 44] formulated hidden Markov chain model for identifying long-term persistence of annual rainfall and dry spell, and drought risk for Sydney, Australia. These studies were not analyzed count data of daily rainfall occurrences as the response variable. The influences of climatic variables on count data of daily rainfall occurrences are very much effective for identifying the patterns of rainfall occurrences. To the agriculturalist and hydrologist, the results of rainfall pattern helpful for formulating their year or season-wise proper crop cultivation plans and activities, side by side it also important for formulating the country’s development programs implementation plans and activities. In this context, this study has taken initiative to develop generalized linear models (GLMs) for Poisson distribution using count data of daily rainfall occurrences.

The study has been organized into five sections. Following this section, in Sect. 2 has been discussed about the methodology of the study. In this section, the study has discussed about the derivation procedure of Poisson distribution from the exponential family distribution, formulation of generalized linear model (GLM) for Poisson distribution, interpretation of parameters of GLM, estimation of parameters of GLM, test of hypothesis for testing the significance of parameters, deviance of the models and model selection criterions (AIC and BIC). In Sect. 3, the behavior of data and regarding data collection has been discussed. The results and discussions of the study are discussed in Sect. 4. Finally, conclusions of the study are drawn in Sect. 5.

2 Methodology

2.1 Poisson distribution

Count data of response variable (Y) generally follow the exponential family distribution considering a parameter θ, which is demonstrated as [7, 18, 23, 32]

$$ f\left( {y;\theta , \phi } \right) = \exp \left\{ {\frac{{y_{i} \theta_{i} - b\left( {\theta_{i} } \right)}}{{\alpha_{i} \left( \phi \right)}} + c\left( {y_{i} ;\, \phi } \right)} \right\},\;\;y \in Z^{ + } $$
(1)

where \(\alpha \left( \cdot \right),b\left( \cdot \right)\;{\text{and}}\;c\left( \cdot \right) \) are the specific functions and Z+ is the positive integer. Here θi indicates the canonical parameter or link function, \(b\left( {\theta_{i} } \right)\) implies cumulant function, \(\alpha \left( \phi \right)\) indicates scale parameter and \(c\left( {y_{i } ;\phi } \right)\) indicates the normalization factor. Then the function is defined as \(E\left( y \right) = b^{\prime}\left( \theta \right)\) and \(v\left( y \right) = \alpha \left( \phi \right)b^{^{\prime\prime}} \left( \theta \right)\), where \(\alpha \left( \phi \right)\) indicates the dispersion parameter and \(b^{^{\prime\prime}} \left( \theta \right)\) indicates the variance function.

To formulate count data distribution of daily rainfall occurrences using Eq. (1), yi explain count data, which implies within a week or fortnight the number of days are occurred rainfall. Further, the canonical parameter θi express logλ for the distribution of count data, and the cumulant function b(θi) indicates λ, which is the parameter of distribution. For this distribution, the normalizing factor \(c\left( {y_{i} ,\phi } \right)\) is defined as \({-}\log y!\) because y always indicated nonnegative value as count data. These expressions explain specific distribution form for the response variable (y) considering \(\alpha \left( \phi \right) = 1\).

Therefore, for the count data of daily rainfall occurrences, considering time interval, Eq. (1) is explained as below

$$ f\left( {y;\lambda } \right) = \exp \left\{ {y\log \lambda - \lambda - \log y!} \right\},\;\;y \in Z^{ + } $$
(2)

Here \(E\left( y \right) = \lambda\) and \({\text{var}} \left( y \right) = \lambda\), i.e., the mean and variance of y indicate equal. Equation (2) can be written as

$$ f\left( {y;\lambda } \right) = \lambda^{y} e^{ - \lambda } \frac{1}{y!} = \frac{{\lambda^{y} e^{ - \lambda } }}{y!},\;\;y = 0,1,2, \ldots \;\;{\text{and}}\;\;\lambda \in R^{ + } $$
(3)

which is the Poisson distribution of count data of rainfall occurrences with parameter λ, under the assumption that the occurrences of rainfall are occurred randomly over time. If the parameter lies 0 < λ < 1, the function is strictly decreasing, and if it provides λ > 1, then the function indicates increasing.

2.2 GLM for Poisson distribution

The mean and variance of the response variable generally explain the behavior of data because the influence of covariates on the response variable highly affects the mean of data as well as variance. The count data of daily rainfall occurrences for the specified time period provide Poisson model with parameter λ. This parameter indicates the intensity of data, i.e., instantaneous rate of rainfall occurrences. Therefore, for count data, let us consider a variable yi which consists of n independent observations, and the ith observation is defined as \(\left( {y_{i} ,x_{i} } \right)\). Here xi indicates the vector of explanatory variables for which the occurrence of rainfall within the specified period of time is responsible, i.e., \(x_{i}^{^{\prime}} = \left[ {x_{i1} , x_{i2} , \ldots ,x_{ik} } \right]\); \(j = 1,2, \ldots ,k\). Then the Poisson distribution of yi conditional xi is defined as

$$ f\left( {y_{i} |x_{i} } \right) = \frac{{\lambda_{i}^{{y_{i} }} e^{{ - \lambda_{i} }} }}{{y_{i} !}},\;\;y_{i} = 0,1,2, \ldots $$
(4)

with \(E\left( {y_{i} |x_{i} } \right) = {\text{Var}}\left( {y_{i} |x_{i} } \right) = \lambda_{i}\) which indicates that this variance is not constant. Therefore, the GLM for Poisson distribution explains the influence of covariates on the count data of daily rainfall occurrences. Then the mean function of the distribution is expressed under logarithm scale through explanatory variables, i.e., \(\log \left( {\lambda_{i} } \right) = x_{i}^{^{\prime}} \beta\), \(\Rightarrow\) \(\lambda_{i} = \exp \left( {x_{i}^{^{\prime}} \beta } \right)\) which indicates the canonical link function of GLM for Poisson distribution [33], where β indicates the parameter vector for explanatory variables. This transformation explains canonical form of yi and canonical parameter of λi, and it indicates the constant elasticity of mean function in the model [47].

2.2.1 Interpretation of parameters for GLM under Poisson distribution

To formulate the GLM for count data of rainfall occurrences, the climatic variables like temperature, humidity, etc., are employed as the covariates. For GLM, the influence of covariates on yi is explained by link function, and then, it is stated as exponential form which indicates the mean function of count data of rainfall, i.e., \(E\left( {y_{i} |x_{i} } \right) = \lambda_{i} = \exp \left( {x_{i}^{^{\prime}} \beta } \right)\). The influence of stated covariates on the occurrence of rainfall indicates the marginal effect of respective covariates. This marginal effect implies that for changing one unit individual climatic variable how much probability of rainfall occurrence will be changed. Furthermore, marginal effect indicates the instantaneous rate of change of probability of rainfall occurrences for one unit change of the specified climatic variable considering the effect of other variables is unchanged. Therefore, for count data model, the marginal effects of explanatory variables influence the average of the predicted counts of the model. Then the average marginal effects for binary explanatory variables indicate the partial effect on count variable [21]. Therefore, the marginal effect of respective covariate on count data under GLM is stated as

$$ \frac{{\partial E\left( {y_{i} |x_{i} } \right)}}{{\partial x_{ij} }} = \frac{{\partial \lambda_{i} }}{{\partial x_{ij} }} = \beta_{j} \exp \left( {x_{i}^{^{\prime}} \beta } \right) $$
(5)

Then the expected or average marginal effect is defined as

$$ E\left[ {\frac{{\partial E\left( {y_{i} |x_{i} } \right)}}{{\partial x_{ij} }}} \right] = \frac{1}{n}\mathop \sum \limits_{i = 1}^{n} \beta_{j} \exp \left( {x_{i}^{^{\prime}} \beta } \right) $$
(6)

If the GLM includes intercept term, then the average marginal effect is expressed as

$$ E\left[ {\frac{{\partial E\left( {y_{i} |x_{i} } \right)}}{{\partial x_{ij} }}} \right] = \beta_{j} \overline{y} $$
(7)

because the first-order condition implies \(\mathop \sum \limits_{i = 1}^{n} \exp \left( {x_{i}^{^{\prime}} \beta } \right) = \mathop \sum \limits_{i = 1}^{n} y_{i}\).

The βj indicates the relative change in \(E\left( {y_{i} |x_{i} } \right)\) associated with one unit change in xi; then, it explains as

$$ \frac{{\partial E\left( {y_{i} |x_{i} } \right)/E\left( {y_{i} |x_{i} } \right)}}{{\partial x_{ij} }} = \beta_{j} . $$
(8)

This parameter also indicates the semielasticity, i.e., it explains the percentage change of rainfall occurrences in a week or fortnight for one unit change of individual climatic variable. If xi explain in logarithm form, then βj interprets the elasticity of rainfall occurrences count for the individual climatic variables. Further, to assess the effect of discrete covariates on count data, i.e., the influence of one unit change in xij on the expected value of yi, then there compares the expected value of yi for xij and xij + 1, respectively. In that situation, let us consider \(\tilde{x}_{i} = \left( {x_{i1} ,x_{i2} ,x_{i3} , \ldots ,x_{ij} + 1, \ldots ,x_{ik} } \right)^{^{\prime}}\); then the relative change is defined as [47].

$$ \frac{{E\left( {y_{i} |\tilde{x}_{i}^{^{\prime}} \beta } \right) - E\left( {y_{i} |x_{i}^{^{\prime}} \beta } \right)}}{{E\left( {y_{i} |x_{i}^{^{\prime}} \beta } \right)}} = \frac{{\exp \left( {x_{i}^{^{\prime}} \beta + \beta_{j} } \right) - \exp \left( {x_{i}^{^{\prime}} \beta } \right)}}{{\exp \left( {x_{i}^{^{\prime}} \beta } \right)}} = \exp \left( {\beta_{j} } \right) - 1 $$
(9)

This indicates the relative change in \(E\left( {y_{i} |x_{i} } \right)\) due to one unit change in xij. Similarly, the relative influence of a dummy variable (takes the value 0 or 1) on the expected count data is defined as \(\exp \left( {\beta_{j} } \right) - 1\).

2.2.2 Estimation of parameters for GLM

To analyze the count data of rainfall occurrences, the GLMs for Poisson distribution explain appropriate basis because it capable of identifying the instantaneous rainfall occurrences rate. For this purpose, need to identify the influence of climatic variables on the occurrence of daily rainfall. To identify the variables influence, the study needs to estimate the parameters of GLMs. For this purpose, the study has been considered maximum likelihood estimation (MLE) method [11, 12, 47]. The principle of MLE indicates that the influences of climatic variables on the rainfall occurrence under GLMs for Poisson distribution are estimated to maximize the probability of rainfall occurrences.

2.2.2.1 Maximum likelihood estimation for GLM

Let us consider n pair of independent observations \(\left( {y_{i} ,x_{ij} } \right)\); here \(y_{i} = 0,1,2, \ldots\) indicate the count data of rainfall occurrences and \(x_{ij} ,i = 1,2, \ldots ,n\;\;{\text{and}}\;\;j = 1,2, \ldots ,k\) indicate the climatic variables corresponding to the yi. Then the GLM for Poisson distribution is defined as

$$ f\left( {y_{i} |x_{ij} ,\beta } \right) = \frac{{\left[ {\exp \left( {x_{ij}^{^{\prime}} \beta } \right)} \right]^{{y_{i} }} \exp \left[ { - \exp \left( {x_{ij}^{^{\prime}} \beta } \right)} \right]}}{{y_{i} !}} $$
(10)

For this model, the likelihood function of conditional distribution is stated as

$$ L\left[ {f\left( {y_{i} |x_{ij} ;\beta } \right)} \right] = \mathop \prod \limits_{i = 1}^{n} f\left( {y_{i} |x_{ij} ;\beta } \right) $$

Therefore, the log-likelihood function for the model is expressed as

$$ \begin{aligned} \Rightarrow \;\;l\left( {\beta ;y,x} \right) = & \mathop \sum \limits_{i = 1}^{n} \log f\left( {y_{i} |x_{ij} ;\beta } \right) \\ = & \mathop \sum \limits_{i = 1}^{n} \left[ {y_{i} x_{ij}^{^{\prime}} \beta - \exp \left( {x_{ij}^{^{\prime}} \beta } \right) - \log \left( {y_{i} !} \right)} \right] \\ \end{aligned} $$
(11)

The logarithm transformation is the monotonic function. To estimate the parameters of the model, this logic indicates the maximization of log-likelihood function, \(l\left( {\beta ;y, x} \right)\). Under optimization condition to estimate the parameter \(\hat{\beta }\) taking first derivatives on \(l\left( {\beta ;y, x} \right)\) and setting equal to zero, then it is written as

$$ \frac{{\partial l\left( {\beta ;y, x} \right)}}{\partial \beta } = \mathop \sum \limits_{i = 1}^{n} \left[ {y_{i} - \exp \left( {x_{ij}^{^{\prime}} \beta } \right)} \right]x_{ij} = 0 $$
(12)

Thus the Hessian matrix of the model is defined as

$$ H_{n} \left( {\beta ;y,x} \right) = \frac{{\partial^{2} l\left( {\beta ;y,x} \right)}}{{\partial \beta \partial \beta^{\prime}}} = \frac{{\partial \left[ {\mathop \sum \nolimits_{i = 1}^{n} y_{i} x_{ij} - \mathop \sum \nolimits_{i = 1}^{n} e^{{x_{ij}^{^{\prime}} \beta }} \cdot x_{ij} } \right]}}{{\partial \beta^{\prime}}} = - \mathop \sum \limits_{i = 1}^{n} \exp \left( {x_{ij}^{^{\prime}} \beta } \right) \cdot x_{ij} x_{ij}^{^{\prime}} $$
(13)

That is, Hn is negative definite, so the stated function is globally concave and there well exist Newton–Raphson iteration method to estimate the unique solution of β.

2.2.3 Test of hypotheses

To identify the significant effect of different climatic variables on the count data of rainfall occurrences under GLM for Poisson distribution, the study considers tz statistic which is given below:

$$ t_{z} = \frac{{\hat{\beta }_{j} }}{{\sqrt {{\text{var}} \left( {\hat{\beta }_{j} } \right)} }} $$
(14)

where \({\text{var}} \left( {\hat{\beta }_{j} } \right)\) is the jth diagonal element of the inverse of information matrix \(I\left( {\hat{\beta }} \right)^{ - 1} \), \(I\left( {\hat{\beta }} \right) = - E\left[ {H_{n} \left( {\hat{\beta };y,x} \right)} \right] = - n^{ - 1} H_{n} \left( {\hat{\beta }} \right)\); that is, diagonal element of \(I\left( {\hat{\beta }} \right)^{ - 1} = {\text{var}} \left( {\hat{\beta }} \right) = - \left[ {H_{n} \left( {\hat{\beta }} \right)} \right]^{ - 1}\) [47]. Here H0 is rejected against \(H_{a} :\beta_{j} \ne 0\) when \(\left| {t_{z} } \right| > z_{\alpha /2}\), α indicates the level of significance. This is called t statistic because tz follows normality assumption of the linear model. In general tz does not follow t distribution for small samples. Further, for nonlinear count data models it asymptotically follows z statistic [11].

Therefore, to identify whether the effects of climatic variables on the count data of rainfall occurrences are insignificant or not, i.e., H0: β1= β2= … =βk=0, the study considers likelihood ratio test (LRT) [27]. Considering Eq. (11), this test is defined as

$$ - 2\log D = 2\left( {l\left( {\lambda_{i} ;y|\beta \ne 0} \right) - l\left( {\lambda_{i} ;y|\beta = 0} \right)} \right). $$
(15)

Here \(- 2\log D\) is asymptotically distributed as χ2 with \(\vartheta\) degrees of freedom, i.e., \(\vartheta = [({\text{number}}\;{\text{of}}\;{\text{parameters}}\;{\text{of}}\;{\text{the}}\;{\text{model}}\;{\text{under}}\;{\text{alternative}}\;{\text{hypothesis}}) - ({\text{number}}\;{\text{of}}\;{\text{parameters}}\;{\text{of}}\;{\text{the}}\;{\text{model}}\;{\text{under}}\;{\text{null}}\;{\text{hypothesis}})] = k{-}0 = k\). For this study, the likelihood function under null hypothesis is explained as \(l\left( {\lambda_{i} ;y|\beta = 0} \right) = - n - \mathop \sum \limits_{i = 1}^{n} \log y_{i} !\) and the likelihood function under alternative hypothesis is stated as \(l\left( {\lambda_{i} ;y|\beta \ne 0} \right) = \mathop \sum \limits_{i = 1}^{n} y_{i} \log \lambda_{i} - \mathop \sum \limits_{i = 1}^{n} \lambda_{i} - \mathop \sum \limits_{i = 1}^{n} \log y_{i} !\). Then the H0 is rejected when \(\chi^{2} > \chi_{\alpha /2,\vartheta }^{2}\), where \(\vartheta \) explains the degrees of freedom.

2.2.4 Deviance for the models

Under normality assumption, deviance implies residual sum of squares for linear regression model. From this idea, deviance is used for GLM framework as the generalization of residual sum of squares because it generalizes the analysis of variance for comparing the sequences of nested GLMs. The count data for rainfall occurrences yi are assumed independent and \(y_{i} \sim {\text{Poisson}} \left( {\lambda_{i} } \right)\); then the log-likelihood function of GLMs for Poisson distribution is defined as

$$ l\left( {\beta ;y,x} \right) = \mathop \sum \limits_{i = 1}^{n} \left[ {y_{i} x_{ij}^{^{\prime}} \beta - \exp \left( {x_{ij}^{^{\prime}} \beta } \right) - \log \left( {y_{i} !} \right)} \right], $$

which is stated as

$$ l\left( {\lambda_{i} ;y} \right) = \mathop \sum \limits_{i = 1}^{n} y_{i} \log \lambda_{i} - \mathop \sum \limits_{i = 1}^{n} \lambda_{i} - \mathop \sum \limits_{i = 1}^{n} \log y_{i} ! $$
(16)

For saturated model, the estimates of \(\hat{\lambda }_{i}\) are replaced by corresponding exact value of yi; then, the value of log-likelihood function (16) is expressed as

$$ l\left( {\lambda_{i(\max )} ;y} \right) = \mathop \sum \limits_{i = 1}^{n} y_{i} \log y_{i} - \mathop \sum \limits_{i = 1}^{n} y_{i} - \mathop \sum \limits_{i = 1}^{n} \log y_{i} ! $$
(17)

Further, for maximum likelihood estimate \(\hat{\lambda }_{i}\) of the model, the value of log-likelihood function (16) is computed as

$$ l\left( {\hat{\lambda }_{i} ;y} \right) = \mathop \sum \limits_{i = 1}^{n} y_{i} \log \hat{\lambda }_{i} - \mathop \sum \limits_{i = 1}^{n} \hat{\lambda }_{i} - \mathop \sum \limits_{i = 1}^{n} \log y_{i} ! $$
(18)

Therefore, the deviance is stated as [30]

$$ \begin{aligned} D = &\, 2\left[ {l\left( {\lambda_{i(\max )} ;y} \right) - l\left( {\hat{\lambda }_{i} ;y} \right)} \right] \\ = &\, 2\left[ {\mathop \sum \limits_{i = 1}^{n} y_{i} \log \left( {\frac{{y_{i} }}{{\hat{\lambda }_{i} }}} \right) - \mathop \sum \limits_{i = 1}^{n} \left( {y_{i} - \hat{\lambda }_{i} } \right)} \right] \\ \end{aligned} $$
(19)

which is twice the difference between the maximum log-likelihood achievable and the log-likelihood of the fitted model.

2.2.5 Model selection criteria

To identify the most effective model for explaining the occurrences of daily rainfall, several authors including McCullagh and Nelder [31] and Agresti [1] suggested various model selection procedures and they also identified some limitations and drawbacks. These selection procedures sometimes provide almost equal emphasis for several possible models; often procedures do not provide the best model among the models sufficiently for a true alternative hypothesis. For overcoming these problems, Akaike’s information criterion (AIC) and Bayesian information criterion (BIC) procedures are employed in this study.

2.2.5.1 Akaike’s information criterion (AIC)

Akaike [5] developed AIC based on the extension of maximum likelihood principle. The likelihood approach developed by Bartlett [8] under the assumption that the frequency of counts is asymptotically normally distributed and also indicate that the frequency counts are occurred depending on the sequence of k explanatory variables along with considering MLE method for estimating the parameters of k explanatory variables. Further, to develop autoregressive model Akaike [6] employed final prediction error (FPE) considering the mean square prediction error of predictors. Considering the extension form of FPE procedure Akaike [3,4,5] formulated the following AIC form

$$ {\text{AIC}} = - 2\left( {{\text{maximum}}\;{\text{log}}\;{\text{likelihood}}} \right) + 2\left( {{\text{number}}\;{\text{of}}\;{\text{estimable}}\;{\text{parameters}}\;{\text{in}}\;{\text{the}}\;{\text{model}}} \right) $$
(20)

Following Kullback–Leibler information [28], this measure indicates the deviation of fitting model for the true structure. The minimum AIC estimate (MAICE) indicates the most effective model among the fitted models [35]. That is,

$$ {\text{AIC}}\left( i \right) < {\text{AIC}}\left( {i + 1} \right) < \cdots < {\text{AIC}}\left( {i + s} \right),\;\;i = 1,2, \ldots \propto \;\;{\text{and}}\;\;s = 0,1,2, \ldots \propto $$
(21)

where (i + s) indicate the number of models and AIC(i) provides the best model compared to others.

2.2.5.2 Bayesian information criterion (BIC)

Schwarz [36] observed that the AIC procedure is not optimal and consistent, and argued Bayesian procedure (i.e., Bayesian information criterion, BIC) provides consistent estimator for determining the best effective model in several models under the assumption that observations come from an exponential family distribution. He has also shown that BIC is not only consistent, but it is asymptotically optimal (i.e., minimizing the expected loss) under the specification form of prior distribution and the type of loss function. Further, Bayes estimator is asymptotically equivalent to MLE [27] considering large n. Therefore, Schwarz [36] defined following BIC form:

$$ {\text{BIC}} = - 2\left( {{\text{maximum}}\;{\text{log}}\;{\text{likelihood}}} \right) + 2\left( {{\text{number}}\;{\text{of}}\;{\text{estimable}}\;{\text{parameters}}\;{\text{in}}\;{\text{the}}\;{\text{model}}} \right) \times \log n $$
(22)

here n indicates sample size. For selecting best model, Sakamoto [35] suggested the minimum BIC estimate (MBICE) as the most effective model among the fitted models. That is,

$$ {\text{BIC}}\left( i \right) < {\text{BIC}}\left( {i + 1} \right) < \cdots < {\text{BIC}}\left( {i + s} \right),\;\;i = 1,2, \ldots \propto \;\;{\text{and}}\;\;s = 0,1,2, \ldots \propto $$
(23)

where (i + s) indicate the number of models and BIC(i) provides the best model compared to others.

3 Data

Bangladesh is situated in subtropical region, and it is characterized by wide seasonal variations due to heavy rainfall, high temperature and high humidity. Considering these variations of the country the Bangladesh Bureau of Statistics (BBS) subdividing the country into three seasons (from March to June for summer, from July to October for monsoon, and from November to February for winter). In this country, more than 90 percent annual rainfall are occurred in summer and monsoon seasons. Within these seasons, maximum rainfall is occurred in monsoon season [2, 29]. Bangladesh is agro-based country; the crop plantation, the crop growing and harvesting are basically depended on the behavior of country’s annual rainfall patterns. To analyze the rainfall occurrence patterns, the daily rainfall recording stations of the country have been subdivided into five regions such as southeast region (Feni station), northeast region (Sylhet station), southwest region (Khulna station), northwest region (Rajshahi station) and mid-region (Dhaka station). The daily rainfall occurrences basically depend on the behavior of climatic variables like temperature (maximum and minimum) and relative humidity [38]; these variables have been identified as covariates in the study. The study has been collected data for daily rainfall, temperature and relative humidity from January, 1975 to December, 2014 (40 years) from the Bangladesh Meteorological Department (BMD) for each rainfall region. The missing values of these data have been substituted by using the Single Best Estimator (SBE) method considering the reference stations (within 100 km of regional rainfall station) [24].

3.1 Count data of daily rainfall occurrences

To transfer daily rainfall data into count data, the study considers binary variable; when rainfall occurred within a day greater or equal to 1 mm, then the observation is considered as 1, otherwise 0. The summer and monsoon seasons, each of them, are formed by four months. Each month contains four weeks plus additional two or three days. That is, summer season of each year contains 17 weeks plus 3 additional days. These three additional days are excluded from summer season and included in monsoon season (for the continuation of time series data); then, the monsoon season of each year is formed by 18 weeks. Similarly, for these seasons each month contains two fortnights or two fortnights plus one additional day. So summer season contains 8 fortnights plus two additional days in each year. These two additional days are excluded from summer season and considered in monsoon season. Then the monsoon season is formed by 8 fortnights plus 5 additional days in each year. Therefore, to perform the study, the monsoon season is considered by 8 fortnights and the last 5 days of October month are excluded as the 5 additional days in each year. So for weekly count data of daily rainfall occurrences, this study (17 × 7 × 40) = 4760 days and (17 × 40) = 680 weeks for summer season and (18 × 7 × 40) = 5040 days and (18 × 40) = 720 weeks for monsoon season are considered. Similarly, for fortnightly count data, this study (8 × 15 × 40) = 4800 days and (8 × 40) = 320 fortnights are considered for each of summer season and monsoon season.

4 Results and discussions

To explain the influence of climatic variables on the patterns of rainfall occurrences for the summer and monsoon seasons, the GLMs for Poisson distribution have been employed considering weekly and fortnightly count data of rainfall occurrences. Further, deviance of the fitted models has been employed to identify the goodness of fit of the models, and AIC and BIC have been considered to select the most effective model among the models. Similarly, to identify the significant effect of climatic variables on the count data of daily rainfall occurrences, different test statistics are used. Therefore, to formulate GLMs for weekly and fortnightly count data of daily rainfall occurrences for the summer and monsoon seasons, the study considers maximum temperature, minimum temperature and relative humidity as the covariates. The observation of each covariate considers weekly or fortnightly mean for performing GLMs, i.e., for weekly data the ith observation of jth covariate indicates \(x_{i} :\overline{x}_{i1} ,\overline{x}_{i2} , \ldots ,\overline{x}_{ik}\); here \(i = 1,2, \ldots ,n\) and \(j = 1,2, \ldots ,k\) [10]. To identify the patterns of weekly and fortnightly daily rainfall occurrences for the summer and monsoon seasons, the result of GLMs under the Poisson distribution is discussed below.

4.1 Analysis of weekly and fortnightly count data of daily rainfall occurrences for the summer and monsoon seasons

The frequency distribution of the summer and monsoon seasons for weekly and fortnightly count data of daily rainfall occurrences for five regions of Bangladesh is shown in Appendix 1, 2 and 3. The distribution of weekly rainfall occurrences for summer season indicates that within 26.2 percent, 9.6 percent, 27.9 percent, 28.2 percent, 21.6 percent weeks are not occurred any rainfall in Feni, Sylhet, Khulna, Rajshahi and Dhaka stations, respectively. Similarly, the distribution of weekly rainfall occurrences for monsoon season indicates that within 8.6 percent, 7.6 percent, 8.8 percent, 11.5 percent and 7.1 percent weeks are not occurred any rainfall in Feni, Sylhet, Khulna, Rajshahi and Dhaka stations, respectively. Further, the distribution of fortnightly rainfall occurrences for summer season explains that within 15.3 percent, 2.8 percent, 12.5 percent, 12.5 percent and 7.2 percent weeks are not occurred any rainfall in Feni, Sylhet, Khulna, Rajshahi and Dhaka stations, respectively. Similarly, the distribution of fortnightly occurrences of rainfall for monsoon season indicates that within 1.9 percent, 1.3 percent, 1.6 percent, 2.2 percent and 1.3 percent weeks are not occur any rainfall in Feni, Sylhet, Khulna, Rajshahi and Dhaka stations, respectively.

In summer season of Feni station within the highest week (18.1 percent), one-day rainfall is occurred, and in the highest fortnight (9.4 percent), six days of rainfall are occurred. Further, in this season, within the lowest week (3.5 percent) seven days of rainfall are occurred, and within the lowest fortnight (0.3 percent) fourteen days or fifteen days of rainfall are occurred. In monsoon season of this station within the highest week (18.1 percent), four days of rainfall are occurred, and within the highest fortnight (12.8 percent), ten days of rainfall are occurred. Again, in this season, within the lowest week (6.5 percent) and within the lowest fortnight (1.5 percent) one-day rainfall is occurred.

In summer season of Sylhet station, within highest week (16.3 percent), five days of rainfall are occurred, and in the highest fortnight (9.7 percent) eleven days of rainfall are occurred. In monsoon season of the station, within highest week (28.8 percent) seven days of rainfall are occurred, and in the highest fortnight (15.6 percent), twelve days of rainfall are occurred. Further, in this season, within the lowest week (5.1 percent) one-day rainfall is occurred, and in the lowest fortnight (0.6 percent) two-day rainfall is occurred.

In summer season of Khulna station within the highest week (20.6 percent) and in the highest fortnight (12.8 percent), one-day rainfall is occurred. In monsoon season of the station, within the highest week (19.6 percent) six days of rainfall are occurred, and in the highest fortnight (12.8 percent), twelve days of rainfall are occurred. Again, in this season within the lowest week (6.3 percent), and in the lowest fortnight (1.9 percent) one-day rainfall is occurred.

In summer season of Rajshahi station, within the highest week (18.5 percent) two days of rainfall are occurred, and in the highest fortnight (14.1 percent) one-day rainfall is occurred. In monsoon season of the station, within the highest week (18.1 percent) four days of rainfall are occurred, and in the highest fortnight (13.8 percent), eight days of rainfall are occurred.

In summer season of Dhaka station within the highest week (15.6 percent) two days of rainfall are occurred, and in the highest fortnight (11.3 percent), three days of rainfall are occurred. In monsoon season of the station within the highest week (20.8 percent), five days of rainfall are occurred, and in the highest fortnight (14.7 percent) eleven days of rainfall are occurred. Further, in this season within the lowest week (6.8 percent) one-day rainfall is occurred, and in the lowest fortnight (0.9 percent) fifteen-day rainfall is occurred.

4.2 GLMs for Poisson distribution for analyzing count data of rainfall occurrences

In order to identify the effect of climatic variables on weekly and fortnightly count data of rainfall occurrences for five regions (southeast (Feni station), northeast (Sylhet station), southwest (Khulna station), northwest (Rajshahi station) and mid-region (Dhaka station)) of Bangladesh, the study formulates generalized linear models (GLMs) for Poisson distribution considering maximum temperature (xi1), minimum temperature (xi2) and relative humidity (xi3) as covariates. The link function (λ) of GLMs can be stated as \(\lambda_{i} = \exp \left( {\beta_{1} x_{i1} + \beta_{2} x_{i2} + \beta_{3} x_{i3} } \right), i = 1, 2, \ldots ,n\). Here λ indicates the mean of the Poisson distribution and β indicates the parameter of the corresponding variable, i.e., this parameter implies the relative change or semielasticity of weekly/fortnightly rainfall occurrences. To analyze the effect of climatic variables on count data of rainfall occurrences, the study has formulated four models, within these models two for weekly data of summer and monsoon seasons, and two for fortnightly data of summer and monsoon seasons. In these models, yi indicates the weekly or fortnightly count data of rainfall occurrences. For performing the analysis, the study has considered MLE method under Newton–Raphson iteration procedure to estimate the parameters of these models and the tz statistic is employed to identify the significance of parameter. Using the parameter of respective climatic variable, the study has identified the average marginal effect for the occurrence of rainfall within weekly or fortnightly. For each region, the estimated parameters of the summer and monsoon seasons model for weekly and fortnightly count data of rainfall occurrences are stated in Table 1. The estimated average marginal effect of each climatic variable for these regions is also shown in this table.

Table 1 Estimated parameters of GLMs for summer and monsoon seasons for weekly and fortnightly count data of rainfall occurrences

Table 1 indicates that the minimum temperature and relative humidity are explained positive effect on weekly or fortnightly daily rainfall occurrences in summer and monsoon seasons for all the regions. The negative influence is found for maximum temperature on the weekly or fortnightly occurrences of rainfall for summer and monsoon seasons. However, the influence of these climatic variables on the occurrences of daily rainfall within weekly or fortnightly is found significant in all the regions.

In southeast region (Feni station) for summer season, the influence of relative humidity is found highest among the covariates on the occurrence of daily rainfall in weekly and fortnightly. That is, within week or fortnight one-unit increase in relative humidity indicates the likelihood of the occurrence of rainfall will be increased by 12 percent, considering other variables influence as usual. For this season, one-unit increase in maximum temperature indicates the probability of rainy days will be decreased by 7 percent within week and 9 percent within fortnight. Further, average marginal effect of this period indicates that if one-unit increase in relative humidity, then the expected rainy days will increase 0.30 within week and 0.59 within fortnight. For this region in monsoon season, within week or fortnight one-unit increase in minimum temperature indicates that the probability of rainy days will be increased by 22 percent. Further, the average marginal effect indicates that if there is one-unit increase in minimum temperature, then the expected rainy days will be increased by 0.86 within week and 1.90 within fortnight. Besides, this effect indicates that if one unit maximum temperature getting increases, then the expected rainy days will be reduced by 0.27 within week and 0.88 within fortnight.

In summer season of northeast region (Sylhet station), if one unit of relative humidity increases, then the likelihood of the rainy days occurrence will be increased by 8 percent within week and 6 percent within fortnight, considering other covariates effect fixed. In monsoon season of this region, if one unit of minimum temperature increases, then the occurrences of daily rainfall will be increased by 18 percent within week and 19 percent within fortnight. Further, if one unit of maximum temperature increases, then the occurrence of daily rainfall will be reduced by 4 percent within week and 3 percent within fortnight.

In summer season of southwest region (Khulna station), the GLMs indicate that if one unit of relative humidity rises, then the occurrences of daily rainfall will be increased by 9 percent within week. Again, if one unit of minimum temperature increases, then the occurrences of daily rainfall will be increased by 10 percent within fortnight considering other climatic variables effect as usual. For monsoon season of this region, GLMs are indicate that if one unit of minimum temperature increases, then the occurrences of daily rainfall will be increased by 23 percent within week and 25 percent within fortnight. Further, if one unit of maximum temperature increases, then the occurrences of daily rainfall will be reduced by 8 percent within week and 10 percent within fortnight. In this season, the marginal effect indicates that if one unit of minimum temperature getting rises, then the expected rainy days will increase 0.43 within week and 2.23 within fortnight.

For summer season of northwest region (Rajshahi station), the GLMs indicate that if one unit of relative humidity increases, then the occurrences of daily rainfall will be increased by 8 percent within week and 6 percent within fortnight. For this region and season if one unit of maximum temperature increases, then the occurrences of daily rainfall will be reduced by 3 percent within week and 4 percent within fortnight. In monsoon season of the region, GLMs indicate that considering other climatic variables effect as usual, if one unit of minimum temperature increases, then the occurrences of daily rainfall will be increased by 17 percent within week and within fortnight. Further, for this region and season, if one unit of maximum temperature increases, then the occurrences of daily rainfall will be reduced by 5 percent within week and 6 percent within fortnight.

In summer season of mid-region (Dhaka station), the GLMs indicate that if one unit of relative humidity increases, then the occurrences of daily rainfall will be increased by 9 percent within week and 7 percent within fortnight. In monsoon season of this region, it indicates that if one unit of minimum temperature increases, then the occurrences of daily rainfall will be increased by 17 percent within week and 19 percent within fortnight. For this season, the marginal effect of minimum temperature indicates that if one unit of the variable increases, then the expected rainy days will increase 0.10 within week and 1.66 within fortnight. Similarly, the marginal effect of maximum temperature indicates that if one unit of the variable rises then the expected rainy days will reduce 0.04 within week and 0.68 within fortnight.

To test the jointly significance of all climatic variables influence on the occurrence of rainfall within weekly or fortnightly for the summer and monsoon seasons of five regions, the study has considered likelihood ratio test (LRT). Therefore, according to Eq. (15) the null hypothesis is defined as \(H_{0} : \beta_{1} = \beta_{2} = \beta_{3} = 0\). Further, to test the goodness of fit of the models the study has considered the deviance of the models, which is discussed in Eq. (19). Besides, to select the appropriate model for explaining the weekly and fortnightly rainfall occurrences in the summer and monsoon seasons for five regions, the study has considered AIC and BIC procedure (Sect. 2.2.5). The results of LRT, deviance of the models, AIC and BIC are shown in Table 2.

Table 2 Value of Chi-square statistic for LRT, deviance of the models, AIC and BIC

Form Table 2, the LRT indicates that for the occurrences of rainfall within weekly or fortnightly in summer and monsoon seasons, the influence of climatic variables (maximum and minimum temperature, and relative humidity) is found jointly significant for all the regions. The value of LRT is found lower for fortnightly data compared to weekly data for respective season and region. Further, in all the regions this table also indicates that the value of deviance of the models for the fortnightly data in summer and monsoon seasons provides very much low value than that of weekly data. For all the regions, the GLMs for the Poisson distribution provide good fit for the fortnightly count data than that of weekly count data of rainfall occurrences for the summer and monsoon seasons.

Further, for all the regions the value of AIC and BIC for the fortnightly data in the summer and monsoon seasons is found minimum than that of weekly data. Therefore, for climatic variables maximum and minimum temperatures and relative humidity, the GLMs for the Poisson distribution explain satisfactory information for the fortnightly rainfall occurrences in summer and monsoon seasons for all the regions of the country.

5 Conclusions

Bangladesh is an agro-based country. The country is always suffered by the devastating natural disaster, e.g., flash flood, drought, heavy rainfall, etc. For saving country's people from these sufferings, country’s economists and agriculturists are needed having appropriate knowledge about the patterns of daily rainfall occurrences. For this purpose, the study develops GLMs for Poisson distribution considering count data of rainfall occurrences. In these models the maximum and minimum temperatures and relative humidity as explanatory variables are considered. To perform the study, weekly and fortnightly count data of daily rainfall for five regions of the country of the summer and monsoon seasons have been considered as response variable. The GLMs indicate that the effect of explanatory variables on the count data of daily rainfall occurrences is significant. For both seasons, the influence of maximum temperature on the weekly and fortnightly occurrences of rainfall is found negative. Besides, for both seasons the influence of minimum temperature and relative humidity is found positive in weekly and fortnightly daily rainfall occurrences for all the regions. Using AIC and BIC, the GLMs for Poisson distribution are explained satisfactory results for fortnightly occurrences of daily rainfall for both seasons and all stations.

In the summer season, other climatic variables influence considering usual; within fortnight, one-unit increase in relative humidity indicates that the probability of rainy days will be increased by 12 percent in Feni station, 6 percent in Sylhet, Khulna and Rajshahi stations, and 7 percent in Dhaka station. Further, within fortnight in this season one unit increase in maximum temperature indicates that the probability of rainy days will be decreased by 9 percent, 3 percent, 14 percent, 4 percent and 0.7 percent in Feni, Sylhet, Khulna, Rajshahi and Dhaka stations, respectively. In the summer season within this period, average marginal effect indicates that if one unit increases of relative humidity, then the expected rainy days will be increased by 0.59 in Feni station. Similarly, if one unit increases of minimum temperature, then the expected rainy days will be increased by 0.86, 0.94, 0.57 and 0.69 in Sylhet, Khulna, Rajshahi and Dhaka stations, respectively.

In monsoon season within fortnight, one-unit increase in minimum temperature indicates the probability of rainy days will be increased by 22 percent, 19 percent, 25 percent, 17 percent and 19 percent in Feni, Sylhet, Khulna, Rajshahi and Dhaka stations, respectively. Further, in this season one-unit increase in maximum temperature indicates the probability of rainy days will be decreased by 10 percent, 2 percent, 10 percent, 6 percent, 8 percent in Feni, Sylhet, Khulna, Rajshahi and Dhaka stations, respectively. For this season and period, the average marginal effect indicates that if one unit minimum temperature increases then the expected rainy days will be increased by 1.90, 2.06, 2.23, 1.29 and 1.66 in Feni, Sylhet, Khulna, Rajshahi and Dhaka stations, respectively. Therefore, the relative humidity for summer season and the minimum temperature for monsoon season play key role for changing fortnightly rainy days in all the regions of the country. Therefore, for the development of rural economy these GLMs may helpful to formulate effective plan for cultivating agricultural crops, vegetable, etc., and to save fisheries, wildlife, human asset, etc.