Generalized linear models for analyzing count data of rainfall occurrences

Sumi, Sharmin Nahar; Sinha, Narayan Chandra; Islam, M. Ataharul

doi:10.1007/s42452-021-04467-x

Generalized linear models for analyzing count data of rainfall occurrences

Research Article
Open access
Published: 20 March 2021

Volume 3, article number 481, (2021)
Cite this article

Download PDF

You have full access to this open access article

SN Applied Sciences Aims and scope Submit manuscript

Generalized linear models for analyzing count data of rainfall occurrences

Download PDF

Sharmin Nahar Sumi¹,
Narayan Chandra Sinha ORCID: orcid.org/0000-0001-9610-360X² &
M. Ataharul Islam³

1389 Accesses
1 Citation
Explore all metrics

Abstract

Having the adequate knowledge about the behavior of climatic variables on the occurrences of rainfall is needed to the country’s economists and agriculturists for saving the country’s people from the devastating natural hazards like flash flood, drought, heavy rainfall, etc. Therefore, the study has been taken initiative to identify the influence of climatic variables for the occurrences of rainfall. The study has been developed generalized linear models (GLMs) for Poisson distribution for weekly and fortnightly count data of daily rainfall occurrences for the summer and monsoon seasons for five regional rainfall stations of Bangladesh. For these models, minimum and maximum temperatures and relative humidity are considered as explanatory variables. For five regional rainfall stations, the model selection procedures AIC and BIC indicate that the GLMs for the Poisson distribution satisfactorily explain the influence of climatic variables for the fortnightly occurrences of rainfall in the summer and monsoon seasons. The GLMs for the summer season of fortnightly occurrences of rainfall indicate that if one unit of relative humidity increases, then the probability of rainy days will be increased by 12 percent in Feni station, 6 percent in Sylhet, Khulna and Rajshahi stations, and 7 percent in Dhaka station. Besides, the GLMs for the monsoon season of fortnightly occurrences of rainfall indicate that if one unit increases of minimum temperature, then the probability of rainy days will be increased by 22 percent, 19 percent, 24 percent, 17 percent and 19 percent in Feni, Sylhet, Khulna, Rajshahi and Dhaka stations, respectively. Further, maximum temperature indicates negative influence on the occurrences of rainfall for all the stations and seasons of the period. The study indicates that the relative humidity for summer season and minimum temperature for monsoon season play remarkable role for changing fortnightly occurrences of rainfall in all the regions of the country.

Determining the best fitting distribution of annual precipitation data in Serbia using L-moments method

Article 30 October 2020

Appropriate statistical rainfall distribution models for the computation of standardized precipitation index (SPI) in Cameroon

Article 21 December 2023

Statistical modeling of annual highest monthly rainfall in Zimbabwe

Article Open access 11 May 2022

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The count data for response variable (Y) contain nonnegative integers which are generally explained in two ways, namely simple count and categorical data. The simple count data explain how many numbers of times an event occur within a specific period of time, such as how many days rainfall occur within a week or fortnight in during several years. Furthermore, the categorical data are categorized on the basis of different criteria of each variable within a specific period of time. The distribution of count data usually follows exponential family under the assumption that the events are independently and identically distributed [25, 26]. Nelder and Wedderburn [33], and McCullagh and Nelder [31, 32] developed generalized linear models (GLMs) for analyzing the effect of explanatory variables on count data considering the parameter of count data distribution as link function. Therefore, the count data models are explained the rate of change of occurrence of an event for per unit time or time interval under different covariates [11, 18, 20, 47].

To explain count data using GLMs, several authors have suggested Poisson, hurdle Poisson, negative binomial (NB), zero-inflated Poisson (ZIP) models as the most suitable for analyzing the violation of safety regulations, turnover in child care arrangements, labor mobility, market entry, recruitment rates, fish catching behavior, influencing factors on TV channel viewing, etc. [9, 15, 19, 20, 34, 46, 48, 49]. Vivekanandan [45] also conducted heavy rainfall study for Tohana, Maharashtra, India, and shown that the log Pearson type-3 provides satisfactory result compared to all other distributions. Srikanthan and McMahon [42] described Markov chain, hidden Markov chain, log normal distribution, etc., for explaining annual, monthly and daily rainfall data. These studies were not considered any atmospheric variables relating to the rainfall occurrences for analyzing the patterns of rainfall data.

For exploring the causes of extreme flood and climate change scenarios in the western Ireland, Chandler and Wheater [14] obtained satisfactory result employing GLMs method for daily rainfall occurrences. Segond et al. [37] demonstrated Poisson cluster processes using GLMs for analyzing simulated hourly and daily rainfall data of Thames region, UK. Buishand et al. [10] employed GLMs considering logistic transformation as the link function for analyzing daily and monthly rainfall occurrences of Bern, Switzerland. Considering time function Fourier series as the link function, Coe and Stern [16] developed GLMs for Markov model for analyzing the patterns of daily rainfall data of Kharja, Jordan; Lunuwila, Sri Lanka; etc. Similar analysis is also described by Stern and Coe [41] for non-stationary Markov model for rainfall data of Morogoro, Tanzania.

Sinha et al. [38,39,40], Islam and Chowdhury [22] explained the patterns of daily rainfall occurrences in Bangladesh considering logistic regression for Markov chain model. Thyer and Kuczera [43, 44] formulated hidden Markov chain model for identifying long-term persistence of annual rainfall and dry spell, and drought risk for Sydney, Australia. These studies were not analyzed count data of daily rainfall occurrences as the response variable. The influences of climatic variables on count data of daily rainfall occurrences are very much effective for identifying the patterns of rainfall occurrences. To the agriculturalist and hydrologist, the results of rainfall pattern helpful for formulating their year or season-wise proper crop cultivation plans and activities, side by side it also important for formulating the country’s development programs implementation plans and activities. In this context, this study has taken initiative to develop generalized linear models (GLMs) for Poisson distribution using count data of daily rainfall occurrences.

The study has been organized into five sections. Following this section, in Sect. 2 has been discussed about the methodology of the study. In this section, the study has discussed about the derivation procedure of Poisson distribution from the exponential family distribution, formulation of generalized linear model (GLM) for Poisson distribution, interpretation of parameters of GLM, estimation of parameters of GLM, test of hypothesis for testing the significance of parameters, deviance of the models and model selection criterions (AIC and BIC). In Sect. 3, the behavior of data and regarding data collection has been discussed. The results and discussions of the study are discussed in Sect. 4. Finally, conclusions of the study are drawn in Sect. 5.

2 Methodology

2.1 Poisson distribution

Count data of response variable (Y) generally follow the exponential family distribution considering a parameter θ, which is demonstrated as [7, 18, 23, 32]

$$ f\left( {y;\theta , \phi } \right) = \exp \left\{ {\frac{{y_{i} \theta_{i} - b\left( {\theta_{i} } \right)}}{{\alpha_{i} \left( \phi \right)}} + c\left( {y_{i} ;\, \phi } \right)} \right\},\;\;y \in Z^{ + } $$

(1)

where $\alpha \left( \cdot \right),b\left( \cdot \right)\;{\text{and}}\;c\left( \cdot \right) $ are the specific functions and Z⁺ is the positive integer. Here θ_i indicates the canonical parameter or link function, $b\left( {\theta_{i} } \right)$ implies cumulant function, $\alpha \left( \phi \right)$ indicates scale parameter and $c\left( {y_{i } ;\phi } \right)$ indicates the normalization factor. Then the function is defined as $E\left( y \right) = b^{\prime}\left( \theta \right)$ and $v\left( y \right) = \alpha \left( \phi \right)b^{^{\prime\prime}} \left( \theta \right)$, where $\alpha \left( \phi \right)$ indicates the dispersion parameter and $b^{^{\prime\prime}} \left( \theta \right)$ indicates the variance function.

To formulate count data distribution of daily rainfall occurrences using Eq. (1), y_i explain count data, which implies within a week or fortnight the number of days are occurred rainfall. Further, the canonical parameter θ_i express logλ for the distribution of count data, and the cumulant function b(θ_i) indicates λ, which is the parameter of distribution. For this distribution, the normalizing factor $c\left( {y_{i} ,\phi } \right)$ is defined as ${-}\log y!$ because y always indicated nonnegative value as count data. These expressions explain specific distribution form for the response variable (y) considering $\alpha \left( \phi \right) = 1$.

Therefore, for the count data of daily rainfall occurrences, considering time interval, Eq. (1) is explained as below

$$ f\left( {y;\lambda } \right) = \exp \left\{ {y\log \lambda - \lambda - \log y!} \right\},\;\;y \in Z^{ + } $$

(2)

Here $E\left( y \right) = \lambda$ and ${\text{var}} \left( y \right) = \lambda$, i.e., the mean and variance of y indicate equal. Equation (2) can be written as

$$ f\left( {y;\lambda } \right) = \lambda^{y} e^{ - \lambda } \frac{1}{y!} = \frac{{\lambda^{y} e^{ - \lambda } }}{y!},\;\;y = 0,1,2, \ldots \;\;{\text{and}}\;\;\lambda \in R^{ + } $$

(3)

which is the Poisson distribution of count data of rainfall occurrences with parameter λ, under the assumption that the occurrences of rainfall are occurred randomly over time. If the parameter lies 0 < λ < 1, the function is strictly decreasing, and if it provides λ > 1, then the function indicates increasing.

2.2 GLM for Poisson distribution

The mean and variance of the response variable generally explain the behavior of data because the influence of covariates on the response variable highly affects the mean of data as well as variance. The count data of daily rainfall occurrences for the specified time period provide Poisson model with parameter λ. This parameter indicates the intensity of data, i.e., instantaneous rate of rainfall occurrences. Therefore, for count data, let us consider a variable y_i which consists of n independent observations, and the ith observation is defined as $\left( {y_{i} ,x_{i} } \right)$. Here x_i indicates the vector of explanatory variables for which the occurrence of rainfall within the specified period of time is responsible, i.e., $x_{i}^{^{\prime}} = \left[ {x_{i1} , x_{i2} , \ldots ,x_{ik} } \right]$; $j = 1,2, \ldots ,k$. Then the Poisson distribution of y_i conditional x_i is defined as

$$ f\left( {y_{i} |x_{i} } \right) = \frac{{\lambda_{i}^{{y_{i} }} e^{{ - \lambda_{i} }} }}{{y_{i} !}},\;\;y_{i} = 0,1,2, \ldots $$

(4)

with $E\left( {y_{i} |x_{i} } \right) = {\text{Var}}\left( {y_{i} |x_{i} } \right) = \lambda_{i}$ which indicates that this variance is not constant. Therefore, the GLM for Poisson distribution explains the influence of covariates on the count data of daily rainfall occurrences. Then the mean function of the distribution is expressed under logarithm scale through explanatory variables, i.e., $\log \left( {\lambda_{i} } \right) = x_{i}^{^{\prime}} \beta$, $\Rightarrow$ $\lambda_{i} = \exp \left( {x_{i}^{^{\prime}} \beta } \right)$ which indicates the canonical link function of GLM for Poisson distribution [33], where β indicates the parameter vector for explanatory variables. This transformation explains canonical form of y_i and canonical parameter of λ_i, and it indicates the constant elasticity of mean function in the model [47].

2.2.1 Interpretation of parameters for GLM under Poisson distribution

To formulate the GLM for count data of rainfall occurrences, the climatic variables like temperature, humidity, etc., are employed as the covariates. For GLM, the influence of covariates on y_i is explained by link function, and then, it is stated as exponential form which indicates the mean function of count data of rainfall, i.e., $E\left( {y_{i} |x_{i} } \right) = \lambda_{i} = \exp \left( {x_{i}^{^{\prime}} \beta } \right)$. The influence of stated covariates on the occurrence of rainfall indicates the marginal effect of respective covariates. This marginal effect implies that for changing one unit individual climatic variable how much probability of rainfall occurrence will be changed. Furthermore, marginal effect indicates the instantaneous rate of change of probability of rainfall occurrences for one unit change of the specified climatic variable considering the effect of other variables is unchanged. Therefore, for count data model, the marginal effects of explanatory variables influence the average of the predicted counts of the model. Then the average marginal effects for binary explanatory variables indicate the partial effect on count variable [21]. Therefore, the marginal effect of respective covariate on count data under GLM is stated as

$$ \frac{{\partial E\left( {y_{i} |x_{i} } \right)}}{{\partial x_{ij} }} = \frac{{\partial \lambda_{i} }}{{\partial x_{ij} }} = \beta_{j} \exp \left( {x_{i}^{^{\prime}} \beta } \right) $$

(5)

Then the expected or average marginal effect is defined as

$$ E\left[ {\frac{{\partial E\left( {y_{i} |x_{i} } \right)}}{{\partial x_{ij} }}} \right] = \frac{1}{n}\mathop \sum \limits_{i = 1}^{n} \beta_{j} \exp \left( {x_{i}^{^{\prime}} \beta } \right) $$

(6)

If the GLM includes intercept term, then the average marginal effect is expressed as

$$ E\left[ {\frac{{\partial E\left( {y_{i} |x_{i} } \right)}}{{\partial x_{ij} }}} \right] = \beta_{j} \overline{y} $$

(7)

because the first-order condition implies $\mathop \sum \limits_{i = 1}^{n} \exp \left( {x_{i}^{^{\prime}} \beta } \right) = \mathop \sum \limits_{i = 1}^{n} y_{i}$.

The β_j indicates the relative change in $E\left( {y_{i} |x_{i} } \right)$ associated with one unit change in x_i; then, it explains as

$$ \frac{{\partial E\left( {y_{i} |x_{i} } \right)/E\left( {y_{i} |x_{i} } \right)}}{{\partial x_{ij} }} = \beta_{j} . $$

(8)

This parameter also indicates the semielasticity, i.e., it explains the percentage change of rainfall occurrences in a week or fortnight for one unit change of individual climatic variable. If x_i explain in logarithm form, then β_j interprets the elasticity of rainfall occurrences count for the individual climatic variables. Further, to assess the effect of discrete covariates on count data, i.e., the influence of one unit change in x_ij on the expected value of y_i, then there compares the expected value of y_i for x_ij and x_ij + 1, respectively. In that situation, let us consider $\tilde{x}_{i} = \left( {x_{i1} ,x_{i2} ,x_{i3} , \ldots ,x_{ij} + 1, \ldots ,x_{ik} } \right)^{^{\prime}}$; then the relative change is defined as [47].

$$ \frac{{E\left( {y_{i} |\tilde{x}_{i}^{^{\prime}} \beta } \right) - E\left( {y_{i} |x_{i}^{^{\prime}} \beta } \right)}}{{E\left( {y_{i} |x_{i}^{^{\prime}} \beta } \right)}} = \frac{{\exp \left( {x_{i}^{^{\prime}} \beta + \beta_{j} } \right) - \exp \left( {x_{i}^{^{\prime}} \beta } \right)}}{{\exp \left( {x_{i}^{^{\prime}} \beta } \right)}} = \exp \left( {\beta_{j} } \right) - 1 $$

(9)

This indicates the relative change in $E\left( {y_{i} |x_{i} } \right)$ due to one unit change in x_ij. Similarly, the relative influence of a dummy variable (takes the value 0 or 1) on the expected count data is defined as $\exp \left( {\beta_{j} } \right) - 1$.

2.2.2 Estimation of parameters for GLM

To analyze the count data of rainfall occurrences, the GLMs for Poisson distribution explain appropriate basis because it capable of identifying the instantaneous rainfall occurrences rate. For this purpose, need to identify the influence of climatic variables on the occurrence of daily rainfall. To identify the variables influence, the study needs to estimate the parameters of GLMs. For this purpose, the study has been considered maximum likelihood estimation (MLE) method [11, 12, 47]. The principle of MLE indicates that the influences of climatic variables on the rainfall occurrence under GLMs for Poisson distribution are estimated to maximize the probability of rainfall occurrences.

2.2.2.1 Maximum likelihood estimation for GLM

Let us consider n pair of independent observations $\left( {y_{i} ,x_{ij} } \right)$; here $y_{i} = 0,1,2, \ldots$ indicate the count data of rainfall occurrences and $x_{ij} ,i = 1,2, \ldots ,n\;\;{\text{and}}\;\;j = 1,2, \ldots ,k$ indicate the climatic variables corresponding to the y_i. Then the GLM for Poisson distribution is defined as

$$ f\left( {y_{i} |x_{ij} ,\beta } \right) = \frac{{\left[ {\exp \left( {x_{ij}^{^{\prime}} \beta } \right)} \right]^{{y_{i} }} \exp \left[ { - \exp \left( {x_{ij}^{^{\prime}} \beta } \right)} \right]}}{{y_{i} !}} $$

(10)

For this model, the likelihood function of conditional distribution is stated as

$$ L\left[ {f\left( {y_{i} |x_{ij} ;\beta } \right)} \right] = \mathop \prod \limits_{i = 1}^{n} f\left( {y_{i} |x_{ij} ;\beta } \right) $$

Therefore, the log-likelihood function for the model is expressed as

$$ \begin{aligned} \Rightarrow \;\;l\left( {\beta ;y,x} \right) = & \mathop \sum \limits_{i = 1}^{n} \log f\left( {y_{i} |x_{ij} ;\beta } \right) \\ = & \mathop \sum \limits_{i = 1}^{n} \left[ {y_{i} x_{ij}^{^{\prime}} \beta - \exp \left( {x_{ij}^{^{\prime}} \beta } \right) - \log \left( {y_{i} !} \right)} \right] \\ \end{aligned} $$

(11)

The logarithm transformation is the monotonic function. To estimate the parameters of the model, this logic indicates the maximization of log-likelihood function, $l\left( {\beta ;y, x} \right)$. Under optimization condition to estimate the parameter $\hat{\beta }$ taking first derivatives on $l\left( {\beta ;y, x} \right)$ and setting equal to zero, then it is written as

$$ \frac{{\partial l\left( {\beta ;y, x} \right)}}{\partial \beta } = \mathop \sum \limits_{i = 1}^{n} \left[ {y_{i} - \exp \left( {x_{ij}^{^{\prime}} \beta } \right)} \right]x_{ij} = 0 $$

(12)

Thus the Hessian matrix of the model is defined as

$$ H_{n} \left( {\beta ;y,x} \right) = \frac{{\partial^{2} l\left( {\beta ;y,x} \right)}}{{\partial \beta \partial \beta^{\prime}}} = \frac{{\partial \left[ {\mathop \sum \nolimits_{i = 1}^{n} y_{i} x_{ij} - \mathop \sum \nolimits_{i = 1}^{n} e^{{x_{ij}^{^{\prime}} \beta }} \cdot x_{ij} } \right]}}{{\partial \beta^{\prime}}} = - \mathop \sum \limits_{i = 1}^{n} \exp \left( {x_{ij}^{^{\prime}} \beta } \right) \cdot x_{ij} x_{ij}^{^{\prime}} $$

(13)

That is, H_n is negative definite, so the stated function is globally concave and there well exist Newton–Raphson iteration method to estimate the unique solution of β.

2.2.3 Test of hypotheses

To identify the significant effect of different climatic variables on the count data of rainfall occurrences under GLM for Poisson distribution, the study considers t_z statistic which is given below:

$$ t_{z} = \frac{{\hat{\beta }_{j} }}{{\sqrt {{\text{var}} \left( {\hat{\beta }_{j} } \right)} }} $$

(14)

where ${\text{var}} \left( {\hat{\beta }_{j} } \right)$ is the jth diagonal element of the inverse of information matrix $I\left( {\hat{\beta }} \right)^{ - 1} $, $I\left( {\hat{\beta }} \right) = - E\left[ {H_{n} \left( {\hat{\beta };y,x} \right)} \right] = - n^{ - 1} H_{n} \left( {\hat{\beta }} \right)$; that is, diagonal element of $I\left( {\hat{\beta }} \right)^{ - 1} = {\text{var}} \left( {\hat{\beta }} \right) = - \left[ {H_{n} \left( {\hat{\beta }} \right)} \right]^{ - 1}$ [47]. Here H₀ is rejected against $H_{a} :\beta_{j} \ne 0$ when $\left| {t_{z} } \right| > z_{\alpha /2}$, α indicates the level of significance. This is called t statistic because t_z follows normality assumption of the linear model. In general t_z does not follow t distribution for small samples. Further, for nonlinear count data models it asymptotically follows z statistic [11].

Therefore, to identify whether the effects of climatic variables on the count data of rainfall occurrences are insignificant or not, i.e., H₀: β₁= β₂= … =β_k=0, the study considers likelihood ratio test (LRT) [27]. Considering Eq. (11), this test is defined as

$$ - 2\log D = 2\left( {l\left( {\lambda_{i} ;y|\beta \ne 0} \right) - l\left( {\lambda_{i} ;y|\beta = 0} \right)} \right). $$

(15)

Here $- 2\log D$ is asymptotically distributed as χ² with $\vartheta$ degrees of freedom, i.e., $\vartheta = [({\text{number}}\;{\text{of}}\;{\text{parameters}}\;{\text{of}}\;{\text{the}}\;{\text{model}}\;{\text{under}}\;{\text{alternative}}\;{\text{hypothesis}}) - ({\text{number}}\;{\text{of}}\;{\text{parameters}}\;{\text{of}}\;{\text{the}}\;{\text{model}}\;{\text{under}}\;{\text{null}}\;{\text{hypothesis}})] = k{-}0 = k$. For this study, the likelihood function under null hypothesis is explained as $l\left( {\lambda_{i} ;y|\beta = 0} \right) = - n - \mathop \sum \limits_{i = 1}^{n} \log y_{i} !$ and the likelihood function under alternative hypothesis is stated as $l\left( {\lambda_{i} ;y|\beta \ne 0} \right) = \mathop \sum \limits_{i = 1}^{n} y_{i} \log \lambda_{i} - \mathop \sum \limits_{i = 1}^{n} \lambda_{i} - \mathop \sum \limits_{i = 1}^{n} \log y_{i} !$. Then the H₀ is rejected when $\chi^{2} > \chi_{\alpha /2,\vartheta }^{2}$, where $\vartheta $ explains the degrees of freedom.

2.2.4 Deviance for the models

Under normality assumption, deviance implies residual sum of squares for linear regression model. From this idea, deviance is used for GLM framework as the generalization of residual sum of squares because it generalizes the analysis of variance for comparing the sequences of nested GLMs. The count data for rainfall occurrences y_i are assumed independent and $y_{i} \sim {\text{Poisson}} \left( {\lambda_{i} } \right)$; then the log-likelihood function of GLMs for Poisson distribution is defined as

$$ l\left( {\beta ;y,x} \right) = \mathop \sum \limits_{i = 1}^{n} \left[ {y_{i} x_{ij}^{^{\prime}} \beta - \exp \left( {x_{ij}^{^{\prime}} \beta } \right) - \log \left( {y_{i} !} \right)} \right], $$

which is stated as

$$ l\left( {\lambda_{i} ;y} \right) = \mathop \sum \limits_{i = 1}^{n} y_{i} \log \lambda_{i} - \mathop \sum \limits_{i = 1}^{n} \lambda_{i} - \mathop \sum \limits_{i = 1}^{n} \log y_{i} ! $$

(16)

For saturated model, the estimates of $\hat{\lambda }_{i}$ are replaced by corresponding exact value of y_i; then, the value of log-likelihood function (16) is expressed as

$$ l\left( {\lambda_{i(\max )} ;y} \right) = \mathop \sum \limits_{i = 1}^{n} y_{i} \log y_{i} - \mathop \sum \limits_{i = 1}^{n} y_{i} - \mathop \sum \limits_{i = 1}^{n} \log y_{i} ! $$

(17)

Further, for maximum likelihood estimate $\hat{\lambda }_{i}$ of the model, the value of log-likelihood function (16) is computed as

$$ l\left( {\hat{\lambda }_{i} ;y} \right) = \mathop \sum \limits_{i = 1}^{n} y_{i} \log \hat{\lambda }_{i} - \mathop \sum \limits_{i = 1}^{n} \hat{\lambda }_{i} - \mathop \sum \limits_{i = 1}^{n} \log y_{i} ! $$

(18)

Therefore, the deviance is stated as [30]

$$ \begin{aligned} D = &\, 2\left[ {l\left( {\lambda_{i(\max )} ;y} \right) - l\left( {\hat{\lambda }_{i} ;y} \right)} \right] \\ = &\, 2\left[ {\mathop \sum \limits_{i = 1}^{n} y_{i} \log \left( {\frac{{y_{i} }}{{\hat{\lambda }_{i} }}} \right) - \mathop \sum \limits_{i = 1}^{n} \left( {y_{i} - \hat{\lambda }_{i} } \right)} \right] \\ \end{aligned} $$

(19)

which is twice the difference between the maximum log-likelihood achievable and the log-likelihood of the fitted model.

2.2.5 Model selection criteria

To identify the most effective model for explaining the occurrences of daily rainfall, several authors including McCullagh and Nelder [31] and Agresti [1] suggested various model selection procedures and they also identified some limitations and drawbacks. These selection procedures sometimes provide almost equal emphasis for several possible models; often procedures do not provide the best model among the models sufficiently for a true alternative hypothesis. For overcoming these problems, Akaike’s information criterion (AIC) and Bayesian information criterion (BIC) procedures are employed in this study.

2.2.5.1 Akaike’s information criterion (AIC)

Akaike [5] developed AIC based on the extension of maximum likelihood principle. The likelihood approach developed by Bartlett [8] under the assumption that the frequency of counts is asymptotically normally distributed and also indicate that the frequency counts are occurred depending on the sequence of k explanatory variables along with considering MLE method for estimating the parameters of k explanatory variables. Further, to develop autoregressive model Akaike [6] employed final prediction error (FPE) considering the mean square prediction error of predictors. Considering the extension form of FPE procedure Akaike [3,4,5] formulated the following AIC form

$$ {\text{AIC}} = - 2\left( {{\text{maximum}}\;{\text{log}}\;{\text{likelihood}}} \right) + 2\left( {{\text{number}}\;{\text{of}}\;{\text{estimable}}\;{\text{parameters}}\;{\text{in}}\;{\text{the}}\;{\text{model}}} \right) $$

(20)

Following Kullback–Leibler information [28], this measure indicates the deviation of fitting model for the true structure. The minimum AIC estimate (MAICE) indicates the most effective model among the fitted models [35]. That is,

$$ {\text{AIC}}\left( i \right) < {\text{AIC}}\left( {i + 1} \right) < \cdots < {\text{AIC}}\left( {i + s} \right),\;\;i = 1,2, \ldots \propto \;\;{\text{and}}\;\;s = 0,1,2, \ldots \propto $$

(21)

where (i + s) indicate the number of models and AIC(i) provides the best model compared to others.

2.2.5.2 Bayesian information criterion (BIC)

Schwarz [36] observed that the AIC procedure is not optimal and consistent, and argued Bayesian procedure (i.e., Bayesian information criterion, BIC) provides consistent estimator for determining the best effective model in several models under the assumption that observations come from an exponential family distribution. He has also shown that BIC is not only consistent, but it is asymptotically optimal (i.e., minimizing the expected loss) under the specification form of prior distribution and the type of loss function. Further, Bayes estimator is asymptotically equivalent to MLE [27] considering large n. Therefore, Schwarz [36] defined following BIC form:

$$ {\text{BIC}} = - 2\left( {{\text{maximum}}\;{\text{log}}\;{\text{likelihood}}} \right) + 2\left( {{\text{number}}\;{\text{of}}\;{\text{estimable}}\;{\text{parameters}}\;{\text{in}}\;{\text{the}}\;{\text{model}}} \right) \times \log n $$

(22)

here n indicates sample size. For selecting best model, Sakamoto [35] suggested the minimum BIC estimate (MBICE) as the most effective model among the fitted models. That is,

$$ {\text{BIC}}\left( i \right) < {\text{BIC}}\left( {i + 1} \right) < \cdots < {\text{BIC}}\left( {i + s} \right),\;\;i = 1,2, \ldots \propto \;\;{\text{and}}\;\;s = 0,1,2, \ldots \propto $$

(23)

where (i + s) indicate the number of models and BIC(i) provides the best model compared to others.

3 Data

Bangladesh is situated in subtropical region, and it is characterized by wide seasonal variations due to heavy rainfall, high temperature and high humidity. Considering these variations of the country the Bangladesh Bureau of Statistics (BBS) subdividing the country into three seasons (from March to June for summer, from July to October for monsoon, and from November to February for winter). In this country, more than 90 percent annual rainfall are occurred in summer and monsoon seasons. Within these seasons, maximum rainfall is occurred in monsoon season [2, 29]. Bangladesh is agro-based country; the crop plantation, the crop growing and harvesting are basically depended on the behavior of country’s annual rainfall patterns. To analyze the rainfall occurrence patterns, the daily rainfall recording stations of the country have been subdivided into five regions such as southeast region (Feni station), northeast region (Sylhet station), southwest region (Khulna station), northwest region (Rajshahi station) and mid-region (Dhaka station). The daily rainfall occurrences basically depend on the behavior of climatic variables like temperature (maximum and minimum) and relative humidity [38]; these variables have been identified as covariates in the study. The study has been collected data for daily rainfall, temperature and relative humidity from January, 1975 to December, 2014 (40 years) from the Bangladesh Meteorological Department (BMD) for each rainfall region. The missing values of these data have been substituted by using the Single Best Estimator (SBE) method considering the reference stations (within 100 km of regional rainfall station) [24].

3.1 Count data of daily rainfall occurrences

To transfer daily rainfall data into count data, the study considers binary variable; when rainfall occurred within a day greater or equal to 1 mm, then the observation is considered as 1, otherwise 0. The summer and monsoon seasons, each of them, are formed by four months. Each month contains four weeks plus additional two or three days. That is, summer season of each year contains 17 weeks plus 3 additional days. These three additional days are excluded from summer season and included in monsoon season (for the continuation of time series data); then, the monsoon season of each year is formed by 18 weeks. Similarly, for these seasons each month contains two fortnights or two fortnights plus one additional day. So summer season contains 8 fortnights plus two additional days in each year. These two additional days are excluded from summer season and considered in monsoon season. Then the monsoon season is formed by 8 fortnights plus 5 additional days in each year. Therefore, to perform the study, the monsoon season is considered by 8 fortnights and the last 5 days of October month are excluded as the 5 additional days in each year. So for weekly count data of daily rainfall occurrences, this study (17 × 7 × 40) = 4760 days and (17 × 40) = 680 weeks for summer season and (18 × 7 × 40) = 5040 days and (18 × 40) = 720 weeks for monsoon season are considered. Similarly, for fortnightly count data, this study (8 × 15 × 40) = 4800 days and (8 × 40) = 320 fortnights are considered for each of summer season and monsoon season.

4 Results and discussions

To explain the influence of climatic variables on the patterns of rainfall occurrences for the summer and monsoon seasons, the GLMs for Poisson distribution have been employed considering weekly and fortnightly count data of rainfall occurrences. Further, deviance of the fitted models has been employed to identify the goodness of fit of the models, and AIC and BIC have been considered to select the most effective model among the models. Similarly, to identify the significant effect of climatic variables on the count data of daily rainfall occurrences, different test statistics are used. Therefore, to formulate GLMs for weekly and fortnightly count data of daily rainfall occurrences for the summer and monsoon seasons, the study considers maximum temperature, minimum temperature and relative humidity as the covariates. The observation of each covariate considers weekly or fortnightly mean for performing GLMs, i.e., for weekly data the ith observation of jth covariate indicates $x_{i} :\overline{x}_{i1} ,\overline{x}_{i2} , \ldots ,\overline{x}_{ik}$; here $i = 1,2, \ldots ,n$ and $j = 1,2, \ldots ,k$ [10]. To identify the patterns of weekly and fortnightly daily rainfall occurrences for the summer and monsoon seasons, the result of GLMs under the Poisson distribution is discussed below.

4.1 Analysis of weekly and fortnightly count data of daily rainfall occurrences for the summer and monsoon seasons

The frequency distribution of the summer and monsoon seasons for weekly and fortnightly count data of daily rainfall occurrences for five regions of Bangladesh is shown in Appendix 1, 2 and 3. The distribution of weekly rainfall occurrences for summer season indicates that within 26.2 percent, 9.6 percent, 27.9 percent, 28.2 percent, 21.6 percent weeks are not occurred any rainfall in Feni, Sylhet, Khulna, Rajshahi and Dhaka stations, respectively. Similarly, the distribution of weekly rainfall occurrences for monsoon season indicates that within 8.6 percent, 7.6 percent, 8.8 percent, 11.5 percent and 7.1 percent weeks are not occurred any rainfall in Feni, Sylhet, Khulna, Rajshahi and Dhaka stations, respectively. Further, the distribution of fortnightly rainfall occurrences for summer season explains that within 15.3 percent, 2.8 percent, 12.5 percent, 12.5 percent and 7.2 percent weeks are not occurred any rainfall in Feni, Sylhet, Khulna, Rajshahi and Dhaka stations, respectively. Similarly, the distribution of fortnightly occurrences of rainfall for monsoon season indicates that within 1.9 percent, 1.3 percent, 1.6 percent, 2.2 percent and 1.3 percent weeks are not occur any rainfall in Feni, Sylhet, Khulna, Rajshahi and Dhaka stations, respectively.

In summer season of Feni station within the highest week (18.1 percent), one-day rainfall is occurred, and in the highest fortnight (9.4 percent), six days of rainfall are occurred. Further, in this season, within the lowest week (3.5 percent) seven days of rainfall are occurred, and within the lowest fortnight (0.3 percent) fourteen days or fifteen days of rainfall are occurred. In monsoon season of this station within the highest week (18.1 percent), four days of rainfall are occurred, and within the highest fortnight (12.8 percent), ten days of rainfall are occurred. Again, in this season, within the lowest week (6.5 percent) and within the lowest fortnight (1.5 percent) one-day rainfall is occurred.

In summer season of Sylhet station, within highest week (16.3 percent), five days of rainfall are occurred, and in the highest fortnight (9.7 percent) eleven days of rainfall are occurred. In monsoon season of the station, within highest week (28.8 percent) seven days of rainfall are occurred, and in the highest fortnight (15.6 percent), twelve days of rainfall are occurred. Further, in this season, within the lowest week (5.1 percent) one-day rainfall is occurred, and in the lowest fortnight (0.6 percent) two-day rainfall is occurred.

In summer season of Khulna station within the highest week (20.6 percent) and in the highest fortnight (12.8 percent), one-day rainfall is occurred. In monsoon season of the station, within the highest week (19.6 percent) six days of rainfall are occurred, and in the highest fortnight (12.8 percent), twelve days of rainfall are occurred. Again, in this season within the lowest week (6.3 percent), and in the lowest fortnight (1.9 percent) one-day rainfall is occurred.

In summer season of Rajshahi station, within the highest week (18.5 percent) two days of rainfall are occurred, and in the highest fortnight (14.1 percent) one-day rainfall is occurred. In monsoon season of the station, within the highest week (18.1 percent) four days of rainfall are occurred, and in the highest fortnight (13.8 percent), eight days of rainfall are occurred.

In summer season of Dhaka station within the highest week (15.6 percent) two days of rainfall are occurred, and in the highest fortnight (11.3 percent), three days of rainfall are occurred. In monsoon season of the station within the highest week (20.8 percent), five days of rainfall are occurred, and in the highest fortnight (14.7 percent) eleven days of rainfall are occurred. Further, in this season within the lowest week (6.8 percent) one-day rainfall is occurred, and in the lowest fortnight (0.9 percent) fifteen-day rainfall is occurred.

4.2 GLMs for Poisson distribution for analyzing count data of rainfall occurrences

In order to identify the effect of climatic variables on weekly and fortnightly count data of rainfall occurrences for five regions (southeast (Feni station), northeast (Sylhet station), southwest (Khulna station), northwest (Rajshahi station) and mid-region (Dhaka station)) of Bangladesh, the study formulates generalized linear models (GLMs) for Poisson distribution considering maximum temperature (x_i1), minimum temperature (x_i2) and relative humidity (x_i3) as covariates. The link function (λ) of GLMs can be stated as $\lambda_{i} = \exp \left( {\beta_{1} x_{i1} + \beta_{2} x_{i2} + \beta_{3} x_{i3} } \right), i = 1, 2, \ldots ,n$. Here λ indicates the mean of the Poisson distribution and β indicates the parameter of the corresponding variable, i.e., this parameter implies the relative change or semielasticity of weekly/fortnightly rainfall occurrences. To analyze the effect of climatic variables on count data of rainfall occurrences, the study has formulated four models, within these models two for weekly data of summer and monsoon seasons, and two for fortnightly data of summer and monsoon seasons. In these models, y_i indicates the weekly or fortnightly count data of rainfall occurrences. For performing the analysis, the study has considered MLE method under Newton–Raphson iteration procedure to estimate the parameters of these models and the t_z statistic is employed to identify the significance of parameter. Using the parameter of respective climatic variable, the study has identified the average marginal effect for the occurrence of rainfall within weekly or fortnightly. For each region, the estimated parameters of the summer and monsoon seasons model for weekly and fortnightly count data of rainfall occurrences are stated in Table 1. The estimated average marginal effect of each climatic variable for these regions is also shown in this table.

Table 1 Estimated parameters of GLMs for summer and monsoon seasons for weekly and fortnightly count data of rainfall occurrences

Full size table

Table 1 indicates that the minimum temperature and relative humidity are explained positive effect on weekly or fortnightly daily rainfall occurrences in summer and monsoon seasons for all the regions. The negative influence is found for maximum temperature on the weekly or fortnightly occurrences of rainfall for summer and monsoon seasons. However, the influence of these climatic variables on the occurrences of daily rainfall within weekly or fortnightly is found significant in all the regions.

In southeast region (Feni station) for summer season, the influence of relative humidity is found highest among the covariates on the occurrence of daily rainfall in weekly and fortnightly. That is, within week or fortnight one-unit increase in relative humidity indicates the likelihood of the occurrence of rainfall will be increased by 12 percent, considering other variables influence as usual. For this season, one-unit increase in maximum temperature indicates the probability of rainy days will be decreased by 7 percent within week and 9 percent within fortnight. Further, average marginal effect of this period indicates that if one-unit increase in relative humidity, then the expected rainy days will increase 0.30 within week and 0.59 within fortnight. For this region in monsoon season, within week or fortnight one-unit increase in minimum temperature indicates that the probability of rainy days will be increased by 22 percent. Further, the average marginal effect indicates that if there is one-unit increase in minimum temperature, then the expected rainy days will be increased by 0.86 within week and 1.90 within fortnight. Besides, this effect indicates that if one unit maximum temperature getting increases, then the expected rainy days will be reduced by 0.27 within week and 0.88 within fortnight.

In summer season of northeast region (Sylhet station), if one unit of relative humidity increases, then the likelihood of the rainy days occurrence will be increased by 8 percent within week and 6 percent within fortnight, considering other covariates effect fixed. In monsoon season of this region, if one unit of minimum temperature increases, then the occurrences of daily rainfall will be increased by 18 percent within week and 19 percent within fortnight. Further, if one unit of maximum temperature increases, then the occurrence of daily rainfall will be reduced by 4 percent within week and 3 percent within fortnight.

In summer season of southwest region (Khulna station), the GLMs indicate that if one unit of relative humidity rises, then the occurrences of daily rainfall will be increased by 9 percent within week. Again, if one unit of minimum temperature increases, then the occurrences of daily rainfall will be increased by 10 percent within fortnight considering other climatic variables effect as usual. For monsoon season of this region, GLMs are indicate that if one unit of minimum temperature increases, then the occurrences of daily rainfall will be increased by 23 percent within week and 25 percent within fortnight. Further, if one unit of maximum temperature increases, then the occurrences of daily rainfall will be reduced by 8 percent within week and 10 percent within fortnight. In this season, the marginal effect indicates that if one unit of minimum temperature getting rises, then the expected rainy days will increase 0.43 within week and 2.23 within fortnight.

For summer season of northwest region (Rajshahi station), the GLMs indicate that if one unit of relative humidity increases, then the occurrences of daily rainfall will be increased by 8 percent within week and 6 percent within fortnight. For this region and season if one unit of maximum temperature increases, then the occurrences of daily rainfall will be reduced by 3 percent within week and 4 percent within fortnight. In monsoon season of the region, GLMs indicate that considering other climatic variables effect as usual, if one unit of minimum temperature increases, then the occurrences of daily rainfall will be increased by 17 percent within week and within fortnight. Further, for this region and season, if one unit of maximum temperature increases, then the occurrences of daily rainfall will be reduced by 5 percent within week and 6 percent within fortnight.

In summer season of mid-region (Dhaka station), the GLMs indicate that if one unit of relative humidity increases, then the occurrences of daily rainfall will be increased by 9 percent within week and 7 percent within fortnight. In monsoon season of this region, it indicates that if one unit of minimum temperature increases, then the occurrences of daily rainfall will be increased by 17 percent within week and 19 percent within fortnight. For this season, the marginal effect of minimum temperature indicates that if one unit of the variable increases, then the expected rainy days will increase 0.10 within week and 1.66 within fortnight. Similarly, the marginal effect of maximum temperature indicates that if one unit of the variable rises then the expected rainy days will reduce 0.04 within week and 0.68 within fortnight.

To test the jointly significance of all climatic variables influence on the occurrence of rainfall within weekly or fortnightly for the summer and monsoon seasons of five regions, the study has considered likelihood ratio test (LRT). Therefore, according to Eq. (15) the null hypothesis is defined as $H_{0} : \beta_{1} = \beta_{2} = \beta_{3} = 0$. Further, to test the goodness of fit of the models the study has considered the deviance of the models, which is discussed in Eq. (19). Besides, to select the appropriate model for explaining the weekly and fortnightly rainfall occurrences in the summer and monsoon seasons for five regions, the study has considered AIC and BIC procedure (Sect. 2.2.5). The results of LRT, deviance of the models, AIC and BIC are shown in Table 2.

Table 2 Value of Chi-square statistic for LRT, deviance of the models, AIC and BIC

Full size table

Form Table 2, the LRT indicates that for the occurrences of rainfall within weekly or fortnightly in summer and monsoon seasons, the influence of climatic variables (maximum and minimum temperature, and relative humidity) is found jointly significant for all the regions. The value of LRT is found lower for fortnightly data compared to weekly data for respective season and region. Further, in all the regions this table also indicates that the value of deviance of the models for the fortnightly data in summer and monsoon seasons provides very much low value than that of weekly data. For all the regions, the GLMs for the Poisson distribution provide good fit for the fortnightly count data than that of weekly count data of rainfall occurrences for the summer and monsoon seasons.

Further, for all the regions the value of AIC and BIC for the fortnightly data in the summer and monsoon seasons is found minimum than that of weekly data. Therefore, for climatic variables maximum and minimum temperatures and relative humidity, the GLMs for the Poisson distribution explain satisfactory information for the fortnightly rainfall occurrences in summer and monsoon seasons for all the regions of the country.

5 Conclusions

Bangladesh is an agro-based country. The country is always suffered by the devastating natural disaster, e.g., flash flood, drought, heavy rainfall, etc. For saving country's people from these sufferings, country’s economists and agriculturists are needed having appropriate knowledge about the patterns of daily rainfall occurrences. For this purpose, the study develops GLMs for Poisson distribution considering count data of rainfall occurrences. In these models the maximum and minimum temperatures and relative humidity as explanatory variables are considered. To perform the study, weekly and fortnightly count data of daily rainfall for five regions of the country of the summer and monsoon seasons have been considered as response variable. The GLMs indicate that the effect of explanatory variables on the count data of daily rainfall occurrences is significant. For both seasons, the influence of maximum temperature on the weekly and fortnightly occurrences of rainfall is found negative. Besides, for both seasons the influence of minimum temperature and relative humidity is found positive in weekly and fortnightly daily rainfall occurrences for all the regions. Using AIC and BIC, the GLMs for Poisson distribution are explained satisfactory results for fortnightly occurrences of daily rainfall for both seasons and all stations.

In the summer season, other climatic variables influence considering usual; within fortnight, one-unit increase in relative humidity indicates that the probability of rainy days will be increased by 12 percent in Feni station, 6 percent in Sylhet, Khulna and Rajshahi stations, and 7 percent in Dhaka station. Further, within fortnight in this season one unit increase in maximum temperature indicates that the probability of rainy days will be decreased by 9 percent, 3 percent, 14 percent, 4 percent and 0.7 percent in Feni, Sylhet, Khulna, Rajshahi and Dhaka stations, respectively. In the summer season within this period, average marginal effect indicates that if one unit increases of relative humidity, then the expected rainy days will be increased by 0.59 in Feni station. Similarly, if one unit increases of minimum temperature, then the expected rainy days will be increased by 0.86, 0.94, 0.57 and 0.69 in Sylhet, Khulna, Rajshahi and Dhaka stations, respectively.

In monsoon season within fortnight, one-unit increase in minimum temperature indicates the probability of rainy days will be increased by 22 percent, 19 percent, 25 percent, 17 percent and 19 percent in Feni, Sylhet, Khulna, Rajshahi and Dhaka stations, respectively. Further, in this season one-unit increase in maximum temperature indicates the probability of rainy days will be decreased by 10 percent, 2 percent, 10 percent, 6 percent, 8 percent in Feni, Sylhet, Khulna, Rajshahi and Dhaka stations, respectively. For this season and period, the average marginal effect indicates that if one unit minimum temperature increases then the expected rainy days will be increased by 1.90, 2.06, 2.23, 1.29 and 1.66 in Feni, Sylhet, Khulna, Rajshahi and Dhaka stations, respectively. Therefore, the relative humidity for summer season and the minimum temperature for monsoon season play key role for changing fortnightly rainy days in all the regions of the country. Therefore, for the development of rural economy these GLMs may helpful to formulate effective plan for cultivating agricultural crops, vegetable, etc., and to save fisheries, wildlife, human asset, etc.

References

Agresti A (1984) Analysis of ordinal categorical data. Wiley, New York
MATH Google Scholar
Ahasan MN, Chowdhary AM, Quadir DA (2010) Variability and trends of summer monsoon rainfall over Bangladesh. J Hydrol Meteorol 7(1):1–17. https://doi.org/10.3126/jhm.v7i1.5612
Article Google Scholar
Akaike H (1974) A new look at the statistical identification model. IEEE Trans Autom Control 19(6):716–723. https://doi.org/10.1109/TAC.1974.1100705
Article MATH Google Scholar
Akaike H (1972) Use of information theoretic quantify for statistical model identification. In: Proceedings of the 5th Hawaii international conference system sciences, Western Periodicals, pp 249–250.
Akaike H (1972) Information theory and an extension of the maximum likelihood principle. In: Petrov BN, Csaki F (eds) Second international symposium on information theory. Aka-demiai Kiado, Budapest, pp 267–281
Google Scholar
Akaike H (1970) Statistical predictor identification. Ann Inst Statist Math 22:203–217
Article MathSciNet Google Scholar
Andersen EB (1970) Asymptotic properties of conditional maximum likelihood estimators. J R Stat Soc Ser B 32:283–301
MathSciNet MATH Google Scholar
Bartlett MS (1951) The frequency goodness of fit test for probability chains. Proc Cambridge Philos Soc 47:86–95
Article MathSciNet Google Scholar
Blau DM, Robins PK (1991) Turnover in child care arrangements. Rev Econ Stat 73:152–157
Article Google Scholar
Buishanand TA, Shabalova MV, Brandsma T (2004) On the choice of the temporal aggregation level for statistical downscaling of precipitation. J Clim Am Meteorol Soc 17:1816–1827
Google Scholar
Cameron AC, Trivedi PK (1998) Regression analysis of count data. Cambridge University Press, Cambridge
Book Google Scholar
Cameron AC, Trivedi PK (1986) Econometric models based on count data: comparisons and applications of some estimators. J Appl Econom 1:29–53
Article Google Scholar
Caskey JE (1963) A Markov chain for the probability of precipitation occurrence in intervals of various lengths. Month Weath Rev 91:298–301
Article Google Scholar
Chandler RE, Wheater HS (2002) Analysis of rainfall variability using generalized linear models: a case study from the west of Ireland. Water Resour Res. https://doi.org/10.1029/2001WR000906
Article Google Scholar
Chappell WF, Kimenyi MS, Mayer WJ (1990) A Poisson probability model of entry and market structure with an application to US industries during 1972–77. South Econ J 56:918–927
Article Google Scholar
Coe R, Stern RD (1982) Fitting models to rainfall data. J Appl Meteorol Soc 21:1024–1031
Article Google Scholar
Cox DR, Isham V (1994) Stochastic models of precipitation. In: Barnett V, Turkman KF (eds) Statistics for the environment 2, water issues. Willy, New York, pp 3–18
Google Scholar
Dobson AJ (2002) An introduction to generalized linear models, 2nd edn. Chapman & Hall, New York
MATH Google Scholar
Feinstein JS (1989) The safety regulations of US nuclear power plants: violations, inspections, and abnormal occurrences. J Polit Econ 97:115–154
Article Google Scholar
Hausman J, Hall BH, Griliches Z (1984) Econometric models for count data with an application to the patents-R&D relationship. Econometrica 52:909–938
Article Google Scholar
Hilbe JM (2011) Negative binomial regression, 2nd edn. Cambridge University Press, Cambridge
Book Google Scholar
Islam MA, Chowdhury RI (2006) A higher order Markov model of analyzing covariate dependence. Appl Math Model 30:477–488
Article Google Scholar
Islam MA, Chowdhury RI (2017) Analysis of repeated measures data. Springer Nature Singapore sPte Ltd, 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore. https://doi.org/10.1007/978-981-10-3794-8
Jahan F, Sinha NC, Rahman MM, Rahman MM, Mondal SHM, Islam MA (2019) Comparison of missing value estimation techniques in rainfall data of Bangladesh. Theor Appl Climatol 136:1115–1131. https://doi.org/10.1007/s00704-018-2537-y
Article Google Scholar
Johnson NL, Kotz S, Kemp AW (1992) Univariate discrete distributions, 2nd edn. Wiley, New York
MATH Google Scholar
Johnson NL, Kotz S (1969) Distributions in statistics: discrete distributions. Wiley, New York
MATH Google Scholar
Kendall MG, Stuart A (1973) The advance theory of statistics, vol II. Griffin, London
Google Scholar
Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22:79–86
Article MathSciNet Google Scholar
Mannan MA, Chowdhury MAM, Karmakar S (2016) Prediction of rainfall over southeastern part of Bangladesh during monsoon season. Int J Integ Sci Technol 2:73–82
Google Scholar
McCullagh P (1986) The conditional distribution of goodness-of-fit statistics for discrete data. J Am Stat Assoc 81:104–107
Article Google Scholar
McCullagh P, Nelder JA (1983) Generalized linear models. Chapman and Hall, London
Book Google Scholar
McCullagh P, Nelder JA (1989) Generalized linear models, 2nd edn. Chapman and Hall, New York
Book Google Scholar
Nelder JA, Wedderburn RWM (1972) Generalized linear models. J R Stat Soc Ser A 135:370–384
Article Google Scholar
Saffari SE, Adnan R (2010) Zero-inflated Poisson regression models with right censored count data. MATEMATIKA 27(1):21–29
MathSciNet Google Scholar
Sakamoto Y (1991) Categorical data analysis by AIC. Kluwer Academic Publishers, New York
MATH Google Scholar
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464
Article MathSciNet Google Scholar
Segond ML, Onof C, Wheater HS (2006) Spatial-temporal disaggregation of daily rainfall from a generalized linear model. J Hydrol 331:674–689
Article Google Scholar
Sinha NC, Islam MA, Ahmed KS (2011) Logistic regression models for higher order transition probabilities of Markov chain for analyzing the occurrences of daily rainfall data. J Mod Appl Stat Methods 10(1):337–348
Article Google Scholar
Sinha NC, Islam MA, Ahmed KS (2010) Order determination for the transition probabilities Markov chain based on logistic regression model. J Appl Stat Sci 17(4):519–540
MathSciNet Google Scholar
Sinha NC, Islam MA, Ahmed KS (2006) Chain dependence and stationarity test for transition probabilities of Markov chain under logistic regression model. J Korean Stat Soc 35(4):355–376
MATH Google Scholar
Stern RD, Coe R (1984) A model fitting analysis of daily rainfall data. J R Stat Soc Ser A 147:1–34
Article Google Scholar
Srikanthan R, McMahon TA (2001) Stochastic generation of annual, monthly and daily climate data: a review. Hydrol Earth Syst Sci 5(4):653–670
Article Google Scholar
Thyer MA, Kuczera G (2000) Modelling long-term persistence in hydro-climatic time series using a hidden state Markov model. Water Resour Res 36:3301–3310
Article Google Scholar
Thyer MA, Kuczera G (1999) Modelling long-term persistence in rainfall time series, Sydney rainfall case study. Hydrol Water Resour Symp Instit Eng Aust. pp 550–555.
Vivekanandan N (2016) Intercomparison of probability distributions for extreme value analysis of rainfall under missing data scenario. Develop Earth Sci 4:1–7. https://doi.org/10.14355/des.2016.04.001
Article Google Scholar
Wedel M, Böckenholt U, Kamakura WA (2003) Factor models for multivariate count data. J Multivar Anal 87:356–369
Article MathSciNet Google Scholar
Winkelmann R (2008) Econometric analysis of count data, 5th edn. Springer, New York
MATH Google Scholar
Winkelmann R, Zimmermann KF (1995) Recent developments in count data modeling: theory and application. J Econ Surv 9(1):1–24
Article Google Scholar
Winkelmann R (1994) Count data models-econometric theory and an application to labour mobility. Lecture Notes on Econmics Mathematics System vol. 410, Heidelberg, New York: Springer.

Download references

Author information

Authors and Affiliations

Department of Statistics, East West University, Dhaka, Bangladesh
Sharmin Nahar Sumi
Dhaka School of Economics, Dhaka, Bangladesh
Narayan Chandra Sinha
Institute of Statistical Research and Training, University of Dhaka, Dhaka, Bangladesh
M. Ataharul Islam

Authors

Sharmin Nahar Sumi
View author publications
You can also search for this author in PubMed Google Scholar
Narayan Chandra Sinha
View author publications
You can also search for this author in PubMed Google Scholar
M. Ataharul Islam
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Narayan Chandra Sinha.

Ethics declarations

Conflict of interest

There is no any type of conflict of interest among the authors (financial, personal, etc.) to prepare the article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Professor M. Ataharul Islam died in December 2020.

Appendices

Appendix 1: Frequency distribution of weekly and fortnightly count data of rainfall occurrences for summer and monsoon seasons of Feni and Sylhet stations

Number of day	Summer season								Monsoon season
	Weekly frequency		Percent of total frequency		Fortnightly frequency		Percent of total frequency		Weekly frequency		Percent of total frequency		Fortnightly frequency		Percent of total frequency
	Feni	Sylhet	Feni	Sylhet	Feni	Sylhet	Feni	Sylhet	Feni	Sylhet	Feni	Sylhet	Feni	Sylhet	Feni	Sylhet
0	178	65	26.2	9.6	49	9	15.3	2.8	62	55	8.6	7.6	6	4	1.9	1.3
1	123	61	18.1	9.0	24	12	7.5	3.8	47	37	6.5	5.1	5	7	1.5	2.2
2	85	73	12.5	10.7	29	11	9.0	3.4	73	40	10.1	5.6	11	2	3.4	0.6
3	81	88	11.9	12.9	24	17	7.5	5.3	107	51	14.9	7.1	13	9	4.1	2.8
4	79	85	11.6	12.5	28	16	8.7	5.0	130	65	18.1	9.0	17	11	5.3	3.4
5	70	111	10.3	16.3	22	15	6.9	4.7	108	118	15.0	16.4	13	11	4.1	3.4
6	40	106	5.9	15.6	30	29	9.4	9.1	110	147	15.3	20.4	19	14	5.9	4.4
7	24	91	3.5	13.4	22	16	6.9	5.0	83	207	11.5	28.8	28	13	8.7	4.1
8	–	–	–	–	20	26	6.2	8.1	–	–	–	–	26	15	8.1	4.7
9	–	–	–	–	21	29	6.6	9.1	–	–	–	–	29	19	9.1	5.9
10	–	–	–	–	14	29	4.4	9.1	–	–	–	–	41	17	12.8	5.3
11	–	–	–	–	20	31	6.3	9.7	–	–	–	–	37	18	11.5	5.6
12	–	–	–	–	7	20	2.2	6.3	–	–	–	–	29	50	9.1	15.6
13	–	–	–	–	8	26	2.5	8.1	–	–	–	–	28	47	8.8	14.7
14	–	–	–	–	1	22	0.3	6.9	–	–	–	–	13	41	4.1	12.8
15	–	–	–	–	1	12	0.3	3.8	–	–	–	–	5	42	1.6	13.1
Total	680	680	100	100	320	320	100	100	720	720	100	100	320	320	100	100.0

Appendix 2: Frequency distribution of weekly and fortnightly count data of rainfall occurrences for summer and monsoon seasons of Khulna and Rajshahi stations

Number of day	Summer season								Monsoon season
	Weekly frequency		Percent of total frequency		Fortnightly frequency		Percent of total frequency		Weekly frequency		Percent of total frequency		Fortnightly frequency		Percent of total frequency
	Khulna	Rajshahi	Khulna	Rajshahi	Khulna	Rajshahi	Khulna	Rajshahi	Khulna	Rajshahi	Khulna	Rajshahi	Khulna	Rajshahi	Khulna	Rajshahi
0	190	192	27.9	28.2	40	40	12.5	12.5	63	83	8.8	11.5	5	7	1.6	2.2
1	140	154	20.6	22.6	41	45	12.8	14.1	45	57	6.3	7.9	6	4	1.9	1.3
2	96	126	14.1	18.5	36	44	11.3	13.8	64	98	8.9	13.6	7	16	2.2	5.0
3	96	107	14.1	15.7	31	38	9.7	11.9	92	116	12.8	16.1	13	15	4.1	4.7
4	62	47	9.1	6.9	31	34	9.7	10.6	108	130	15.0	18.1	14	15	4.4	4.7
5	60	37	8.8	5.4	32	31	10.0	9.7	129	108	17.9	15.0	16	26	5.0	8.1
6	24	11	3.5	1.6	26	24	8.1	7.5	141	97	19.6	13.5	16	22	5.0	6.9
7	12	6	1.8	0.9	19	20	5.9	6.3	78	31	10.8	4.3	21	34	6.6	10.6
8	–	–	–	–	15	18	4.7	5.6	–	–	–	–	27	44	8.4	13.8
9	–	–	–	–	12	10	3.8	3.1	–	–	–	–	31	39	9.7	12.2
10	–	–	–	–	14	8	4.4	2.5	–	–	–	–	37	37	11.6	11.6
11	–	–	–	–	10	5	3.1	1.6	–	–	–	–	37	33	11.6	10.3
12	–	–	–	–	8	3	2.5	0.9	–	–	–	–	41	18	12.8	5.6
13	–	–	–	–	4	0	1.3	0	–	–	–	–	28	6	8.8	1.9
14	–	–	–	–	0	0	0.0	0	–	–	–	–	14	4	4.4	1.3
15	–	–	–	–	1	0	0.3	0	–	–	–	–	7	0	2.2	0
Total	680	680	100	100	320	320	100	100	720	720	100	100	320	320	100	100.0

Appendix 3: Frequency distribution of weekly and fortnightly count data of rainfall occurrences for summer and monsoon seasons of Dhaka station

Number of day	Summer Season				Monsoon Season
Number of day	Weekly frequency	Percent of total frequency	Fortnightly frequency	Percent of total frequency	Weekly frequency	Percent of total frequency	Fortnightly frequency	Percent of total frequency
0	147	21.6	23	7.2	51	7.1	4	1.3
1	103	15.1	28	8.8	49	6.8	4	1.3
2	106	15.6	24	7.5	63	8.8	4	1.3
3	96	14.1	36	11.3	98	13.6	12	3.8
4	87	12.8	26	8.1	140	19.4	11	3.4
5	90	13.2	29	9.1	150	20.8	13	4.1
6	35	5.1	30	9.4	112	15.6	22	6.9
7	16	2.4	28	8.8	57	7.9	23	7.2
8	–	–	26	8.1	–	–	42	13.1
9	–	–	26	8.1	–	–	41	12.8
10	–	–	14	4.4	–	–	40	12.5
11	–	–	12	3.8	–	–	47	14.7
12	–	–	10	3.1	–	–	28	8.8
13	–	–	6	1.9	–	–	17	5.3
14	–	–	1	0.3	–	–	9	2.8
15	–	–	1	0.3	–	–	3	0.9
Total	680	100	320	100	720	100	320	100

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Sumi, S.N., Sinha, N.C. & Islam, M.A. Generalized linear models for analyzing count data of rainfall occurrences. SN Appl. Sci. 3, 481 (2021). https://doi.org/10.1007/s42452-021-04467-x

Download citation

Received: 03 June 2020
Accepted: 04 March 2021
Published: 20 March 2021
DOI: https://doi.org/10.1007/s42452-021-04467-x

Generalized linear models for analyzing count data of rainfall occurrences

Abstract

Similar content being viewed by others

Determining the best fitting distribution of annual precipitation data in Serbia using L-moments method

Appropriate statistical rainfall distribution models for the computation of standardized precipitation index (SPI) in Cameroon

Statistical modeling of annual highest monthly rainfall in Zimbabwe

1 Introduction

2 Methodology

2.1 Poisson distribution

2.2 GLM for Poisson distribution

2.2.1 Interpretation of parameters for GLM under Poisson distribution

2.2.2 Estimation of parameters for GLM

2.2.2.1 Maximum likelihood estimation for GLM

2.2.3 Test of hypotheses

2.2.4 Deviance for the models

2.2.5 Model selection criteria

2.2.5.1 Akaike’s information criterion (AIC)

2.2.5.2 Bayesian information criterion (BIC)

3 Data

3.1 Count data of daily rainfall occurrences

4 Results and discussions

4.1 Analysis of weekly and fortnightly count data of daily rainfall occurrences for the summer and monsoon seasons

4.2 GLMs for Poisson distribution for analyzing count data of rainfall occurrences

5 Conclusions

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Appendix 1: Frequency distribution of weekly and fortnightly count data of rainfall occurrences for summer and monsoon seasons of Feni and Sylhet stations

Appendix 2: Frequency distribution of weekly and fortnightly count data of rainfall occurrences for summer and monsoon seasons of Khulna and Rajshahi stations

Appendix 3: Frequency distribution of weekly and fortnightly count data of rainfall occurrences for summer and monsoon seasons of Dhaka station

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation