# A new model for over-dispersed count data: Poisson quasi-Lindley regression model

- 56 Downloads

## Abstract

In this paper, a new regression model for count response variable is proposed via re-parametrization of Poisson quasi-Lindley distribution. The maximum likelihood and method of moment estimations are considered to estimate the unknown parameters of re-parametrized Poisson quasi-Lindley distribution. The simulation study is conducted to evaluate the efficiency of estimation methods. The real data set is analyzed to demonstrate the usefulness of proposed model against the well-known regression models for count data modeling such as Poisson and negative-binomial regression models. Empirical results show that when the response variable is over-dispersed, the proposed model provides better results than other competitive models.

## Keywords

Count data Poisson regression Negative-binomial regression Maximum Likelihood Method of moments Over-dispersion## Mathematics Subject Classification

62E15 62J05## Introduction

The interest on count data modeling has been greatly increased in the last decade. The widely used distribution for modeling the count data sets is Poisson distribution. The well-known property of Poisson distribution is that its mean and variance are equal. Therefore, Poisson distribution does not work in the case of over-dispersion or under-dispersion. Poisson distribution is widely used in many research fields such as actuarial, environmental, actuarial and economics sciences in spite of its weakness. The reason for that comes from its simple form and easy implementation and software support. To remove the drawback of Poisson distribution, researchers have shown great interest to introduce mixed-Poisson distributions for modeling the over-dispersed or under-dispersed count data sets such as Bhati et al. [1], Imoto et al. [7], Mahmoudi and Zakerzadeh [9], Gencturk and Yigiter [5], Wongrin and Bodhisuwan [15], Déniz [3], Cheng et al. [2], Lord and Geedipally [8], Zamani et al. [16], Sáez-Castillo and Conde-Sánchez [12], Rodríguez-Avi et al. [10], Shmueli et al. [11], Shoukri et al. [13].

As mentioned above, Poisson distribution is insufficient to model the over-dispersed count data sets. The main motivation of this study is to introduce an alternative regression model for modeling the over-dispersed count data sets. Therefore, a re-parametrization of Poisson quasi-Lindley distribution, proposed by Grine and Zeghdoudi [6], is introduced and its statistical properties are studied comprehensively such as mean, variance and estimation problem of the model parameters. The maximum likelihood (ML) and method of moments (MM) estimation methods are considered to estimate the unknown parameters of the re-parametrized PQL distribution. The efficiencies of the estimation methods are compared with extensive simulation study. Using the re-parametrized Poisson quasi-Lindley distribution, a new regression model for over-dispersed count data sets is introduced. To demonstrate the effectiveness of proposed regression model, a real data set on days of absence of the high school students are analyzed with Poisson, negative-binomial and PQL regression models.

The rest of the paper is organized as follows: In “Re-parametrization of Poisson quasi-Lindley distribution” section, the statistical properties of the re-parametrized Poisson quasi-Lindley distribution are obtained. In “Estimation” section, ML and (MM) estimation methods are considered to estimate the unknown model parameters. In “Simulation” section, finite sample performance of estimation methods is compared via a Monte Carlo simulation study. In “Poisson quasi-Lindley regression model” section, a new regression model is introduced. In “Empirical study” section, a real data set is analyzed to demonstrate the usefulness of proposed model against the Poisson and negative-binomial regression models. “Conclusion” section contains the concluding remarks.

## Re-parametrization of Poisson quasi-Lindley distribution

*X*follows a Poisson distribution. The probability mass function (pmf) is

*DI*, for Poisson distribution is \(DI = {{\mathrm{Var}\left( X \right) } / {E\left( X \right) = {\lambda / \lambda }}} = 1\). As seen from the dispersion index of Poisson distribution, the over-dispersed or under-dispersed data sets cannot be modeled by Poisson distribution. Note that when the variance is greater than mean, the over-dispersion occurs; otherwise, it is called as under-dispersion. Grine and Zeghdoudi [6] introduced a new mixed-Poisson distribution, called Poisson quasi-Lindley (PQL), by compounding Poisson distribution with quasi-Lindley distribution, introduced by Shanker and Mishra [14]. The pmf of PQL distribution is given by

*Y*will be denoted as \({\text {PQL}}\left( \theta ,\alpha \right)\). The corresponding cumulative distribution function (cdf) to 1 is

### Proposition 1

*Let*\(\theta = {{\left( {2 + \alpha } \right) }/} {\left[ {\left( {1 + \alpha } \right) \mu } \right] }\),

*then the pdf of PQL distribution is*

*where*\(\alpha >0\)

*and*\(\mu >0\).

*The mean and variance of*5 are given by, respectively,

Note that the parameter \(\alpha\) should be greater than zero to ensure the positive variance. The other statistical properties of PQL distribution, such as probability and moment generating functions, mode and its cdf, under the above re-parametrization can be obtained following the results in Grine and Zeghdoudi [6]. As seen from 6, since the second part of variance equation for PQL distribution is greater than zero for all values of the parameters \(\alpha\) and \(\mu\), the variance of PQL distribution is always greater than its mean. Therefore, PQL distribution can be a good choice for modeling the over-dispersed data sets.

### Generating random variables from Poisson-xgamma distribution

Here, a general algorithm and corresponding code written in R software are given to generate random variables from PQL distribution. The below code can be used for all discrete distributions such as Poisson, Poisson–Lindley, negative-binomial.

## Estimation

In this section, ML and MM estimation methods are considered to estimate the unknown parameters of PQL distribution.

### Maximum likelihood estimation

**nlm**) function of R software is used for this purpose. The corresponding interval estimations of the parameters are obtained by means of observed information matrix which is given by

*p*/ 2 quantile of the standard normal distribution.

### Method of moments

### Theorem 1

*For fixed values of*\(\mu\),

*MM estimator*\({{\hat{\alpha }} _{MM}}\)

*of*\(\alpha\)

*is consistent and asymptotically normal distributed:*

*where*

The detailed information about asymptotic properties of MM estimators can be found in Farbod and Arzideh [4].

## Simulation

- 1.
Set the sample size

*n*and the vector of parameters \(\varvec{\theta }=\left( \alpha ,\mu \right) ^T\); - 2.
Generate random observations from the \({\text {PQL}}\left( {\alpha ,\mu } \right)\) distribution, using the algorithm given in “Generating random variables from Poisson-xgamma distribution” section, with size

*n*; - 3.
Use the generated random observations in Step 2, and estimate \(\varvec{\theta }\) by means of ML and MM estimation methods;

- 4.
Repeat

*N*times the steps 2 and 3; - 5.Use \(\varvec{\hat{\theta }}\) and \(\varvec{\theta }\) and calculate the biases, mean relative estimates (MREs) and mean square errors (MSEs) from the following equations:$$\begin{aligned} \begin{array}{l} {\mathrm{Bias}} = \sum \limits _{j = 1}^N {\frac{{{{ { \varvec{{\hat{\theta }}} }_{i,j}}} - { \varvec{\theta }_i}}}{N}},\,\,\,\,{\mathrm{MRE}} = \sum \limits _{j = 1}^N {\frac{{{{{{{{\varvec{{\hat{\theta }}} }}}_{i,j}}} / {{{\varvec{\theta }_i }}}}}}{N}}, \\ {\mathrm{MSE}} = \sum \limits _{j = 1}^N {\frac{{{{\left( {{{{\varvec{\hat{\theta }} }}_{i,j}} - {{\varvec{\theta }_i }}} \right) }^2}}}{N}},\,\,\,\ i=1,2. \end{array} \end{aligned}$$

*n*is sufficiently large, MREs should be closer to one and MSEs and biases should be closer to zero. As seen from Fig. 2, when the sample size,

*n*, increases, the MSEs and biases are closer to zero and MREs approach to one for both estimation methods. The MM and ML estimation methods yield similar results for the parameter \(\mu\) in view of estimated MSE, bias and MRE. However, ML estimation method provides more satisfactory results for the parameter \(\alpha,\) especially for small sample sizes. Therefore, we suggest to use ML estimation method when the sample size is small.

## Poisson quasi-Lindley regression model

The Poisson and negative-binomial are the two commonly used regression models for count data modeling. When the response variable is not equi-dispersed, the negative-binomial regression model is preferable. Here, an alternative regression model is introduced for over-dispersed response variable.

*Y*follow a PQL distribution, given in (5). The mean of

*Y*is \(E\left( Y|\alpha ,\mu \right) =\mu\). Therefore, the covariates can be linked to the mean of response variable,

*y*, by means of the log-link function, given by

**the nlm**function of R software. Under standard regularity conditions, the asymptotic distribution of \((\widehat{\pmb {\tau }}-\pmb {\tau })\) is multivariate normal \(N_{k+2}(0,J(\pmb {\tau })^{-1})\), where \(J(\pmb {\tau })\) is the expected information matrix. The asymptotic covariance matrix \(J(\pmb {\tau })^{-1}\) of \(\widehat{\pmb {\tau }}\) can be approximated by the inverse of the \((k+2)\times (k+2)\) observed information matrix \({I}(\pmb {\tau })\), whose elements are evaluated numerically via most statistical packages. The approximate multivariate normal distribution \(N_{k+2}(0,{I }(\pmb {\tau })^{-1})\) for \(\widehat{\pmb {\tau }}\) can be used to construct asymptotic confidence intervals for the vector of parameters \(\pmb {\tau }\).

## Empirical study

The estimated parameters of models and goodness-of-fit statistics

Covariates | Poisson | NB | PQL | ||||||
---|---|---|---|---|---|---|---|---|---|

Estimate (SE) | SE |
| Estimate | SE |
| Estimate | SE |
| |

Intercept | 1.323 | 0.089 | \(<0.001\) | 1.271 | 0.214 | \(<0.001\) | 1.273 | 0.215 | \(<0.001\) |

Gender | \(-\, 0.234\) | 0.047 | \(<0.001\) | \(-\, 0.193\) | 0.123 | 0.118 | \(-\, 0.191\) | 0.122 | 0.119 |

General | 1.374 | 0.076 | \(<0.001\) | 1.362 | 0.199 | \(<0.001\) | 1.348 | 0.198 | \(<0.001\) |

Academic | 0.957 | 0.066 | \(<0.001\) | 0.949 | 0.140 | \(<0.001\) | 0.945 | 0.138 | \(<0.001\) |

Dispersion | – | – | – | 1.017 | 0.104 | – | 4.999 | 5.139 | – |

\(-\ell\) | 1343.250 | 869.423 | 867.436 | ||||||

AIC | 2694.500 | 1748.846 | 1744.872 | ||||||

BIC | 2709.498 | 1767.593 | 1763.619 |

## Conclusion

A re-parametrization of the Poisson quasi-Lindley distribution is introduced and studied comprehensively. The parameter estimation problem of the Poisson quasi-Lindley distribution is discussed via extensive simulation study. A new regression model for count data is proposed and compared with Poisson and negative-binomial regression models based on the real data set. We conclude that Poisson quasi-Lindley regression model exhibits better fitting performance than Poisson and negative-binomial regression models when the response variable is over-dispersed. We hope that the results given in this study will be very helpful for researchers studying in this field.

## Notes

## References

- 1.Bhati, D., Kumawat, P., Gómez-Déniz, E.: A new count model generated from mixed Poisson transmuted exponential family with an application to health care data. Commun. Stat. Theory Methods
**46**(22), 11060–11076 (2017)MathSciNetCrossRefGoogle Scholar - 2.Cheng, L., Geedipally, S.R., Lord, D.: The Poisson–Weibull generalized linear model for analyzing motor vehicle crash data. Saf. Sci.
**54**, 38–42 (2013)CrossRefGoogle Scholar - 3.Déniz, E.G.: A new discrete distribution: properties and applications in medical care. J. Appl. Stat.
**40**(12), 2760–2770 (2013)MathSciNetCrossRefGoogle Scholar - 4.Farbod, D., Arzideh, K.: Asymptotic properties of moment estimators for distributions generated by Levy’s law. Int. J. Appl. Math. Stat
**20**(11), 55–59 (2010)MathSciNetGoogle Scholar - 5.Gencturk, Y., Yigiter, A.: Modelling claim number using a new mixture model: negative binomial gamma distribution. J. Stat. Comput. Simul.
**86**(10), 1829–1839 (2016)MathSciNetCrossRefGoogle Scholar - 6.Grine, R., Zeghdoudi, H.: On Poisson quasi-Lindley distribution and its applications. J. Mod. Appl. Stat. Methods
**16**(2), 21 (2017)CrossRefGoogle Scholar - 7.Imoto, T., Ng, C.M., Ong, S.H., Chakraborty, S.: A modified Conway–Maxwell–Poisson type binomial distribution and its applications. Commun. Stat. Theory Methods
**46**(24), 12210–12225 (2017)MathSciNetCrossRefGoogle Scholar - 8.Lord, D., Geedipally, S.R.: The negative binomial-Lindley distribution as a tool for analyzing crash data characterized by a large amount of zeros. Accid. Anal. Prev.
**43**(5), 1738–1742 (2011)CrossRefGoogle Scholar - 9.Mahmoudi, E., Zakerzadeh, H.: Generalized Poisson–Lindley distribution. Commun. Stat. Theory Methods
**39**(10), 1785–1798 (2010)MathSciNetCrossRefGoogle Scholar - 10.Rodríguez-Avi, J., Conde-Sínchez, A., Sáez-Castillo, A.J., Olmo-Jiménez, M.J., Martínez-Rodríguez, A.M.: A generalized Waring regression model for count data. Comput. Stat. Data Anal
**53**(10), 3717–3725 (2009)MathSciNetCrossRefGoogle Scholar - 11.Shmueli, G., Minka, T.P., Kadane, J.B., Borle, S., Boatwright, P.: A useful distribution for fitting discrete data: revival of the Conway–Maxwell–Poisson distribution. J. R. Stat. Soc. Ser. C Appl. Stat.
**54**(1), 127–142 (2005)MathSciNetCrossRefGoogle Scholar - 12.Sáez-Castillo, A.J., Conde-Sánchez, A.: A hyper-Poisson regression model for overdispersed and underdispersed count data. Comput. Stat. Data Anal.
**61**, 148–157 (2013)MathSciNetCrossRefGoogle Scholar - 13.Shoukri, M.M., Asyali, M.H., VanDorp, R., Kelton, D.: The Poisson inverse Gaussian regression model in the analysis of clustered counts data. J. Data Sci.
**2**(1), 17–32 (2004)Google Scholar - 14.Shanker, R., Mishra, A.: A quasi Lindley distribution. Afr. J. Math. Comput. Sci. Res.
**6**(4), 64–71 (2013)Google Scholar - 15.Wongrin, W., Bodhisuwan, W.: Generalized Poisson–Lindley linear model for count data. J. Appl. Stat.
**44**(15), 2659–2671 (2017)MathSciNetCrossRefGoogle Scholar - 16.Zamani, H., Ismail, N., Faroughi, P.: Poisson-weighted exponential univariate version and regression model with applications. J. Math. Stat.
**10**(2), 148–154 (2014)CrossRefGoogle Scholar

## Copyright information

**Open Access**This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.