Skip to main content
Log in

A regression model for overdispersed data without too many zeros

  • Regular Article
  • Published:
Statistical Papers Aims and scope Submit manuscript

Abstract

A regression model for overdispersed count data based on the complex biparametric Pearson (CBP) distribution is developed. It is compared with the generalized Poisson regression model, the negative binomial regression model and the zero inflated Poisson regression model, which are based on the generalized Poisson (CBP), negative binomial (NB) and zero inflated Poisson (ZIP) distributions, respectively. It is shown that the CBP distribution is more adequate than the GP, NB and ZIP distributions when the overdispersion is not related to a higher frequency of 0, but to other low values greater than 0, so it may be appropriate for overdispersed cases in which there are external reasons that raise the number of low values different from 0. Firstly, we study the shape and the parameters of the CBP distribution and we compare it with the Poisson, GP, NB and ZIP distributions by means of the probability of 0, the skewness and curtosis coefficients and the Kullback–Leibler divergence. Furthermore, we present an application example where the aforementioned performance is shown by the number of public educational facilities by municipality in Andalusia (Spain). Secondly, we describe two regression models based on the CBP distribution and the estimation method for their parameters. Thirdly, we carry out a simulation study that reveals the performance of the regression models proposed. Finally, one application in the field of sport illustrates that these models can provide more accurate fits than those provided by other usual regression models for count data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. \(g_1^{NB}/g_1^P=\mu /(2\sigma ^2-\mu )>1.\)

  2. http://www.juntadeandalucia.es/institutodeestadisticaycartografia/.

References

  • Ajiferuke I, Famoye F (2015) Modelling count response variables in informetric studies: comparison among count, linear, and lognormal regression models. J Inf 9(3):499–513

    Article  Google Scholar 

  • Astuti ET, Budiantara IN, Sunaryo S, Dokhi M (2013) Statistical modeling for mortality data using local generalized poisson regression model. Int J Appl Math Stat 33(3):92–101

    MathSciNet  Google Scholar 

  • Burnham KP, Anderson DR (2002) Model selection and multi-model inference: a practical information-theoretic approach, 2nd edn. Springer, New York, p 51

    MATH  Google Scholar 

  • Cameron AC, Trivedi PK (2013) Regression analysis of count data, 2nd edn. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  • Consul PC (1989) Generalized Poisson distributions: properties and applications. Marcel Dekker Inc, New York

    MATH  Google Scholar 

  • Consul PC, Famoye F (1992) Generalized Poisson regression model. Commun Stat 21(1):89–109

    Article  MATH  Google Scholar 

  • Cordeiro GM, Andrade MG, de Castro M (2009) Power series generalized nonlinear models. Comput Stat Data Anal 53:1155–1166

    Article  MathSciNet  MATH  Google Scholar 

  • Czado C, Erhardt V, Min A, Wagner S (2007) Zero-inflated generalized Poisson models with regression effects on the mean, dispersion and zero-inflation level applied to patent outsourcing rates. Stat Model 7(2):125–153

    Article  MathSciNet  Google Scholar 

  • Famoye F, Wulu JT, Singh KP (2004) On the generalized Poisson regression model with an application to accident data. J Sci 2:287–295

    Google Scholar 

  • Hilbe JM (2011) Negative binomial regression, 2nd edn. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  • Hinde J, Demétrio CGB (1998) Overdispersion: models and estimation. Comput Stat Data Anal 27:151–170

    Article  MATH  Google Scholar 

  • Joe H, Zhu R (2005) Generalized Poisson distribution: the property of mixture of Poisson and comparison with negative binomial distribution. Biom J 47(2):219–229

    Article  MathSciNet  Google Scholar 

  • Lambert D (1992) Zero-Inflated Poisson regression with an application to defects in manufacturing. Technometrics 34:1–14

    Article  MATH  Google Scholar 

  • Lu HX, Wong MCM, Lo ECM, McGrath C (2013) Risk indicators of oral health status among young adults aged 18 years analyzed by negative binomial regression. BMC Oral Health 13:40

    Article  Google Scholar 

  • Mullahy J (1986) Specification and testing of some modified count data models. J Econ 33:341–365

    Article  MathSciNet  Google Scholar 

  • Poortema K (1999) On modelling overdispersion of counts. Stat Neerl 53(1):5–20

    Article  MathSciNet  MATH  Google Scholar 

  • R Core Team (2014) R: a language and environment for statistical computing., R Foundation for Statistical Computing, Vienna, Austria

  • Rigby R, Stasinopoulos D, Akantziliotou C (2008) A framework for modelling overdispersed count data, including the Poisson-shifted generalized inverse gaussian distribution. Comput Stat Data Anal 53:381–393

    Article  MathSciNet  MATH  Google Scholar 

  • Rodríguez-Avi J, Conde-Sánchez A, Sáez-Castillo AJ (2003) A new class of discrete distributions with complex parameters. Stat Pap 44:67–88

    Article  MathSciNet  MATH  Google Scholar 

  • Rodríguez-Avi J, Conde-Sánchez A, Sáez-Castillo AJ, Olmo-Jiménez MJ (2004) A triparametric discrete distribution with complex parameters. Stat Pap 45(1):81–95

    Article  MathSciNet  MATH  Google Scholar 

  • Rodríguez-Avi J, Conde-Sánchez A, Sáez-Castillo AJ, Olmo-Jiménez MJ, Martínez-Rodríguez AM (2009) A generalized waring regression model for count data. Comput Stat Data Anal 53:3717–3725

    Article  MathSciNet  MATH  Google Scholar 

  • Sáez-Castillo AJ, Conde-Sánchez A (2013) A hyper-Poisson regression model for overdispersed and underdispersed count data. Comput Stat Data Anal 61(C):148–157

  • Van den Broek J (1995) A score test for zero inflation in a Poisson distribution. Biometrics 54:738–743

    Article  MathSciNet  MATH  Google Scholar 

  • Venables WN, Ripley BD (2002) Modern applied statistics with S, 4th edn. Springer, Springer

    Book  MATH  Google Scholar 

  • Wei F, Lovegrove G (2013) An empirical tool to evaluate the safety of cyclists: Community based, macro-level collision prediction models using negative binomial regression. Accid Anal Prev 61:129–137

    Article  Google Scholar 

  • Winkelmann R (2008) Econometric Analysis of Count Data. Springer, Berlin

    MATH  Google Scholar 

  • Wong KY, Lam KF (2013) Modeling zero-inflated count data using a covariate-dependent random effect model. Stat Med 32(8):1283–1293

    Article  MathSciNet  Google Scholar 

  • Xie FC, Lin JG, Wei BC (2014) Bayesian zero-inflated generalized Poisson regression model: estimation and case influence diagnostics. J Appl Stat 41(6):1383–1392

    Article  MathSciNet  MATH  Google Scholar 

  • Zamani H, Ismail N (2013) Score test for testing zero-inflated Poisson regression against zero-inflated generalized Poisson alternatives. J Appl Stat 40(9) Published online: 03 Jun 2013

  • Zou Y, Zhang Y, Lord D (2013) Application of finite mixture of negative binomial regression models with varying weight parameters for vehicle crash data analysis. Accid Anal Prev 50:1042–1051

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to José Rodríguez-Avi.

Appendix

Appendix

It is easy to prove that in the limit case (\(\mu =\sigma ^2\)) the quotient given in (10) is equal to 1. In general, if we solve the equation \(Q_1=1\), we have

$$\begin{aligned} 2\sigma ^4+2\mu (\mu -2)\sigma ^2-2\mu ^2(\mu -1)=0 \end{aligned}$$
(15)

whose solutions are \(\sigma ^2=-\mu ^2+\mu \) (which is imposible since \(\sigma ^2>\mu \)) or \(\sigma ^2=\mu \) (as we already knew). Given that the expression (15) is a parabola with positive coefficient of the greatest order, then \(Q_1>1\) when \(\sigma ^2>\mu \).

In relation to (11), the quotient

$$\begin{aligned} \frac{g_1^{CBP}}{g_1^{GP}}=\frac{(4\mu +1)AI-3\mu }{(\mu +2-AI)(3AI-2\sqrt{AI})} \end{aligned}$$

is greater than 1 if and only if

$$\begin{aligned}&(4\mu +1)AI-3\mu >(\mu +2-AI)(3AI-2\sqrt{AI})\nonumber \\&\quad \Leftrightarrow \mu AI-5AI-3\mu +3AI^2+2\mu \sqrt{AI}+4\sqrt{AI}-2AI\sqrt{AI}>0. \end{aligned}$$
(16)

Applying that \(AI>1\), (16) is greater than \(3AI^2-2AI\sqrt{AI}-5AI+4\sqrt{AI}\), which is a polinomial of degree 4 in \(\sqrt{AI}\). This polynomial is always greater than 0 when \(\sqrt{AI}>1\) since the polynomial \(3x^3-2x^2-5x+4\) has the form of Fig. 6.

Fig. 6
figure 6

Graph of the polynomial \(3x^3-2x^2-5x+4\)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rodríguez-Avi, J., Olmo-Jiménez, M.J. A regression model for overdispersed data without too many zeros. Stat Papers 58, 749–773 (2017). https://doi.org/10.1007/s00362-015-0724-9

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00362-015-0724-9

Keywords

Navigation