Abstract
A regression model for overdispersed count data based on the complex biparametric Pearson (CBP) distribution is developed. It is compared with the generalized Poisson regression model, the negative binomial regression model and the zero inflated Poisson regression model, which are based on the generalized Poisson (CBP), negative binomial (NB) and zero inflated Poisson (ZIP) distributions, respectively. It is shown that the CBP distribution is more adequate than the GP, NB and ZIP distributions when the overdispersion is not related to a higher frequency of 0, but to other low values greater than 0, so it may be appropriate for overdispersed cases in which there are external reasons that raise the number of low values different from 0. Firstly, we study the shape and the parameters of the CBP distribution and we compare it with the Poisson, GP, NB and ZIP distributions by means of the probability of 0, the skewness and curtosis coefficients and the Kullback–Leibler divergence. Furthermore, we present an application example where the aforementioned performance is shown by the number of public educational facilities by municipality in Andalusia (Spain). Secondly, we describe two regression models based on the CBP distribution and the estimation method for their parameters. Thirdly, we carry out a simulation study that reveals the performance of the regression models proposed. Finally, one application in the field of sport illustrates that these models can provide more accurate fits than those provided by other usual regression models for count data.
Similar content being viewed by others
Notes
\(g_1^{NB}/g_1^P=\mu /(2\sigma ^2-\mu )>1.\)
References
Ajiferuke I, Famoye F (2015) Modelling count response variables in informetric studies: comparison among count, linear, and lognormal regression models. J Inf 9(3):499–513
Astuti ET, Budiantara IN, Sunaryo S, Dokhi M (2013) Statistical modeling for mortality data using local generalized poisson regression model. Int J Appl Math Stat 33(3):92–101
Burnham KP, Anderson DR (2002) Model selection and multi-model inference: a practical information-theoretic approach, 2nd edn. Springer, New York, p 51
Cameron AC, Trivedi PK (2013) Regression analysis of count data, 2nd edn. Cambridge University Press, Cambridge
Consul PC (1989) Generalized Poisson distributions: properties and applications. Marcel Dekker Inc, New York
Consul PC, Famoye F (1992) Generalized Poisson regression model. Commun Stat 21(1):89–109
Cordeiro GM, Andrade MG, de Castro M (2009) Power series generalized nonlinear models. Comput Stat Data Anal 53:1155–1166
Czado C, Erhardt V, Min A, Wagner S (2007) Zero-inflated generalized Poisson models with regression effects on the mean, dispersion and zero-inflation level applied to patent outsourcing rates. Stat Model 7(2):125–153
Famoye F, Wulu JT, Singh KP (2004) On the generalized Poisson regression model with an application to accident data. J Sci 2:287–295
Hilbe JM (2011) Negative binomial regression, 2nd edn. Cambridge University Press, Cambridge
Hinde J, Demétrio CGB (1998) Overdispersion: models and estimation. Comput Stat Data Anal 27:151–170
Joe H, Zhu R (2005) Generalized Poisson distribution: the property of mixture of Poisson and comparison with negative binomial distribution. Biom J 47(2):219–229
Lambert D (1992) Zero-Inflated Poisson regression with an application to defects in manufacturing. Technometrics 34:1–14
Lu HX, Wong MCM, Lo ECM, McGrath C (2013) Risk indicators of oral health status among young adults aged 18 years analyzed by negative binomial regression. BMC Oral Health 13:40
Mullahy J (1986) Specification and testing of some modified count data models. J Econ 33:341–365
Poortema K (1999) On modelling overdispersion of counts. Stat Neerl 53(1):5–20
R Core Team (2014) R: a language and environment for statistical computing., R Foundation for Statistical Computing, Vienna, Austria
Rigby R, Stasinopoulos D, Akantziliotou C (2008) A framework for modelling overdispersed count data, including the Poisson-shifted generalized inverse gaussian distribution. Comput Stat Data Anal 53:381–393
Rodríguez-Avi J, Conde-Sánchez A, Sáez-Castillo AJ (2003) A new class of discrete distributions with complex parameters. Stat Pap 44:67–88
Rodríguez-Avi J, Conde-Sánchez A, Sáez-Castillo AJ, Olmo-Jiménez MJ (2004) A triparametric discrete distribution with complex parameters. Stat Pap 45(1):81–95
Rodríguez-Avi J, Conde-Sánchez A, Sáez-Castillo AJ, Olmo-Jiménez MJ, Martínez-Rodríguez AM (2009) A generalized waring regression model for count data. Comput Stat Data Anal 53:3717–3725
Sáez-Castillo AJ, Conde-Sánchez A (2013) A hyper-Poisson regression model for overdispersed and underdispersed count data. Comput Stat Data Anal 61(C):148–157
Van den Broek J (1995) A score test for zero inflation in a Poisson distribution. Biometrics 54:738–743
Venables WN, Ripley BD (2002) Modern applied statistics with S, 4th edn. Springer, Springer
Wei F, Lovegrove G (2013) An empirical tool to evaluate the safety of cyclists: Community based, macro-level collision prediction models using negative binomial regression. Accid Anal Prev 61:129–137
Winkelmann R (2008) Econometric Analysis of Count Data. Springer, Berlin
Wong KY, Lam KF (2013) Modeling zero-inflated count data using a covariate-dependent random effect model. Stat Med 32(8):1283–1293
Xie FC, Lin JG, Wei BC (2014) Bayesian zero-inflated generalized Poisson regression model: estimation and case influence diagnostics. J Appl Stat 41(6):1383–1392
Zamani H, Ismail N (2013) Score test for testing zero-inflated Poisson regression against zero-inflated generalized Poisson alternatives. J Appl Stat 40(9) Published online: 03 Jun 2013
Zou Y, Zhang Y, Lord D (2013) Application of finite mixture of negative binomial regression models with varying weight parameters for vehicle crash data analysis. Accid Anal Prev 50:1042–1051
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
It is easy to prove that in the limit case (\(\mu =\sigma ^2\)) the quotient given in (10) is equal to 1. In general, if we solve the equation \(Q_1=1\), we have
whose solutions are \(\sigma ^2=-\mu ^2+\mu \) (which is imposible since \(\sigma ^2>\mu \)) or \(\sigma ^2=\mu \) (as we already knew). Given that the expression (15) is a parabola with positive coefficient of the greatest order, then \(Q_1>1\) when \(\sigma ^2>\mu \).
In relation to (11), the quotient
is greater than 1 if and only if
Applying that \(AI>1\), (16) is greater than \(3AI^2-2AI\sqrt{AI}-5AI+4\sqrt{AI}\), which is a polinomial of degree 4 in \(\sqrt{AI}\). This polynomial is always greater than 0 when \(\sqrt{AI}>1\) since the polynomial \(3x^3-2x^2-5x+4\) has the form of Fig. 6.
Rights and permissions
About this article
Cite this article
Rodríguez-Avi, J., Olmo-Jiménez, M.J. A regression model for overdispersed data without too many zeros. Stat Papers 58, 749–773 (2017). https://doi.org/10.1007/s00362-015-0724-9
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00362-015-0724-9