A regression model for overdispersed data without too many zeros

Rodríguez-Avi, José; Olmo-Jiménez, María José

doi:10.1007/s00362-015-0724-9

A regression model for overdispersed data without too many zeros

Regular Article
Published: 07 November 2015

Volume 58, pages 749–773, (2017)
Cite this article

Statistical Papers Aims and scope Submit manuscript

546 Accesses
6 Citations
Explore all metrics

Abstract

A regression model for overdispersed count data based on the complex biparametric Pearson (CBP) distribution is developed. It is compared with the generalized Poisson regression model, the negative binomial regression model and the zero inflated Poisson regression model, which are based on the generalized Poisson (CBP), negative binomial (NB) and zero inflated Poisson (ZIP) distributions, respectively. It is shown that the CBP distribution is more adequate than the GP, NB and ZIP distributions when the overdispersion is not related to a higher frequency of 0, but to other low values greater than 0, so it may be appropriate for overdispersed cases in which there are external reasons that raise the number of low values different from 0. Firstly, we study the shape and the parameters of the CBP distribution and we compare it with the Poisson, GP, NB and ZIP distributions by means of the probability of 0, the skewness and curtosis coefficients and the Kullback–Leibler divergence. Furthermore, we present an application example where the aforementioned performance is shown by the number of public educational facilities by municipality in Andalusia (Spain). Secondly, we describe two regression models based on the CBP distribution and the estimation method for their parameters. Thirdly, we carry out a simulation study that reveals the performance of the regression models proposed. Finally, one application in the field of sport illustrates that these models can provide more accurate fits than those provided by other usual regression models for count data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mean and Variance for Count Regression Models Based on Reparameterized Distributions

Article 14 March 2024

Flexible models for overdispersed and underdispersed count data

Article Open access 04 February 2021

A new model for over-dispersed count data: Poisson quasi-Lindley regression model

Article Open access 10 July 2019

Notes

$g_1^{NB}/g_1^P=\mu /(2\sigma ^2-\mu )>1.$
http://www.juntadeandalucia.es/institutodeestadisticaycartografia/.

References

Ajiferuke I, Famoye F (2015) Modelling count response variables in informetric studies: comparison among count, linear, and lognormal regression models. J Inf 9(3):499–513
Article Google Scholar
Astuti ET, Budiantara IN, Sunaryo S, Dokhi M (2013) Statistical modeling for mortality data using local generalized poisson regression model. Int J Appl Math Stat 33(3):92–101
MathSciNet Google Scholar
Burnham KP, Anderson DR (2002) Model selection and multi-model inference: a practical information-theoretic approach, 2nd edn. Springer, New York, p 51
MATH Google Scholar
Cameron AC, Trivedi PK (2013) Regression analysis of count data, 2nd edn. Cambridge University Press, Cambridge
Book MATH Google Scholar
Consul PC (1989) Generalized Poisson distributions: properties and applications. Marcel Dekker Inc, New York
MATH Google Scholar
Consul PC, Famoye F (1992) Generalized Poisson regression model. Commun Stat 21(1):89–109
Article MATH Google Scholar
Cordeiro GM, Andrade MG, de Castro M (2009) Power series generalized nonlinear models. Comput Stat Data Anal 53:1155–1166
Article MathSciNet MATH Google Scholar
Czado C, Erhardt V, Min A, Wagner S (2007) Zero-inflated generalized Poisson models with regression effects on the mean, dispersion and zero-inflation level applied to patent outsourcing rates. Stat Model 7(2):125–153
Article MathSciNet Google Scholar
Famoye F, Wulu JT, Singh KP (2004) On the generalized Poisson regression model with an application to accident data. J Sci 2:287–295
Google Scholar
Hilbe JM (2011) Negative binomial regression, 2nd edn. Cambridge University Press, Cambridge
Book MATH Google Scholar
Hinde J, Demétrio CGB (1998) Overdispersion: models and estimation. Comput Stat Data Anal 27:151–170
Article MATH Google Scholar
Joe H, Zhu R (2005) Generalized Poisson distribution: the property of mixture of Poisson and comparison with negative binomial distribution. Biom J 47(2):219–229
Article MathSciNet Google Scholar
Lambert D (1992) Zero-Inflated Poisson regression with an application to defects in manufacturing. Technometrics 34:1–14
Article MATH Google Scholar
Lu HX, Wong MCM, Lo ECM, McGrath C (2013) Risk indicators of oral health status among young adults aged 18 years analyzed by negative binomial regression. BMC Oral Health 13:40
Article Google Scholar
Mullahy J (1986) Specification and testing of some modified count data models. J Econ 33:341–365
Article MathSciNet Google Scholar
Poortema K (1999) On modelling overdispersion of counts. Stat Neerl 53(1):5–20
Article MathSciNet MATH Google Scholar
R Core Team (2014) R: a language and environment for statistical computing., R Foundation for Statistical Computing, Vienna, Austria
Rigby R, Stasinopoulos D, Akantziliotou C (2008) A framework for modelling overdispersed count data, including the Poisson-shifted generalized inverse gaussian distribution. Comput Stat Data Anal 53:381–393
Article MathSciNet MATH Google Scholar
Rodríguez-Avi J, Conde-Sánchez A, Sáez-Castillo AJ (2003) A new class of discrete distributions with complex parameters. Stat Pap 44:67–88
Article MathSciNet MATH Google Scholar
Rodríguez-Avi J, Conde-Sánchez A, Sáez-Castillo AJ, Olmo-Jiménez MJ (2004) A triparametric discrete distribution with complex parameters. Stat Pap 45(1):81–95
Article MathSciNet MATH Google Scholar
Rodríguez-Avi J, Conde-Sánchez A, Sáez-Castillo AJ, Olmo-Jiménez MJ, Martínez-Rodríguez AM (2009) A generalized waring regression model for count data. Comput Stat Data Anal 53:3717–3725
Article MathSciNet MATH Google Scholar
Sáez-Castillo AJ, Conde-Sánchez A (2013) A hyper-Poisson regression model for overdispersed and underdispersed count data. Comput Stat Data Anal 61(C):148–157
Van den Broek J (1995) A score test for zero inflation in a Poisson distribution. Biometrics 54:738–743
Article MathSciNet MATH Google Scholar
Venables WN, Ripley BD (2002) Modern applied statistics with S, 4th edn. Springer, Springer
Book MATH Google Scholar
Wei F, Lovegrove G (2013) An empirical tool to evaluate the safety of cyclists: Community based, macro-level collision prediction models using negative binomial regression. Accid Anal Prev 61:129–137
Article Google Scholar
Winkelmann R (2008) Econometric Analysis of Count Data. Springer, Berlin
MATH Google Scholar
Wong KY, Lam KF (2013) Modeling zero-inflated count data using a covariate-dependent random effect model. Stat Med 32(8):1283–1293
Article MathSciNet Google Scholar
Xie FC, Lin JG, Wei BC (2014) Bayesian zero-inflated generalized Poisson regression model: estimation and case influence diagnostics. J Appl Stat 41(6):1383–1392
Article MathSciNet MATH Google Scholar
Zamani H, Ismail N (2013) Score test for testing zero-inflated Poisson regression against zero-inflated generalized Poisson alternatives. J Appl Stat 40(9) Published online: 03 Jun 2013
Zou Y, Zhang Y, Lord D (2013) Application of finite mixture of negative binomial regression models with varying weight parameters for vehicle crash data analysis. Accid Anal Prev 50:1042–1051
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics and Operations Research, University of Jaén, Jaén, Spain
José Rodríguez-Avi & María José Olmo-Jiménez

Authors

José Rodríguez-Avi
View author publications
You can also search for this author in PubMed Google Scholar
María José Olmo-Jiménez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to José Rodríguez-Avi.

Appendix

It is easy to prove that in the limit case ($\mu =\sigma ^2$) the quotient given in (10) is equal to 1. In general, if we solve the equation $Q_1=1$, we have

$$\begin{aligned} 2\sigma ^4+2\mu (\mu -2)\sigma ^2-2\mu ^2(\mu -1)=0 \end{aligned}$$

(15)

whose solutions are $\sigma ^2=-\mu ^2+\mu $ (which is imposible since $\sigma ^2>\mu $) or $\sigma ^2=\mu $ (as we already knew). Given that the expression (15) is a parabola with positive coefficient of the greatest order, then $Q_1>1$ when $\sigma ^2>\mu $.

In relation to (11), the quotient

$$\begin{aligned} \frac{g_1^{CBP}}{g_1^{GP}}=\frac{(4\mu +1)AI-3\mu }{(\mu +2-AI)(3AI-2\sqrt{AI})} \end{aligned}$$

is greater than 1 if and only if

$$\begin{aligned}&(4\mu +1)AI-3\mu >(\mu +2-AI)(3AI-2\sqrt{AI})\nonumber \\&\quad \Leftrightarrow \mu AI-5AI-3\mu +3AI^2+2\mu \sqrt{AI}+4\sqrt{AI}-2AI\sqrt{AI}>0. \end{aligned}$$

(16)

Applying that $AI>1$, (16) is greater than $3AI^2-2AI\sqrt{AI}-5AI+4\sqrt{AI}$, which is a polinomial of degree 4 in $\sqrt{AI}$. This polynomial is always greater than 0 when $\sqrt{AI}>1$ since the polynomial $3x^3-2x^2-5x+4$ has the form of Fig. 6.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rodríguez-Avi, J., Olmo-Jiménez, M.J. A regression model for overdispersed data without too many zeros. Stat Papers 58, 749–773 (2017). https://doi.org/10.1007/s00362-015-0724-9

Download citation

Received: 29 December 2014
Revised: 20 July 2015
Published: 07 November 2015
Issue Date: September 2017
DOI: https://doi.org/10.1007/s00362-015-0724-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A regression model for overdispersed data without too many zeros

Abstract

Access this article

Similar content being viewed by others

Mean and Variance for Count Regression Models Based on Reparameterized Distributions

Flexible models for overdispersed and underdispersed count data

A new model for over-dispersed count data: Poisson quasi-Lindley regression model

Notes

References

Author information

Authors and Affiliations

Corresponding author

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A regression model for overdispersed data without too many zeros

Abstract

Access this article

Similar content being viewed by others

Mean and Variance for Count Regression Models Based on Reparameterized Distributions

Flexible models for overdispersed and underdispersed count data

A new model for over-dispersed count data: Poisson quasi-Lindley regression model

Notes

References

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation