Abstract
The zero inflated hyper-Poisson regression model permits count data to be analysed with covariates that determine different levels of dispersion and that present structural zeros due to the existence of a non-users group. A simulation study demonstrates the capability of the model to detect over- and under-dispersion of the potential users group of the dataset in relation to the value of covariates, and to estimate the proportion of structural zeros with great accuracy. An application of the model to fit the number of children per family in relation to several covariates confirms the presence of structural zeros in fertility data at the same time as it detects under-dispersion in most of the levels determined by the covariates.
Similar content being viewed by others
References
Aldieri L, Vinci CP (2012) Education and fertility: an investigation into Italian families. Int J Soc Econ 39(4):254–263
Angers JF, Biswas A (2003) A Bayesian analysis of zero-inflated generalized Poisson model. Comput Stati Data Anal 42(1–2):37–46
Baetschmann G, Winkelmann R (2013) Modeling zero-inflated count data when exposure varies: with an application to tumor counts. Biom J 55(5):679–686
Bardwell G, Crow E (1964) A two-parameter family of hyper-Poisson distributions. J Am Stat Assoc 59:133–141
By K, Qaqish B (2011) mvtBinaryEP: generates correlated binary data. http://CRAN.R-project.org/package=mvtBinaryEP, r package version 1.0.1
Chen XD, Fu YZ (2011) Model selection for zero-inflated regression with missing covariates. Comput Stat Data Anal 55(1):765–773
Chen XD, Fu YZ, Wang XR (2012) Local influence measure of zero-inflated generalized Poisson mixture regression models. Stat Med 32(8):1294–1312
Cui Y, Yang W (2009) Zero-inflated generalized Poisson regression mixture model for mapping quantitative trait loci underlying count trait with many zeros. J Theor Biol 256(2):276–85
Czado C, Erhardt V, Min A, Wagner S (2007) Zero-inflated generalized poisson models with regression effects on the mean, dispersion and zero-inflation level applied to patent outsourcing rates. Stat Model 7(2):125–153
Czado C, Schabenberger H, Erhardt V (2014) Non nested model selection for spatial count regression models with application to health insurance. Stat Pap 55(2):455–476
Emrich LJ, Piedmonte MR (1991) A method for generating high-dimensional multivariate binary variates. Am Stat 45(4):302–304
Erhardt V (2011) ZIGP: zero inflated generalized poisson (ZIGP) regression models. http://cran.r-project.org/src/contrib/Archive/ZIGP/, r package version 3.8
Famoye F, Singh KP (2006) Zero-inflated generalized Poisson regression model with an application to domestic violence data. J Data Sci 4:117–130
Fotouhi A (2013) Over- and under-dispersion in modelling count data. Far East J Math Sci 75:203–221
Fox JP (2013) Multivariate zero-inflated modeling with latent predictors: modeling feedback behavior. Comput Stat Data Anal 68:361–374
Gupta PL, Gupta RC, Tripathi RC (2005) Score test for zero inflated generalized Poisson regression model. Commun Stat Theory Methods 33(1):47–64
Hilbe J (2011) Negative binomial regression. Cambridge University Press, New York
Jochmann M (2013) What belongs where? Variable selection for zero-inflated count models with an application to the demand for health care. Comput Stat 28(5):1947–1964
Lambert D (1992) Zero-inflated Poisson regression, with an application to defects in manufacturing. Technometrics 34(1):1–14
Lesnoff, M, Lancelot, R (2012) aod: analysis of overdispersed data. http://cran.r-project.org/package=aod, r package version 1.3
Panel Study of Income Dynamics (2011) Produced and distributed by the Survey Research Center. Institute for Social Research, University of Michigan, Ann Arbor, MI, Technical report
Perumean-Chaney SE, Morgan C, McDowall D, Aban I (2013) Zero-inflated and overdispersed: what’s one to do? J Stat Comput Simul 83(9):1671–1683
Poston D Jr, McKibben S (2003) Using zero-inflated count regression models to estimate the fertility of US women. J Mod Appl Stat Methods 2(2):371–379
R Core Team (2013) R: a language and environment for statistical computing. Vienna, Austria, http://www.R-project.org/
Sáez-Castillo A, Conde-Sánchez A (2013) A hyper-Poisson regression model for overdispersed and underdispersed count data. Comput Stat Data Anal 61:148–157
Sellers KF, Shmueli G (2013) Data dispersion: now you see it... now you don’t. Commun Stat Theory Methods 42(17):3134–3147
Staub KE, Winkelmann R (2013) Consistent estimation of zero inflated count models. Health Econ 22(6):673–686
Tin A (2008) Modeling zero-inflated count data with underdispersion and overdispersion. SAS Global Forum Proceedings, Statistics and Data Analysis (372–2008)
Wang W, Famoye F (1997) Modeling household fertility decisions with generalized Poisson regression. J Popul Econ 10(3):273–283
Xie FC, Lin JG, Wei BC (2010) Testing for varying zero-inflation and dispersion in generalized Poisson regression models. J Appl Stat 37(9):1509–1522
Xie FC, Lin JG, Wei BC (2014) Bayesian zero-inflated generalized Poisson regression model: estimation and case influence diagnostics. J Appl Stat 41(6):1383–1392
Yang Z, Hardin JW, Addy CL (2009) Testing overdispersion in the zero-inflated Poisson model. J Stat Plan Inference 139(9):3340–3353
Yang Z, Hardin JW, Addy CL (2010) Score tests for zero-inflation in overdispersed count data. Commun Stat Theory Methods 39(11):2008–2030
Yee TW (2013) VGAM: Vector Generalized Linear and Additive Models. http://CRAN.R-project.org/package=VGAM, r package version 0.9-3
Yip KC, Yau KK (2005) On modeling claim frequency data in general insurance with extra zeros. Insur Math Econ 36(2):153–163
Zamani H, Ismail N (2013) Score test for testing zero-inflated Poisson regression against zero-inflated generalized Poisson alternatives. J Appl Stat 40(9):2056–2068
Zamani H, Ismail N (2014) Functional form for the zero-inflated generalized Poisson regression model. Commun Stat Theory Methods 43(3):515–529
Zeileis A, Kleiber C, Jackman S (2008) Regression models for count data in R. J Stat Softw 27(8):1–25
Acknowledgments
The authors are grateful for the constructive suggestions provided by the reviewers, which improved the paper.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Sáez-Castillo, A.J., Conde-Sánchez, A. Detecting over- and under-dispersion in zero inflated data with the hyper-Poisson regression model. Stat Papers 58, 19–33 (2017). https://doi.org/10.1007/s00362-015-0683-1
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00362-015-0683-1