Skip to main content
Log in

Detecting over- and under-dispersion in zero inflated data with the hyper-Poisson regression model

  • Regular Article
  • Published:
Statistical Papers Aims and scope Submit manuscript

Abstract

The zero inflated hyper-Poisson regression model permits count data to be analysed with covariates that determine different levels of dispersion and that present structural zeros due to the existence of a non-users group. A simulation study demonstrates the capability of the model to detect over- and under-dispersion of the potential users group of the dataset in relation to the value of covariates, and to estimate the proportion of structural zeros with great accuracy. An application of the model to fit the number of children per family in relation to several covariates confirms the presence of structural zeros in fertility data at the same time as it detects under-dispersion in most of the levels determined by the covariates.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Aldieri L, Vinci CP (2012) Education and fertility: an investigation into Italian families. Int J Soc Econ 39(4):254–263

    Article  Google Scholar 

  • Angers JF, Biswas A (2003) A Bayesian analysis of zero-inflated generalized Poisson model. Comput Stati Data Anal 42(1–2):37–46

    Article  MathSciNet  MATH  Google Scholar 

  • Baetschmann G, Winkelmann R (2013) Modeling zero-inflated count data when exposure varies: with an application to tumor counts. Biom J 55(5):679–686

    Article  MathSciNet  MATH  Google Scholar 

  • Bardwell G, Crow E (1964) A two-parameter family of hyper-Poisson distributions. J Am Stat Assoc 59:133–141

    Article  MathSciNet  MATH  Google Scholar 

  • By K, Qaqish B (2011) mvtBinaryEP: generates correlated binary data. http://CRAN.R-project.org/package=mvtBinaryEP, r package version 1.0.1

  • Chen XD, Fu YZ (2011) Model selection for zero-inflated regression with missing covariates. Comput Stat Data Anal 55(1):765–773

    Article  MathSciNet  MATH  Google Scholar 

  • Chen XD, Fu YZ, Wang XR (2012) Local influence measure of zero-inflated generalized Poisson mixture regression models. Stat Med 32(8):1294–1312

    Article  MathSciNet  Google Scholar 

  • Cui Y, Yang W (2009) Zero-inflated generalized Poisson regression mixture model for mapping quantitative trait loci underlying count trait with many zeros. J Theor Biol 256(2):276–85

    Article  MathSciNet  Google Scholar 

  • Czado C, Erhardt V, Min A, Wagner S (2007) Zero-inflated generalized poisson models with regression effects on the mean, dispersion and zero-inflation level applied to patent outsourcing rates. Stat Model 7(2):125–153

    Article  MathSciNet  Google Scholar 

  • Czado C, Schabenberger H, Erhardt V (2014) Non nested model selection for spatial count regression models with application to health insurance. Stat Pap 55(2):455–476

    Article  MathSciNet  MATH  Google Scholar 

  • Emrich LJ, Piedmonte MR (1991) A method for generating high-dimensional multivariate binary variates. Am Stat 45(4):302–304

    Google Scholar 

  • Erhardt V (2011) ZIGP: zero inflated generalized poisson (ZIGP) regression models. http://cran.r-project.org/src/contrib/Archive/ZIGP/, r package version 3.8

  • Famoye F, Singh KP (2006) Zero-inflated generalized Poisson regression model with an application to domestic violence data. J Data Sci 4:117–130

    Google Scholar 

  • Fotouhi A (2013) Over- and under-dispersion in modelling count data. Far East J Math Sci 75:203–221

    Google Scholar 

  • Fox JP (2013) Multivariate zero-inflated modeling with latent predictors: modeling feedback behavior. Comput Stat Data Anal 68:361–374

    Article  MathSciNet  Google Scholar 

  • Gupta PL, Gupta RC, Tripathi RC (2005) Score test for zero inflated generalized Poisson regression model. Commun Stat Theory Methods 33(1):47–64

    Article  MathSciNet  MATH  Google Scholar 

  • Hilbe J (2011) Negative binomial regression. Cambridge University Press, New York

    Book  MATH  Google Scholar 

  • Jochmann M (2013) What belongs where? Variable selection for zero-inflated count models with an application to the demand for health care. Comput Stat 28(5):1947–1964

    Article  MathSciNet  MATH  Google Scholar 

  • Lambert D (1992) Zero-inflated Poisson regression, with an application to defects in manufacturing. Technometrics 34(1):1–14

    Article  MATH  Google Scholar 

  • Lesnoff, M, Lancelot, R (2012) aod: analysis of overdispersed data. http://cran.r-project.org/package=aod, r package version 1.3

  • Panel Study of Income Dynamics (2011) Produced and distributed by the Survey Research Center. Institute for Social Research, University of Michigan, Ann Arbor, MI, Technical report

  • Perumean-Chaney SE, Morgan C, McDowall D, Aban I (2013) Zero-inflated and overdispersed: what’s one to do? J Stat Comput Simul 83(9):1671–1683

    Article  MathSciNet  Google Scholar 

  • Poston D Jr, McKibben S (2003) Using zero-inflated count regression models to estimate the fertility of US women. J Mod Appl Stat Methods 2(2):371–379

    Article  Google Scholar 

  • R Core Team (2013) R: a language and environment for statistical computing. Vienna, Austria, http://www.R-project.org/

  • Sáez-Castillo A, Conde-Sánchez A (2013) A hyper-Poisson regression model for overdispersed and underdispersed count data. Comput Stat Data Anal 61:148–157

    Article  MathSciNet  MATH  Google Scholar 

  • Sellers KF, Shmueli G (2013) Data dispersion: now you see it... now you don’t. Commun Stat Theory Methods 42(17):3134–3147

  • Staub KE, Winkelmann R (2013) Consistent estimation of zero inflated count models. Health Econ 22(6):673–686

    Article  Google Scholar 

  • Tin A (2008) Modeling zero-inflated count data with underdispersion and overdispersion. SAS Global Forum Proceedings, Statistics and Data Analysis (372–2008)

  • Wang W, Famoye F (1997) Modeling household fertility decisions with generalized Poisson regression. J Popul Econ 10(3):273–283

    Article  Google Scholar 

  • Xie FC, Lin JG, Wei BC (2010) Testing for varying zero-inflation and dispersion in generalized Poisson regression models. J Appl Stat 37(9):1509–1522

    Article  MathSciNet  Google Scholar 

  • Xie FC, Lin JG, Wei BC (2014) Bayesian zero-inflated generalized Poisson regression model: estimation and case influence diagnostics. J Appl Stat 41(6):1383–1392

    Article  MathSciNet  MATH  Google Scholar 

  • Yang Z, Hardin JW, Addy CL (2009) Testing overdispersion in the zero-inflated Poisson model. J Stat Plan Inference 139(9):3340–3353

    Article  MathSciNet  MATH  Google Scholar 

  • Yang Z, Hardin JW, Addy CL (2010) Score tests for zero-inflation in overdispersed count data. Commun Stat Theory Methods 39(11):2008–2030

    Article  MathSciNet  MATH  Google Scholar 

  • Yee TW (2013) VGAM: Vector Generalized Linear and Additive Models. http://CRAN.R-project.org/package=VGAM, r package version 0.9-3

  • Yip KC, Yau KK (2005) On modeling claim frequency data in general insurance with extra zeros. Insur Math Econ 36(2):153–163

    Article  MATH  Google Scholar 

  • Zamani H, Ismail N (2013) Score test for testing zero-inflated Poisson regression against zero-inflated generalized Poisson alternatives. J Appl Stat 40(9):2056–2068

    Article  MathSciNet  Google Scholar 

  • Zamani H, Ismail N (2014) Functional form for the zero-inflated generalized Poisson regression model. Commun Stat Theory Methods 43(3):515–529

    Article  MathSciNet  MATH  Google Scholar 

  • Zeileis A, Kleiber C, Jackman S (2008) Regression models for count data in R. J Stat Softw 27(8):1–25

    Article  Google Scholar 

Download references

Acknowledgments

The authors are grateful for the constructive suggestions provided by the reviewers, which improved the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Antonio J. Sáez-Castillo.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sáez-Castillo, A.J., Conde-Sánchez, A. Detecting over- and under-dispersion in zero inflated data with the hyper-Poisson regression model. Stat Papers 58, 19–33 (2017). https://doi.org/10.1007/s00362-015-0683-1

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00362-015-0683-1

Keywords

Navigation