Skip to main content

Advertisement

Log in

Flexible regression models for counts with high-inflation of zeros

  • Published:
METRON Aims and scope Submit manuscript

Abstract

In this paper, we introduce a flexible class of regression models for counts with high-inflation of zeros that cannot be predicted by the Poisson, the zero-inflated Poisson, the negative binomial and the Poisson-inverse Gaussian regression models. Our proposed flexible regression models are based on a class of zero-inflated mixed Poisson distributions and contain the zero-inflated negative binomial (ZINB) and the zero-inflated Poisson-inverse Gaussian (ZIPIG) distributions, as particular cases, among others. We consider regression structures for the mean, the dispersion, and the zero-inflation parameters. Consequently, we generalize existing models, such as the ZINB regression (with non-varying dispersion), and also open the possibility of introducing new models, such as the ZIPIG and the zero-inflated generalized hyperbolic secant regressions. We propose an Expectation-Maximization (in short EM) algorithm for estimating the parameters and the associated information matrix. Simulation results are presented to compare the finite-sample performance of our proposed EM-algorithm with a direct maximization of the log-likelihood function based on the GAMLSS approach. These simulated results show some advantages of our EM-algorithm concerning the GAMLSS proposal. We also discuss a measure of influence based on the EM approach and propose simulated envelopes for checking the adequacy of our zero-inflated regression models. An empirical application, about the number of roots produced by 270 micropropagated shoots of the columnar apple cultivar Trajan, illustrates the usefulness of the proposed class of regression models for dealing with count data presenting high-inflation of zeros and shows that one cannot use the GAMLSS approach in some practical situations due to numerical problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Atkinson, A.C.: Plots, Transformations and Regression. Oxford University Press, Oxford (1985)

    MATH  Google Scholar 

  2. Barreto-Souza, W., Simas, A.B.: General mixed Poisson regression models with varying dispersion. Stat. Comput. 26, 1263–1280 (2016)

    Article  MathSciNet  Google Scholar 

  3. Böhning, D., Dietz, E., Schlattmann, P., Mendonça, L., Kirchner, U.: The zero-inflated Poisson model and the decayed, missing and filled teeth index in dental epidemiology. J. R. Stat. Soc. Ser. A 162, 195–209 (1999)

    Article  Google Scholar 

  4. Cameron, A.C., Trivedi, P.K.: Regression Analysis of Count Data. Cambridge University Press, Cambridge (1998)

    Book  Google Scholar 

  5. Cook, R.D.: Detection of influential observations in linear regression. Technometrics 19, 15–18 (1977)

    MathSciNet  MATH  Google Scholar 

  6. Dean, C.B., Lawless, J., Willmot, G.E.: A mixed Poisson-inverse Gaussian regression model. Can. J. Stat. 17, 171–182 (1989)

    Article  MathSciNet  Google Scholar 

  7. Dean, C.B., Nielsen, J.D.: Generalized linear mixed models: a review and some extensions. Lifetime Data Anal. 13, 497–512 (2007)

    Article  MathSciNet  Google Scholar 

  8. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm (with discussion). J. R. Stat. Soc. Ser. B 39, 1–38 (1977)

    MATH  Google Scholar 

  9. Famoye, F., Singh, K.P.: Zero-inflated generalized Poisson regression model with an application to domestic violence data. J. Data Sci. 4, 117–130 (2006)

    Google Scholar 

  10. Garay, A.M., Hashimoto, E.M., Ortega, E.M., Lachos, V.H.: On estimation and influence diagnostics for zero-inflated negative binomial regression models. Comput. Stat. Data Anal. 55, 1304–1318 (2011)

    Article  MathSciNet  Google Scholar 

  11. Hall, D.: Zero-inflated Poisson and binomial regression with random effects: a case study. Biometrics 56, 1030–1039 (2000)

    Article  MathSciNet  Google Scholar 

  12. Hilbe, J.M.: Negative Binomial Regression. Cambridge University Press, New York (2008)

    MATH  Google Scholar 

  13. Hinde, J., Demétrio, C.G.B.: Overdispersion: models and estimation. Comput. Stat. Data Anal. 27, 151–170 (1998)

    Article  Google Scholar 

  14. Holla, M.S.: On a Poisson-inverse Gaussian distribution. Metrika 11, 115–121 (1966)

    Article  MathSciNet  Google Scholar 

  15. Karlis, D., Xekalaki, E.: Mixed Poisson distributions. Int. Stat. Rev. 73, 35–58 (2005)

    Article  Google Scholar 

  16. Lambert, D.: Zero-inflated Poisson regression, with an application to defects in manufacturing. Technometrics 34, 1–14 (1992)

    Article  Google Scholar 

  17. Lawless, J.F.: Negative binomial and mixed Poisson regression. Can. J. Stat. 15, 209–225 (1987)

    Article  MathSciNet  Google Scholar 

  18. Lee, A.H., Wang, K., Yau, K.K.: Analysis of zero-inflated Poisson data incorporating extent of exposure. Biometr. J. 43, 963–975 (2001)

    Article  MathSciNet  Google Scholar 

  19. Li, C.S., Lu, J.C., Park, J., Kim, K., Brinkley, P.A., Peterson, J.P.: Multivariate zero-inflated Poisson models and their applications. Technometrics 41, 29–38 (1999)

    Article  Google Scholar 

  20. Lim, H.K., Li, W.K., Yu, P.L.H.: Zero-inflated Poisson regression mixture model. Comput. Stat. Data Anal. 71, 151–158 (2014)

    Article  MathSciNet  Google Scholar 

  21. Louis, T.A.: Finding the observed information matrix when using the EM algorithm. J. R. Stat. Soc. 44, 226–233 (1982)

    MathSciNet  MATH  Google Scholar 

  22. Mwalili, S.M., Lesaffre, E., Declerck, D.: The zero-inflated negative binomial regression model with correction for misclassification: an example in caries research. Stat. Methods Med. Res. 17, 123–139 (2008)

    Article  MathSciNet  Google Scholar 

  23. Oliveira, M., Einbeck, J., Higueras, M., Ainsbury, E., Puig, P., Rothkamm, K.: Zero-inflated regression models for radiation-induced chromosome aberration data: a comparative study. Biometr. J. 58, 259–279 (2016)

    Article  MathSciNet  Google Scholar 

  24. R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2016)

  25. Ridout, M.S., Demétrio, C.G.B., Hinde, J.P.: Models for count data with many zeros. In: Proceedings of the XIXth International Biometrics Conference, Cape Town, Invited Papers, pp. 179–192 (1998)

  26. Ridout, M., Hinde, J., Demétrio, C.G.B.: A score test for testing a zero-inflated Poisson regression model against zero-inflated negative binomial alternatives. Biometrics 57, 219–223 (2001)

    Article  MathSciNet  Google Scholar 

  27. Rigby, R.A., Stasinopoulos, D.M.: Generalized additive models for location, scale and shape (with discussion). Appl. Stat. 54, 507–554 (2005)

    MATH  Google Scholar 

  28. Shankar, V., Milton, J., Mannering, F.: Modelling accident frequencies as zero-altered probability processes: An empirical inquiry. Accid. Anal. Prev. 29, 829–837 (1997)

    Article  Google Scholar 

  29. Sichel, H.S.: On a family of discrete distributions particularly suited to represent long-tailed frequency data. In: Proceedings of the Third Symposium on Mathematical Statistics, Pretoria, CSIR, pp. 51–97 (1971)

  30. Willmot, G.E.: The Poisson-inverse Gaussian distribution as an alternative to the negative binomial. Scand. Actuar. J. 20, 113–127 (1989)

    MathSciNet  Google Scholar 

  31. Wu, C.F.J.: On the convergence properties of the EM algorithm. Ann. Stat. 11, 95–103 (1983)

    Article  MathSciNet  Google Scholar 

  32. Yau, K.K.W., Wang, K., Lee, A.H.: Zero-inflated negative binomial mixed regression modelling of over-dispersed count data with extra zeros. Biometr. J. 45, 437–452 (2003)

    Article  Google Scholar 

  33. Zhu, H.T., Lee, S.Y., Wei, B.C., Zhu, J.: Case-deletion measures for models with incomplete data. Biometrika 88, 727–737 (2001)

    Article  MathSciNet  Google Scholar 

  34. Zhu, H.T., Lee, S.Y.: Local influence for incomplete-data models. J. R. Stat. Soc. Ser. B 63, 111–126 (2001)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

We would like to thank the referee for the useful comments and suggestions that leads to a substantial improvement of the paper. We also acknowledge the financial support from CAPES (Brazil), CNPq (Brazil) and FAPEMIG (Brazil).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wagner Barreto-Souza.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 517 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gonçalves, J.N., Barreto-Souza, W. Flexible regression models for counts with high-inflation of zeros. METRON 78, 71–95 (2020). https://doi.org/10.1007/s40300-020-00163-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s40300-020-00163-9

Keywords

Navigation