Abstract
We introduce a new regression model for count data where the response variable is mainly in the class of inflated-parameter generalized power series (IGPS) distributions, which take automatically into account both dispersion and zero inflation phenomena. An original parameterization of these distributions is used, which is indexed by the mean and variance parameters, and not generally connected between them. An advantage of our approach is the straightforward interpretation of the regression coefficients in terms of the mean and variance comparing, for instance, to the popular generalized linear models. This attractive methodology is so simple and useful for many models. Some new mathematical and practical properties of the IGPS distributions are studied, including the quantile function, dispersion and zero-inflation indexes. Three basical IGPS models such for geometric, Bernoulli and Poisson are investigated in details. For the corresponding count regression models, the method of maximum likelihood is used for estimating the model parameters. Simulation studies are conducted to evaluate its finite sample performance. Finally, we highlight the ability of some reparameterized IGPS regression models to deal with count data which are overdispersed and zero-inflated; and then, comparing with usual models like zero inflated Poisson and negative binomial which are also reparameterized in terms of mean and variance.
Similar content being viewed by others
References
Abid R, Kokonendji CC and Masmoudi A (2021) On Poisson-exponential-Tweedie models for ultra-overdispersed count data. AStA Advances in Statistical Analysis 105, 1-23.
Bonat WH, Jrgensen B, Kokonendji, CC, Hinde J and Demétrio CG (2018) Extended Poisson-Tweedie: properties and regression models for count data. Statistical Modelling 18, 24–49.
Borges P and Godoi LG (2019) Plya-Aeppli regression model for overdispersed count data. Statistical Modelling 19, 362–385.
Bourguignon M and Medeiros RMR de (2022) A simple and useful regression model for fitting count data. TEST 31, 790–827.
Bourguignon M, Gallardo DI and de Medeiros RMR (2022) A simple and useful regression model for underdispersed count data based on Bernoulli-Poisson convolution. Statistical Papers 63, 821–848.
Castellares F, Lemonte AJ, and Moreno-Arenas G (2020) On the two-parameter Bell-Touchard discrete distribution. Communications in Statistics-Theory and Methods 49, 4834–4852.
Consul PC and Jain GC (1973) A generalization of the Poisson distribution. Technometrics 15, 791–799.
Cupach, WR and Spitzberg, BH (2004). The Dark Side of Relationship Pursuit: From Attraction to Obsession and Stalking, 2nd ed. Lawrence Erlbaum Associates, Mahwah, NJ.
Dunn PK and Smyth GK (1996) Randomized quantile residuals. Journal of Computational and Graphical Statistics 5, 236–244.
Efron B (1986) Double exponential families and their use in generalized linear regression. Journal of the American Statistical Association 81, 709–721.
Evans DA (1953) Experimental evidence concerning contagious distributions in ecology. Biometrika 40, 186–211.
Famoye F and Singh KP (2006) Zero-Inflated Generalized Poisson Regression Model with an Application to Domestic Violence Data. Journal of Data Science 4, 117–130.
Ferreri C (2009) On the Polya-Aeppli regression model. Metron 2, 129–152.
Graham RL, Knuth DE and Patashnik O (1989) Concrete Mathematics: A Foundation for Computer Science, 2nd ed. Addison & Wesley, Reading, BRK.
Greene WH (1994) Some Accounting for Excess Zeros and Sample Selection in Poisson and Negative Binomial Regression Models. Working Paper EC-94-10: Department of Economics, New York University. SSRN 1293115.
Gupta, RC (1974) Modified power series distributions and some of its applications. \(Sankhy \overline{a} B\)35, 288–298.
Hall DB (2000) Zero-Inflated Poisson and Binomial Regression with Random Effects: A Case Study. Biometrics 56, 1030–1039.
Joe H and Zhu R (2005), Generalized Poisson distribution: the property of mixture of Poisson and comparison with negative binomial distribution. Biometrical Journal 47, 219–229.
Johnson NL, Kemp AK and Kotz S (2005) Univariate Discrete Distributions, 3rd ed. Wiley, Hoboken, NJ.
Kleiber C and Zeileis A (2016) Visualizing count data regressions using rootograms. The American Statistician 70, 296–303.
Kolev N, Minkova L and Neytchev P (2000) Inflated-parameter family of generalized power series distributions and their application in analysis of overdispersed insurance data. ARCH Research Clearing House 2, 295–320.
Kumar CS and Ramachandran R (2020) On some aspects of a zero-inflated overdispersed model and its applications. Journal of Applied Statistics 47, 506-523
Kumar CS and Ramachandran R (2023) A generalization to zero-inflated hyper-Poisson distribution: Properties and applications. Communications in Statistics - Theory and Methods 52, 7289–7302.
Lambert D (1992) Zero-inflated Poisson regression, with an application to defects in manufacturing. Technometrics 34, 1–14.
Lemonte AJ (2022) On the mean-parameterized Bell-Touchard regression model for count data. Applied Mathematical Modelling 105, 1–16.
Loeys T, Moerkerke B, De Smet O and Buysse A (2012) The analysis of zero-inflated count data: Beyond zero-inflated Poisson regression. British Journal of Mathematical and Statistical Psychology 65, 163–180.
Noack A (1950) A Class of random variables with discrete distributions. The Annals of Mathematical Statistics 21, 127–32.
Petterle RR, Bonat WH, Kokonendji CC, Seganfredo JC, Moraes A, da Silva MG (2019) Double Poisson-Tweedie regression models (with Analyzing CD4 cell count in HIV-positive pregnant women). International Journal of Biostatistics 15(1), 15. Paper No. 20180119
Puig P and Valero J (2006) Count data distributions: some characterizations with applications. Journal of the American Statistical Association 101, 332–340
R Core Team (2023) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
Ribeiro Jr EE, Zeviani WM, Bonat WH, Demétrio CG and Hinde J (2019) Reparametrization of COM-Poisson regression models with applications in the analysis of experimental data. Statistical Modelling 5, 443–466.
Rigby RA, Stasinopoulos MD, Heller GZ and De Bastiani F (2019) Distributions for Modeling Location, Scale, and Shape: Using GAMLSS in R. CRC Press.
Rodríguez-Avi J and Olmo-Jiménez MJ (2017). A regression model for overdispersed data without too many zeros. Statistical Papers 58, 749–773.
Sellers KF and Raim A (2016) A flexible zero-inflated model to address data dispersion. Computational Statistics & Data Analysis 99, 68–80.
Vanegas LH and Paula GA (2016) Log-symmetric distributions: statistical properties and parameter estimation. Brazilian Journal of Probability and Statistics 30, 196–220.
Acknowledgements
We sincerely thank the Associate Editor and two anonymous referees for their valuable comments. This work was performed while the third author was at the LmB of the Université de Franche-Comté (UBFC) as a visiting professor, partly funded by FeProMath of UBFC. The LmB receives support from the EIPHI Graduate School (contract ANR-17-EURE-0002). Marcelo Bourguignon gratefully acknowledges partial financial support of the Brazilian agency Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq: grant 304140/2021-0).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Disclosure statement
No potential conflict of interest was reported by the author(s).
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Kokonendji, C.C., de Medeiros, R.M.R. & Bourguignon, M. Mean and Variance for Count Regression Models Based on Reparameterized Distributions. Sankhya B 86, 280–310 (2024). https://doi.org/10.1007/s13571-024-00325-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13571-024-00325-z
Keywords
- Dispersion phenomenon
- Generalized power series distribution
- Inflated parameter
- Maximum likelihood estimation
- Zero-inflation measure