Abstract
We present a novel regression model for count data where the response variable is BerG-distributed using a new parameterization of this distribution, which is indexed by mean and dispersion parameters. An attractive feature of this model lies in its potential to fit count data when overdispersion, equidispersion, underdispersion, or zero inflation (or deflation) is indicated. The advantage of our new parameterization and approach is the straightforward interpretation of the regression coefficients in terms of the mean and dispersion as in generalized linear models. The maximum likelihood method is used to estimate the model parameters. Also, we conduct hypothesis tests for the dispersion parameter and consider residual analysis. Simulation studies are conducted to empirically evidence the properties of the estimators, the test statistics, and the residuals in finite-sized samples. The proposed model is applied to two real datasets on wildlife habitat and road traffic accidents, which illustrates its capabilities in accommodating both over- and underdispersed count data. This paper contains Supplementary Material.
Similar content being viewed by others
References
Aoyama K, Shimizu K, Ong S (2008) A first-passage time random walk distribution with five transition probabilities: a generalization of the shifted inverse trinomial. Ann Inst Stat Math 60:1–20
Atkinson AC (1985) Plots, transformations, and regression: an introduction to graphical methods of diagnostic regression analysis. Oxford statistical science series. Clarendon Press, New York
Bonat WH, Jørgensen B, Kokonendji CC, Hinde J, Demétrio CG (2017) Extended Poisson–Tweedie: properties and regression models for count data. Stat Model 18:24–49
Bourguignon M, Weiß CH (2017) An INAR(1) process for modeling count time series with equidispersion, underdispersion and overdispersion. TEST 26:847–868
Choo-Wosoba H, Levy SM, Datta S (2016) Marginal regression models for clustered count data based on zero-inflated Conway–Maxwell–Poisson distribution with applications. Biometrics 72:606–618
Cox DR, Snell EJ (1968) A general definition of residuals. J R Stat Soc Ser B (Methodol) 30:248–275
Dobbie MJ, Welsh AH (2001) Models for zero-inflated count data using the Neyman type A distribution. Stat Model 1:65–80
Dunn PK, Smyth GK (1996) Randomized quantile residuals. J Comput Graph Stat 5:236–244
Efron B (1986) Double exponential families and their use in generalized linear regression. J Am Stat Assoc 81:709–721
Famoye F (1993) Restricted generalized Poisson regression model. Commun Stat Theory Methods 22:1335–1354
Ferrari S, Cribari-Neto F (2004) Beta regression for modelling rates and proportions. J Appl Stat 31:799–815
Graham RL, Knuth DE, Patashnik O (1989) Concrete mathematics: a foundation for computer science, 2nd edn. Addison & Wesley, Reading
Griva I, Nash SG, Sofer A (2009) Linear and nonlinear optimization, vol 108, 2nd edn. SIAM, Philadelphia
Guo Z, Small DS, Gansky SA, Cheng J (2018) Mediation analysis for count and zero-inflated count data without sequential ignorability and its application in dental studies. J R Stat Soc Ser C (Appl Stat) 67:371–394
Howes AL, Maron M, Mcalpine CA (2010) Bayesian networks and adaptive management of wildlife habitat. Conserv Biol 24:974–983
Hubert M, Vandervieren E (2008) An adjusted boxplot for skewed distributions. Comput Stat Data Anal 52:5186–5201
Kleiber C, Zeileis A (2016) Visualizing count data regressions using rootograms. Am Stat 70:296–303
Luenberger DG, Ye Y (2008) Linear and nonlinear Programming, 3rd edn. Springer, New York
McCullagh P, Nelder JA (1989) Generalized linear models, 2nd edn. Chapman and Hall, London
Nocedal J, Wright SJ (1999) Numerical optimization. Springer, New York
Petterle RR, BonatWH, Kokonendji CC, Seganfredo JC,Moraes A, da SilvaMG (2019) Double poisson-tweedie regression models. Int J Biostat 15(1). https://doi.org/10.1515/ijb-2018-0119
Puig P, Valero J (2006) Count data distributions: some characterizations with applications. J Am Stat Assoc 101:332–340
R Core Team (2019) R: a language and environment for statistical computing. R Foundation for Statistical Computing. Vienna, Austria. https://www.R-project.org/
Ribeiro Jr EE (2019) Contributions to the analysis of dispersed count data. Master’s thesis. Universidade de São Paulo. São Paulo
Ribeiro Jr EE, Zeviani WM, Bonat WH, Demetrio CG, Hinde J (2020) Reparametrization of COM–Poisson regression models with applications in the analysis of experimental data. Stat Model 5: 443–466
Ridout MS, Besbeas P (2004) An empirical model for underdispersed count data. Stat Model 4:77–89
Sáez-Castillo A, Conde-Sánchez A (2013) A hyper-Poisson regression model for overdispersed and underdispersed count data. Comput Stat Data Anal 61:148–157
Sellers KF, Shmueli G et al (2010) A flexible regression model for count data. Ann Appl Stat 4:943–961
Shmueli G, Minka TP, Kadane JB, Borle S, Boatwright P (2005) A useful distribution for fitting discrete data: revival of the Conway–Maxwell–Poisson distribution. J R Stat Soc Ser C (Appl Stat) 54:127–142
Wedderburn RWM (1974) Quasi-likelihood functions, generalized linear models, and the Gauss–Newton method. Biometrika 61:439–447
Author information
Authors and Affiliations
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
11749_2022_801_MOESM1_ESM.pdf
The Supplementary Material includes additional details and calculations of the score function, the Hessian matrix, and the Fisher information matrix of the BerG regression model. (151KB)
A. Chi-square Q–Q plots of the score, Wald, likelihood ratio, and gradient statistics in the second simulation study
A. Chi-square Q–Q plots of the score, Wald, likelihood ratio, and gradient statistics in the second simulation study
Rights and permissions
About this article
Cite this article
Bourguignon, M., de Medeiros, R.M.R. A simple and useful regression model for fitting count data. TEST 31, 790–827 (2022). https://doi.org/10.1007/s11749-022-00801-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11749-022-00801-6