Skip to main content

Advertisement

Log in

A simple and useful regression model for fitting count data

  • Original Paper
  • Published:
TEST Aims and scope Submit manuscript

Abstract

We present a novel regression model for count data where the response variable is BerG-distributed using a new parameterization of this distribution, which is indexed by mean and dispersion parameters. An attractive feature of this model lies in its potential to fit count data when overdispersion, equidispersion, underdispersion, or zero inflation (or deflation) is indicated. The advantage of our new parameterization and approach is the straightforward interpretation of the regression coefficients in terms of the mean and dispersion as in generalized linear models. The maximum likelihood method is used to estimate the model parameters. Also, we conduct hypothesis tests for the dispersion parameter and consider residual analysis. Simulation studies are conducted to empirically evidence the properties of the estimators, the test statistics, and the residuals in finite-sized samples. The proposed model is applied to two real datasets on wildlife habitat and road traffic accidents, which illustrates its capabilities in accommodating both over- and underdispersed count data. This paper contains Supplementary Material.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  • Aoyama K, Shimizu K, Ong S (2008) A first-passage time random walk distribution with five transition probabilities: a generalization of the shifted inverse trinomial. Ann Inst Stat Math 60:1–20

    Article  MathSciNet  Google Scholar 

  • Atkinson AC (1985) Plots, transformations, and regression: an introduction to graphical methods of diagnostic regression analysis. Oxford statistical science series. Clarendon Press, New York

    MATH  Google Scholar 

  • Bonat WH, Jørgensen B, Kokonendji CC, Hinde J, Demétrio CG (2017) Extended Poisson–Tweedie: properties and regression models for count data. Stat Model 18:24–49

    Article  MathSciNet  Google Scholar 

  • Bourguignon M, Weiß CH (2017) An INAR(1) process for modeling count time series with equidispersion, underdispersion and overdispersion. TEST 26:847–868

    Article  MathSciNet  Google Scholar 

  • Choo-Wosoba H, Levy SM, Datta S (2016) Marginal regression models for clustered count data based on zero-inflated Conway–Maxwell–Poisson distribution with applications. Biometrics 72:606–618

    Article  MathSciNet  Google Scholar 

  • Cox DR, Snell EJ (1968) A general definition of residuals. J R Stat Soc Ser B (Methodol) 30:248–275

    MathSciNet  MATH  Google Scholar 

  • Dobbie MJ, Welsh AH (2001) Models for zero-inflated count data using the Neyman type A distribution. Stat Model 1:65–80

    Article  Google Scholar 

  • Dunn PK, Smyth GK (1996) Randomized quantile residuals. J Comput Graph Stat 5:236–244

    Google Scholar 

  • Efron B (1986) Double exponential families and their use in generalized linear regression. J Am Stat Assoc 81:709–721

    Article  MathSciNet  Google Scholar 

  • Famoye F (1993) Restricted generalized Poisson regression model. Commun Stat Theory Methods 22:1335–1354

    Article  MathSciNet  Google Scholar 

  • Ferrari S, Cribari-Neto F (2004) Beta regression for modelling rates and proportions. J Appl Stat 31:799–815

    Article  MathSciNet  Google Scholar 

  • Graham RL, Knuth DE, Patashnik O (1989) Concrete mathematics: a foundation for computer science, 2nd edn. Addison & Wesley, Reading

    MATH  Google Scholar 

  • Griva I, Nash SG, Sofer A (2009) Linear and nonlinear optimization, vol 108, 2nd edn. SIAM, Philadelphia

    Book  Google Scholar 

  • Guo Z, Small DS, Gansky SA, Cheng J (2018) Mediation analysis for count and zero-inflated count data without sequential ignorability and its application in dental studies. J R Stat Soc Ser C (Appl Stat) 67:371–394

    Article  MathSciNet  Google Scholar 

  • Howes AL, Maron M, Mcalpine CA (2010) Bayesian networks and adaptive management of wildlife habitat. Conserv Biol 24:974–983

    Article  Google Scholar 

  • Hubert M, Vandervieren E (2008) An adjusted boxplot for skewed distributions. Comput Stat Data Anal 52:5186–5201

    Article  MathSciNet  Google Scholar 

  • Kleiber C, Zeileis A (2016) Visualizing count data regressions using rootograms. Am Stat 70:296–303

    Article  MathSciNet  Google Scholar 

  • Luenberger DG, Ye Y (2008) Linear and nonlinear Programming, 3rd edn. Springer, New York

    Book  Google Scholar 

  • McCullagh P, Nelder JA (1989) Generalized linear models, 2nd edn. Chapman and Hall, London

    Book  Google Scholar 

  • Nocedal J, Wright SJ (1999) Numerical optimization. Springer, New York

    Book  Google Scholar 

  • Petterle RR, BonatWH, Kokonendji CC, Seganfredo JC,Moraes A, da SilvaMG (2019) Double poisson-tweedie regression models. Int J Biostat 15(1). https://doi.org/10.1515/ijb-2018-0119

  • Puig P, Valero J (2006) Count data distributions: some characterizations with applications. J Am Stat Assoc 101:332–340

    Article  MathSciNet  Google Scholar 

  • R Core Team (2019) R: a language and environment for statistical computing. R Foundation for Statistical Computing. Vienna, Austria. https://www.R-project.org/

  • Ribeiro Jr EE (2019) Contributions to the analysis of dispersed count data. Master’s thesis. Universidade de São Paulo. São Paulo

  • Ribeiro Jr EE, Zeviani WM, Bonat WH, Demetrio CG, Hinde J (2020) Reparametrization of COM–Poisson regression models with applications in the analysis of experimental data. Stat Model 5: 443–466

  • Ridout MS, Besbeas P (2004) An empirical model for underdispersed count data. Stat Model 4:77–89

    Article  MathSciNet  Google Scholar 

  • Sáez-Castillo A, Conde-Sánchez A (2013) A hyper-Poisson regression model for overdispersed and underdispersed count data. Comput Stat Data Anal 61:148–157

    Article  MathSciNet  Google Scholar 

  • Sellers KF, Shmueli G et al (2010) A flexible regression model for count data. Ann Appl Stat 4:943–961

    Article  MathSciNet  Google Scholar 

  • Shmueli G, Minka TP, Kadane JB, Borle S, Boatwright P (2005) A useful distribution for fitting discrete data: revival of the Conway–Maxwell–Poisson distribution. J R Stat Soc Ser C (Appl Stat) 54:127–142

    Article  MathSciNet  Google Scholar 

  • Wedderburn RWM (1974) Quasi-likelihood functions, generalized linear models, and the Gauss–Newton method. Biometrika 61:439–447

    MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

11749_2022_801_MOESM1_ESM.pdf

The Supplementary Material includes additional details and calculations of the score function, the Hessian matrix, and the Fisher information matrix of the BerG regression model. (151KB)

A. Chi-square Q–Q plots of the score, Wald, likelihood ratio, and gradient statistics in the second simulation study

A. Chi-square Q–Q plots of the score, Wald, likelihood ratio, and gradient statistics in the second simulation study

Fig. 12
figure 12

Q–Q plots of the score statistic according to each sample size (row) and number of parameters tested (column)

Fig. 13
figure 13

Q–Q plots of the Wald statistic according to each sample size (row) and number of parameters tested (column)

Fig. 14
figure 14

Q–Q plots of the likelihood ratio statistic according to each sample size (row) and number of parameters tested (column), where the first column corresponds to \(q = 1\) and the second, \(q = 3\)

Fig. 15
figure 15

Q–Q plots of the gradient statistic according to each sample size (row) and number of parameters tested (column), where the first column corresponds to \(q = 1\) and the second, \(q = 3\)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bourguignon, M., de Medeiros, R.M.R. A simple and useful regression model for fitting count data. TEST 31, 790–827 (2022). https://doi.org/10.1007/s11749-022-00801-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11749-022-00801-6

Keywords

Mathematics Subject Classification

Navigation