A simple and useful regression model for fitting count data

Bourguignon, Marcelo; de Medeiros, Rodrigo M. R.

doi:10.1007/s11749-022-00801-6

A simple and useful regression model for fitting count data

Original Paper
Published: 21 February 2022

Volume 31, pages 790–827, (2022)
Cite this article

TEST Aims and scope Submit manuscript

536 Accesses
3 Citations
Explore all metrics

Abstract

We present a novel regression model for count data where the response variable is BerG-distributed using a new parameterization of this distribution, which is indexed by mean and dispersion parameters. An attractive feature of this model lies in its potential to fit count data when overdispersion, equidispersion, underdispersion, or zero inflation (or deflation) is indicated. The advantage of our new parameterization and approach is the straightforward interpretation of the regression coefficients in terms of the mean and dispersion as in generalized linear models. The maximum likelihood method is used to estimate the model parameters. Also, we conduct hypothesis tests for the dispersion parameter and consider residual analysis. Simulation studies are conducted to empirically evidence the properties of the estimators, the test statistics, and the residuals in finite-sized samples. The proposed model is applied to two real datasets on wildlife habitat and road traffic accidents, which illustrates its capabilities in accommodating both over- and underdispersed count data. This paper contains Supplementary Material.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Sampling Techniques for Quantitative Research

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

Article Open access 05 May 2021

Levi Kumle, Melissa L.-H. Võ & Dejan Draschkow

Violating the normality assumption may be the lesser of two evils

Article Open access 07 May 2021

Ulrich Knief & Wolfgang Forstmeier

References

Aoyama K, Shimizu K, Ong S (2008) A first-passage time random walk distribution with five transition probabilities: a generalization of the shifted inverse trinomial. Ann Inst Stat Math 60:1–20
Article MathSciNet Google Scholar
Atkinson AC (1985) Plots, transformations, and regression: an introduction to graphical methods of diagnostic regression analysis. Oxford statistical science series. Clarendon Press, New York
MATH Google Scholar
Bonat WH, Jørgensen B, Kokonendji CC, Hinde J, Demétrio CG (2017) Extended Poisson–Tweedie: properties and regression models for count data. Stat Model 18:24–49
Article MathSciNet Google Scholar
Bourguignon M, Weiß CH (2017) An INAR(1) process for modeling count time series with equidispersion, underdispersion and overdispersion. TEST 26:847–868
Article MathSciNet Google Scholar
Choo-Wosoba H, Levy SM, Datta S (2016) Marginal regression models for clustered count data based on zero-inflated Conway–Maxwell–Poisson distribution with applications. Biometrics 72:606–618
Article MathSciNet Google Scholar
Cox DR, Snell EJ (1968) A general definition of residuals. J R Stat Soc Ser B (Methodol) 30:248–275
MathSciNet MATH Google Scholar
Dobbie MJ, Welsh AH (2001) Models for zero-inflated count data using the Neyman type A distribution. Stat Model 1:65–80
Article Google Scholar
Dunn PK, Smyth GK (1996) Randomized quantile residuals. J Comput Graph Stat 5:236–244
Google Scholar
Efron B (1986) Double exponential families and their use in generalized linear regression. J Am Stat Assoc 81:709–721
Article MathSciNet Google Scholar
Famoye F (1993) Restricted generalized Poisson regression model. Commun Stat Theory Methods 22:1335–1354
Article MathSciNet Google Scholar
Ferrari S, Cribari-Neto F (2004) Beta regression for modelling rates and proportions. J Appl Stat 31:799–815
Article MathSciNet Google Scholar
Graham RL, Knuth DE, Patashnik O (1989) Concrete mathematics: a foundation for computer science, 2nd edn. Addison & Wesley, Reading
MATH Google Scholar
Griva I, Nash SG, Sofer A (2009) Linear and nonlinear optimization, vol 108, 2nd edn. SIAM, Philadelphia
Book Google Scholar
Guo Z, Small DS, Gansky SA, Cheng J (2018) Mediation analysis for count and zero-inflated count data without sequential ignorability and its application in dental studies. J R Stat Soc Ser C (Appl Stat) 67:371–394
Article MathSciNet Google Scholar
Howes AL, Maron M, Mcalpine CA (2010) Bayesian networks and adaptive management of wildlife habitat. Conserv Biol 24:974–983
Article Google Scholar
Hubert M, Vandervieren E (2008) An adjusted boxplot for skewed distributions. Comput Stat Data Anal 52:5186–5201
Article MathSciNet Google Scholar
Kleiber C, Zeileis A (2016) Visualizing count data regressions using rootograms. Am Stat 70:296–303
Article MathSciNet Google Scholar
Luenberger DG, Ye Y (2008) Linear and nonlinear Programming, 3rd edn. Springer, New York
Book Google Scholar
McCullagh P, Nelder JA (1989) Generalized linear models, 2nd edn. Chapman and Hall, London
Book Google Scholar
Nocedal J, Wright SJ (1999) Numerical optimization. Springer, New York
Book Google Scholar
Petterle RR, BonatWH, Kokonendji CC, Seganfredo JC,Moraes A, da SilvaMG (2019) Double poisson-tweedie regression models. Int J Biostat 15(1). https://doi.org/10.1515/ijb-2018-0119
Puig P, Valero J (2006) Count data distributions: some characterizations with applications. J Am Stat Assoc 101:332–340
Article MathSciNet Google Scholar
R Core Team (2019) R: a language and environment for statistical computing. R Foundation for Statistical Computing. Vienna, Austria. https://www.R-project.org/
Ribeiro Jr EE (2019) Contributions to the analysis of dispersed count data. Master’s thesis. Universidade de São Paulo. São Paulo
Ribeiro Jr EE, Zeviani WM, Bonat WH, Demetrio CG, Hinde J (2020) Reparametrization of COM–Poisson regression models with applications in the analysis of experimental data. Stat Model 5: 443–466
Ridout MS, Besbeas P (2004) An empirical model for underdispersed count data. Stat Model 4:77–89
Article MathSciNet Google Scholar
Sáez-Castillo A, Conde-Sánchez A (2013) A hyper-Poisson regression model for overdispersed and underdispersed count data. Comput Stat Data Anal 61:148–157
Article MathSciNet Google Scholar
Sellers KF, Shmueli G et al (2010) A flexible regression model for count data. Ann Appl Stat 4:943–961
Article MathSciNet Google Scholar
Shmueli G, Minka TP, Kadane JB, Borle S, Boatwright P (2005) A useful distribution for fitting discrete data: revival of the Conway–Maxwell–Poisson distribution. J R Stat Soc Ser C (Appl Stat) 54:127–142
Article MathSciNet Google Scholar
Wedderburn RWM (1974) Quasi-likelihood functions, generalized linear models, and the Gauss–Newton method. Biometrika 61:439–447
MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Departamento de Estatística, Universidade Federal do Rio Grande do Norte, Natal, RN, 59078-970, Brazil
Marcelo Bourguignon & Rodrigo M. R. de Medeiros

Authors

Marcelo Bourguignon
View author publications
You can also search for this author in PubMed Google Scholar
Rodrigo M. R. de Medeiros
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

11749_2022_801_MOESM1_ESM.pdf

The Supplementary Material includes additional details and calculations of the score function, the Hessian matrix, and the Fisher information matrix of the BerG regression model. (151KB)

A. Chi-square Q–Q plots of the score, Wald, likelihood ratio, and gradient statistics in the second simulation study

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bourguignon, M., de Medeiros, R.M.R. A simple and useful regression model for fitting count data. TEST 31, 790–827 (2022). https://doi.org/10.1007/s11749-022-00801-6

Download citation

Received: 29 March 2021
Accepted: 21 January 2022
Published: 21 February 2022
Issue Date: September 2022
DOI: https://doi.org/10.1007/s11749-022-00801-6

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

A simple and useful regression model for fitting count data

Abstract

Access this article

Similar content being viewed by others

Sampling Techniques for Quantitative Research

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

Violating the normality assumption may be the lesser of two evils

References