Abstract
Count data often exhibit more zeros than predicted by common count distributions like the Poisson or negative binomial. In recent years, there has been considerable interest in methods for analyzing zero-inflated count data in longitudinal or other correlated data settings. A common approach has been to extend zero-inflated Poisson models to include random effects that account for correlation among observations. However, these models have been shown to have a few drawbacks, including interpretability of regression coefficients and numerical instability of fitting algorithms even when the data arise from the assumed model. To address these issues, we propose a model that parameterizes the marginal associations between the count outcome and the covariates as easily interpretable log relative rates, while including random effects to account for correlation among observations. One of the main advantages of this marginal model is that it allows a basis upon which we can directly compare the performance of standard methods that ignore zero inflation with that of a method that explicitly takes zero inflation into account. We present simulations of these various model formulations in terms of bias and variance estimation. Finally, we apply the proposed approach to analyze toxicological data of the effect of emissions on cardiac arrhythmias.
Similar content being viewed by others
References
Akaike H (1987) Factor analysis and AIC. Psychometrika 52(3):317–332
Albert A, Anderson JA (1984) On the existence of maximum likelihood estimates in logistic regression models. Biometrika 71(1):1–10
Everitt BS (1998) The cambridge dictionary of statistics. Cambridge University Press, Cambridge
Hall DB (2000) Zero-inflated poisson and binomial regression with random effects: a case study. Biometrics 56(4):1030–1039
Hall DB, Berenhaut KS (2002) Score tests for heterogeneity and overdispersion in zero-inflated poisson and binomial regression models. Can J Stat 30(3):415–430
Hall DB, Zhang Z (2004) Marginal models for zero inflated clustered data. Stat Model 4(3):161–180
Hampel FR (1974) The influence curve and its role in robust estimation. J Am Stat Assoc 69(346):383–393
Heagerty PJ (1999) Marginally specified logistic-normal models for longitudinal binary data. Biometrics 55(3):688–698
Heagerty PJ, Zeger SL (2000) Marginalized multilevel models and likelihood inference (with comments and a rejoinder by the authors). Stat Sci 15(1):1–26
Kassahun W, Neyens T, Molenberghs G, Faes C, Verbeke G (2014) Marginalized multilevel hurdle and zero-inflated models for overdispersed and correlated count data with excess zeros. Stat Med 33(25):4402–4419
Lachenbruch PA (2002) Analysis of data with excess zeros. Stat Methods Med Res 11(4):297–302
Lambert D (1992) Zero-inflated poisson regression, with an application to defects in manufacturing. Technometrics 34(1):1–14
Lesaffre E, Albert A (1989) Partial separation in logistic discrimination. J R Stat Soc Ser B 51:109–116
Lu S-E, Lin Y, Shih W-CJ (2004) Analyzing excessive no changes in clinical trials with clustered data. Biometrics 60(1):257–267
Miglioretti DL, Heagerty PJ (2004) Marginal modeling of multilevel binary data with time-varying covariates. Biostatistics 5(3):381–398
Min Y, Agresti A (2005) Random effect models for repeated measures of zero-inflated count data. Stat Model 5(1):1–19
Olsen MK, Schafer JL (2001) A two-part random-effects model for semicontinuous longitudinal data. J Am Stat Assoc 96(454):730–745
Philip LP (2010) Multilevel models for zero-inflated count data in environmental health and health disparities research. Ph.D. thesis, Harvard University
Qu A, Song PX-K (2004) Assessing robustness of generalised estimating equations and quadratic inference functions. Biometrika 91(2):447–459
Ridout M, Demétrio CGB, Hinde J (1998) Models for count data with many zeros. Proceedings of the sixth international biometric conference, vol. 19, pp 179–192
Ridout M, Hinde J, DemeAtrio CGB (2001) A score test for testing a zero-inflated poisson regression model against zero-inflated negative binomial alternatives. Biometrics 57(1):219–223
Rousseeuw FR, Hampel EM, Ronchetti PJ, Stahel WA (1986) Robust statistics: the approach based on influence functions. Wiley, New York
Schildcrout JS, Heagerty PJ (2007) Marginalized models for moderate to long series of longitudinal binary response data. Biometrics 63(2):322–331
Shankar V, Milton J, Mannering F (1997) Modeling accident frequencies as zero-altered probability processes: an empirical inquiry. Accid Anal Prev 29(6):829–837
Tooze JA, Grunwald GK, Jones RH (2002) Analysis of repeated measures data with clumping at zero. Stat Methods Med Res 11(4):341–355
Wellenius Gregory A, Diaz Edgar A, Gupta Tarun, Ruiz Pablo A, Long Mark, Kang Choong Min, Coull Brent A, Godleski John J (2011) Electrocardiographic and respiratory responses to coal-fired power plant emissions in a rat model of acute myocardial infarction: results from the toxicological evaluation of realistic emissions of source aerosols study. Inhal Toxicol 23(S2):84–94
Yau KKW, Lee AH (2001) Zero-inflated poisson regression with random effects to evaluate an occupational injury prevention programme. Stat Med 20(19):2907–2920
Zeger SL, Liang K-Y, Albert PS (1988) Models for longitudinal data: a generalized estimating equation approach. Biometrics 44:1049–1060
Acknowledgments
The authors would like to acknowledge the following NIH Grants: ES07142, CA134294, ES012044, and ES00002.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
No conflict of interest.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Appendices
Appendix 1: Calculation of \(\Delta _{ij}\)
In solving for \(\Delta _{ij}\), we need to solve the convolution equation that links the marginal (i.e, \(\mu _{ij}^{Y} = \hbox {E}(Y_{ij}|X_{ij}) = \hbox {exp} (\beta X_{ij})\)) and conditional means, where, assuming the MZAP setting,
Estimates of \(\Delta _{ij}\) can be obtained using a Newton–Raphson algorithm, such that
where \(\Delta _{ij} = \Delta _{ij} (\beta , \gamma _{1}, \gamma _{2}, \sigma )\) and \(f(\Delta _{ij})\) refers to the convolution equation above. The derivative needed for the Newton–Raphson algorithm is as follows
After using the chain rule, \(\diamond = \left\{ \frac{\partial }{\partial \Delta _{ij}} \frac{[1-\hbox {exp}[-\hbox {exp}(\gamma _{1} + \gamma _{2} (\Delta _{ij} + b_{i}))]]}{[1 - \hbox {exp}[-\hbox {exp}(\Delta _{ij} + b_{i})]]} \hbox {exp}(\Delta _{ij} + b_{i}) \right\} \) results in
Gauss–Hermite quadrature can be used to evaluate this one-dimensional integral.
Appendix 2: SAS Execution via PROC NLMIXED
Marginalized zero-altered Poisson (MZAP) and marginalized zero-altered negative binomial (MZANB) models
Rights and permissions
About this article
Cite this article
Tabb, L.P., Tchetgen, E.J.T., Wellenius, G.A. et al. Marginalized Zero-Altered Models for Longitudinal Count Data. Stat Biosci 8, 181–203 (2016). https://doi.org/10.1007/s12561-015-9136-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12561-015-9136-6