Skip to main content
Log in

Marginalized Zero-Altered Models for Longitudinal Count Data

  • Published:
Statistics in Biosciences Aims and scope Submit manuscript

Abstract

Count data often exhibit more zeros than predicted by common count distributions like the Poisson or negative binomial. In recent years, there has been considerable interest in methods for analyzing zero-inflated count data in longitudinal or other correlated data settings. A common approach has been to extend zero-inflated Poisson models to include random effects that account for correlation among observations. However, these models have been shown to have a few drawbacks, including interpretability of regression coefficients and numerical instability of fitting algorithms even when the data arise from the assumed model. To address these issues, we propose a model that parameterizes the marginal associations between the count outcome and the covariates as easily interpretable log relative rates, while including random effects to account for correlation among observations. One of the main advantages of this marginal model is that it allows a basis upon which we can directly compare the performance of standard methods that ignore zero inflation with that of a method that explicitly takes zero inflation into account. We present simulations of these various model formulations in terms of bias and variance estimation. Finally, we apply the proposed approach to analyze toxicological data of the effect of emissions on cardiac arrhythmias.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. Akaike H (1987) Factor analysis and AIC. Psychometrika 52(3):317–332

    Article  MathSciNet  MATH  Google Scholar 

  2. Albert A, Anderson JA (1984) On the existence of maximum likelihood estimates in logistic regression models. Biometrika 71(1):1–10

    Article  MathSciNet  MATH  Google Scholar 

  3. Everitt BS (1998) The cambridge dictionary of statistics. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  4. Hall DB (2000) Zero-inflated poisson and binomial regression with random effects: a case study. Biometrics 56(4):1030–1039

    Article  MathSciNet  MATH  Google Scholar 

  5. Hall DB, Berenhaut KS (2002) Score tests for heterogeneity and overdispersion in zero-inflated poisson and binomial regression models. Can J Stat 30(3):415–430

    Article  MathSciNet  MATH  Google Scholar 

  6. Hall DB, Zhang Z (2004) Marginal models for zero inflated clustered data. Stat Model 4(3):161–180

    Article  MathSciNet  MATH  Google Scholar 

  7. Hampel FR (1974) The influence curve and its role in robust estimation. J Am Stat Assoc 69(346):383–393

    Article  MathSciNet  MATH  Google Scholar 

  8. Heagerty PJ (1999) Marginally specified logistic-normal models for longitudinal binary data. Biometrics 55(3):688–698

    Article  MathSciNet  MATH  Google Scholar 

  9. Heagerty PJ, Zeger SL (2000) Marginalized multilevel models and likelihood inference (with comments and a rejoinder by the authors). Stat Sci 15(1):1–26

    MathSciNet  Google Scholar 

  10. Kassahun W, Neyens T, Molenberghs G, Faes C, Verbeke G (2014) Marginalized multilevel hurdle and zero-inflated models for overdispersed and correlated count data with excess zeros. Stat Med 33(25):4402–4419

    Article  MathSciNet  Google Scholar 

  11. Lachenbruch PA (2002) Analysis of data with excess zeros. Stat Methods Med Res 11(4):297–302

    Article  MATH  Google Scholar 

  12. Lambert D (1992) Zero-inflated poisson regression, with an application to defects in manufacturing. Technometrics 34(1):1–14

    Article  MATH  Google Scholar 

  13. Lesaffre E, Albert A (1989) Partial separation in logistic discrimination. J R Stat Soc Ser B 51:109–116

    MathSciNet  MATH  Google Scholar 

  14. Lu S-E, Lin Y, Shih W-CJ (2004) Analyzing excessive no changes in clinical trials with clustered data. Biometrics 60(1):257–267

    Article  MathSciNet  MATH  Google Scholar 

  15. Miglioretti DL, Heagerty PJ (2004) Marginal modeling of multilevel binary data with time-varying covariates. Biostatistics 5(3):381–398

    Article  MATH  Google Scholar 

  16. Min Y, Agresti A (2005) Random effect models for repeated measures of zero-inflated count data. Stat Model 5(1):1–19

    Article  MathSciNet  MATH  Google Scholar 

  17. Olsen MK, Schafer JL (2001) A two-part random-effects model for semicontinuous longitudinal data. J Am Stat Assoc 96(454):730–745

    Article  MathSciNet  MATH  Google Scholar 

  18. Philip LP (2010) Multilevel models for zero-inflated count data in environmental health and health disparities research. Ph.D. thesis, Harvard University

  19. Qu A, Song PX-K (2004) Assessing robustness of generalised estimating equations and quadratic inference functions. Biometrika 91(2):447–459

    Article  MathSciNet  MATH  Google Scholar 

  20. Ridout M, Demétrio CGB, Hinde J (1998) Models for count data with many zeros. Proceedings of the sixth international biometric conference, vol. 19, pp 179–192

  21. Ridout M, Hinde J, DemeAtrio CGB (2001) A score test for testing a zero-inflated poisson regression model against zero-inflated negative binomial alternatives. Biometrics 57(1):219–223

    Article  MathSciNet  MATH  Google Scholar 

  22. Rousseeuw FR, Hampel EM, Ronchetti PJ, Stahel WA (1986) Robust statistics: the approach based on influence functions. Wiley, New York

    MATH  Google Scholar 

  23. Schildcrout JS, Heagerty PJ (2007) Marginalized models for moderate to long series of longitudinal binary response data. Biometrics 63(2):322–331

    Article  MathSciNet  MATH  Google Scholar 

  24. Shankar V, Milton J, Mannering F (1997) Modeling accident frequencies as zero-altered probability processes: an empirical inquiry. Accid Anal Prev 29(6):829–837

    Article  Google Scholar 

  25. Tooze JA, Grunwald GK, Jones RH (2002) Analysis of repeated measures data with clumping at zero. Stat Methods Med Res 11(4):341–355

    Article  MATH  Google Scholar 

  26. Wellenius Gregory A, Diaz Edgar A, Gupta Tarun, Ruiz Pablo A, Long Mark, Kang Choong Min, Coull Brent A, Godleski John J (2011) Electrocardiographic and respiratory responses to coal-fired power plant emissions in a rat model of acute myocardial infarction: results from the toxicological evaluation of realistic emissions of source aerosols study. Inhal Toxicol 23(S2):84–94

    Article  Google Scholar 

  27. Yau KKW, Lee AH (2001) Zero-inflated poisson regression with random effects to evaluate an occupational injury prevention programme. Stat Med 20(19):2907–2920

    Article  MathSciNet  Google Scholar 

  28. Zeger SL, Liang K-Y, Albert PS (1988) Models for longitudinal data: a generalized estimating equation approach. Biometrics 44:1049–1060

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

The authors would like to acknowledge the following NIH Grants: ES07142, CA134294, ES012044, and ES00002.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Loni Philip Tabb.

Ethics declarations

Conflicts of interest

No conflict of interest.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 40 KB)

Appendices

Appendix 1: Calculation of \(\Delta _{ij}\)

In solving for \(\Delta _{ij}\), we need to solve the convolution equation that links the marginal (i.e, \(\mu _{ij}^{Y} = \hbox {E}(Y_{ij}|X_{ij}) = \hbox {exp} (\beta X_{ij})\)) and conditional means, where, assuming the MZAP setting,

$$\begin{aligned} \mu _{ij}^{Y}= & {} \hbox {exp}(X_{ij} \beta ) \\= & {} \int [1-P(Y_{ij}=0|X_{ij},b_{i})] \frac{\mu _{ij}^{b}}{[1 - \hbox {exp}(-\mu _{ij}^{b})]} \phi (b_{i}|\sigma ) \mathrm{d}b_{i} \\= & {} \int \frac{[1-\hbox {exp}[-\hbox {exp}(\gamma _{1} + \gamma _{2} (\Delta _{ij} + b_{i}))]]}{[1 - \hbox {exp}[-\hbox {exp}(\Delta _{ij} + b_{i})]]} \hbox {exp}(\Delta _{ij} + b_{i}) \phi (b_{i}|\sigma ) \mathrm{d}b_{i} . \end{aligned}$$

Estimates of \(\Delta _{ij}\) can be obtained using a Newton–Raphson algorithm, such that

$$\begin{aligned} \Delta _{ij}^{(t+1)}= & {} \Delta _{ij}^{(t)} - \left( \frac{\partial f(\Delta _{ij})}{\partial \Delta _{ij}}\right) ^{-1} \times f(\Delta _{ij}) , \end{aligned}$$

where \(\Delta _{ij} = \Delta _{ij} (\beta , \gamma _{1}, \gamma _{2}, \sigma )\) and \(f(\Delta _{ij})\) refers to the convolution equation above. The derivative needed for the Newton–Raphson algorithm is as follows

$$\begin{aligned} \frac{\partial }{\partial \Delta _{ij}} \mu _{ij}^{Y}= & {} \frac{\partial }{\partial \Delta _{ij}} \int \frac{[1-\hbox {exp}[-\hbox {exp}(\gamma _{1} + \gamma _{2} (\Delta _{ij} + b_{i}))]]}{[1 - \hbox {exp}[-\hbox {exp}(\Delta _{ij} + b_{i})]]} \hbox {exp}(\Delta _{ij} + b_{i}) \phi (b_{i}|\sigma ) \mathrm{d}b_{i} \\= & {} \int \left\{ \frac{\partial }{\partial \Delta _{ij}} \frac{[1-\hbox {exp}[-\hbox {exp}(\gamma _{1} + \gamma _{2} (\Delta _{ij} + b_{i}))]]}{[1 - \hbox {exp}[-\hbox {exp}(\Delta _{ij} + b_{i})]]} \hbox {exp}(\Delta _{ij} + b_{i}) \right\} \phi (b_{i}|\sigma ) \mathrm{d}b_{i} . \end{aligned}$$

After using the chain rule, \(\diamond = \left\{ \frac{\partial }{\partial \Delta _{ij}} \frac{[1-\hbox {exp}[-\hbox {exp}(\gamma _{1} + \gamma _{2} (\Delta _{ij} + b_{i}))]]}{[1 - \hbox {exp}[-\hbox {exp}(\Delta _{ij} + b_{i})]]} \hbox {exp}(\Delta _{ij} + b_{i}) \right\} \) results in

$$\begin{aligned} \diamond= & {} \{ 1 - e^{-e^{\gamma _{1} + \gamma _{2} (\Delta _{ij} + b_{i})}} \} * \left\{ \frac{[1 - e^{-e^{\Delta _{ij} + b_{i}}}]e^{\Delta _{ij} + b_{i}} - e^{\Delta _{ij} + b_{i}}[-e^{-e^{\Delta _{ij} + b_{i}}}][-e^{\Delta _{ij} + b_{i}}]}{[1 - e^{-e^{\Delta _{ij} + b_{i}}}]^{2}} \right\} \\&+ \left\{ \frac{e^{\Delta _{ij} + b_{i}}}{1 - e^{-e^{\Delta _{ij} + b_{i}}}} \right\} * \{ e^{-e^{\gamma _{1} + \gamma _{2} (\Delta _{ij} + b_{i})}} [e^{\gamma _{1} + \gamma _{2} (\Delta _{ij} + b_{i})}] \gamma _{2} \} \\= & {} \frac{e^{\Delta _{ij} + b_{i}}}{1 - e^{-e^{\Delta _{ij} + b_{i}}}} * [ \{ 1 - e^{-e^{\gamma _{1} + \gamma _{2} (\Delta _{ij} + b_{i})}} \}* \left\{ 1 - \frac{[e^{-e^{\Delta _{ij} + b_{i}}}] [e^{\Delta _{ij} + b_{i}}]}{1 - e^{-e^{\Delta _{ij} + b_{i}}}} \right\} \\&+ \{ e^{-e^{\gamma _{1} + \gamma _{2} (\Delta _{ij} + b_{i})}} [e^{\gamma _{1} + \gamma _{2} (\Delta _{ij} + b_{i})}] \gamma _{2} \} ] . \end{aligned}$$

Gauss–Hermite quadrature can be used to evaluate this one-dimensional integral.

Appendix 2: SAS Execution via PROC NLMIXED

Marginalized zero-altered Poisson (MZAP) and marginalized zero-altered negative binomial (MZANB) models

figure a
figure b

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tabb, L.P., Tchetgen, E.J.T., Wellenius, G.A. et al. Marginalized Zero-Altered Models for Longitudinal Count Data. Stat Biosci 8, 181–203 (2016). https://doi.org/10.1007/s12561-015-9136-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12561-015-9136-6

Keywords

Navigation