Marginalized Zero-Altered Models for Longitudinal Count Data

Tabb, Loni Philip; Tchetgen, Eric J. Tchetgen; Wellenius, Greg A.; Coull, Brent A.

doi:10.1007/s12561-015-9136-6

Marginalized Zero-Altered Models for Longitudinal Count Data

Published: 22 September 2015

Volume 8, pages 181–203, (2016)
Cite this article

Statistics in Biosciences Aims and scope Submit manuscript

Loni Philip Tabb¹,
Eric J. Tchetgen Tchetgen^2,4,
Greg A. Wellenius³ &
…
Brent A. Coull⁴

356 Accesses
3 Citations
Explore all metrics

Abstract

Count data often exhibit more zeros than predicted by common count distributions like the Poisson or negative binomial. In recent years, there has been considerable interest in methods for analyzing zero-inflated count data in longitudinal or other correlated data settings. A common approach has been to extend zero-inflated Poisson models to include random effects that account for correlation among observations. However, these models have been shown to have a few drawbacks, including interpretability of regression coefficients and numerical instability of fitting algorithms even when the data arise from the assumed model. To address these issues, we propose a model that parameterizes the marginal associations between the count outcome and the covariates as easily interpretable log relative rates, while including random effects to account for correlation among observations. One of the main advantages of this marginal model is that it allows a basis upon which we can directly compare the performance of standard methods that ignore zero inflation with that of a method that explicitly takes zero inflation into account. We present simulations of these various model formulations in terms of bias and variance estimation. Finally, we apply the proposed approach to analyze toxicological data of the effect of emissions on cardiac arrhythmias.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Maximum likelihood based analysis of equally spaced longitudinal count data with first-order antedependence and overdispersion

Article Open access 08 November 2016

A comparison of zero-inflated and hurdle models for modeling zero-inflated count data

Article Open access 24 June 2021

Conditional Poisson models: a flexible alternative to conditional logistic case cross-over analysis

Article Open access 24 November 2014

References

Akaike H (1987) Factor analysis and AIC. Psychometrika 52(3):317–332
Article MathSciNet MATH Google Scholar
Albert A, Anderson JA (1984) On the existence of maximum likelihood estimates in logistic regression models. Biometrika 71(1):1–10
Article MathSciNet MATH Google Scholar
Everitt BS (1998) The cambridge dictionary of statistics. Cambridge University Press, Cambridge
MATH Google Scholar
Hall DB (2000) Zero-inflated poisson and binomial regression with random effects: a case study. Biometrics 56(4):1030–1039
Article MathSciNet MATH Google Scholar
Hall DB, Berenhaut KS (2002) Score tests for heterogeneity and overdispersion in zero-inflated poisson and binomial regression models. Can J Stat 30(3):415–430
Article MathSciNet MATH Google Scholar
Hall DB, Zhang Z (2004) Marginal models for zero inflated clustered data. Stat Model 4(3):161–180
Article MathSciNet MATH Google Scholar
Hampel FR (1974) The influence curve and its role in robust estimation. J Am Stat Assoc 69(346):383–393
Article MathSciNet MATH Google Scholar
Heagerty PJ (1999) Marginally specified logistic-normal models for longitudinal binary data. Biometrics 55(3):688–698
Article MathSciNet MATH Google Scholar
Heagerty PJ, Zeger SL (2000) Marginalized multilevel models and likelihood inference (with comments and a rejoinder by the authors). Stat Sci 15(1):1–26
MathSciNet Google Scholar
Kassahun W, Neyens T, Molenberghs G, Faes C, Verbeke G (2014) Marginalized multilevel hurdle and zero-inflated models for overdispersed and correlated count data with excess zeros. Stat Med 33(25):4402–4419
Article MathSciNet Google Scholar
Lachenbruch PA (2002) Analysis of data with excess zeros. Stat Methods Med Res 11(4):297–302
Article MATH Google Scholar
Lambert D (1992) Zero-inflated poisson regression, with an application to defects in manufacturing. Technometrics 34(1):1–14
Article MATH Google Scholar
Lesaffre E, Albert A (1989) Partial separation in logistic discrimination. J R Stat Soc Ser B 51:109–116
MathSciNet MATH Google Scholar
Lu S-E, Lin Y, Shih W-CJ (2004) Analyzing excessive no changes in clinical trials with clustered data. Biometrics 60(1):257–267
Article MathSciNet MATH Google Scholar
Miglioretti DL, Heagerty PJ (2004) Marginal modeling of multilevel binary data with time-varying covariates. Biostatistics 5(3):381–398
Article MATH Google Scholar
Min Y, Agresti A (2005) Random effect models for repeated measures of zero-inflated count data. Stat Model 5(1):1–19
Article MathSciNet MATH Google Scholar
Olsen MK, Schafer JL (2001) A two-part random-effects model for semicontinuous longitudinal data. J Am Stat Assoc 96(454):730–745
Article MathSciNet MATH Google Scholar
Philip LP (2010) Multilevel models for zero-inflated count data in environmental health and health disparities research. Ph.D. thesis, Harvard University
Qu A, Song PX-K (2004) Assessing robustness of generalised estimating equations and quadratic inference functions. Biometrika 91(2):447–459
Article MathSciNet MATH Google Scholar
Ridout M, Demétrio CGB, Hinde J (1998) Models for count data with many zeros. Proceedings of the sixth international biometric conference, vol. 19, pp 179–192
Ridout M, Hinde J, DemeAtrio CGB (2001) A score test for testing a zero-inflated poisson regression model against zero-inflated negative binomial alternatives. Biometrics 57(1):219–223
Article MathSciNet MATH Google Scholar
Rousseeuw FR, Hampel EM, Ronchetti PJ, Stahel WA (1986) Robust statistics: the approach based on influence functions. Wiley, New York
MATH Google Scholar
Schildcrout JS, Heagerty PJ (2007) Marginalized models for moderate to long series of longitudinal binary response data. Biometrics 63(2):322–331
Article MathSciNet MATH Google Scholar
Shankar V, Milton J, Mannering F (1997) Modeling accident frequencies as zero-altered probability processes: an empirical inquiry. Accid Anal Prev 29(6):829–837
Article Google Scholar
Tooze JA, Grunwald GK, Jones RH (2002) Analysis of repeated measures data with clumping at zero. Stat Methods Med Res 11(4):341–355
Article MATH Google Scholar
Wellenius Gregory A, Diaz Edgar A, Gupta Tarun, Ruiz Pablo A, Long Mark, Kang Choong Min, Coull Brent A, Godleski John J (2011) Electrocardiographic and respiratory responses to coal-fired power plant emissions in a rat model of acute myocardial infarction: results from the toxicological evaluation of realistic emissions of source aerosols study. Inhal Toxicol 23(S2):84–94
Article Google Scholar
Yau KKW, Lee AH (2001) Zero-inflated poisson regression with random effects to evaluate an occupational injury prevention programme. Stat Med 20(19):2907–2920
Article MathSciNet Google Scholar
Zeger SL, Liang K-Y, Albert PS (1988) Models for longitudinal data: a generalized estimating equation approach. Biometrics 44:1049–1060
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

The authors would like to acknowledge the following NIH Grants: ES07142, CA134294, ES012044, and ES00002.

Author information

Authors and Affiliations

Department of Epidemiology & Biostatistics, School of Public Health, Drexel University, Philadelphia, PA, USA
Loni Philip Tabb
Department of Epidemiology, Harvard School of Public Health, Boston, MA, USA
Eric J. Tchetgen Tchetgen
Department of Community Health, Brown University, Boston, MA, USA
Greg A. Wellenius
Department of Biostatistics, Harvard School of Public Health, Boston, MA, USA
Eric J. Tchetgen Tchetgen & Brent A. Coull

Authors

Loni Philip Tabb
View author publications
You can also search for this author in PubMed Google Scholar
Eric J. Tchetgen Tchetgen
View author publications
You can also search for this author in PubMed Google Scholar
Greg A. Wellenius
View author publications
You can also search for this author in PubMed Google Scholar
Brent A. Coull
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Loni Philip Tabb.

Ethics declarations

Conflicts of interest

No conflict of interest.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 40 KB)

Appendices

Appendix 1: Calculation of $\Delta _{ij}$

In solving for $\Delta _{ij}$, we need to solve the convolution equation that links the marginal (i.e, $\mu _{ij}^{Y} = \hbox {E}(Y_{ij}|X_{ij}) = \hbox {exp} (\beta X_{ij})$) and conditional means, where, assuming the MZAP setting,

$$\begin{aligned} \mu _{ij}^{Y}= & {} \hbox {exp}(X_{ij} \beta ) \\= & {} \int [1-P(Y_{ij}=0|X_{ij},b_{i})] \frac{\mu _{ij}^{b}}{[1 - \hbox {exp}(-\mu _{ij}^{b})]} \phi (b_{i}|\sigma ) \mathrm{d}b_{i} \\= & {} \int \frac{[1-\hbox {exp}[-\hbox {exp}(\gamma _{1} + \gamma _{2} (\Delta _{ij} + b_{i}))]]}{[1 - \hbox {exp}[-\hbox {exp}(\Delta _{ij} + b_{i})]]} \hbox {exp}(\Delta _{ij} + b_{i}) \phi (b_{i}|\sigma ) \mathrm{d}b_{i} . \end{aligned}$$

Estimates of $\Delta _{ij}$ can be obtained using a Newton–Raphson algorithm, such that

$$\begin{aligned} \Delta _{ij}^{(t+1)}= & {} \Delta _{ij}^{(t)} - \left( \frac{\partial f(\Delta _{ij})}{\partial \Delta _{ij}}\right) ^{-1} \times f(\Delta _{ij}) , \end{aligned}$$

where $\Delta _{ij} = \Delta _{ij} (\beta , \gamma _{1}, \gamma _{2}, \sigma )$ and $f(\Delta _{ij})$ refers to the convolution equation above. The derivative needed for the Newton–Raphson algorithm is as follows

$$\begin{aligned} \frac{\partial }{\partial \Delta _{ij}} \mu _{ij}^{Y}= & {} \frac{\partial }{\partial \Delta _{ij}} \int \frac{[1-\hbox {exp}[-\hbox {exp}(\gamma _{1} + \gamma _{2} (\Delta _{ij} + b_{i}))]]}{[1 - \hbox {exp}[-\hbox {exp}(\Delta _{ij} + b_{i})]]} \hbox {exp}(\Delta _{ij} + b_{i}) \phi (b_{i}|\sigma ) \mathrm{d}b_{i} \\= & {} \int \left\{ \frac{\partial }{\partial \Delta _{ij}} \frac{[1-\hbox {exp}[-\hbox {exp}(\gamma _{1} + \gamma _{2} (\Delta _{ij} + b_{i}))]]}{[1 - \hbox {exp}[-\hbox {exp}(\Delta _{ij} + b_{i})]]} \hbox {exp}(\Delta _{ij} + b_{i}) \right\} \phi (b_{i}|\sigma ) \mathrm{d}b_{i} . \end{aligned}$$

After using the chain rule, $\diamond = \left\{ \frac{\partial }{\partial \Delta _{ij}} \frac{[1-\hbox {exp}[-\hbox {exp}(\gamma _{1} + \gamma _{2} (\Delta _{ij} + b_{i}))]]}{[1 - \hbox {exp}[-\hbox {exp}(\Delta _{ij} + b_{i})]]} \hbox {exp}(\Delta _{ij} + b_{i}) \right\} $ results in

$$\begin{aligned} \diamond= & {} \{ 1 - e^{-e^{\gamma _{1} + \gamma _{2} (\Delta _{ij} + b_{i})}} \} * \left\{ \frac{[1 - e^{-e^{\Delta _{ij} + b_{i}}}]e^{\Delta _{ij} + b_{i}} - e^{\Delta _{ij} + b_{i}}[-e^{-e^{\Delta _{ij} + b_{i}}}][-e^{\Delta _{ij} + b_{i}}]}{[1 - e^{-e^{\Delta _{ij} + b_{i}}}]^{2}} \right\} \\&+ \left\{ \frac{e^{\Delta _{ij} + b_{i}}}{1 - e^{-e^{\Delta _{ij} + b_{i}}}} \right\} * \{ e^{-e^{\gamma _{1} + \gamma _{2} (\Delta _{ij} + b_{i})}} [e^{\gamma _{1} + \gamma _{2} (\Delta _{ij} + b_{i})}] \gamma _{2} \} \\= & {} \frac{e^{\Delta _{ij} + b_{i}}}{1 - e^{-e^{\Delta _{ij} + b_{i}}}} * [ \{ 1 - e^{-e^{\gamma _{1} + \gamma _{2} (\Delta _{ij} + b_{i})}} \}* \left\{ 1 - \frac{[e^{-e^{\Delta _{ij} + b_{i}}}] [e^{\Delta _{ij} + b_{i}}]}{1 - e^{-e^{\Delta _{ij} + b_{i}}}} \right\} \\&+ \{ e^{-e^{\gamma _{1} + \gamma _{2} (\Delta _{ij} + b_{i})}} [e^{\gamma _{1} + \gamma _{2} (\Delta _{ij} + b_{i})}] \gamma _{2} \} ] . \end{aligned}$$

Gauss–Hermite quadrature can be used to evaluate this one-dimensional integral.

Appendix 2: SAS Execution via PROC NLMIXED

Marginalized zero-altered Poisson (MZAP) and marginalized zero-altered negative binomial (MZANB) models

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tabb, L.P., Tchetgen, E.J.T., Wellenius, G.A. et al. Marginalized Zero-Altered Models for Longitudinal Count Data. Stat Biosci 8, 181–203 (2016). https://doi.org/10.1007/s12561-015-9136-6

Download citation

Received: 03 March 2015
Accepted: 08 September 2015
Published: 22 September 2015
Issue Date: October 2016
DOI: https://doi.org/10.1007/s12561-015-9136-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Marginalized Zero-Altered Models for Longitudinal Count Data

Abstract

Access this article

Similar content being viewed by others

Maximum likelihood based analysis of equally spaced longitudinal count data with first-order antedependence and overdispersion

A comparison of zero-inflated and hurdle models for modeling zero-inflated count data

Conditional Poisson models: a flexible alternative to conditional logistic case cross-over analysis

References

Acknowledgments