Abstract
In recent decades, marginal structural models have gained popularity for proper adjustment of time-dependent confounders in longitudinal studies through time-dependent weighting. When the marginal model is a Cox model, using current standard statistical software packages was thought to be problematic because they were not developed to compute standard errors in the presence of time-dependent weights. We address this practical modelling issue by extending the standard calculations for Cox models with case weights to time-dependent weights and show that the coxph procedure in R can readily compute asymptotic robust standard errors. Through a simulation study, we show that the robust standard errors are rather conservative, though corresponding confidence intervals have good coverage. A second contribution of this paper is to introduce a Cox score bootstrap procedure to compute the standard errors. We show that this method is efficient and tends to outperform the non-parametric bootstrap in small samples.
Similar content being viewed by others
References
Andersen PK, Gill RD (1982) Cox’s regression model for counting processes: a large sample study. Ann Stat 10(4):1100–1120
Binder DA (1983) On the variances of asymptotically normal estimators from complex surveys. Int Stat Rev 51(3):279–292. http://www.jstor.org/stable/1402588
Binder DA (1992) Fitting Cox’s proportional hazards models from survey data. Biometrika 79(1):139–147
Breslow N (1974) Covariance analysis of censored survival data. Biometrics 30(1):89–99. http://www.jstor.org/stable/2529620
Burr D (1994) A comparison of certain bootstrap confidence intervals in the Cox model. J Am Stat Assoc 89(428):1290–1302
Cheng G, Huang J (2010) Bootstrap consistency for general semiparametric M-estimation. Ann Stat 38(5):2884–2915. doi:10.1214/10-AOS809
Cook NR, Cole SR, Hennekens CH (2002) Use of a marginal structural model to determine the effect of aspirin on cardiovascular mortality in the physicians’ health study. Am J Epidemiol 155(11):1045–1053. doi:10.1093/aje/155.11.1045, http://aje.oxfordjournals.org/cgi/content/abstract/155/11/1045, http://aje.oxfordjournals.org/cgi/reprint/155/11/1045.pdf
D’Agostino RB, Lee M, Belanger AJ, Cupples A (1990) Relation of pooled logistic regression to time dependent Cox regression analysis: the Framingham heart study. Stat Med 9(12):1501–1515
Hernan MA, Brumback B, Robins JM (2000) Marginal structural models to estimate the causal effect of zidovudine on the survival of hiv-positive men. Epidemiology 11(5):561–570. http://www.jstor.org/stable/3703998
Hogan J, Lee J (2004) Marginal structural quantile models for longitudinal observational studies with time-varying treatment. Stat Sin 14:927–944
Insightful (2001) S-PLUS 8: guide to statistics, vol 2. Insightful Corporation, Seattle
Kline P, Santos A (2011) A score based approach to wild bootstrap inference. Tech. Rep. NBER-TWP-16127. The National Bureau of Economic Research
Lin DY, Wei LJ (1989) The robust inference for the Cox proportional hazards model. J Am Stat Assoc 84(408):1074–1078. http://www.jstor.org/stable/2290085
Petersen M, Deeks S, Martin J, Van der Lann M (2007) History-adjusted marginal structural models for estimating time-varying effect modification. Am J Epidemiol 166(9):985–993
R Development Core Team (2010) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. http://www.R-project.org. ISBN 3-900051-07-0
Robins J (1992) Estimation of the time-dependent accelerated failure time model in the presence of confounding factors. Biometrika 79(2):321–334. doi:10.1093/biomet/79.2.321, http://biomet.oxfordjournals.org/cgi/reprint/79/2/321.pdf
Robins J (1998) Marginal structural models. In: 1997 Proceedings of the section on Bayesian statistical science. American Statistical Association, Alexandria, pp 1–10
Robins JM, Hernan MA, Brumback B (2000) Marginal structural models and causal inference in epidemiology. Epidemiology 11(5):550–560. http://www.jstor.org/stable/3703997
Robins J, Hernan M, Siebert U (2005) Effects of multiple interventions. In: Ezzati M, Lopez A, Rodgers A, Murray C (eds) Comparative quantification of health risks: global and regional burden of diseases attributable to selected major risks, chap 28. World Health Organization, Geneva, p 2207
Therneau TM, Grambsch PM (2001) Modeling survival data: extending the Cox model. Springer, New York
Wu C (1986) Jackknife, bootstrap and other resampling methods in regression analysis (with discussions). Ann Stat 14:1161–1350
Xiao Y, Abrahamowicz M, Moodie E (2010) Accuracy of conventional and marginal structural Cox model estimators: a simulation study. Int J Biostat 6(2):1–28
Young J, Hernan M, Picciottol S, Robins J (2008) Simulation from structural survival models under complex time-varying data structures. In: JSM proceedings, section on statistics in epidemiology, American Statistical Association, Denver
Young J, Hernan M, Picciotto S, Robins J (2009) Relation between three classes of structural models for the effect of a time-varying exposure on survival. Lifetime Data Anal 16(1):71–84
Acknowledgments
We thank Tony Desmond, Gerarda Darlington, Babette Brumback and 2 anonymous reviewers for helpful comments. We also thank Erica Moodie for providing code to generate data and Thomas Gerds for his input on computing issues. Simulations were performed on the Shared Hierarchical Academic Research Computing Network (SHARCNET: www.sharcnet.ca) through Compute/Calcul Canada. This work was supported by NSERC.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix 1
We now derive expression (10) for \(u_i({\hat{\beta }})\) in (9). First, we re-express (5) as follows:
where \({\tilde{G}}(t)\) is as defined in (11). Suppressing the brackets in functions, and taking a first-order Taylor series expansion of (5) around \({\tilde{S}}^{(0)}= S^{(0)}, {\tilde{S}}^{(1)}= S^{(1)}\), and \({\tilde{G}}=G\), we have
where
The remainder terms of the Taylor series expansion are negligible because \({\tilde{S}}^{(0)}, {\tilde{S}}^{(1)}\) and \({\tilde{G}}\) are consistent estimates of \(S^{(0)}, S^{(1)}\) and \(G\) respectively. Plugging in the above partial derivatives, and noting that \(\sum _{i=1}^n w_i(t_i)\delta _i (S^{(1)}/S^{(0)}) = \left( S^{(1)}/{S^{(0)}}\right) {\tilde{G}}\), the right-hand side of Eq. (17) can be reduced to
After substituting Eqs. (6) for \(S^{(0)}\) and \(S^{(1)}\) in the last two terms of Eq. (18), approximating the second term with \(\int _0^{\infty }(S^{(1)}/S^{(0)})d{\tilde{G}}(t)\), and interchanging the order of integration and summation, we get
Hence, (18) is asymptotically equivalent to \({\tilde{U}}({\hat{\beta }})\) in (9), where \(u_i({\hat{\beta }})\) is as given in (10), i.e.
Further, since
then we have
which by (3) equals zero.
In other words, \(\tilde{\mathbf{U }}({\hat{\beta }})\) is a consistent, though not necessarily unbiased estimator of \(0\). It can easily be seen that if weights \(sw_i^*(t)\) were used, the resulting coefficient estimates would still be consistent and slightly biased.
Appendix 2
In this appendix, we show how one can do parameter estimation using the Breslow approximation, and then provide a toy example that demonstrates that coxph can accommodate time-dependent weights when computing asymptotic standard errors. First, we re-write the partial log-likelihood, score vector and Fisher information matrix such that we can easily compute them from data.
There is a term in the (partial) likelihood function for every event. When there are multiple subjects who have an event at the same time, i.e., event times are tied, the Breslow approximation does not assume that the exact time of any death is unique. Hence the contribution to the likelihood is simply the ratio of each subject’s score to the sum of scores for all subjects at risk just before the event time ( i.e., any subject for which \(Y(t_i) = 1\) for event time \(t_i\)). The log-likelihood is computed as follows (comparing to Eq. (3.1)):
where \(W_i = \delta _i w_i(t_i)\), and \(A_i = \left( B_i - \ln \left( \sum _j^n C_j(t_i) \right) \right) \) with
Define
It is easy to verify that \({\bar{X}}(t) = {\tilde{S}}^{(1)}({\hat{\beta }}^{\prime },t)/{\tilde{S}}^{(0)}({\hat{\beta }}^{\prime },t)\). Further, we can re-write the respective score vector in Eq. (5) and Fisher information matrix in Eq. (7) as follows:
In fact, we can simplify the information matrix even further as follows:
for \(k,l = 1, \ldots , p\) where \(X_{kj}(t_i)\) is subject \(j\)’s value of the \(k\)th covariate. Similarly, \({\bar{X}}_k(t_i)\) is the \(k\)th component of \({\bar{X}}(t_i)\). For variance estimation of the model coefficients, we re-write Eq. (9) as:
Let \({\tilde{U}}\) be a \(n \times 1\) vector containing the \(i\)th contribution to \({\tilde{U}}({\hat{\beta }})\), for \(i = 1, \ldots , n\). Then we have,
The final sandwich estimator for producing robust variance estimates is given by \( V = (J^{-1}V_{{\tilde{U}}}J^{-1}) = (J^{-1}{\tilde{U}}^{\prime }) ({\tilde{U}}J^{-1}) \). Using the equations detailed in this section, in the examples that follow we will need to compute \(W_i, B_i, C_j(t_i), \sum _{j=1}^n C_j(t_i)\) as well as \({\bar{X}}(t_i)\). We will use these quantities in estimating parameters from the data set presented in the next section.
1.1 Worked out implementation of fitting a MSCM to data
The first six columns of Table 7 present a data set that contains eight subjects observed over 1–6 time intervals, comprising 23 observations. There are two covariates: \(x_1\) is a binary baseline variable, while \(x_2\) is binary but time-dependent. The column ‘wt’ shows the time-dependent weights associated with each subject at each visit. For convenience, subjects are ordered based on their respective failure times. The remaining five columns detail much of the preliminary calculations needed for parameter estimation, and used implicitly in future calculations.
Since there are four failures in the data, there are four terms in the log-likelihood. Let \(d_1 = 11r_1r_2+5r_1+11, d_3 = 8r_1r_2+2r_1+12, d_5 = 3r_1+12r_2+6\) and \(d_6 = 4r_1+8r_2+6\). The corresponding log-likelihood, score vector and Fisher information matrix are as follows:
Setting \(U=0\) and solving for \(\beta \) we find that \({\hat{\beta }}_1 = 0.5705749\) and \({\hat{\beta }}_2 = 2.1112007\). Before computing \(LL, U\) and \(J\) we perform preliminary calculations for the four observed failure times in Table 8. The values of these statistics at the initial and final parameter estiamtes are provided in Table 9. Table 10 contains the score residuals for each subject. Finally, we can compute the variance–covariance matrix for the parameters using
giving the standard errors of \({\hat{\beta }}_1\) and \({\hat{\beta }}_2\) as 0.082976 and 1.320177, respectively.
If the data were analyzed in R and stored in a coxph.object called fit, then the quantities evaluated in this section could be compared to the corresponding coxph output as follows:
The MLE calculated here matches that of coxph, as do the standard errors. Hence, this appendix demonstrates that the R code for computing the standard errors in coxph can accommodate time-dependent weights.
Rights and permissions
About this article
Cite this article
Ali, R.A., Ali, M.A. & Wei, Z. On computing standard errors for marginal structural Cox models. Lifetime Data Anal 20, 106–131 (2014). https://doi.org/10.1007/s10985-013-9255-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10985-013-9255-7