Abstract
Simple logistic regression can be adapted to deal with right-censoring by inverse probability of censoring weighting (IPCW). We here compare two such IPCW approaches, one based on weighting the outcome, the other based on weighting the estimating equations. We study the large sample properties of the two approaches and show that which of the two weighting methods is the most efficient depends on the censoring distribution. We show by theoretical computations that the methods can be surprisingly different in realistic settings. We further show how to use the two weighting approaches for logistic regression to estimate causal treatment effects, for both observational studies and randomized clinical trials (RCT). Several estimators for observational studies are compared and we present an application to registry data. We also revisit interesting robustness properties of logistic regression in the context of RCTs, with a particular focus on the IPCW weighting. We find that these robustness properties still hold when the censoring weights are correctly specified, but not necessarily otherwise.
Similar content being viewed by others
References
Aalen O, Borgan Ø, Gjessing HK, Gjessing S (2008) Survival and event history analysis: a process point of view. Springer, Berlin
Andersen PK, Borgan Ø, Gill RD, Keiding N (1993) Statistical Models Based on Counting Processes. Springer, New york
Azarang L, Scheike T, de Uña-Álvarez J (2017) Direct modeling of regression effects for transition probabilities in the progressive illness-death model. Stat Med 36(12):1964–1976
Bang H, Robins JM (2005) Doubly robust estimation in missing data and causal inference models. Biometrics 61(4):962–973
Bang H, Tsiatis AA (2000) Estimating medical costs with censored data. Biometrika 87(2):329–343
Bartlett JW (2018) Covariate adjustment and estimation of mean response in randomised trials. Pharmaceutical stat 17(5):648–666
Colantuoni E, Scharfstein DO, Wang C, Hashem MD, Leroux A, Needham DM, Girard TD (2018) Statistical methods to compare functional outcomes in randomized controlled trials with high mortality. BMJ, 360
Cortese G, Holmboe SA, Scheike TH (2017) Regression models for the restricted residual mean life for right-censored and left-truncated data. Stat Med 36(11):1803–1822
DiRienzo A, Lagakos S (2001) Effects of model misspecification on tests of no randomized treatment effect arising from Cox’s proportional hazards model. J Royal Stat Society: Series B (Statistical Methodology) 63(4):745–757
EMA (2015). Guideline on adjustment for baseline covariates in clinical trials. https://www.ema.europa.eu/en/documents/scientific-guideline/guideline-adjustment-baseline-covariates-clinical-trials_en.pdf
FDA (2021). Adjusting for covariates in randomized clinical trials for drugs and biological products guidance for industry. https://www.fda.gov/media/148910/download
Fine JP, Gray RJ (1999) A proportional hazards model for the subdistribution of a competing risk. J Amer Stat Assoc 94(446):496–509
Geskus RB (2016) Data analysis with competing risks and intermediate states. CRC Press Boca Raton
Hernán M, Robins J (2020) Causal Inference: What If. Chapman & Hall/CRC, Boca Raton
Holst KK, Scheike T (2021) mets: Analysis of Multivariate Event Times. R package version 1(2):9
Holt A, Blanche P, Zareini B, Rajan D, El-Sheikh M, Schjerning A-M, Schou M, Torp-Pedersen C, McGettigan P, Gislason GH et al (2021) Effect of long-term beta-blocker treatment following myocardial infarction among stable, optimally treated patients without heart failure in the reperfusion era: a Danish, nationwide cohort study. Eur Heart J 42(9):907–914
Kim JP (2013) A Note on Using Regression Models to Analyze Randomized Trials: Asymptotically Valid Hypothesis Tests Despite Incorrectly Specified Models. Biometrics 69(1):282–289
Lin DY, Wei L-J (1989) The robust inference for the Cox proportional hazards model. J Amer stat Assoc 84(408):1074–1078
Loder E, Groves T, MacAuley D (2010) Registration of observational studies
Lok JJ, Yang S, Sharkey B, Hughes MD (2018) Estimation of the cumulative incidence function under multiple dependent and independent censoring mechanisms. Lifetime Data Anal 24(2):201–223
Lu X, Tsiatis AA (2008) Improving the efficiency of the log-rank test using auxiliary covariates. Biometrika 95(3):679–694
Luque-Fernandez MA, Schomaker M, Rachet B, Schnitzer ME (2018) Targeted maximum likelihood estimation for a binary treatment: A tutorial. Stat Med 37(16):2530–2546
Malani HM (1995) A modification of the redistribution to the right algorithm using disease markers. Biometrika 82(3):515–526
Martens MJ, Logan BR (2020) Group sequential tests for treatment effect on survival and cumulative incidence at a fixed time point. Lifetime Data Anal 26(3):603–623
Martinussen T, Vansteelandt S, Andersen PK (2020) Subtleties in the interpretation of hazard contrasts. Lifetime Data Anal 26(4):833–855
McCullagh P, Nelder JA (1989) Generalized Linear Models. Chapman and Hall, London, 2nd edition
Moore KL, van der Laan MJ (2009) Covariate adjustment in randomized trials with binary outcomes: targeted maximum likelihood estimation. Stat Med 28(1):39–64
Ozenne B, Sørensen AL, Scheike T, Torp-Pedersen C, Gerds TA (2017) riskRegression: predicting the risk of an event using Cox regression models. The R J 9(2):440–460
Ozenne BMH, Scheike TH, Stærk L, Gerds TA (2020) On the estimation of average treatment effects with right-censored time to event outcome and competing risks. Biom J 62(3):751–763
Pfeiffer RM, Gail MH (2017) Absolute Risk: Methods and Applications in Clinical Management and Public Health. CRC Press
Robins JM, Rotnitzky A (1992) In: Recovery of information and adjustment of dependent censoring using surrogate markers. AIDS Epidemiology-Methodological Issues. Birkhäuser, Boston, pp 24–33
Robinson LD, Jewell NP (1991) Some surprising results about covariate adjustment in logistic regression models. International Statistical Review/Revue Internationale de Statistique, pp. 227–240
Rosenblum M, Steingrimsson JA (2016) Matching the Efficiency Gains of the Logistic Regression Estimator While Avoiding its Interpretability Problems, in Randomized Trials. Technical report
Rossello X, Pocock SJ, Julian DG (2015) Long-term use of cardiovascular drugs: challenges for research and for patient care. J Amer Coll Cardiology 66(11):1273–1285
Rotnitzky A, Farall A, Bergesio A, Scharfstein D (2007) Analysis of failure time data under competing censoring mechanisms. J Royal Stat Soc: Series B (Stat Methodol) 69(3):307–327
Rufibach K (2019) Treatment effect quantification for time-to-event endpoints-Estimands, analysis strategies, and beyond. Pharmaceutical stat 18(2):145–165
Scharfstein DO, Rotnitzky A, Robins JM (1999) Adjusting for nonignorable drop-out using semiparametric nonresponse models. Journal of the American Statistical Association, 94(448):1096–1120 (with Rejoinder, 1135–1146)
Scheike T, Zhang M, Gerds T (2008) Predicting cumulative incidence probability by direct binomial regression. Biometrika 95:205–220
Schumacher M, Ohneberg K, Beyersmann J (2016) Competing risk bias was common in a prominent medical journal. Journal of Clinical Epidemiology 80:135–136
Stefanski LA, Boos DD (2002) The calculus of M-estimation. The Amer Stat 56(1):29–38
Stensrud MJ, Hernán MA (2020) Why test for proportional hazards? Jama 323(14):1401–1402
Struthers CA, Kalbfleisch JD (1986) Misspecified Proportional Hazard Models. Biometrika 73:363–369
Sutradhar R, Austin PC (2018) Relative rates not relative risks: addressing a widespread misinterpretation of hazard ratios. Ann epidemiology 28(1):54–57
Tsiatis A (2006) Semiparametric theory and missing data. Springer Science & Business Media
Uno H, Cai T, Tian L, Wei L (2007) Evaluating prediction rules for t-year survivors with censored regression models. J Amer Stat Assoc 102(478):527–537
Van der Laan MJ, Rose S (2011) Targeted learning: causal inference for observational and experimental data. Springer Science & Business Media
Van Der Laan MJ, Rubin D (2006) Targeted maximum likelihood learning. The international journal of biostatistics, 2(1)
Vansteelandt S, Martinussen T, Tchetgen ET (2014) On adjustment for auxiliary covariates in additive hazard models for the analysis of randomized experiments. Biometrika 101(1):237–244
Wang B, Susukida R, Mojtabai R, Amin-Esmaeili M, Rosenblum M (2021) Model-robust inference for clinical trials that improve precision by stratified randomization and covariate adjustment. Journal of the American Statistical Association, pp. 1–12
Young JG, Stensrud MJ, Tchetgen Tchetgen EJ, Hernán MA (2020) A causal framework for classical statistical estimands in failure-time settings with competing events. Stat Med 39(8):1199–1236
Zhang M, Tsiatis AA, Davidian M (2008) Improving efficiency of inferences in randomized clinical trials using auxiliary covariates. Biometrics 64(3):707–715
Zhang X, Zhang M-J (2011) SAS macros for estimation of direct adjusted cumulative incidence curves under proportional subdistribution hazards models. Comput methods programs in biomedicine 101(1):87–93
Zheng Y, Cai T, Feng Z (2006) Application of the time-dependent ROC curves for prognostic accuracy with multiple biomarkers. Biometrics 62(1):279–287
Acknowledgements
We are grateful to the editors as well as two referees for their constructive and useful comments.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Appendix A
Appendix A
1.1 Appendix A.1: Proof of Theorem 1
Overall, we follow similar lines as those of Bang and Tsiatis (2000). First, note that for all \(s\in [0,t]\),
with \(\Delta (s) = 1\!\! 1 \{ s \wedge T \le C \}\), \(Y(u)=1\!\! 1 \{ {\widetilde{T}}\ge u \}\) and \(M^c(u)=1\!\! 1 \{ {\widetilde{T}}\le u, \Delta =0 \} - \int _0^u Y(v)\lambda _c(v)dv \). The second equality in (15) has been pointed out by Robins and Rotnitzky (1992) for \(s=\infty \) and it can be shown as follows for any \(s\in [0,t]\) as follows.
because
Since \(G_c(0)=1\), it follows
Second, recall a well-known martingale integral representation for the Kaplan-Meier estimator, see e.g. Andersen et al. (1993, Sec. IV.3), for all \(s\in [0,t]\),
where \(Y_\bullet (u)=\sum _{i=1}^n 1\!\! 1 \{ {\widetilde{T}}_i\ge u \}\). Third, note that
where \({\widehat{S}} (u-)\) is the Kaplan-Meier estimator for \(S(u-)=P(T>u-)\). With the notations
we first note that
From (15), we get
and from (16), we get
and from (17), we get
where
because of independent censoring and the uniform convergence of the Kaplan-Meier estimator \({\widehat{G}}_c(\cdot )\) in [0, t]. Consequently,
For \(n^{1/2} {\widehat{U}}_{ipcw-glm}(\varvec{\beta })\), the result follows similarly, since the calculation is similar except from \({\varvec{X}}_i D_i(t)\) being replaced with \({\varvec{X}}_i \{ D_i(t) - Q(t,{\varvec{X}}_i)\}\) in \((*)\) and \((**)\).
1.2 Appendix A.2: variance estimator \({\widehat{\varvec{\Sigma }}}_m\)
From the above derivations, the same calculations without using
lead to \(n^{1/2} {\widehat{U}}_{oipcw}(\varvec{\beta }) = n^{-1/2} \sum _{i=1}^n \varvec{\epsilon }_i+ o_p(1)\) where
One can consistently estimate \(\varvec{\epsilon }_i\) by
where \({\widehat{M}}_i^c(s)=1\!\! 1 \{ {\widetilde{T}}_i \le s, \Delta _i =0 \} - \int _0^sY_i(v) d \widehat{\Lambda }_c(v)\), where \({\widehat{\Lambda }}_c(s)\) is the Nelson-Aalen estimator of the cumulative hazard of C at time s and
Consequently, one can consistently estimate \(\varvec{\Omega }_{oipcw}\) by \({\widehat{\varvec{\Omega }}}_{oipcw}= (1/n) \sum _{i=1}^n\widehat{\varvec{\epsilon }}_i\widehat{\varvec{\epsilon }}_i^T\). Similarly, to estimate \(\varvec{\Omega }_{ipcw-glm}\) we can use \({\widehat{\varvec{\Omega }}}_{ipcw-glm}= (1/n) \sum _{i=1}^n \widehat{\varvec{\omega }}_i \widehat{\varvec{\omega }}_i^T\) with
where
Consistent estimators of \(\varvec{{{\mathcal {I}}}}\) are \({\widehat{\varvec{{{\mathcal {I}}}}}}_m \!=\! n^{-1} \sum _{i=1}^n \left[ {\varvec{X}}_i^{2} Q(t,{\varvec{X}}_i,\widehat{\varvec{\beta }}_m) \left\{ 1 \!-\! Q(t,{\varvec{X}}_i,\widehat{\varvec{\beta }}_m) \right\} \right] \), for \(m=oipcw\) or \(ipcw-glm\). Finally, \(\varvec{\Sigma }_m= \varvec{{{\mathcal {I}}}}^{-1}\varvec{\Omega }_m\varvec{{{\mathcal {I}}}}^{-1}\) can be estimated by \({\widehat{\varvec{\Sigma }}}_m= {\widehat{\varvec{{{\mathcal {I}}}}}}_m^{-1} \widehat{\varvec{\Omega }}_{m} {\widehat{\varvec{{{\mathcal {I}}}}}}_m^{-1}\).
1.3 Appendix A.3: Proof of Proposition 4
A proof was provided by Bartlett (2018, Appendix A.2), in a slightly different context. We repeat the main arguments here for completeness. First, we note that a Taylor-expansion and the results from Sects. 2.5 and 2.7 imply that
with \(\varvec{\psi }_i=\varvec{{{\mathcal {I}}}}^{-1} \varvec{\Phi }\big \{{\varvec{X}}_i, D_i(t), {{\widetilde{T}}}_i, {\widetilde{\eta }}_i , t \big \}\). Here, \(\varvec{\Phi }\) denotes \(\varvec{\Phi }_m\) or \(\varvec{\Phi }_m^{Aug}\), for \(m=opicw\) or \(ipcw-glm\), depending on which estimator \({\widehat{\varvec{\beta }}}\) we plugged-in to define \({\widehat{F_1}}^g(t,a)\). Formulas for the different versions of \(\varvec{\Phi }\) can be found in Theorem 1 or in the proof of Proposition 2. Hence, it only remains to prove that \(\text{ Cov }\Big \{ {Q}(t,a,{\varvec{L}}) \, , \, {\varvec{B}}_{\varvec{\beta }}(a) \varvec{\psi }\Big \}= 0\). This can be done by expanding the covariance, using the conditional covariance formula, as
The second term is 0, because the first component is constant, conditional on \({\varvec{X}}\). We now explain why the first term is also 0. First note, that
and
Second, note that \(E\big [ {\varvec{X}}\big \{ D(t) - Q(t,{\varvec{X}})\big \} \, \big \vert \, {\varvec{X}}\big ]= {\varvec{X}}\big \{ F_1(t,{\varvec{X}}) - Q(t,{\varvec{X}})\big \} ={\varvec{0}}\) because the model is assumed to be well specified. Third, \(E\big [ \int _0^t \varvec{\varphi }\big ({\varvec{X}}, D(t),s\big ) \frac{dM^c(s)}{G_c(s)} \, \big \vert \, {\varvec{X}}\big ]=0\) follows by independent censoring and standard martingale theory (Aalen et al. 2008, Sec. 2.2).
1.4 Appendix B
1.5 Sketch of Proof for Theorem 2
A proof was provided by Rosenblum and Steingrimsson (2016) for the uncensored case. We here essentially repeat their main arguments, which also apply in our case, up to minor differences introduced by the IPCW weights. First, note that \({\widehat{\varvec{\beta }}}_{oipcw}\) is consistent for \(\varvec{\beta }\), where \(\varvec{\beta }\) is the solution to
with \({\varvec{X}}=(1,A,{\varvec{L}})^T\) and \(W(t,{\varvec{X}})=\Delta (t)/G_C(t \wedge {{\widetilde{T}}},{\varvec{X}})\), using a notation that emphasizes the (potential) dependence of the IPCW weight on \({\varvec{X}}\). Second, note that the first equation in (18), which corresponds to the first component of \({\varvec{X}}=(1,A,{\varvec{L}})^T\), is
where \(\pi (a)=P(A=a)\), for \(a=0,1\), and where the last equality follows because A is assumed independent of \({\varvec{L}}\) (randomization), hence the conditioning disappears in the expectations. Similarly, the second equation in (18), which corresponds to the second component of \({\varvec{X}}=(1,A,{\varvec{L}})^T\), is
Furthermore,
By plugging-in (19) and (20) into (21), we find
Further note that we also have the identities
Hence, Eqs. (23) and (20) give
The robustness result for the G-computation estimator, that is, \(E\big [ Q(t,(1,a,{\varvec{L}}),\varvec{\beta }) \big ] = F_1(t,a)\), for \(a=0,1\), therefore follows for OIPCW because \(E\big [ D(t) W(t,{\varvec{X}})| A=a \big ]=F_1(t,a)\), when the censoring adjustment is correct.
To show that \({\widehat{\beta }}_A\) converges in probability towards zero if and only if \(F_1(t,0) = F_1(t,1)\), we here repeat the argument of Rosenblum and Steingrimsson (2016). As, we have just shown that
and since \(x\mapsto \text{ expit }(x)\) is monotonically increasing, then it follows that \(F_1(t,0) - F_1(t,1)=0\) if and only if \(\beta _A=0\). This completes the proof for the OIPCW approach.
The same arguments apply for IPCW-GLM too. Briefly, instead of the solution to (18), we consider the solution to
Instead of (25) and (26), this leads to
Note that in the expectations in (27) and (28) the conditioning does not vanish as in (26) and (25), because of the weight \(W(t,{\varvec{X}})\). If the censoring adjustment is correct, then \(E\left[ W(t,{\varvec{X}}) \, \big | \, A=a,{\varvec{X}}\right] =1\) for \(a=0,1\), which together with the law of iterated expectations implies that the right-hand side of (27) and (28) are the same as that of (25) and (26). Hence (27) and (28) are equivalent to (26) and (25) and the rest of the proof is identical to that of OIPCW.
1.6 Sketch of Proof for Theorem 3
The proof exploits the three key equalities (29), (30) and (31) below. They are proven at the end of this section, after the proof of the theorem which exploits them. First, the stratified Kaplan-Meier estimator of \(P(C>s|A=a)\), for \(s\in [0,t]\), will converge to \({{\widetilde{G}}}_c(s,A)=\exp \left\{ -\int _0^s {\widetilde{\lambda }}_c(u,A) du\right\} \), where
where \({\tilde{\lambda }}_0(u)\) does not depends on \((A,{\varvec{L}})\). Second,
where \({{\widetilde{W}}}(t,A)=\Delta (t)/{{\widetilde{G}}}_c(t\wedge T,A)\), where the notation \({{\widetilde{W}}}(t,A)\) emphasizes both the mispecification of the censoring adjustment and the dependence on A. Third, for \(a=0,1\),
where \(r_c(T \wedge t,{\varvec{L}})\ge 0\) depends on \(({\varvec{L}},T)\) but not on A (see Sect. 7.2.3 for details).
Other useful results are “new versions” of (25) and (26) (for OIPCW) and (27) and (28) (for IPCW-GLM). The equations of (25), (26), (27) and (28) still hold under the weaker assumptions of Theorem 2, up to the minor difference that \(W(t,{\varvec{X}})\) should now be replaced by \({{\widetilde{W}}}(t,A)\). Indeed, (25), (26), (27) and (28) were derived without making any assumptions on the censoring adjustment. They were only consequences of the estimating equations. Below we refer to these “new versions” when we cite (25), (26), (27) or (28).
Let us now consider the OIPCW case. Equations (30), (25) and (26), imply
which proves that \({\widehat{F_1}}^g(t,1)-\widehat{F_1}^g(t,0)\) convergences to 0, for OIPCW. Using the same monotonicity argument as in the Proof of Theorem 2, this implies that \({\widehat{\beta }}_A\) also convergences to 0.
Let us now consider the IPCW-GLM case. Equations (30), (27) and (28), imply
which, using (31), further leads to
Because \(r_c(T \wedge t,{\varvec{L}})> 0\) and \(x\mapsto \text{ expit }(x)\) is monotonically increasing, then it follows that \(\beta _A=0\), which proves that \({\widehat{\beta }}_A\) convergences to 0. This further implies that \(E\big [ Q(t,(1,1,{\varvec{L}})^T,\varvec{\beta }) \big ]=E\big [ Q(t,(1,0,{\varvec{L}})^T,\varvec{\beta }) \big ]\), which proves that \({\widehat{F_1}}^g(t,1)-\widehat{F_1}^g(t,0)\) convergences to 0.
1.6.1 Proof of (29)
By construction, the stratified Kaplan-Meier estimator of \(P(C>s|A=a)\), for \(s\in [0,t]\), will converge to \(\widetilde{G}_c(s,A)=\exp \left\{ - \int _0^s {\widetilde{\lambda }}_c(u,A) du\right\} \), where
First, let’s look at the numerator,
where \(G_c(u,A,{\varvec{L}})=P(C>u \vert A,{\varvec{L}})\) and \(S(u,{\varvec{L}})= 1- F(u,A,{\varvec{L}}) - F_2(u,A,{\varvec{L}})\), which depends on \({\varvec{L}}\) but not on A, by the assumption of no conditional treatment effects. This further leads to
with
where the conditioning on A in the expectations vanishes because we assumed \(A\perp \!\!\! \perp L\) (randomization). Similarly, it follows \(P( C>u, T>u \, \vert \, A)= \tau (A,u) \gamma (u)\). Hence, \(\widetilde{\lambda }_c(u,A) = {\widetilde{\lambda }}_0(u) + A \lambda _1(u)\) with \({\widetilde{\lambda }}_0(u)= \lambda _0(u) + \xi (u)/\gamma (u)\).
1.6.2 Proof of (30)
Using result (29) and the above definition of \(\widetilde{G}_c(t,A)\), we now show (30). First, we note that result result (29) implies that the ratio \(r_c(t,{\varvec{L}}) = G_c(t,A,{\varvec{L}})/{{\widetilde{G}}}_C(t,A) = \exp [ - \int _0^t \{\lambda _0(s) + \lambda _2(s,{\varvec{L}}) - {\widetilde{\lambda }}_0(s)\}ds ]\) does not depend on A, but on \({\varvec{L}}\) and t (hence the notation \(r_c(t,{\varvec{L}})\)). Therefore, we have
1.6.3 Proof of (31)
As in the Proof of (30), here again we use that (29) implies that \(r_c(t,{\varvec{L}}) = G_c(t,A,{\varvec{L}})/{{\widetilde{G}}}_C(t,A)\) does not depend on A, but on \({\varvec{L}}\) and t. Therefore,
where the last equality follows from the randomization assumption \(A \perp \!\!\! \perp {\varvec{L}}\) and the assumption of no conditional treatment effect, which further implies \(A \perp \!\!\! \perp T\).
Rights and permissions
About this article
Cite this article
Blanche, P.F., Holt, A. & Scheike, T. On logistic regression with right censored data, with or without competing risks, and its use for estimating treatment effects. Lifetime Data Anal 29, 441–482 (2023). https://doi.org/10.1007/s10985-022-09564-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10985-022-09564-6