A causal proportional hazards estimator under homogeneous or heterogeneous selection in an IV setting


In this paper we present a framework to do estimation in a structural Cox model when there may be unobserved confounding. The model is phrased in terms of a selection bias function and a baseline model that describes how covariates affect the survival time in a scenario without exposure. In this way model congeniality is ensured. The method uses an instrumental variable. Interestingly, the formulated model turns out to have similarities to the so-called Cox–Aalen survival model for the observed data. We exploit this to enhance estimation of the unknown parameters. This also allows us to derive large sample properties of the proposed estimator.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2


  1. Aalen OO (1989) A linear regression model for the analysis of life times. Stat Med 8:907–925

    Article  Google Scholar 

  2. Andersen PK, Borgan Ø, Gill RD, Keiding N (1993) Statistical models based on counting processes. Springer, New York

    Book  Google Scholar 

  3. Angrist JD, Imbens GW (1994) Identification and estimation of local average treatment effects. Econometrica 62:467–475

    Article  Google Scholar 

  4. Chen L, Lin DY, Zeng D (2010) Attributable fraction functions for censored event times. Biometrika 97:713–726

    MathSciNet  Article  Google Scholar 

  5. Clarke PS, Windmeijer F (2010) Identification of causal effects on binary outcomes using structural mean models. Biostatistics 11:756–70

    Article  Google Scholar 

  6. Clarke PS, Windmeijer F (2012) Instrumental variable estimators for binary outcomes. J Am Stat Assoc 107:1638–1652

    MathSciNet  Article  Google Scholar 

  7. Cuzick J, Sasieni P, Myles J, Tyrer J (2007) Estimating the effect of treatment in a proportional hazards model in the presence of non-compliance and contamination. J R Stat Soc Ser B 69:565–588

    MathSciNet  Article  Google Scholar 

  8. Hernan MA, Robins JM (2006) Instruments for causal inference. Epidemiology 17:360–372

    Article  Google Scholar 

  9. Kosorok MR (2008) Introduction to empirical processes and semiparametric inference. Springer, New York

    Book  Google Scholar 

  10. Lin DY, Wei LJ, Ying Z (1993) Checking the Cox model with cumulative sums of martingale-based residuals. Biometrika 80:557–572

    MathSciNet  Article  Google Scholar 

  11. Loeys T, Goetghebeur E (2003) A causal proportional hazards estimator for the effect of treatment actually received in a randomized trial with all-or-nothing compliance. Biometrics 59:100–105

    MathSciNet  Article  Google Scholar 

  12. MacKenzie TA, Tosteson TD, Morden NE, Stukel TA, O’Malley AJ (2014) Using instrumental variables to estimate a Cox’s proportional hazards regression subject to additive confounding. Health Serv Outcomes Res Methodol 14:54–68

    Article  Google Scholar 

  13. MacKenzie TA, Løberg M, O’Malley AJ (2016) Patient centered hazard ratio estimation using principal stratification weights: application to the NORCCAP randomized trial of colorectal cancer screening. Obs Stud 2:29–50

    Google Scholar 

  14. Martínez-Camblor P, Mackenzie T, Staiger DO, Goodney PP, O’Malley AJ (2017) Adjusting for bias introduced by instrumental variable estimation in the Cox proportional hazards model. Biostatistics 20:80–96

    MathSciNet  Article  Google Scholar 

  15. Martinussen T, Scheike TH (2006) Dynamic regression models for survival data, vol 102. Springer, New York

    MATH  Google Scholar 

  16. Martinussen T, Sørensen D, Vansteelandt S (2017a) Instrumental variables estimation under a structural Cox model. Biostatistics 20:65–79

    MathSciNet  Article  Google Scholar 

  17. Martinussen T, Vansteelandt S, Tchetgen Tchetgen EJ, Zucker DM (2017b) Instrumental variables estimation of exposure effects on a time-to-event endpoint using structural cumulative survival models. Biometrics 73(4):1140–1149

    MathSciNet  Article  Google Scholar 

  18. Nordestgaard BG, Palmer TM, Benn M, Zacho J, Tybjaerg-Hansen A, Davey Smith G, Timpson NJ (2012) The effect of elevated body mass index on ischemic heart disease risk: causal estimates from a mendelian randomisation approach. PLoS Med 9(5):e1001212

    Article  Google Scholar 

  19. Richardson TS, Robins, JM (2013) Single World Intervention Graphs (SWIGs): a unification of the counterfactual and graphical approaches to causality. Technical Report 128, Center for Statistics and the Social Sciences, University of Washington

  20. Robins J, Rotnitzky A (2004) Estimation of treatment effects in randomised trials with non-compliance and a dichotomous outcome using structural mean models. Biometrika 91:763–783

    MathSciNet  Article  Google Scholar 

  21. Tchetgen Tchetgen EJ, Vansteelandt S (2013) Alternative identification and inference for the effect of treatment on the treated with an instrumental variable. Harvard University Biostatistics Working Paper Series

  22. Tchetgen Tchetgen EJ, Walter S, Vansteelandt S, Martinussen T, Glymour M (2015) Instrumental variable estimation in a survival context. Epidemiology 26:402–10

    Article  Google Scholar 

  23. Tsiatis A (2006) Semiparametric theory and missing data. Springer, New York

    MATH  Google Scholar 

  24. Vansteelandt S, Goetghebeur E (2003) Causal inference with generalized structural mean models. J R Stat Soc Ser B 65:817–835

    MathSciNet  Article  Google Scholar 

  25. Vansteelandt S, Bowden J, Babanezhad M, Goetghebeur E (2011) On instrumental variables estimation of causal odds ratios. Stat Sci 26:403–422

    MathSciNet  Article  Google Scholar 

Download references


We are grateful to Shoaib Afzal and Børge Nordestgaard for giving us access to the CGPS-data.

Author information



Corresponding author

Correspondence to Torben Martinussen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Large sample properties

Appendix: Large sample properties

Large sample properties of the estimator of A

The consistency of \({{\hat{A}}}(t, \phi _0)\) may be shown similar to what is done in Martinussen et al. (2017b). We will here focus on the asymptotic distribution of \({{\hat{A}}}(t, \phi _0)\). To this end, define the \(p \times n\) matrix H as

$$\begin{aligned} H (s, \phi , b)&= \{Y(s, \phi , b)^TW(s)Y(s, \phi , b)\}^{-1}Y(s, \phi , b)^TW(s) \end{aligned}$$

for \(p=k+2\) and k the dimension of C. Then we can write

$$\begin{aligned} {{\hat{A}}}(t, \phi _0)&= \int _0^t H\{s, \phi _0, {{\hat{B}}}_X(s-)\}dN(s). \end{aligned}$$

Notice that we can write the true nuisance parameter \(A_0\) as

$$\begin{aligned} A_0(t)&= \int _0^tH\{s, \phi _0, B^0_X(s)\}Y\{s,\phi _0, B^0_X(s)\}dA_0(s)\\&=\int _0^tH\{s, \phi _0, B^0_X(s)\} dN(s) - \int _0^tH \{s, \phi _0, B^0_X(s)\}dM(s) . \end{aligned}$$

Now \(n^{1/2}\{{{\hat{A}}} (t, \phi _0)-A_0(t)\}\) can be expressed as

$$\begin{aligned}&n^{1/2}\int _0^tH \{s, \phi _0, B^0_X(s)\}dM(s)\\&\quad +\, n^{1/2}\int _0^t[H \{s, \phi _0,{{\hat{B}}}_X(s-)\}-H \{s, \phi _0,B^0_X(s-)\}]dN(s). \end{aligned}$$

First we take a closer look at the last integral in this expression. By a Taylor approximation we have

$$\begin{aligned}&n^{1/2}\int _0^t[H \{s, \phi _0,{{\hat{B}}}_X(s-)\}-H \{s, \phi _0,B^0_X(s-)\}]dN(s)\\&\quad =n^{1/2}\int _0^tV(ds) \{{{\hat{A}}} (s-, \phi _0)-A_0(s-)\} +o_P(1) \end{aligned}$$

where V is the \(p\times p\) matrix

$$\begin{aligned} V(s)=\frac{\partial H \{s, \phi _0, B_X^0(s-)\}N(s)}{\partial A_0(s)^T}. \end{aligned}$$

As H only depends on the first element of \(A_0\) namely \(B_X^0\) then the first column of V is non-zero and the rest consist of zeros. Define \( Z(t, \phi _0) = n^{1/2}\{{{\hat{A}}} (t, \phi _0)-A_0(t)\}^T\) and from the above calculations we see that it satisfies the following Volterra equation (Andersen et al. 1993, p. 91)

$$\begin{aligned} Z(t, \phi _0)= n^{1/2}\int _0^t M(ds)^T H \{s, \phi _0,B^0_X(s-)\}^T + \int _0^t Z(s-, \phi _0) V(ds)^T , \end{aligned}$$

and the solution is

$$\begin{aligned} n^{1/2} \int _0^t M(ds)^TH\{s, \phi _0, B^0_X(s-)\}^T{\mathcal {Q}}(s,t) \end{aligned}$$

where \({\mathcal {Q}}\) is the product integral

as defined in Andersen et al. (1993). Finally, we have that

$$\begin{aligned} n^{1/2}\{{{\hat{A}}}(t, \phi _0)-A_0(t)\}&=n^{1/2} \int _0^t \mathcal Q(s,t)^T H\{s, \phi _0,B^0_X(s-)\} dM(s). \end{aligned}$$

In this expression \([n^{-1}Y\{s, \phi _0, B^0_X(s-)\}^TW(s)Y\{s, \phi _0, B^0_X(s-)\}]^{-1}\) converges in probability to some \(p\times p\) matrix that we denote \(l_1(s)\). Also \({\mathcal {Q}}(s, t)^T\) converges to some limit in probability that is denoted l(st). Further, one can show the convergence in distribution of \(n^{-1/2}Y\{s, \phi _0, B^0_X(s-)\}^TW(s) d M(s)\) to a mean zero process. Then we have the i.i.d. representation of \({{\hat{A}}}(t, \phi _0)\)

$$\begin{aligned} n^{1/2}\{{{\hat{A}}} (t, \phi _0)-A_0(t)\}&= n^{-1/2}\sum _{i=1}^n \epsilon _i^A (t) + o_P(1) \end{aligned}$$


$$\begin{aligned} \epsilon _i^A(t)&= \int _0^t l (s,t) l_1(s)Y_i\{s, \phi _0,B^0_X(s-)\}^TW_i(s) d M_i(s) , \end{aligned}$$

where the elements of \(\epsilon _i^A\) are denoted \((\epsilon _i^{B_X}, \epsilon _i^{\varOmega _C}, \epsilon _i^{\varOmega _0})^T\). This representation ensures convergence of the finite dimensional distribution. Convergence in distribution as a process can be shown similarly to what is done in Martinussen et al. (2017b).

Large sample properties of \({{\hat{\psi }}}\)

We first note that

$$\begin{aligned} n^{-1/2}U({{\hat{\psi }}})&=n^{-1/2}U({{\hat{\psi }}},{{\hat{\theta }}}) =n^{-1/2}U(\phi _0)+\{n^{-1}D_{\psi }U\}n^{-1/2}({{\hat{\psi }}}\\&\quad -\psi _0)+\{n^{-1}D_{ \theta }U\}n^{-1/2}({{\hat{\theta }}}- \theta _0)+o_p(1) \end{aligned}$$


$$\begin{aligned} n^{-1/2}({{\hat{\psi }}}-\psi _0)=-\{n^{-1}D_{\psi }U\}^{-1}[n^{-1/2}U(\phi _0)+\{n^{-1}D_{ \theta }U\}n^{-1/2}({{\hat{\theta }}}- \theta _0)]+o_p(1) \end{aligned}$$

and since we have assumed \({{\hat{\theta }}}\) to be RAL we just need to find the asymptotic distribution of \(n^{-1/2}U(\phi _0)\). We can write this function as

$$\begin{aligned} n^{-1/2}U(\phi _0)&= n^{-1/2}\int _0^\tau X^T[dN(t) - Y\{t, \phi _0,B^0_X(t)\}dA_0(t)]\nonumber \\&\quad - n^{-1/2}\int _0^\tau X^TY\{t, \phi _0, {{\hat{B}}}_X(t-)\}\{d{{\hat{A}}}(t, \phi _0)- d A_0(t)\}\end{aligned}$$
$$\begin{aligned}&\quad -\, n^{-1/2}\int _0^\tau X^T[Y \{t, \phi _0, {{\hat{B}}}_X(t-)\}-Y \{t, \phi _0,B^0_X(t)\}] d A_0(t) . \end{aligned}$$

The first term on the right hand side of the latter display is the martingale

$$\begin{aligned} n^{-1/2}\int _0^\tau X^TdM(t)&= n^{-1/2}\sum _{i=1}^n\int _0^\tau X_idM_i (t). \end{aligned}$$

Note that \(Y \{t, \phi _0, {{\hat{B}}}_X(t-)\}\) and \(Y \{t, \phi _0,B^0_X(t)\}\) share all entries except for the first column. If we let \(Y_{*1}\) denote the first column of Y then (15) can be written as

$$\begin{aligned} -n^{-1/2}\int _0^\tau X^T[Y_{*1} \{t, \phi _0, {{\hat{B}}}_X(t-)\}-Y_{*1} \{t, \phi _0,B^0_X(t)\}] dB^0_X(t) . \end{aligned}$$

Using a Taylor expansion this is asymptotically equivalent to

$$\begin{aligned}&-n^{-1/2}\int _0^\tau \frac{\partial X^T Y_{*1}\{t, \phi _0,B^0_X(t)\}}{\partial B_X^0(t)}\{{{\hat{B}}}_X(t-)-B_{X0}(t)\} dB^0_X(t). \end{aligned}$$

Let \(d_{B_X}(t)\) denote the limit in probability of the derivative \(n^{-1}\frac{\partial X^T Y_{*1}\{t, \phi _0,B^0_X(t)\}}{\partial B_X^0(t)}\). By the i.i.d. representation of \(n^{1/2}\{{{\hat{B}}}_X(t-)-B_X^0(t)\}\) derived earlier in this Appendix, (15) is seen to be asymptotically equivalent to the following sum of zero-mean i.i.d. terms

$$\begin{aligned} -n^{-1/2}\sum _{i=1}^n\int _0^\tau d_{B_X}(t) \epsilon ^{B_X}_i(t-) dB^0_X(t). \end{aligned}$$

For the latter of these integrals, (14), we have convergence in distribution of its integrand \(n^{1/2}\{{{\hat{A}}}(t, \phi _0)-A_0(t)\}\) and the integral can be written as (Kosorok 2008, Lemma 4.2)

$$\begin{aligned} -n^{-1/2}\int _0^\tau X^TY\{t, \phi _0,B^0_X(t)\}\{d{{\hat{A}}}(t, \phi _0)- d A_0(t)\} \end{aligned}$$

since \(n^{-1}[X^TY\{t, \phi _0, {{\hat{B}}}_X(t-)\}-X^TY\{t, \phi _0,B^0_X(t)\}]\) converges in probability to 0. Denote the limit in probability of \(n^{-1}X^TY\{t, \phi _0, B_X^0(t)\}\) as \(l_{XY}(t)\). Thus it is asymptotically equivalent to

$$\begin{aligned} -n^{-1/2}\sum _{i=1}^n\int _0^\tau l_{XY}(t) d\epsilon _i^{A}(t) \end{aligned}$$

Finally, we have that \(n^{-1/2} U(\psi _0) = n^{-1/2}\sum _{i=1}^n\epsilon ^U_i+o_p(1)\), where

$$\begin{aligned} \epsilon ^U_i = \int _0^\tau X_idM_i (t) -\int _0^\tau d_{B_X}(t) \epsilon ^{B_X}_i(t) dB^0_X(t)-\int _0^\tau l_{XY}(t) d\epsilon _i^{A}(t). \end{aligned}$$


$$\begin{aligned} n^{1/2}({{\hat{\psi }}}-\psi _0) =n^{-1/2}\sum _{i=1}^n\epsilon ^{\psi }_i+o_p(1) \end{aligned}$$


$$\begin{aligned} \epsilon ^{\psi }_i=-\mathcal{J}_{\psi }^{-1}\{\epsilon ^U_i+{{{\mathcal {J}}}}_{\theta }\epsilon ^{\theta }_i\}, \end{aligned}$$

with \({{{\mathcal {J}}}}_{\psi }\) denotes the limit in probability of \(n^{-1}D_{\psi }U\) and similarly with \({{{\mathcal {J}}}}_{\theta }\). Based on the above derivations, the i.i.d. decomposition of \(n^{1/2} \{ {{\hat{A}}}(t, {{\hat{\phi }}})-A_0(t)\}\) can easily be obtained since

$$\begin{aligned} n^{1/2}\{{{\hat{A}}}(t,{{\hat{\phi }}})-A_0(t)\}&= n^{1/2}\{{{\hat{A}}}(t,\hat{\phi })-{{\hat{A}}}(t, \phi _0)\} + n^{1/2}\{{{\hat{A}}}(t,\phi _0)- A_0(t)\} \\&= n^{1/2} \{D_{\psi } {{\hat{A}}}(t, \phi _0)\}\{{{\hat{\psi }}}-\psi _0\} + n^{1/2} \{D_{\theta } {{\hat{A}}}(t, \phi _0)\}\{{{\hat{\theta }}}-\theta _0\}\\&\quad +\, n^{1/2}\{{{\hat{A}}}(t,\psi _0)- A_0(t)\} \end{aligned}$$

where \(D_{\psi } {{\hat{A}}}(t, \psi _0)\) converges in probability to some limit as \(n\rightarrow \infty \). Hence also,

$$\begin{aligned} n^{1/2}\{{{\hat{A}}}(t,{{\hat{\phi }}})-A_0(t)\}=n^{-1/2}\sum _{i=1}^n {{\tilde{\epsilon }}}_i^A (t) + o_p(1), \end{aligned}$$


$$\begin{aligned} {{\tilde{\epsilon }}}_i^A (t)= \epsilon _i^A (t)+ \mathcal{A}_{\psi }\epsilon ^{\psi }_i + {{{\mathcal {A}}}}_{\theta }\epsilon ^{\theta }_i, \end{aligned}$$

with \( {{{\mathcal {A}}}}_{\psi }\) denoting the limit in probability of \(D_{\psi } {{\hat{A}}}(t, \phi _0)\), and similarly with \( \mathcal{A}_{\theta }\).

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Sørensen, D.N., Martinussen, T. & Tchetgen Tchetgen, E. A causal proportional hazards estimator under homogeneous or heterogeneous selection in an IV setting. Lifetime Data Anal 25, 639–659 (2019). https://doi.org/10.1007/s10985-019-09476-y

Download citation


  • Causal effect
  • Structural Cox model
  • Instrumental variable
  • Treatment effect on the treated
  • Selection bias function