Conditional screening for ultra-high dimensional covariates with survival outcomes


Identifying important biomarkers that are predictive for cancer patients’ prognosis is key in gaining better insights into the biological influences on the disease and has become a critical component of precision medicine. The emergence of large-scale biomedical survival studies, which typically involve excessive number of biomarkers, has brought high demand in designing efficient screening tools for selecting predictive biomarkers. The vast amount of biomarkers defies any existing variable selection methods via regularization. The recently developed variable screening methods, though powerful in many practical setting, fail to incorporate prior information on the importance of each biomarker and are less powerful in detecting marginally weak while jointly important signals. We propose a new conditional screening method for survival outcome data by computing the marginal contribution of each biomarker given priorily known biological information. This is based on the premise that some biomarkers are known to be associated with disease outcomes a priori. Our method possesses sure screening properties and a vanishing false selection rate. The utility of the proposal is further confirmed with extensive simulation studies and analysis of a diffuse large B-cell lymphoma dataset. We are pleased to dedicate this work to Jack Kalbfleisch, who has made instrumental contributions to the development of modern methods of analyzing survival data.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3


  1. Barut E, Fan J, Verhasselt A (2016) Conditional sure independence screening. J Am Stat Assoc 116:544–557

    MathSciNet  Google Scholar 

  2. Binder H, Schumacher M (2009) Incorporating pathway information into boosting estimation of high-dimensional risk prediction models. BMC Bioinform 10:18

    Article  Google Scholar 

  3. Chow ML, Moler EJ, Mian IS (2001) Identifying marker genes in transcription profiling data using a mixture of feature relevance experts. Physiol Genomics 5:99–111

    Article  Google Scholar 

  4. Deb K, Reddy AR (2003) Reliable classification of two-class cancer data using evolutionary algorithms. BioSystems 72:111–129

    Article  Google Scholar 

  5. Fan J, Lv J (2008) Sure independence screening for ultrahigh dimensional feature space (with discussion). J R Stat Soc B 70:849–911

    MathSciNet  Article  Google Scholar 

  6. Fan J, Feng Y, Wu Y (2010) High-dimensional variable selection for Cox’s proportional hazards model. In: IMS collections borrowing strength: theory powering applications—A Festschrift for Lawrence D. Brown, vol 6, pp 70–86. Institute of Mathematical Statistics, Beachwood

  7. Gui J, Li H (2005) Penalized Cox regression analysis in the high-dimensional and low-sample size settings, with applications to microarray gene expression data. Bioinformatics 21(13):3001–3008

    Article  Google Scholar 

  8. Hong H, Wang L, He X (2016) A data-driven approach to conditional screening of high dimensional variables. Stat 5(1):200–212

  9. Jiang Y, He Y, Zhang H (2015) Variable selection with prior information for generalized linear models via the prior Lasso method. J Am Stat Assoc 111(513):355–376

    MathSciNet  Article  Google Scholar 

  10. Li H, Luan Y (2005) Boosting proportional hazards models using smoothing splines, with applications to high-dimensional microarray data. Bioinformatics 21(10):2403–2409

    Article  Google Scholar 

  11. Lin DY, Wei LJ (1989) The robust inference for the Cox proportional hazards model. J Am Stat Assoc 84(408):1074–1078

    MathSciNet  Article  MATH  Google Scholar 

  12. Liu XY, Liang Y, Xu ZB, Zhang H, Leung KS (2013) Adaptive \(l_{1/2}\) shooting regularization method for survival analysis using gene expression data. Sci World J 2013:475702

  13. Mikovits J, Ruscetti F, Zhu W, Bagni R, Dorjsuren D, Shoemaker R (2001) Potential cellular signatures of viral infections in human hematopoietic cells. Dis Markers 17(3):173–178

    Article  Google Scholar 

  14. Rosenwald A, Wright G, Chan W, Connors J, Campo E et al (2002) The use of molecular profiling to predict survival after chemotherapy for diffuse large-b-cell lymphoma. N Engl J Med 346(25):1937–1947

    Article  Google Scholar 

  15. Schifano ED, Strawderman RL, Wells MT (2010) Mm algorithms for minimizing nonsmoothly penalized objective functions. Electron J Stat 4:1258–1299

    MathSciNet  Article  MATH  Google Scholar 

  16. Song R, Lu W, Ma S, Jeng XJ (2014) Censored rank independence screening for high-dimensional survival data. Biometrika 101(4):799–814

    MathSciNet  Article  MATH  Google Scholar 

  17. Stewart AK, Schuh AC (2000) White cells 2: impact of understanding the molecular basis of haematological malignant disorders on clinical practice. Lancet 355(9213):1447–1453

    Article  Google Scholar 

  18. Uno H, Cai T, Pencina MJ, D’Agostino RB, Wei LJ (2011) On the c-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data. Stat Med 30(10):1105–1117

    MathSciNet  Google Scholar 

  19. Van Der Vaart AW, Wellner JA (1996) Weak convergence. Springer, New York

    Google Scholar 

  20. Wang Z, Xu W, San Lucas F, Liu Y (2013) Incorporating prior knowledge into gene network study. Bioinformatics 29:2633–2640

    Article  Google Scholar 

  21. Zhao SD, Li Y (2012) Principled sure independence screening for Cox models with ultra-high-dimensional covariates. J Multivar Anal 105(1):397–411

    MathSciNet  Article  MATH  Google Scholar 

Download references


This research was partially supported by a grant from NSA (H98230-15-1-0260, Hong), an NIH grant (R01MH105561, Kang) and Chinese Natural Science Foundation (11528102, Li).

Author information



Corresponding author

Correspondence to Jian Kang.



The basic properties of the conditional linear expectation are listed in the following proposition.

Proposition 1

\({\mathbf {v}}_j(\varvec{\beta }_{{\mathcal {C}},0},0)={\varvec{0}}_{q+1}\) if and only if \(v_{j,j}(\varvec{\beta }_{{\mathcal {C}},0},0) = 0\), for all \(j\in {\mathcal {C}}\).

The proof is straightforward based on Definitions 2 and 3.

Proposition 2

Let \(\varvec{\zeta }\), \(\varvec{\zeta }_1\), \(\varvec{\zeta }_2\) and \(\varvec{\xi }\) be any four random variables in the probability space \((\varOmega ,{\mathcal {F}}, \mathrm {P})\). The following properties hold for the conditional linear expectation \(\mathrm {E}^*[\bullet \mid \varvec{\xi }]\) given \(\varvec{\xi }\):

  1. 1.

    Closed form: \(\mathrm {E}^*(\varvec{\zeta }\mid \varvec{\xi }) =\mathrm {E}[\varvec{\zeta }] + \mathrm {Cov}(\varvec{\zeta },\varvec{\xi })\mathrm {Var}[\varvec{\xi }]^{-1} \{\varvec{\xi }-\mathrm {E}(\varvec{\xi })\}\).

  2. 2.

    Stability: \(\mathrm {E}^*[\varvec{\xi }\mid \varvec{\xi }] = \varvec{\xi }\).

  3. 3.

    Linearity: \(\mathrm {E}^*[{\mathbf {A}}_1\varvec{\zeta }_1+{\mathbf {A}}_2\varvec{\zeta }_2 \mid \varvec{\xi }] = {\mathbf {A}}_1\mathrm {E}^*[\varvec{\zeta }_1 \mid \varvec{\xi }]+{\mathbf {A}}_2\mathrm {E}^*[\varvec{\zeta }_2 \mid \varvec{\xi }]\), where \({\mathbf {A}}_1\) and \({\mathbf {A}}_2\) are two matrices that are compatible with the equation.

  4. 4.

    Law of total expectation: \(\mathrm {E}^*[\mathrm {E}^*(\varvec{\zeta }\mid \varvec{\xi })] = \mathrm {E}[\mathrm {E}^*(\varvec{\zeta }\mid \varvec{\xi })] = \mathrm {E}[\varvec{\zeta }]\).

Remark 1

In general, \(\mathrm {E}^*(\varvec{\zeta }\mid \varvec{\xi }) \ne \mathrm {E}(\varvec{\zeta }\mid \varvec{\xi })\). Also, \(\varvec{\zeta }\) and \(\varvec{\xi }\) are independent does not imply \(\mathrm {E}^*(\varvec{\zeta }\mid \varvec{\xi }) = 0\), unless \(\varvec{\zeta }\) and \(\varvec{\xi }\) are jointly normally distributed.

Remark 2

By Proposition 2, we can easily verify the following properties.

Proposition 3

The conditional linear covariance defined in Definition 5 has the following properties:

  1. 1.

    Linear independence and linear zero correlation:

    $$\begin{aligned} \mathrm {Cov}^*(\varvec{\zeta }_1,\varvec{\zeta }_2 \mid \varvec{\xi }) = 0\qquad \Leftrightarrow \qquad \mathrm {E}^*(\varvec{\zeta }_1 \varvec{\zeta }_2 \mid \varvec{\xi }) = \mathrm {E}^*(\varvec{\zeta }_1 \mid \varvec{\xi }) \mathrm {E}^*(\varvec{\zeta }_2 \mid \varvec{\xi }). \end{aligned}$$
  2. 2.

    Expectation of conditional linear covariance:

    $$\begin{aligned} \mathrm {E}[\mathrm {Cov}^*(\varvec{\zeta }_1,\varvec{\zeta }_2\mid \varvec{\xi })] = \mathrm {Cov}(\varvec{\zeta }_1,\varvec{\zeta }_2) - \mathrm {Cov}(\varvec{\zeta }_1,\varvec{\xi }) \mathrm {Var}(\varvec{\xi })^{-1} \mathrm {Cov}(\varvec{\xi },\varvec{\zeta }_2). \end{aligned}$$
  3. 3.

    Sign: for any increasing function \(h(\cdot ): {\mathbb {R}}\rightarrow {\mathbb {R}}\) and random variable \(\eta : \varOmega \rightarrow {\mathbb {R}}\), then

    $$\begin{aligned} \mathrm {Cov}^*(h(\eta ),\eta \mid \varvec{\xi }) \ge 0. \end{aligned}$$

Combining Propositions 13 and based on Definition 6, we have the following property.

Proposition 4

\(v_{j,j}(\varvec{\beta }_{{\mathcal {C}},0},0) = 0\) if and only if \(v_j(\varvec{\beta }_{{\mathcal {C}},0},0) = 0\).

Lemma 1

The solution of \({\mathbf {v}}_j(\varvec{\beta }_{{\mathcal {C}}},\beta ) = {\varvec{0}}_{q+1}\) and the solution of \({\mathbf {v}}_{{\mathcal {C}}}(\varvec{\beta }_{{\mathcal {C}}}) = {\varvec{0}}_q\) are both unique, for any \(j \notin {\mathcal {C}}\).

Proof of Theorem 1


First we make the connection between \(\beta _{j}\) to the expected conditional linear covariance between \(Z_j\) and \(\mathrm {P}[\delta =1\mid {\mathbf {Z}}]\) given \({\mathbf {Z}}_{\mathcal {C}}\), that is

$$\begin{aligned} \mathrm {E}[\mathrm {Cov}^*(Z_j, \mathrm {P}[\delta =1\mid {\mathbf {Z}}]\mid {\mathbf {Z}}_{{\mathcal {C}}})], \end{aligned}$$

then by Condition 2, we relate it to \(\alpha _{j}\). For any \(j\notin {\mathcal {C}}\) and \(k\in {\mathcal {C}}\), it is straightforward to see that

$$\begin{aligned} s^{(m)}_{k}(t) = \mathrm {E}[Z_{k}^{m}\lambda _0(t) \exp ({\mathbf {Z}}^{\mathrm {T}}\varvec{\alpha }) S_T(t\mid {\mathbf {Z}}) S_C(t)], \end{aligned}$$


$$\begin{aligned} r^{(m)}_{j,k}(t,\varvec{\beta }_{{\mathcal {C}}},\beta ) =\mathrm {E}[Z^m_{k} \exp ({\mathbf {Z}}_{{\mathcal {C}}}^{\mathrm {T}}\varvec{\beta }_{{\mathcal {C}}} + Z_{j}\beta ) S_T(t\mid {\mathbf {Z}}) S_C(t)], \end{aligned}$$

for \(m = 0,1\). Then

$$\begin{aligned}&{v_{j,k}(\varvec{\beta }_{{\mathcal {C}}},\beta ) }\nonumber \\&\quad = \int _0^\tau \mathrm {E}\left[ W_{j,k}(t,\varvec{\beta }_{{\mathcal {C}}},\beta )\exp ({\mathbf {Z}}^{\mathrm {T}}\varvec{\alpha }) S_T(t\mid {\mathbf {Z}}) S_C(t)\lambda _0(t)\right] \mathrm {d}t, \end{aligned}$$


$$\begin{aligned} W_{j,k}(t,\varvec{\beta }_{{\mathcal {C}}},\beta ) = Z_k- \frac{\mathrm {E}[Z_k \exp ({\mathbf {Z}}_{{\mathcal {C}}}^{\mathrm {T}}\varvec{\beta }_{{\mathcal {C}}} + Z_{j}\beta ) S_T(t\mid {\mathbf {Z}}) S_C(t)]}{\mathrm {E}[\exp ({\mathbf {Z}}_{{\mathcal {C}}}^{\mathrm {T}}\varvec{\beta }_{{\mathcal {C}}} + Z_{j}\beta ) S_T(t\mid {\mathbf {Z}}) S_C(t)]}. \end{aligned}$$

By Proposition 2,

$$\begin{aligned}&\mathrm {E}\left[ W_{j,k}(t,\varvec{\beta }_{{\mathcal {C}}},\beta )\exp ({\mathbf {Z}}^{\mathrm {T}}\varvec{\alpha }) S_T(t\mid {\mathbf {Z}}) S_C(t)\right] \\&\quad =\mathrm {E}\left\{ \mathrm {E}^*\left[ W_{j,k}(t,\varvec{\beta }_{{\mathcal {C}}},\beta )\exp ({\mathbf {Z}}^{\mathrm {T}}\varvec{\alpha }) S_T(t\mid {\mathbf {Z}}) S_C(t) \right] \right\} . \end{aligned}$$

By Definition 6,

$$\begin{aligned} v_j(\varvec{\beta }_{{\mathcal {C}}},\beta )= & {} v_{j,j}(\varvec{\beta }_{\mathcal {C}},\beta ) -\sum _{k\in {\mathcal {C}}} a_k v_{j,k}(\varvec{\beta }_{\mathcal {C}},\beta ) \\= & {} \mathrm {E}\left[ \mathrm {Cov}^{*} (Z_j, \mathrm {P}[\delta =1\mid {\mathbf {Z}}] \mid {\mathbf {Z}}_{{\mathcal {C}}})\right] - g_j(\varvec{\beta }_{{\mathcal {C}}},\beta ), \end{aligned}$$


$$\begin{aligned}&\mathrm {E}\left[ \mathrm {Cov}^*(Z_j, \mathrm {P}[\delta =1\mid {\mathbf {Z}}] \mid {\mathbf {Z}}_{{\mathcal {C}}})\right] \\&\quad = \int _0^\tau \mathrm {E}\left[ (Z_j - \mathrm {E}^*[Z_j \mid {\mathbf {Z}}_{{\mathcal {C}}}])\exp ({\mathbf {Z}}^{\mathrm {T}}\varvec{\alpha }) S_T(t\mid {\mathbf {Z}}) S_C(t)\lambda _0(t)\right] \mathrm {d}t, \end{aligned}$$


$$\begin{aligned}&g_{j}(\varvec{\beta }_{{\mathcal {C}}},\beta )\\= & {} \int _0^\tau \frac{\mathrm {E}[(Z_j - \mathrm {E}^*[Z_j \mid {\mathbf {Z}}_{{\mathcal {C}}}]) \exp ({\mathbf {Z}}_{{\mathcal {C}}}^{\mathrm {T}}\varvec{\beta }_{{\mathcal {C}}}+Z_j\beta ) S_T(t\mid {\mathbf {Z}})S_C(t)]}{\mathrm {E}[\exp ({\mathbf {Z}}_{{\mathcal {C}}}^{\mathrm {T}}\varvec{\beta }_{{\mathcal {C}}}+Z_j\beta ) S_T(t\mid {\mathbf {Z}})S_C(t)]}\\&\times \mathrm {E}\left[ \exp ({\mathbf {Z}}^{\mathrm {T}}\varvec{\alpha }) S_T(t\mid {\mathbf {Z}})\lambda _0(t) S_C(t)\right] \mathrm {d}t. \end{aligned}$$

By Definition 2, \({\mathbf {v}}_j(\varvec{\beta }_{{\mathcal {C}},j},\beta _j) = {\varvec{0}}_{q+1}\),

$$\begin{aligned} g_j(\varvec{\beta }_{{\mathcal {C}},j}, \beta _j) = \mathrm {E}\left[ \mathrm {Cov}^*(Z_j, \mathrm {P}[\delta =1\mid {\mathbf {Z}}] \mid {\mathbf {Z}}_{{\mathcal {C}}})\right] . \end{aligned}$$

When \(\alpha _{j} = 0\), then \(\mathrm {E}\left[ \mathrm {Cov}^*(Z_j, \mathrm {P}[\delta =1\mid {\mathbf {Z}}] \mid {\mathbf {Z}}_{{\mathcal {C}}})\right] = 0\) by Condition 2.3. Thus \(g_j(\varvec{\beta }_{{\mathcal {C}},j},\beta _j) = 0\). Also, by Propositions 1 and 2, \(g_j(\varvec{\beta }_{{\mathcal {C}},0},0) = 0\), then \({\mathbf {v}}_j(\varvec{\beta }_{{\mathcal {C}},0},0) = {\varvec{0}}_{q+1}\). By uniqueness in Lemma 1, \(\beta _j = 0\).

When \(\alpha _j \ne 0\), by Condition 2, we have

$$\begin{aligned} |g_j(\varvec{\beta }_{{\mathcal {C}},j},\beta _j)| = | \mathrm {E}\left[ \mathrm {Cov}^*( Z_j, \mathrm {P}[\delta =1\mid {\mathbf {Z}}] \mid {\mathbf {Z}}_{{\mathcal {C}}})\right] | > c_1 n^{-\kappa }. \end{aligned}$$

This implies that \(g_j(\varvec{\beta }_{{\mathcal {C}},j},\beta _j)\) and \( \mathrm {E}\left[ \mathrm {Cov}^*( Z_j, \mathrm {P}[\delta =1\mid {\mathbf {Z}}] \mid {\mathbf {Z}}_{{\mathcal {C}}})\right] \) are both nonzero and have the same signs since they are equal. Next we show for any \(\varvec{\beta }_{{\mathcal {C}}}\), \(g_j(\varvec{\beta }_{{\mathcal {C}}},0)\) and \(\mathrm {E}\left[ \mathrm {Cov}^*( Z_j, \mathrm {P}[\delta =1\mid {\mathbf {Z}}] \mid {\mathbf {Z}}_{{\mathcal {C}}})\right] \) have the opposite signs unless they are equal to zero. This fact implies that \(\beta _j\ne 0\). Specifically, note that \(\mathrm {P}(\delta = 1\mid {\mathbf {Z}})\) is the probability of occurring the event and \(S_T(t\mid {\mathbf {Z}})S_C(t) = \mathrm {P}(X > t \mid {\mathbf {Z}})\) represents the probability at risk at time t. Based on Model (1), for any t,

$$\begin{aligned} \frac{\partial \mathrm {P}(X > t \mid {\mathbf {Z}})}{\partial Z_j} \times \frac{\partial \mathrm {P}( \delta = 1 \mid {\mathbf {Z}})}{\partial Z_j} \le 0. \end{aligned}$$

By Proposition 3, \(\mathrm {Cov}^*( Z_j, \mathrm {P}[\delta =1\mid {\mathbf {Z}}] \mid {\mathbf {Z}}_{{\mathcal {C}}})\) and \(\mathrm {Cov}^*[Z_j, S_T(t\mid {\mathbf {Z}})S_C(t)\mid {\mathbf {Z}}_{\mathcal {C}}]\) have the opposite signs unless they are zero. This further implies that for any \(\varvec{\beta }_{{\mathcal {C}}}\),

$$\begin{aligned} g_{j}(\varvec{\beta }_{{\mathcal {C}}},0)= & {} \int _{0}^{\tau }\frac{\mathrm {E}[\exp ({\mathbf {Z}}_{{\mathcal {C}}}^{\mathrm {T}} \varvec{\beta }_{{\mathcal {C}}})\mathrm {Cov}^*[Z_j, S_T(t\mid {\mathbf {Z}})S_C(t)\mid {\mathbf {Z}}_{\mathcal {C}}]]}{\mathrm {E}[\exp ({\mathbf {Z}}_{{\mathcal {C}}}^{\mathrm {T}}\varvec{\beta }_{{\mathcal {C}}}) S_T(t\mid {\mathbf {Z}})S_C(t)]}\\&\times \mathrm {E}\left[ \exp ({\mathbf {Z}}^{\mathrm {T}}\varvec{\alpha }) S_T(t\mid {\mathbf {Z}})\lambda _0(t) S_C(t)\right] \mathrm {d}t, \end{aligned}$$

and \(\mathrm {E}\left[ \mathrm {Cov}^*( Z_j, \mathrm {P}[\delta =1\mid {\mathbf {Z}}]\mid {\mathbf {Z}}_{{\mathcal {C}}})\right] \) have opposite signs unless they are equal to zero. Therefore, \(\beta _j \ne 0\). \(\square \)

Proof of Theorem 2


For any \(j\in {\mathcal {M}}_{-{\mathcal {C}}}\), we have \(\beta _j \ne 0\) by Theorem 1, by mean value theorem, for some \({{\widetilde{\beta }}}_j \in (0, \beta _j)\),

$$\begin{aligned} |v_{j}(\varvec{\beta }_{{\mathcal {C}},j},0)| = |v_{j}(\varvec{\beta }_{{\mathcal {C}},j},\beta _{j}) - v_{j}(\varvec{\beta }_{{\mathcal {C}},j},0)| = \left| \frac{\partial v_{j}}{\partial \beta }(\varvec{\beta }_{{\mathcal {C}},j},{{\widetilde{\beta }}}_j)\right| |\beta _j|. \end{aligned}$$

Next we show that \(\left| \frac{\partial v_{j}}{\partial \beta }(\varvec{\beta }_{{\mathcal {C}},j},{{\widetilde{\beta }}}_j)\right| \) is bounded. For given any \(\varvec{\beta }_{{\mathcal {C}}}\), consider \(g_{j}(\varvec{\beta }_{{\mathcal {C}}},\beta )\) as a function of \(\beta \), Then

$$\begin{aligned} \frac{\partial g_j}{\partial \beta }(\varvec{\beta }_{\mathcal {C}},\beta ) =\mathrm {E}\left[ \int _0^{\tau }H_j(t,\varvec{\beta }_{{\mathcal {C}}},\beta )S_C(t)\mathrm {d}F_T(t\mid {\mathbf {Z}})\right] , \end{aligned}$$


$$\begin{aligned}&H_{j}(t, \varvec{\beta }_{\mathcal {C}},\beta ) = \frac{\mathrm {E}[\exp ({\mathbf {Z}}_{{\mathcal {C}}}^{\mathrm {T}} \varvec{\beta }_{{\mathcal {C}}})\mathrm {Cov}^*[Z^2_j\exp (Z_j\beta ), S_T(t\mid {\mathbf {Z}})\mid {\mathbf {Z}}_{\mathcal {C}}]]}{\mathrm {E}[\exp ({\mathbf {Z}}_{{\mathcal {C}}}^{\mathrm {T}}\varvec{\beta }_{{\mathcal {C}}}+Z_j\beta ) S_T(t\mid {\mathbf {Z}})]} \\&\quad -\frac{\mathrm {E}[\exp ({\mathbf {Z}}_{{\mathcal {C}}}^{\mathrm {T}} \varvec{\beta }_{{\mathcal {C}}})\mathrm {Cov}^*[Z_j\exp (Z_j\beta ), S_T(t\mid {\mathbf {Z}})\mid {\mathbf {Z}}_{\mathcal {C}}]]\mathrm {E}[Z_j\exp ({\mathbf {Z}}_{{\mathcal {C}}}^{\mathrm {T}}\varvec{\beta }_{{\mathcal {C}}}+Z_j\beta ) S_T(t\mid {\mathbf {Z}})]}{[\mathrm {E}[\exp ({\mathbf {Z}}_{{\mathcal {C}}}^{\mathrm {T}}\varvec{\beta }_{{\mathcal {C}}}+Z_j\beta ) S_T(t\mid {\mathbf {Z}})]]^2}. \end{aligned}$$

By Condition 2.1, \(\mathrm {P}(|Z|<K_0) = 1\), then \(\sup _{\varvec{\beta }_{\mathcal {C}},\beta }|H_j(t,\varvec{\beta }_{{\mathcal {C}}},\beta )|\le 2 K_0^2\). Thus,

$$\begin{aligned} \left| \frac{\partial v_{j}}{\partial \beta }(\varvec{\beta }_{{\mathcal {C}},j},{{\widetilde{\beta }}}_j)\right| \le \sup _{\varvec{\beta }_{\mathcal {C}},\beta }\left| \frac{\partial g_j}{\partial \beta }(\varvec{\beta }_{\mathcal {C}},\beta )\right| \le 2K_0^2 |\mathrm {E}[\mathrm {E}[S_C(T)\mid {\mathbf {Z}}]] \le 2K_0^2. \end{aligned}$$

By the proof in Theorem 1, \(g(\varvec{\beta }_{{\mathcal {C}},j}, 0)\) and \(\mathrm {E}\left[ \mathrm {Cov}^*( Z_j, \mathrm {E}\{F_{T}(C \mid {\mathbf {Z}}) \mid {\mathbf {Z}}\} \mid {\mathbf {Z}}_{{\mathcal {C}}})\right] \) have the opposite signs, and by Condition 2,

$$\begin{aligned} |v_{j}(\varvec{\beta }_{{\mathcal {C}},j},0)| = | \mathrm {E}\left[ \mathrm {Cov}^*( Z_j, \mathrm {P}[\delta =1\mid {\mathbf {Z}}]\mid {\mathbf {Z}}_{{\mathcal {C}}})\right] | + |g_{j}(\varvec{\beta }_{{\mathcal {C}},j},0)| > c_1 n^{-\kappa }. \end{aligned}$$

Taking \(c_2 = 0.5K_0^{-2} c_1\), \(\beta _j> 0.5K_0^{-2} |v_j(\varvec{\beta }_{{\mathcal {C}},j},0)| > c_2 n^{-\kappa } \). This completes the proof. \(\square \)

Proof of Theorem 3


For any \(j\notin {\mathcal {C}}\) and \(k\in {\mathcal {C}}\cup \{j\}\), by Lin and Wei (1989), we have

$$\begin{aligned} {\overline{{\mathbf {V}}}}_j(\varvec{\beta }_{{\mathcal {C}}},\beta ) = \mathrm {E}_n\{{\mathbf {W}}_{i,j}(\varvec{\beta }_{{\mathcal {C}}},\beta )\} + o_p(1), \end{aligned}$$

where \(\mathrm {E}_n[\cdot ]\) denotes the empirical measure, which is defined as \(\mathrm {E}_n[\varvec{\xi }_i] = n^{-1} \sum _{i=1}^n \varvec{\xi }_i\) for any random variables \(\varvec{\xi }_1,\ldots , \varvec{\xi }_n\), and \({\mathbf {W}}_{i,j}(\varvec{\beta }_{{\mathcal {C}}},\beta )\) are independent over i, and write \({\mathbf {W}}_{i,j}(\varvec{\beta }_{{\mathcal {C}}},\beta ) = [W_{i,j,k}(\varvec{\beta }_{{\mathcal {C}}},\beta ),k\in {\mathcal {C}}\cup \{j\}]^{\mathrm {T}}\) with

$$\begin{aligned} W_{i,j,k}(\varvec{\beta }_{{\mathcal {C}}},\beta )= & {} \int _0^\tau \left\{ Z_{i,k} - \frac{r^{(1)}_{j,k}(\varvec{\beta }_{{\mathcal {C}}},\beta , t)}{r^{(0)}_{j,k}(\varvec{\beta }_{{\mathcal {C}}},\beta ,t)}\right\} \mathrm {d}N_i(t) \\&- \int _0^\tau \frac{Y_i(t)\exp ({\mathbf {Z}}_{i,{\mathcal {C}}}\varvec{\beta }^{\mathrm {T}}_{{\mathcal {C}}}+ Z_{i,j}\beta )}{r^{(0)}_{j,k}(\varvec{\beta }_{{\mathcal {C}}},\beta ,t)}\left\{ Z_{i,k} - \frac{r^{(1)}_{j,k}(\varvec{\beta }_{{\mathcal {C}}},\beta , t)}{r^{(0)}_{j,k}(\varvec{\beta }_{{\mathcal {C}}},\beta ,t)}\right\} \mathrm {d}\mathrm {E}[N_i(t)]. \end{aligned}$$

Note that given any ijk, with probability one \(|W_{i,j,k}(\varvec{\beta }_{{\mathcal {C}}},\beta )|\) are uniformly bounded. Specifically, by Conditions 1.2, 2.1 and 3, with probability one, for all \(t\in [0,\tau ]\), \((\varvec{\beta }_{{\mathcal {C}}}^{\mathrm {T}},\beta )^{\mathrm {T}}\in {\mathcal {B}}_j\),

$$\begin{aligned} \left| Z_{i,k} - \frac{r^{(1)}_{j,k}(\varvec{\beta }_{{\mathcal {C}}},\beta , t)}{r^{(0)}_{j,k}(\varvec{\beta }_{{\mathcal {C}}},\beta ,t)}\right|\le & {} |Z_{i,k}| + K_0,\\ \left| \frac{Y_i(t)\exp ({\mathbf {Z}}_{i,{\mathcal {C}}}\varvec{\beta }^{\mathrm {T}}_{{\mathcal {C}}}+ Z_{i,j}\beta )}{r^{(0)}_{j,k}(\varvec{\beta }_{{\mathcal {C}}},\beta ,t)}\right|\le & {} \exp \{K_0(K_1+\delta )-\log (L)\}, \end{aligned}$$


$$\begin{aligned} \left| \int _0^{\tau } \mathrm {d}\mathrm {E}[N_i(t)]\right| \le \Lambda _0(\tau )\exp (K_0K_1). \end{aligned}$$

Thus, with probability one,

$$\begin{aligned} |W_{i,j,k}(\varvec{\beta }_{{\mathcal {C}}},\beta )| \le K_2, \end{aligned}$$

where \(K_2 = 2K_0(1+\Lambda _0(\tau )\exp (2K_0K_1+K_0\delta -\log L))\). By the fact that \(\mathrm {E}[W_{i,j,k}(\varvec{\beta }_{{\mathcal {C}}},\beta )] = 0\),

$$\begin{aligned} \mathrm {Var}[W_{i,j,k}(\varvec{\beta }_{{\mathcal {C}}},\beta )] = \mathrm {E}[|W_{i,j,k}(\varvec{\beta }_{{\mathcal {C}}},\beta )|^2] < K_2^2. \end{aligned}$$

By Lemma 2.2.9 (Bernsterin’s inequality) of Vaart and Wellner (1996), for any \(t>0\), for all jk, \(\varvec{\beta }_{{{\mathcal {C}}}}\) and \(\beta \), we have

$$\begin{aligned} \mathrm {P}\left( |\mathrm {E}_n(W_{i,j,k}(\varvec{\beta }_{{\mathcal {C}}},\beta ))|>\frac{t}{n}\right) \le 2 \exp \left( -\frac{1}{2}\frac{t^2}{n K^2_2+K_2 t/3}\right) . \end{aligned}$$

Note that the above inequality holds for every \(j\notin {\mathcal {C}}\) and \(k\in {\mathcal {C}}\cup \{j\}\). By Bonferroni inequality,

$$\begin{aligned} \mathrm {P}\left( \Vert \mathrm {E}_n({\mathbf {W}}_{i,j}(\varvec{\beta }_{{\mathcal {C}}},\beta ))\Vert _2>\frac{t}{(q+1)n}\right) \le 2 (q+1)\exp \left( -\frac{1}{2}\frac{t^2}{n K^2_2+K_2 t/3}\right) . \end{aligned}$$


$$\begin{aligned} \Vert {\overline{{\mathbf {V}}}}_{j}(\varvec{\beta }_{{\mathcal {C}}},\beta )-\mathrm {E}_n({\mathbf {W}}_{i,j} (\varvec{\beta }_{{\mathcal {C}}},\beta ))\Vert _2 = o_p(1). \end{aligned}$$

Then for any \(\epsilon _1>0\) and \(\epsilon _2>0\), there exits \(N_1\), such that for any \(n>N_1\),

$$\begin{aligned} \mathrm {P}(\Vert {\overline{{\mathbf {V}}}}_{j}(\varvec{\beta }_{{\mathcal {C}}},\beta ) -\mathrm {E}_n({\mathbf {W}}_{i,j}(\varvec{\beta }_{{\mathcal {C}}},\beta ))\Vert _2 > M\epsilon _1/2 ) < \epsilon _2, \end{aligned}$$

where M is the same value in Condition 4. By Triangle inequality and Bonferroni inequality, we have

$$\begin{aligned}&\mathrm {P}\left( \Vert {\overline{{\mathbf {V}}}}_{j}(\varvec{\beta }_{{\mathcal {C}}},\beta )\Vert _2> \frac{t}{(q+1)n} \right) \\\le & {} \mathrm {P}\left( \Vert \mathrm {E}_n({\mathbf {W}}_{i,j}(\varvec{\beta }_{{\mathcal {C}}},\beta ))\Vert _2> \frac{t}{(q+1)n} -M\epsilon _1/2\right) \\&+\, \mathrm {P}(\Vert {\overline{{\mathbf {V}}}}_{j}(\varvec{\beta }_{{\mathcal {C}}},\beta ) -\mathrm {E}_n({\mathbf {W}}_{i,j}(\varvec{\beta }_{{\mathcal {C}}},\beta ))\Vert _2 > M\epsilon _2/2 ). \end{aligned}$$

When \(n\rightarrow \infty \), take \(t = c_2M(q+1)n^{1-\kappa }/2>0\) on both side of the inequality, where \(c_2\) is the same value in Theorem 2, we have

$$\begin{aligned}&\mathrm {P}\left( \Vert {\overline{{\mathbf {V}}}}_{j}(\varvec{\beta }_{{\mathcal {C}}},\beta )\Vert _2 > \frac{Mc_2}{2}(n^{-\kappa } -\epsilon _1)\right) \\&\quad \le 2 (q+1)\exp \left( -\frac{M^2c_2^2}{8(q+1)^2}\frac{n^{1-2\kappa }}{K^2_2+K_2 n^{-\kappa }/3}\right) +\epsilon _2. \end{aligned}$$

Take \(N =\max \{ \lceil (K_2/3)^{1/\kappa }\rceil ,N_1\}\), then for any \(n>N\), \(n^{-\kappa }<3/K_2\), and

$$\begin{aligned} \mathrm {P}\left( \Vert {\overline{{\mathbf {V}}}}_{j}(\varvec{\beta }_{{\mathcal {C}}},\beta )\Vert _2 > \frac{Mc_2}{2}(n^{-\kappa } -\epsilon _1)\right) \le 2 (q+1)\exp \left( -\frac{M^2c_2^2}{8(q+1)^2} \frac{n^{1-2\kappa }}{K^2_2+1}\right) +\epsilon _2. \end{aligned}$$

Note that the above inequality holds for all \((\varvec{\beta }_{{\mathcal {C}}}^{\mathrm {T}},\beta )^{\mathrm {T}} \in {\mathcal {B}}_j\), particularly for \((\varvec{\beta }_{{\mathcal {C}},j}^{\mathrm {T}},\beta _j)^{\mathrm {T}}\), \(j \notin {\mathcal {C}}\). Also, we have \({\overline{{\mathbf {V}}}}_{j}({{\widehat{\varvec{\beta }}}}_{{\mathcal {C}},j},{{\widehat{\beta }}}_j) = {\varvec{0}}_{q+1}\). By Condition 4, we have

$$\begin{aligned}&\mathrm {P}\left( |{{\widehat{\beta }}}_j-\beta _j|>\frac{c_2}{2}(n^{-\kappa } -\epsilon _1)\right) \\&\quad \le \mathrm {P}\left( \Vert ({{\widehat{\varvec{\beta }}}}_{{\mathcal {C}},j}^{\mathrm {T}},{{\widehat{\beta }}}_j)^{\mathrm {T}} -(\varvec{\beta }_{{\mathcal {C}},j}^{\mathrm {T}},\beta _j)^{\mathrm {T}} \Vert _2 >\frac{c_2}{2}(n^{-\kappa } -\epsilon _1)\right) \\&\quad \le 2 (q+1)\exp \left( -\frac{M^2c_2^2}{8(q+1)^2} \frac{n^{1-2\kappa }}{K^2_2+1}\right) +\epsilon _2. \end{aligned}$$

Taking \(c_3 = \frac{M^2c_2^2}{8(q+1)^2(K^2_2+1)}\) and by Bonferroni completes the proof for part 1.

For part 2, by Theorem 2,

$$\begin{aligned} \min _{j\in {\mathcal {M}}_{-{\mathcal {C}}}}|\beta _j| > c_2 n^{-\kappa }. \end{aligned}$$

Note that, for any \(j\in {\mathcal {M}}_{-{\mathcal {C}}}\), event

$$\begin{aligned}&\left\{ |{{\widehat{\beta }}}_j - \beta _j| \le c_2 n^{-\kappa }/2 - \epsilon _1\right\} \\&\quad \subseteq \left\{ |{{\widehat{\beta }}}_j| \ge |\beta _j| - c_2 n^{-\kappa }/2+\epsilon _1\right\} \\&\quad \subseteq \left\{ |{{\widehat{\beta }}}_j| \ge c_2 n^{-\kappa }/2+\epsilon _1\right\} . \end{aligned}$$

Take \(\gamma _n = c_4 n^{-\kappa }\) with \(c_4 = c_2/4\),

$$\begin{aligned}&\left\{ \max _{j\in {\mathcal {M}}_{-{\mathcal {C}}}}|{{\widehat{\beta }}}_j - \beta _j| \le c_2 n^{-\kappa }/2 - \epsilon _1\right\} \\&\quad \subseteq \left\{ \min _{j\in {\mathcal {M}}_{-{\mathcal {C}}}}|{{\widehat{\beta }}}_j| \ge c_2 n^{-\kappa }/2+\epsilon _1\right\} \\&\quad \subseteq \left\{ \min _{j\in {\mathcal {M}}_{-{\mathcal {C}}}}|{{\widehat{\beta }}}_j| \ge \gamma _n +\epsilon _1\right\} . \end{aligned}$$


$$\begin{aligned}&\mathrm {P}\left[ {\mathcal {M}}_{-{\mathcal {C}}} \subseteq {{\widehat{{\mathcal {M}}}}}_{-{\mathcal {C}}}\right] \\&\quad = \mathrm {P}\left[ \min _{j\in {\mathcal {M}}_{-{\mathcal {C}}}}|{{\widehat{\beta }}}_j|> \gamma _n \right] \\&\quad \ge \mathrm {P}\left[ \min _{j\in {\mathcal {M}}_{-{\mathcal {C}}}}|{{\widehat{\beta }}}_j| > \gamma _n +\epsilon _1\right] \\&\quad \ge 1 - \mathrm {P}\left[ \max _{j\in {\mathcal {M}}_{-{\mathcal {C}}}}|{{\widehat{\beta }}}_j - \beta _j| \le c_2 n^{-\kappa }/2 - \epsilon _1\right] \\&\quad \ge 1 -2w(q+1)\exp (-c_3 n^{1-2\kappa }) - \epsilon _2. \end{aligned}$$

Let \(n\rightarrow \infty \), we have for any \(\epsilon _2>0\),

$$\begin{aligned} \lim _{n\rightarrow \infty }\mathrm {P}\left[ {\mathcal {M}}_{-{\mathcal {C}}} \subseteq {{\widehat{{\mathcal {M}}}}}_{-{\mathcal {C}}}\right] \ge 1 - \epsilon _2. \end{aligned}$$

Note that the left side of the above equation does not depends on n any more. Taking \(\epsilon _2\rightarrow 0\) completes proof. \(\square \)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Hong, H.G., Kang, J. & Li, Y. Conditional screening for ultra-high dimensional covariates with survival outcomes. Lifetime Data Anal 24, 45–71 (2018).

Download citation


  • Conditional screening
  • Cox model
  • Diffuse large B-cell lymphoma
  • High-dimensional variable screening