Conditional screening for ultra-high dimensional covariates with survival outcomes

Hong, Hyokyoung G.; Kang, Jian; Li, Yi

doi:10.1007/s10985-016-9387-7

Conditional screening for ultra-high dimensional covariates with survival outcomes

Published: 08 December 2016

Volume 24, pages 45–71, (2018)
Cite this article

Lifetime Data Analysis Aims and scope Submit manuscript

786 Accesses
34 Citations
1 Altmetric
Explore all metrics

Abstract

Identifying important biomarkers that are predictive for cancer patients’ prognosis is key in gaining better insights into the biological influences on the disease and has become a critical component of precision medicine. The emergence of large-scale biomedical survival studies, which typically involve excessive number of biomarkers, has brought high demand in designing efficient screening tools for selecting predictive biomarkers. The vast amount of biomarkers defies any existing variable selection methods via regularization. The recently developed variable screening methods, though powerful in many practical setting, fail to incorporate prior information on the importance of each biomarker and are less powerful in detecting marginally weak while jointly important signals. We propose a new conditional screening method for survival outcome data by computing the marginal contribution of each biomarker given priorily known biological information. This is based on the premise that some biomarkers are known to be associated with disease outcomes a priori. Our method possesses sure screening properties and a vanishing false selection rate. The utility of the proposal is further confirmed with extensive simulation studies and analysis of a diffuse large B-cell lymphoma dataset. We are pleased to dedicate this work to Jack Kalbfleisch, who has made instrumental contributions to the development of modern methods of analyzing survival data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep learning for survival analysis: a review

Article Open access 19 February 2024

Entropy-Based Subsampling Methods for Big Data

Article 11 April 2024

The benefits and pitfalls of machine learning for biomarker discovery

Article Open access 27 July 2023

References

Barut E, Fan J, Verhasselt A (2016) Conditional sure independence screening. J Am Stat Assoc 116:544–557
MathSciNet Google Scholar
Binder H, Schumacher M (2009) Incorporating pathway information into boosting estimation of high-dimensional risk prediction models. BMC Bioinform 10:18
Article Google Scholar
Chow ML, Moler EJ, Mian IS (2001) Identifying marker genes in transcription profiling data using a mixture of feature relevance experts. Physiol Genomics 5:99–111
Article Google Scholar
Deb K, Reddy AR (2003) Reliable classification of two-class cancer data using evolutionary algorithms. BioSystems 72:111–129
Article Google Scholar
Fan J, Lv J (2008) Sure independence screening for ultrahigh dimensional feature space (with discussion). J R Stat Soc B 70:849–911
Article MathSciNet Google Scholar
Fan J, Feng Y, Wu Y (2010) High-dimensional variable selection for Cox’s proportional hazards model. In: IMS collections borrowing strength: theory powering applications—A Festschrift for Lawrence D. Brown, vol 6, pp 70–86. Institute of Mathematical Statistics, Beachwood
Gui J, Li H (2005) Penalized Cox regression analysis in the high-dimensional and low-sample size settings, with applications to microarray gene expression data. Bioinformatics 21(13):3001–3008
Article Google Scholar
Hong H, Wang L, He X (2016) A data-driven approach to conditional screening of high dimensional variables. Stat 5(1):200–212
Jiang Y, He Y, Zhang H (2015) Variable selection with prior information for generalized linear models via the prior Lasso method. J Am Stat Assoc 111(513):355–376
Article MathSciNet Google Scholar
Li H, Luan Y (2005) Boosting proportional hazards models using smoothing splines, with applications to high-dimensional microarray data. Bioinformatics 21(10):2403–2409
Article Google Scholar
Lin DY, Wei LJ (1989) The robust inference for the Cox proportional hazards model. J Am Stat Assoc 84(408):1074–1078
Article MathSciNet MATH Google Scholar
Liu XY, Liang Y, Xu ZB, Zhang H, Leung KS (2013) Adaptive $l_{1/2}$ shooting regularization method for survival analysis using gene expression data. Sci World J 2013:475702
Mikovits J, Ruscetti F, Zhu W, Bagni R, Dorjsuren D, Shoemaker R (2001) Potential cellular signatures of viral infections in human hematopoietic cells. Dis Markers 17(3):173–178
Article Google Scholar
Rosenwald A, Wright G, Chan W, Connors J, Campo E et al (2002) The use of molecular profiling to predict survival after chemotherapy for diffuse large-b-cell lymphoma. N Engl J Med 346(25):1937–1947
Article Google Scholar
Schifano ED, Strawderman RL, Wells MT (2010) Mm algorithms for minimizing nonsmoothly penalized objective functions. Electron J Stat 4:1258–1299
Article MathSciNet MATH Google Scholar
Song R, Lu W, Ma S, Jeng XJ (2014) Censored rank independence screening for high-dimensional survival data. Biometrika 101(4):799–814
Article MathSciNet MATH Google Scholar
Stewart AK, Schuh AC (2000) White cells 2: impact of understanding the molecular basis of haematological malignant disorders on clinical practice. Lancet 355(9213):1447–1453
Article Google Scholar
Uno H, Cai T, Pencina MJ, D’Agostino RB, Wei LJ (2011) On the c-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data. Stat Med 30(10):1105–1117
MathSciNet Google Scholar
Van Der Vaart AW, Wellner JA (1996) Weak convergence. Springer, New York
MATH Google Scholar
Wang Z, Xu W, San Lucas F, Liu Y (2013) Incorporating prior knowledge into gene network study. Bioinformatics 29:2633–2640
Article Google Scholar
Zhao SD, Li Y (2012) Principled sure independence screening for Cox models with ultra-high-dimensional covariates. J Multivar Anal 105(1):397–411
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

This research was partially supported by a grant from NSA (H98230-15-1-0260, Hong), an NIH grant (R01MH105561, Kang) and Chinese Natural Science Foundation (11528102, Li).

Author information

Authors and Affiliations

Michigan State University, East Lansing, MI, USA
Hyokyoung G. Hong
University of Michigan, Ann Arbor, MI, USA
Jian Kang & Yi Li

Authors

Hyokyoung G. Hong
View author publications
You can also search for this author in PubMed Google Scholar
Jian Kang
View author publications
You can also search for this author in PubMed Google Scholar
Yi Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jian Kang.

Appendix

The basic properties of the conditional linear expectation are listed in the following proposition.

Proposition 1

${\mathbf {v}}_j(\varvec{\beta }_{{\mathcal {C}},0},0)={\varvec{0}}_{q+1}$ if and only if $v_{j,j}(\varvec{\beta }_{{\mathcal {C}},0},0) = 0$, for all $j\in {\mathcal {C}}$.

The proof is straightforward based on Definitions 2 and 3.

Proposition 2

Let $\varvec{\zeta }$, $\varvec{\zeta }_1$, $\varvec{\zeta }_2$ and $\varvec{\xi }$ be any four random variables in the probability space $(\varOmega ,{\mathcal {F}}, \mathrm {P})$. The following properties hold for the conditional linear expectation $\mathrm {E}^*[\bullet \mid \varvec{\xi }]$ given $\varvec{\xi }$:

1.
Closed form: $\mathrm {E}^*(\varvec{\zeta }\mid \varvec{\xi }) =\mathrm {E}[\varvec{\zeta }] + \mathrm {Cov}(\varvec{\zeta },\varvec{\xi })\mathrm {Var}[\varvec{\xi }]^{-1} \{\varvec{\xi }-\mathrm {E}(\varvec{\xi })\}$.
2.
Stability: $\mathrm {E}^*[\varvec{\xi }\mid \varvec{\xi }] = \varvec{\xi }$.
3.
Linearity: $\mathrm {E}^*[{\mathbf {A}}_1\varvec{\zeta }_1+{\mathbf {A}}_2\varvec{\zeta }_2 \mid \varvec{\xi }] = {\mathbf {A}}_1\mathrm {E}^*[\varvec{\zeta }_1 \mid \varvec{\xi }]+{\mathbf {A}}_2\mathrm {E}^*[\varvec{\zeta }_2 \mid \varvec{\xi }]$, where ${\mathbf {A}}_1$ and ${\mathbf {A}}_2$ are two matrices that are compatible with the equation.
4.
Law of total expectation: $\mathrm {E}^*[\mathrm {E}^*(\varvec{\zeta }\mid \varvec{\xi })] = \mathrm {E}[\mathrm {E}^*(\varvec{\zeta }\mid \varvec{\xi })] = \mathrm {E}[\varvec{\zeta }]$.

Remark 1

In general, $\mathrm {E}^*(\varvec{\zeta }\mid \varvec{\xi }) \ne \mathrm {E}(\varvec{\zeta }\mid \varvec{\xi })$. Also, $\varvec{\zeta }$ and $\varvec{\xi }$ are independent does not imply $\mathrm {E}^*(\varvec{\zeta }\mid \varvec{\xi }) = 0$, unless $\varvec{\zeta }$ and $\varvec{\xi }$ are jointly normally distributed.

Remark 2

By Proposition 2, we can easily verify the following properties.

Proposition 3

The conditional linear covariance defined in Definition 5 has the following properties:

1.
Linear independence and linear zero correlation:
$$\begin{aligned} \mathrm {Cov}^*(\varvec{\zeta }_1,\varvec{\zeta }_2 \mid \varvec{\xi }) = 0\qquad \Leftrightarrow \qquad \mathrm {E}^*(\varvec{\zeta }_1 \varvec{\zeta }_2 \mid \varvec{\xi }) = \mathrm {E}^*(\varvec{\zeta }_1 \mid \varvec{\xi }) \mathrm {E}^*(\varvec{\zeta }_2 \mid \varvec{\xi }). \end{aligned}$$
2.
Expectation of conditional linear covariance:
$$\begin{aligned} \mathrm {E}[\mathrm {Cov}^*(\varvec{\zeta }_1,\varvec{\zeta }_2\mid \varvec{\xi })] = \mathrm {Cov}(\varvec{\zeta }_1,\varvec{\zeta }_2) - \mathrm {Cov}(\varvec{\zeta }_1,\varvec{\xi }) \mathrm {Var}(\varvec{\xi })^{-1} \mathrm {Cov}(\varvec{\xi },\varvec{\zeta }_2). \end{aligned}$$
3.
Sign: for any increasing function $h(\cdot ): {\mathbb {R}}\rightarrow {\mathbb {R}}$ and random variable $\eta : \varOmega \rightarrow {\mathbb {R}}$, then
$$\begin{aligned} \mathrm {Cov}^*(h(\eta ),\eta \mid \varvec{\xi }) \ge 0. \end{aligned}$$

Combining Propositions 1–3 and based on Definition 6, we have the following property.

Proposition 4

$v_{j,j}(\varvec{\beta }_{{\mathcal {C}},0},0) = 0$ if and only if $v_j(\varvec{\beta }_{{\mathcal {C}},0},0) = 0$.

Lemma 1

The solution of ${\mathbf {v}}_j(\varvec{\beta }_{{\mathcal {C}}},\beta ) = {\varvec{0}}_{q+1}$ and the solution of ${\mathbf {v}}_{{\mathcal {C}}}(\varvec{\beta }_{{\mathcal {C}}}) = {\varvec{0}}_q$ are both unique, for any $j \notin {\mathcal {C}}$.

1.1 Proof of Theorem 1

Proof

First we make the connection between $\beta _{j}$ to the expected conditional linear covariance between $Z_j$ and $\mathrm {P}[\delta =1\mid {\mathbf {Z}}]$ given ${\mathbf {Z}}_{\mathcal {C}}$, that is

$$\begin{aligned} \mathrm {E}[\mathrm {Cov}^*(Z_j, \mathrm {P}[\delta =1\mid {\mathbf {Z}}]\mid {\mathbf {Z}}_{{\mathcal {C}}})], \end{aligned}$$

then by Condition 2, we relate it to $\alpha _{j}$. For any $j\notin {\mathcal {C}}$ and $k\in {\mathcal {C}}$, it is straightforward to see that

$$\begin{aligned} s^{(m)}_{k}(t) = \mathrm {E}[Z_{k}^{m}\lambda _0(t) \exp ({\mathbf {Z}}^{\mathrm {T}}\varvec{\alpha }) S_T(t\mid {\mathbf {Z}}) S_C(t)], \end{aligned}$$

(21)

and

$$\begin{aligned} r^{(m)}_{j,k}(t,\varvec{\beta }_{{\mathcal {C}}},\beta ) =\mathrm {E}[Z^m_{k} \exp ({\mathbf {Z}}_{{\mathcal {C}}}^{\mathrm {T}}\varvec{\beta }_{{\mathcal {C}}} + Z_{j}\beta ) S_T(t\mid {\mathbf {Z}}) S_C(t)], \end{aligned}$$

(22)

for $m = 0,1$. Then

$$\begin{aligned}&{v_{j,k}(\varvec{\beta }_{{\mathcal {C}}},\beta ) }\nonumber \\&\quad = \int _0^\tau \mathrm {E}\left[ W_{j,k}(t,\varvec{\beta }_{{\mathcal {C}}},\beta )\exp ({\mathbf {Z}}^{\mathrm {T}}\varvec{\alpha }) S_T(t\mid {\mathbf {Z}}) S_C(t)\lambda _0(t)\right] \mathrm {d}t, \end{aligned}$$

(23)

where

$$\begin{aligned} W_{j,k}(t,\varvec{\beta }_{{\mathcal {C}}},\beta ) = Z_k- \frac{\mathrm {E}[Z_k \exp ({\mathbf {Z}}_{{\mathcal {C}}}^{\mathrm {T}}\varvec{\beta }_{{\mathcal {C}}} + Z_{j}\beta ) S_T(t\mid {\mathbf {Z}}) S_C(t)]}{\mathrm {E}[\exp ({\mathbf {Z}}_{{\mathcal {C}}}^{\mathrm {T}}\varvec{\beta }_{{\mathcal {C}}} + Z_{j}\beta ) S_T(t\mid {\mathbf {Z}}) S_C(t)]}. \end{aligned}$$

By Proposition 2,

$$\begin{aligned}&\mathrm {E}\left[ W_{j,k}(t,\varvec{\beta }_{{\mathcal {C}}},\beta )\exp ({\mathbf {Z}}^{\mathrm {T}}\varvec{\alpha }) S_T(t\mid {\mathbf {Z}}) S_C(t)\right] \\&\quad =\mathrm {E}\left\{ \mathrm {E}^*\left[ W_{j,k}(t,\varvec{\beta }_{{\mathcal {C}}},\beta )\exp ({\mathbf {Z}}^{\mathrm {T}}\varvec{\alpha }) S_T(t\mid {\mathbf {Z}}) S_C(t) \right] \right\} . \end{aligned}$$

By Definition 6,

$$\begin{aligned} v_j(\varvec{\beta }_{{\mathcal {C}}},\beta )= & {} v_{j,j}(\varvec{\beta }_{\mathcal {C}},\beta ) -\sum _{k\in {\mathcal {C}}} a_k v_{j,k}(\varvec{\beta }_{\mathcal {C}},\beta ) \\= & {} \mathrm {E}\left[ \mathrm {Cov}^{*} (Z_j, \mathrm {P}[\delta =1\mid {\mathbf {Z}}] \mid {\mathbf {Z}}_{{\mathcal {C}}})\right] - g_j(\varvec{\beta }_{{\mathcal {C}}},\beta ), \end{aligned}$$

where

$$\begin{aligned}&\mathrm {E}\left[ \mathrm {Cov}^*(Z_j, \mathrm {P}[\delta =1\mid {\mathbf {Z}}] \mid {\mathbf {Z}}_{{\mathcal {C}}})\right] \\&\quad = \int _0^\tau \mathrm {E}\left[ (Z_j - \mathrm {E}^*[Z_j \mid {\mathbf {Z}}_{{\mathcal {C}}}])\exp ({\mathbf {Z}}^{\mathrm {T}}\varvec{\alpha }) S_T(t\mid {\mathbf {Z}}) S_C(t)\lambda _0(t)\right] \mathrm {d}t, \end{aligned}$$

and

$$\begin{aligned}&g_{j}(\varvec{\beta }_{{\mathcal {C}}},\beta )\\= & {} \int _0^\tau \frac{\mathrm {E}[(Z_j - \mathrm {E}^*[Z_j \mid {\mathbf {Z}}_{{\mathcal {C}}}]) \exp ({\mathbf {Z}}_{{\mathcal {C}}}^{\mathrm {T}}\varvec{\beta }_{{\mathcal {C}}}+Z_j\beta ) S_T(t\mid {\mathbf {Z}})S_C(t)]}{\mathrm {E}[\exp ({\mathbf {Z}}_{{\mathcal {C}}}^{\mathrm {T}}\varvec{\beta }_{{\mathcal {C}}}+Z_j\beta ) S_T(t\mid {\mathbf {Z}})S_C(t)]}\\&\times \mathrm {E}\left[ \exp ({\mathbf {Z}}^{\mathrm {T}}\varvec{\alpha }) S_T(t\mid {\mathbf {Z}})\lambda _0(t) S_C(t)\right] \mathrm {d}t. \end{aligned}$$

By Definition 2, ${\mathbf {v}}_j(\varvec{\beta }_{{\mathcal {C}},j},\beta _j) = {\varvec{0}}_{q+1}$,

$$\begin{aligned} g_j(\varvec{\beta }_{{\mathcal {C}},j}, \beta _j) = \mathrm {E}\left[ \mathrm {Cov}^*(Z_j, \mathrm {P}[\delta =1\mid {\mathbf {Z}}] \mid {\mathbf {Z}}_{{\mathcal {C}}})\right] . \end{aligned}$$

When $\alpha _{j} = 0$, then $\mathrm {E}\left[ \mathrm {Cov}^*(Z_j, \mathrm {P}[\delta =1\mid {\mathbf {Z}}] \mid {\mathbf {Z}}_{{\mathcal {C}}})\right] = 0$ by Condition 2.3. Thus $g_j(\varvec{\beta }_{{\mathcal {C}},j},\beta _j) = 0$. Also, by Propositions 1 and 2, $g_j(\varvec{\beta }_{{\mathcal {C}},0},0) = 0$, then ${\mathbf {v}}_j(\varvec{\beta }_{{\mathcal {C}},0},0) = {\varvec{0}}_{q+1}$. By uniqueness in Lemma 1, $\beta _j = 0$.

When $\alpha _j \ne 0$, by Condition 2, we have

$$\begin{aligned} |g_j(\varvec{\beta }_{{\mathcal {C}},j},\beta _j)| = | \mathrm {E}\left[ \mathrm {Cov}^*( Z_j, \mathrm {P}[\delta =1\mid {\mathbf {Z}}] \mid {\mathbf {Z}}_{{\mathcal {C}}})\right] | > c_1 n^{-\kappa }. \end{aligned}$$

This implies that $g_j(\varvec{\beta }_{{\mathcal {C}},j},\beta _j)$ and $ \mathrm {E}\left[ \mathrm {Cov}^*( Z_j, \mathrm {P}[\delta =1\mid {\mathbf {Z}}] \mid {\mathbf {Z}}_{{\mathcal {C}}})\right] $ are both nonzero and have the same signs since they are equal. Next we show for any $\varvec{\beta }_{{\mathcal {C}}}$, $g_j(\varvec{\beta }_{{\mathcal {C}}},0)$ and $\mathrm {E}\left[ \mathrm {Cov}^*( Z_j, \mathrm {P}[\delta =1\mid {\mathbf {Z}}] \mid {\mathbf {Z}}_{{\mathcal {C}}})\right] $ have the opposite signs unless they are equal to zero. This fact implies that $\beta _j\ne 0$. Specifically, note that $\mathrm {P}(\delta = 1\mid {\mathbf {Z}})$ is the probability of occurring the event and $S_T(t\mid {\mathbf {Z}})S_C(t) = \mathrm {P}(X > t \mid {\mathbf {Z}})$ represents the probability at risk at time t. Based on Model (1), for any t,

$$\begin{aligned} \frac{\partial \mathrm {P}(X > t \mid {\mathbf {Z}})}{\partial Z_j} \times \frac{\partial \mathrm {P}( \delta = 1 \mid {\mathbf {Z}})}{\partial Z_j} \le 0. \end{aligned}$$

By Proposition 3, $\mathrm {Cov}^*( Z_j, \mathrm {P}[\delta =1\mid {\mathbf {Z}}] \mid {\mathbf {Z}}_{{\mathcal {C}}})$ and $\mathrm {Cov}^*[Z_j, S_T(t\mid {\mathbf {Z}})S_C(t)\mid {\mathbf {Z}}_{\mathcal {C}}]$ have the opposite signs unless they are zero. This further implies that for any $\varvec{\beta }_{{\mathcal {C}}}$,

$$\begin{aligned} g_{j}(\varvec{\beta }_{{\mathcal {C}}},0)= & {} \int _{0}^{\tau }\frac{\mathrm {E}[\exp ({\mathbf {Z}}_{{\mathcal {C}}}^{\mathrm {T}} \varvec{\beta }_{{\mathcal {C}}})\mathrm {Cov}^*[Z_j, S_T(t\mid {\mathbf {Z}})S_C(t)\mid {\mathbf {Z}}_{\mathcal {C}}]]}{\mathrm {E}[\exp ({\mathbf {Z}}_{{\mathcal {C}}}^{\mathrm {T}}\varvec{\beta }_{{\mathcal {C}}}) S_T(t\mid {\mathbf {Z}})S_C(t)]}\\&\times \mathrm {E}\left[ \exp ({\mathbf {Z}}^{\mathrm {T}}\varvec{\alpha }) S_T(t\mid {\mathbf {Z}})\lambda _0(t) S_C(t)\right] \mathrm {d}t, \end{aligned}$$

and $\mathrm {E}\left[ \mathrm {Cov}^*( Z_j, \mathrm {P}[\delta =1\mid {\mathbf {Z}}]\mid {\mathbf {Z}}_{{\mathcal {C}}})\right] $ have opposite signs unless they are equal to zero. Therefore, $\beta _j \ne 0$. $\square $

1.2 Proof of Theorem 2

Proof

For any $j\in {\mathcal {M}}_{-{\mathcal {C}}}$, we have $\beta _j \ne 0$ by Theorem 1, by mean value theorem, for some ${{\widetilde{\beta }}}_j \in (0, \beta _j)$,

$$\begin{aligned} |v_{j}(\varvec{\beta }_{{\mathcal {C}},j},0)| = |v_{j}(\varvec{\beta }_{{\mathcal {C}},j},\beta _{j}) - v_{j}(\varvec{\beta }_{{\mathcal {C}},j},0)| = \left| \frac{\partial v_{j}}{\partial \beta }(\varvec{\beta }_{{\mathcal {C}},j},{{\widetilde{\beta }}}_j)\right| |\beta _j|. \end{aligned}$$

Next we show that $\left| \frac{\partial v_{j}}{\partial \beta }(\varvec{\beta }_{{\mathcal {C}},j},{{\widetilde{\beta }}}_j)\right| $ is bounded. For given any $\varvec{\beta }_{{\mathcal {C}}}$, consider $g_{j}(\varvec{\beta }_{{\mathcal {C}}},\beta )$ as a function of $\beta $, Then

$$\begin{aligned} \frac{\partial g_j}{\partial \beta }(\varvec{\beta }_{\mathcal {C}},\beta ) =\mathrm {E}\left[ \int _0^{\tau }H_j(t,\varvec{\beta }_{{\mathcal {C}}},\beta )S_C(t)\mathrm {d}F_T(t\mid {\mathbf {Z}})\right] , \end{aligned}$$

where

$$\begin{aligned}&H_{j}(t, \varvec{\beta }_{\mathcal {C}},\beta ) = \frac{\mathrm {E}[\exp ({\mathbf {Z}}_{{\mathcal {C}}}^{\mathrm {T}} \varvec{\beta }_{{\mathcal {C}}})\mathrm {Cov}^*[Z^2_j\exp (Z_j\beta ), S_T(t\mid {\mathbf {Z}})\mid {\mathbf {Z}}_{\mathcal {C}}]]}{\mathrm {E}[\exp ({\mathbf {Z}}_{{\mathcal {C}}}^{\mathrm {T}}\varvec{\beta }_{{\mathcal {C}}}+Z_j\beta ) S_T(t\mid {\mathbf {Z}})]} \\&\quad -\frac{\mathrm {E}[\exp ({\mathbf {Z}}_{{\mathcal {C}}}^{\mathrm {T}} \varvec{\beta }_{{\mathcal {C}}})\mathrm {Cov}^*[Z_j\exp (Z_j\beta ), S_T(t\mid {\mathbf {Z}})\mid {\mathbf {Z}}_{\mathcal {C}}]]\mathrm {E}[Z_j\exp ({\mathbf {Z}}_{{\mathcal {C}}}^{\mathrm {T}}\varvec{\beta }_{{\mathcal {C}}}+Z_j\beta ) S_T(t\mid {\mathbf {Z}})]}{[\mathrm {E}[\exp ({\mathbf {Z}}_{{\mathcal {C}}}^{\mathrm {T}}\varvec{\beta }_{{\mathcal {C}}}+Z_j\beta ) S_T(t\mid {\mathbf {Z}})]]^2}. \end{aligned}$$

By Condition 2.1, $\mathrm {P}(|Z|<K_0) = 1$, then $\sup _{\varvec{\beta }_{\mathcal {C}},\beta }|H_j(t,\varvec{\beta }_{{\mathcal {C}}},\beta )|\le 2 K_0^2$. Thus,

$$\begin{aligned} \left| \frac{\partial v_{j}}{\partial \beta }(\varvec{\beta }_{{\mathcal {C}},j},{{\widetilde{\beta }}}_j)\right| \le \sup _{\varvec{\beta }_{\mathcal {C}},\beta }\left| \frac{\partial g_j}{\partial \beta }(\varvec{\beta }_{\mathcal {C}},\beta )\right| \le 2K_0^2 |\mathrm {E}[\mathrm {E}[S_C(T)\mid {\mathbf {Z}}]] \le 2K_0^2. \end{aligned}$$

By the proof in Theorem 1, $g(\varvec{\beta }_{{\mathcal {C}},j}, 0)$ and $\mathrm {E}\left[ \mathrm {Cov}^*( Z_j, \mathrm {E}\{F_{T}(C \mid {\mathbf {Z}}) \mid {\mathbf {Z}}\} \mid {\mathbf {Z}}_{{\mathcal {C}}})\right] $ have the opposite signs, and by Condition 2,

$$\begin{aligned} |v_{j}(\varvec{\beta }_{{\mathcal {C}},j},0)| = | \mathrm {E}\left[ \mathrm {Cov}^*( Z_j, \mathrm {P}[\delta =1\mid {\mathbf {Z}}]\mid {\mathbf {Z}}_{{\mathcal {C}}})\right] | + |g_{j}(\varvec{\beta }_{{\mathcal {C}},j},0)| > c_1 n^{-\kappa }. \end{aligned}$$

Taking $c_2 = 0.5K_0^{-2} c_1$, $\beta _j> 0.5K_0^{-2} |v_j(\varvec{\beta }_{{\mathcal {C}},j},0)| > c_2 n^{-\kappa } $. This completes the proof. $\square $

1.3 Proof of Theorem 3

Proof

For any $j\notin {\mathcal {C}}$ and $k\in {\mathcal {C}}\cup \{j\}$, by Lin and Wei (1989), we have

$$\begin{aligned} {\overline{{\mathbf {V}}}}_j(\varvec{\beta }_{{\mathcal {C}}},\beta ) = \mathrm {E}_n\{{\mathbf {W}}_{i,j}(\varvec{\beta }_{{\mathcal {C}}},\beta )\} + o_p(1), \end{aligned}$$

where $\mathrm {E}_n[\cdot ]$ denotes the empirical measure, which is defined as $\mathrm {E}_n[\varvec{\xi }_i] = n^{-1} \sum _{i=1}^n \varvec{\xi }_i$ for any random variables $\varvec{\xi }_1,\ldots , \varvec{\xi }_n$, and ${\mathbf {W}}_{i,j}(\varvec{\beta }_{{\mathcal {C}}},\beta )$ are independent over i, and write ${\mathbf {W}}_{i,j}(\varvec{\beta }_{{\mathcal {C}}},\beta ) = [W_{i,j,k}(\varvec{\beta }_{{\mathcal {C}}},\beta ),k\in {\mathcal {C}}\cup \{j\}]^{\mathrm {T}}$ with

$$\begin{aligned} W_{i,j,k}(\varvec{\beta }_{{\mathcal {C}}},\beta )= & {} \int _0^\tau \left\{ Z_{i,k} - \frac{r^{(1)}_{j,k}(\varvec{\beta }_{{\mathcal {C}}},\beta , t)}{r^{(0)}_{j,k}(\varvec{\beta }_{{\mathcal {C}}},\beta ,t)}\right\} \mathrm {d}N_i(t) \\&- \int _0^\tau \frac{Y_i(t)\exp ({\mathbf {Z}}_{i,{\mathcal {C}}}\varvec{\beta }^{\mathrm {T}}_{{\mathcal {C}}}+ Z_{i,j}\beta )}{r^{(0)}_{j,k}(\varvec{\beta }_{{\mathcal {C}}},\beta ,t)}\left\{ Z_{i,k} - \frac{r^{(1)}_{j,k}(\varvec{\beta }_{{\mathcal {C}}},\beta , t)}{r^{(0)}_{j,k}(\varvec{\beta }_{{\mathcal {C}}},\beta ,t)}\right\} \mathrm {d}\mathrm {E}[N_i(t)]. \end{aligned}$$

Note that given any i, j, k, with probability one $|W_{i,j,k}(\varvec{\beta }_{{\mathcal {C}}},\beta )|$ are uniformly bounded. Specifically, by Conditions 1.2, 2.1 and 3, with probability one, for all $t\in [0,\tau ]$, $(\varvec{\beta }_{{\mathcal {C}}}^{\mathrm {T}},\beta )^{\mathrm {T}}\in {\mathcal {B}}_j$,

$$\begin{aligned} \left| Z_{i,k} - \frac{r^{(1)}_{j,k}(\varvec{\beta }_{{\mathcal {C}}},\beta , t)}{r^{(0)}_{j,k}(\varvec{\beta }_{{\mathcal {C}}},\beta ,t)}\right|\le & {} |Z_{i,k}| + K_0,\\ \left| \frac{Y_i(t)\exp ({\mathbf {Z}}_{i,{\mathcal {C}}}\varvec{\beta }^{\mathrm {T}}_{{\mathcal {C}}}+ Z_{i,j}\beta )}{r^{(0)}_{j,k}(\varvec{\beta }_{{\mathcal {C}}},\beta ,t)}\right|\le & {} \exp \{K_0(K_1+\delta )-\log (L)\}, \end{aligned}$$

and

$$\begin{aligned} \left| \int _0^{\tau } \mathrm {d}\mathrm {E}[N_i(t)]\right| \le \Lambda _0(\tau )\exp (K_0K_1). \end{aligned}$$

Thus, with probability one,

$$\begin{aligned} |W_{i,j,k}(\varvec{\beta }_{{\mathcal {C}}},\beta )| \le K_2, \end{aligned}$$

where $K_2 = 2K_0(1+\Lambda _0(\tau )\exp (2K_0K_1+K_0\delta -\log L))$. By the fact that $\mathrm {E}[W_{i,j,k}(\varvec{\beta }_{{\mathcal {C}}},\beta )] = 0$,

$$\begin{aligned} \mathrm {Var}[W_{i,j,k}(\varvec{\beta }_{{\mathcal {C}}},\beta )] = \mathrm {E}[|W_{i,j,k}(\varvec{\beta }_{{\mathcal {C}}},\beta )|^2] < K_2^2. \end{aligned}$$

By Lemma 2.2.9 (Bernsterin’s inequality) of Vaart and Wellner (1996), for any $t>0$, for all j, k, $\varvec{\beta }_{{{\mathcal {C}}}}$ and $\beta $, we have

$$\begin{aligned} \mathrm {P}\left( |\mathrm {E}_n(W_{i,j,k}(\varvec{\beta }_{{\mathcal {C}}},\beta ))|>\frac{t}{n}\right) \le 2 \exp \left( -\frac{1}{2}\frac{t^2}{n K^2_2+K_2 t/3}\right) . \end{aligned}$$

Note that the above inequality holds for every $j\notin {\mathcal {C}}$ and $k\in {\mathcal {C}}\cup \{j\}$. By Bonferroni inequality,

$$\begin{aligned} \mathrm {P}\left( \Vert \mathrm {E}_n({\mathbf {W}}_{i,j}(\varvec{\beta }_{{\mathcal {C}}},\beta ))\Vert _2>\frac{t}{(q+1)n}\right) \le 2 (q+1)\exp \left( -\frac{1}{2}\frac{t^2}{n K^2_2+K_2 t/3}\right) . \end{aligned}$$

Since,

$$\begin{aligned} \Vert {\overline{{\mathbf {V}}}}_{j}(\varvec{\beta }_{{\mathcal {C}}},\beta )-\mathrm {E}_n({\mathbf {W}}_{i,j} (\varvec{\beta }_{{\mathcal {C}}},\beta ))\Vert _2 = o_p(1). \end{aligned}$$

Then for any $\epsilon _1>0$ and $\epsilon _2>0$, there exits $N_1$, such that for any $n>N_1$,

$$\begin{aligned} \mathrm {P}(\Vert {\overline{{\mathbf {V}}}}_{j}(\varvec{\beta }_{{\mathcal {C}}},\beta ) -\mathrm {E}_n({\mathbf {W}}_{i,j}(\varvec{\beta }_{{\mathcal {C}}},\beta ))\Vert _2 > M\epsilon _1/2 ) < \epsilon _2, \end{aligned}$$

where M is the same value in Condition 4. By Triangle inequality and Bonferroni inequality, we have

$$\begin{aligned}&\mathrm {P}\left( \Vert {\overline{{\mathbf {V}}}}_{j}(\varvec{\beta }_{{\mathcal {C}}},\beta )\Vert _2> \frac{t}{(q+1)n} \right) \\\le & {} \mathrm {P}\left( \Vert \mathrm {E}_n({\mathbf {W}}_{i,j}(\varvec{\beta }_{{\mathcal {C}}},\beta ))\Vert _2> \frac{t}{(q+1)n} -M\epsilon _1/2\right) \\&+\, \mathrm {P}(\Vert {\overline{{\mathbf {V}}}}_{j}(\varvec{\beta }_{{\mathcal {C}}},\beta ) -\mathrm {E}_n({\mathbf {W}}_{i,j}(\varvec{\beta }_{{\mathcal {C}}},\beta ))\Vert _2 > M\epsilon _2/2 ). \end{aligned}$$

When $n\rightarrow \infty $, take $t = c_2M(q+1)n^{1-\kappa }/2>0$ on both side of the inequality, where $c_2$ is the same value in Theorem 2, we have

$$\begin{aligned}&\mathrm {P}\left( \Vert {\overline{{\mathbf {V}}}}_{j}(\varvec{\beta }_{{\mathcal {C}}},\beta )\Vert _2 > \frac{Mc_2}{2}(n^{-\kappa } -\epsilon _1)\right) \\&\quad \le 2 (q+1)\exp \left( -\frac{M^2c_2^2}{8(q+1)^2}\frac{n^{1-2\kappa }}{K^2_2+K_2 n^{-\kappa }/3}\right) +\epsilon _2. \end{aligned}$$

Take $N =\max \{ \lceil (K_2/3)^{1/\kappa }\rceil ,N_1\}$, then for any $n>N$, $n^{-\kappa }<3/K_2$, and

$$\begin{aligned} \mathrm {P}\left( \Vert {\overline{{\mathbf {V}}}}_{j}(\varvec{\beta }_{{\mathcal {C}}},\beta )\Vert _2 > \frac{Mc_2}{2}(n^{-\kappa } -\epsilon _1)\right) \le 2 (q+1)\exp \left( -\frac{M^2c_2^2}{8(q+1)^2} \frac{n^{1-2\kappa }}{K^2_2+1}\right) +\epsilon _2. \end{aligned}$$

Note that the above inequality holds for all $(\varvec{\beta }_{{\mathcal {C}}}^{\mathrm {T}},\beta )^{\mathrm {T}} \in {\mathcal {B}}_j$, particularly for $(\varvec{\beta }_{{\mathcal {C}},j}^{\mathrm {T}},\beta _j)^{\mathrm {T}}$, $j \notin {\mathcal {C}}$. Also, we have ${\overline{{\mathbf {V}}}}_{j}({{\widehat{\varvec{\beta }}}}_{{\mathcal {C}},j},{{\widehat{\beta }}}_j) = {\varvec{0}}_{q+1}$. By Condition 4, we have

$$\begin{aligned}&\mathrm {P}\left( |{{\widehat{\beta }}}_j-\beta _j|>\frac{c_2}{2}(n^{-\kappa } -\epsilon _1)\right) \\&\quad \le \mathrm {P}\left( \Vert ({{\widehat{\varvec{\beta }}}}_{{\mathcal {C}},j}^{\mathrm {T}},{{\widehat{\beta }}}_j)^{\mathrm {T}} -(\varvec{\beta }_{{\mathcal {C}},j}^{\mathrm {T}},\beta _j)^{\mathrm {T}} \Vert _2 >\frac{c_2}{2}(n^{-\kappa } -\epsilon _1)\right) \\&\quad \le 2 (q+1)\exp \left( -\frac{M^2c_2^2}{8(q+1)^2} \frac{n^{1-2\kappa }}{K^2_2+1}\right) +\epsilon _2. \end{aligned}$$

Taking $c_3 = \frac{M^2c_2^2}{8(q+1)^2(K^2_2+1)}$ and by Bonferroni completes the proof for part 1.

For part 2, by Theorem 2,

$$\begin{aligned} \min _{j\in {\mathcal {M}}_{-{\mathcal {C}}}}|\beta _j| > c_2 n^{-\kappa }. \end{aligned}$$

Note that, for any $j\in {\mathcal {M}}_{-{\mathcal {C}}}$, event

$$\begin{aligned}&\left\{ |{{\widehat{\beta }}}_j - \beta _j| \le c_2 n^{-\kappa }/2 - \epsilon _1\right\} \\&\quad \subseteq \left\{ |{{\widehat{\beta }}}_j| \ge |\beta _j| - c_2 n^{-\kappa }/2+\epsilon _1\right\} \\&\quad \subseteq \left\{ |{{\widehat{\beta }}}_j| \ge c_2 n^{-\kappa }/2+\epsilon _1\right\} . \end{aligned}$$

Take $\gamma _n = c_4 n^{-\kappa }$ with $c_4 = c_2/4$,

$$\begin{aligned}&\left\{ \max _{j\in {\mathcal {M}}_{-{\mathcal {C}}}}|{{\widehat{\beta }}}_j - \beta _j| \le c_2 n^{-\kappa }/2 - \epsilon _1\right\} \\&\quad \subseteq \left\{ \min _{j\in {\mathcal {M}}_{-{\mathcal {C}}}}|{{\widehat{\beta }}}_j| \ge c_2 n^{-\kappa }/2+\epsilon _1\right\} \\&\quad \subseteq \left\{ \min _{j\in {\mathcal {M}}_{-{\mathcal {C}}}}|{{\widehat{\beta }}}_j| \ge \gamma _n +\epsilon _1\right\} . \end{aligned}$$

Thus,

$$\begin{aligned}&\mathrm {P}\left[ {\mathcal {M}}_{-{\mathcal {C}}} \subseteq {{\widehat{{\mathcal {M}}}}}_{-{\mathcal {C}}}\right] \\&\quad = \mathrm {P}\left[ \min _{j\in {\mathcal {M}}_{-{\mathcal {C}}}}|{{\widehat{\beta }}}_j|> \gamma _n \right] \\&\quad \ge \mathrm {P}\left[ \min _{j\in {\mathcal {M}}_{-{\mathcal {C}}}}|{{\widehat{\beta }}}_j| > \gamma _n +\epsilon _1\right] \\&\quad \ge 1 - \mathrm {P}\left[ \max _{j\in {\mathcal {M}}_{-{\mathcal {C}}}}|{{\widehat{\beta }}}_j - \beta _j| \le c_2 n^{-\kappa }/2 - \epsilon _1\right] \\&\quad \ge 1 -2w(q+1)\exp (-c_3 n^{1-2\kappa }) - \epsilon _2. \end{aligned}$$

Let $n\rightarrow \infty $, we have for any $\epsilon _2>0$,

$$\begin{aligned} \lim _{n\rightarrow \infty }\mathrm {P}\left[ {\mathcal {M}}_{-{\mathcal {C}}} \subseteq {{\widehat{{\mathcal {M}}}}}_{-{\mathcal {C}}}\right] \ge 1 - \epsilon _2. \end{aligned}$$

Note that the left side of the above equation does not depends on n any more. Taking $\epsilon _2\rightarrow 0$ completes proof. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hong, H.G., Kang, J. & Li, Y. Conditional screening for ultra-high dimensional covariates with survival outcomes. Lifetime Data Anal 24, 45–71 (2018). https://doi.org/10.1007/s10985-016-9387-7

Download citation

Received: 24 March 2016
Accepted: 25 November 2016
Published: 08 December 2016
Issue Date: January 2018
DOI: https://doi.org/10.1007/s10985-016-9387-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Conditional screening for ultra-high dimensional covariates with survival outcomes

Abstract

Access this article

Similar content being viewed by others

Deep learning for survival analysis: a review

Entropy-Based Subsampling Methods for Big Data

The benefits and pitfalls of machine learning for biomarker discovery

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix

Proposition 1

Proposition 2

Remark 1

Remark 2

Proposition 3

Proposition 4

Lemma 1

1.1 Proof of Theorem 1

Proof

1.2 Proof of Theorem 2

Proof

1.3 Proof of Theorem 3

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Conditional screening for ultra-high dimensional covariates with survival outcomes

Abstract

Access this article

Similar content being viewed by others

Deep learning for survival analysis: a review

Entropy-Based Subsampling Methods for Big Data

The benefits and pitfalls of machine learning for biomarker discovery

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Proposition 1

Proposition 2

Remark 1

Remark 2

Proposition 3

Proposition 4

Lemma 1

1.1 Proof of Theorem 1

Proof

1.2 Proof of Theorem 2

Proof

1.3 Proof of Theorem 3

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation