Abstract
Identifying important biomarkers that are predictive for cancer patients’ prognosis is key in gaining better insights into the biological influences on the disease and has become a critical component of precision medicine. The emergence of large-scale biomedical survival studies, which typically involve excessive number of biomarkers, has brought high demand in designing efficient screening tools for selecting predictive biomarkers. The vast amount of biomarkers defies any existing variable selection methods via regularization. The recently developed variable screening methods, though powerful in many practical setting, fail to incorporate prior information on the importance of each biomarker and are less powerful in detecting marginally weak while jointly important signals. We propose a new conditional screening method for survival outcome data by computing the marginal contribution of each biomarker given priorily known biological information. This is based on the premise that some biomarkers are known to be associated with disease outcomes a priori. Our method possesses sure screening properties and a vanishing false selection rate. The utility of the proposal is further confirmed with extensive simulation studies and analysis of a diffuse large B-cell lymphoma dataset. We are pleased to dedicate this work to Jack Kalbfleisch, who has made instrumental contributions to the development of modern methods of analyzing survival data.
Similar content being viewed by others
References
Barut E, Fan J, Verhasselt A (2016) Conditional sure independence screening. J Am Stat Assoc 116:544–557
Binder H, Schumacher M (2009) Incorporating pathway information into boosting estimation of high-dimensional risk prediction models. BMC Bioinform 10:18
Chow ML, Moler EJ, Mian IS (2001) Identifying marker genes in transcription profiling data using a mixture of feature relevance experts. Physiol Genomics 5:99–111
Deb K, Reddy AR (2003) Reliable classification of two-class cancer data using evolutionary algorithms. BioSystems 72:111–129
Fan J, Lv J (2008) Sure independence screening for ultrahigh dimensional feature space (with discussion). J R Stat Soc B 70:849–911
Fan J, Feng Y, Wu Y (2010) High-dimensional variable selection for Cox’s proportional hazards model. In: IMS collections borrowing strength: theory powering applications—A Festschrift for Lawrence D. Brown, vol 6, pp 70–86. Institute of Mathematical Statistics, Beachwood
Gui J, Li H (2005) Penalized Cox regression analysis in the high-dimensional and low-sample size settings, with applications to microarray gene expression data. Bioinformatics 21(13):3001–3008
Hong H, Wang L, He X (2016) A data-driven approach to conditional screening of high dimensional variables. Stat 5(1):200–212
Jiang Y, He Y, Zhang H (2015) Variable selection with prior information for generalized linear models via the prior Lasso method. J Am Stat Assoc 111(513):355–376
Li H, Luan Y (2005) Boosting proportional hazards models using smoothing splines, with applications to high-dimensional microarray data. Bioinformatics 21(10):2403–2409
Lin DY, Wei LJ (1989) The robust inference for the Cox proportional hazards model. J Am Stat Assoc 84(408):1074–1078
Liu XY, Liang Y, Xu ZB, Zhang H, Leung KS (2013) Adaptive \(l_{1/2}\) shooting regularization method for survival analysis using gene expression data. Sci World J 2013:475702
Mikovits J, Ruscetti F, Zhu W, Bagni R, Dorjsuren D, Shoemaker R (2001) Potential cellular signatures of viral infections in human hematopoietic cells. Dis Markers 17(3):173–178
Rosenwald A, Wright G, Chan W, Connors J, Campo E et al (2002) The use of molecular profiling to predict survival after chemotherapy for diffuse large-b-cell lymphoma. N Engl J Med 346(25):1937–1947
Schifano ED, Strawderman RL, Wells MT (2010) Mm algorithms for minimizing nonsmoothly penalized objective functions. Electron J Stat 4:1258–1299
Song R, Lu W, Ma S, Jeng XJ (2014) Censored rank independence screening for high-dimensional survival data. Biometrika 101(4):799–814
Stewart AK, Schuh AC (2000) White cells 2: impact of understanding the molecular basis of haematological malignant disorders on clinical practice. Lancet 355(9213):1447–1453
Uno H, Cai T, Pencina MJ, D’Agostino RB, Wei LJ (2011) On the c-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data. Stat Med 30(10):1105–1117
Van Der Vaart AW, Wellner JA (1996) Weak convergence. Springer, New York
Wang Z, Xu W, San Lucas F, Liu Y (2013) Incorporating prior knowledge into gene network study. Bioinformatics 29:2633–2640
Zhao SD, Li Y (2012) Principled sure independence screening for Cox models with ultra-high-dimensional covariates. J Multivar Anal 105(1):397–411
Acknowledgements
This research was partially supported by a grant from NSA (H98230-15-1-0260, Hong), an NIH grant (R01MH105561, Kang) and Chinese Natural Science Foundation (11528102, Li).
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
The basic properties of the conditional linear expectation are listed in the following proposition.
Proposition 1
\({\mathbf {v}}_j(\varvec{\beta }_{{\mathcal {C}},0},0)={\varvec{0}}_{q+1}\) if and only if \(v_{j,j}(\varvec{\beta }_{{\mathcal {C}},0},0) = 0\), for all \(j\in {\mathcal {C}}\).
The proof is straightforward based on Definitions 2 and 3.
Proposition 2
Let \(\varvec{\zeta }\), \(\varvec{\zeta }_1\), \(\varvec{\zeta }_2\) and \(\varvec{\xi }\) be any four random variables in the probability space \((\varOmega ,{\mathcal {F}}, \mathrm {P})\). The following properties hold for the conditional linear expectation \(\mathrm {E}^*[\bullet \mid \varvec{\xi }]\) given \(\varvec{\xi }\):
-
1.
Closed form: \(\mathrm {E}^*(\varvec{\zeta }\mid \varvec{\xi }) =\mathrm {E}[\varvec{\zeta }] + \mathrm {Cov}(\varvec{\zeta },\varvec{\xi })\mathrm {Var}[\varvec{\xi }]^{-1} \{\varvec{\xi }-\mathrm {E}(\varvec{\xi })\}\).
-
2.
Stability: \(\mathrm {E}^*[\varvec{\xi }\mid \varvec{\xi }] = \varvec{\xi }\).
-
3.
Linearity: \(\mathrm {E}^*[{\mathbf {A}}_1\varvec{\zeta }_1+{\mathbf {A}}_2\varvec{\zeta }_2 \mid \varvec{\xi }] = {\mathbf {A}}_1\mathrm {E}^*[\varvec{\zeta }_1 \mid \varvec{\xi }]+{\mathbf {A}}_2\mathrm {E}^*[\varvec{\zeta }_2 \mid \varvec{\xi }]\), where \({\mathbf {A}}_1\) and \({\mathbf {A}}_2\) are two matrices that are compatible with the equation.
-
4.
Law of total expectation: \(\mathrm {E}^*[\mathrm {E}^*(\varvec{\zeta }\mid \varvec{\xi })] = \mathrm {E}[\mathrm {E}^*(\varvec{\zeta }\mid \varvec{\xi })] = \mathrm {E}[\varvec{\zeta }]\).
Remark 1
In general, \(\mathrm {E}^*(\varvec{\zeta }\mid \varvec{\xi }) \ne \mathrm {E}(\varvec{\zeta }\mid \varvec{\xi })\). Also, \(\varvec{\zeta }\) and \(\varvec{\xi }\) are independent does not imply \(\mathrm {E}^*(\varvec{\zeta }\mid \varvec{\xi }) = 0\), unless \(\varvec{\zeta }\) and \(\varvec{\xi }\) are jointly normally distributed.
Remark 2
By Proposition 2, we can easily verify the following properties.
Proposition 3
The conditional linear covariance defined in Definition 5 has the following properties:
-
1.
Linear independence and linear zero correlation:
$$\begin{aligned} \mathrm {Cov}^*(\varvec{\zeta }_1,\varvec{\zeta }_2 \mid \varvec{\xi }) = 0\qquad \Leftrightarrow \qquad \mathrm {E}^*(\varvec{\zeta }_1 \varvec{\zeta }_2 \mid \varvec{\xi }) = \mathrm {E}^*(\varvec{\zeta }_1 \mid \varvec{\xi }) \mathrm {E}^*(\varvec{\zeta }_2 \mid \varvec{\xi }). \end{aligned}$$ -
2.
Expectation of conditional linear covariance:
$$\begin{aligned} \mathrm {E}[\mathrm {Cov}^*(\varvec{\zeta }_1,\varvec{\zeta }_2\mid \varvec{\xi })] = \mathrm {Cov}(\varvec{\zeta }_1,\varvec{\zeta }_2) - \mathrm {Cov}(\varvec{\zeta }_1,\varvec{\xi }) \mathrm {Var}(\varvec{\xi })^{-1} \mathrm {Cov}(\varvec{\xi },\varvec{\zeta }_2). \end{aligned}$$ -
3.
Sign: for any increasing function \(h(\cdot ): {\mathbb {R}}\rightarrow {\mathbb {R}}\) and random variable \(\eta : \varOmega \rightarrow {\mathbb {R}}\), then
$$\begin{aligned} \mathrm {Cov}^*(h(\eta ),\eta \mid \varvec{\xi }) \ge 0. \end{aligned}$$
Combining Propositions 1–3 and based on Definition 6, we have the following property.
Proposition 4
\(v_{j,j}(\varvec{\beta }_{{\mathcal {C}},0},0) = 0\) if and only if \(v_j(\varvec{\beta }_{{\mathcal {C}},0},0) = 0\).
Lemma 1
The solution of \({\mathbf {v}}_j(\varvec{\beta }_{{\mathcal {C}}},\beta ) = {\varvec{0}}_{q+1}\) and the solution of \({\mathbf {v}}_{{\mathcal {C}}}(\varvec{\beta }_{{\mathcal {C}}}) = {\varvec{0}}_q\) are both unique, for any \(j \notin {\mathcal {C}}\).
1.1 Proof of Theorem 1
Proof
First we make the connection between \(\beta _{j}\) to the expected conditional linear covariance between \(Z_j\) and \(\mathrm {P}[\delta =1\mid {\mathbf {Z}}]\) given \({\mathbf {Z}}_{\mathcal {C}}\), that is
then by Condition 2, we relate it to \(\alpha _{j}\). For any \(j\notin {\mathcal {C}}\) and \(k\in {\mathcal {C}}\), it is straightforward to see that
and
for \(m = 0,1\). Then
where
By Proposition 2,
By Definition 6,
where
and
By Definition 2, \({\mathbf {v}}_j(\varvec{\beta }_{{\mathcal {C}},j},\beta _j) = {\varvec{0}}_{q+1}\),
When \(\alpha _{j} = 0\), then \(\mathrm {E}\left[ \mathrm {Cov}^*(Z_j, \mathrm {P}[\delta =1\mid {\mathbf {Z}}] \mid {\mathbf {Z}}_{{\mathcal {C}}})\right] = 0\) by Condition 2.3. Thus \(g_j(\varvec{\beta }_{{\mathcal {C}},j},\beta _j) = 0\). Also, by Propositions 1 and 2, \(g_j(\varvec{\beta }_{{\mathcal {C}},0},0) = 0\), then \({\mathbf {v}}_j(\varvec{\beta }_{{\mathcal {C}},0},0) = {\varvec{0}}_{q+1}\). By uniqueness in Lemma 1, \(\beta _j = 0\).
When \(\alpha _j \ne 0\), by Condition 2, we have
This implies that \(g_j(\varvec{\beta }_{{\mathcal {C}},j},\beta _j)\) and \( \mathrm {E}\left[ \mathrm {Cov}^*( Z_j, \mathrm {P}[\delta =1\mid {\mathbf {Z}}] \mid {\mathbf {Z}}_{{\mathcal {C}}})\right] \) are both nonzero and have the same signs since they are equal. Next we show for any \(\varvec{\beta }_{{\mathcal {C}}}\), \(g_j(\varvec{\beta }_{{\mathcal {C}}},0)\) and \(\mathrm {E}\left[ \mathrm {Cov}^*( Z_j, \mathrm {P}[\delta =1\mid {\mathbf {Z}}] \mid {\mathbf {Z}}_{{\mathcal {C}}})\right] \) have the opposite signs unless they are equal to zero. This fact implies that \(\beta _j\ne 0\). Specifically, note that \(\mathrm {P}(\delta = 1\mid {\mathbf {Z}})\) is the probability of occurring the event and \(S_T(t\mid {\mathbf {Z}})S_C(t) = \mathrm {P}(X > t \mid {\mathbf {Z}})\) represents the probability at risk at time t. Based on Model (1), for any t,
By Proposition 3, \(\mathrm {Cov}^*( Z_j, \mathrm {P}[\delta =1\mid {\mathbf {Z}}] \mid {\mathbf {Z}}_{{\mathcal {C}}})\) and \(\mathrm {Cov}^*[Z_j, S_T(t\mid {\mathbf {Z}})S_C(t)\mid {\mathbf {Z}}_{\mathcal {C}}]\) have the opposite signs unless they are zero. This further implies that for any \(\varvec{\beta }_{{\mathcal {C}}}\),
and \(\mathrm {E}\left[ \mathrm {Cov}^*( Z_j, \mathrm {P}[\delta =1\mid {\mathbf {Z}}]\mid {\mathbf {Z}}_{{\mathcal {C}}})\right] \) have opposite signs unless they are equal to zero. Therefore, \(\beta _j \ne 0\). \(\square \)
1.2 Proof of Theorem 2
Proof
For any \(j\in {\mathcal {M}}_{-{\mathcal {C}}}\), we have \(\beta _j \ne 0\) by Theorem 1, by mean value theorem, for some \({{\widetilde{\beta }}}_j \in (0, \beta _j)\),
Next we show that \(\left| \frac{\partial v_{j}}{\partial \beta }(\varvec{\beta }_{{\mathcal {C}},j},{{\widetilde{\beta }}}_j)\right| \) is bounded. For given any \(\varvec{\beta }_{{\mathcal {C}}}\), consider \(g_{j}(\varvec{\beta }_{{\mathcal {C}}},\beta )\) as a function of \(\beta \), Then
where
By Condition 2.1, \(\mathrm {P}(|Z|<K_0) = 1\), then \(\sup _{\varvec{\beta }_{\mathcal {C}},\beta }|H_j(t,\varvec{\beta }_{{\mathcal {C}}},\beta )|\le 2 K_0^2\). Thus,
By the proof in Theorem 1, \(g(\varvec{\beta }_{{\mathcal {C}},j}, 0)\) and \(\mathrm {E}\left[ \mathrm {Cov}^*( Z_j, \mathrm {E}\{F_{T}(C \mid {\mathbf {Z}}) \mid {\mathbf {Z}}\} \mid {\mathbf {Z}}_{{\mathcal {C}}})\right] \) have the opposite signs, and by Condition 2,
Taking \(c_2 = 0.5K_0^{-2} c_1\), \(\beta _j> 0.5K_0^{-2} |v_j(\varvec{\beta }_{{\mathcal {C}},j},0)| > c_2 n^{-\kappa } \). This completes the proof. \(\square \)
1.3 Proof of Theorem 3
Proof
For any \(j\notin {\mathcal {C}}\) and \(k\in {\mathcal {C}}\cup \{j\}\), by Lin and Wei (1989), we have
where \(\mathrm {E}_n[\cdot ]\) denotes the empirical measure, which is defined as \(\mathrm {E}_n[\varvec{\xi }_i] = n^{-1} \sum _{i=1}^n \varvec{\xi }_i\) for any random variables \(\varvec{\xi }_1,\ldots , \varvec{\xi }_n\), and \({\mathbf {W}}_{i,j}(\varvec{\beta }_{{\mathcal {C}}},\beta )\) are independent over i, and write \({\mathbf {W}}_{i,j}(\varvec{\beta }_{{\mathcal {C}}},\beta ) = [W_{i,j,k}(\varvec{\beta }_{{\mathcal {C}}},\beta ),k\in {\mathcal {C}}\cup \{j\}]^{\mathrm {T}}\) with
Note that given any i, j, k, with probability one \(|W_{i,j,k}(\varvec{\beta }_{{\mathcal {C}}},\beta )|\) are uniformly bounded. Specifically, by Conditions 1.2, 2.1 and 3, with probability one, for all \(t\in [0,\tau ]\), \((\varvec{\beta }_{{\mathcal {C}}}^{\mathrm {T}},\beta )^{\mathrm {T}}\in {\mathcal {B}}_j\),
and
Thus, with probability one,
where \(K_2 = 2K_0(1+\Lambda _0(\tau )\exp (2K_0K_1+K_0\delta -\log L))\). By the fact that \(\mathrm {E}[W_{i,j,k}(\varvec{\beta }_{{\mathcal {C}}},\beta )] = 0\),
By Lemma 2.2.9 (Bernsterin’s inequality) of Vaart and Wellner (1996), for any \(t>0\), for all j, k, \(\varvec{\beta }_{{{\mathcal {C}}}}\) and \(\beta \), we have
Note that the above inequality holds for every \(j\notin {\mathcal {C}}\) and \(k\in {\mathcal {C}}\cup \{j\}\). By Bonferroni inequality,
Since,
Then for any \(\epsilon _1>0\) and \(\epsilon _2>0\), there exits \(N_1\), such that for any \(n>N_1\),
where M is the same value in Condition 4. By Triangle inequality and Bonferroni inequality, we have
When \(n\rightarrow \infty \), take \(t = c_2M(q+1)n^{1-\kappa }/2>0\) on both side of the inequality, where \(c_2\) is the same value in Theorem 2, we have
Take \(N =\max \{ \lceil (K_2/3)^{1/\kappa }\rceil ,N_1\}\), then for any \(n>N\), \(n^{-\kappa }<3/K_2\), and
Note that the above inequality holds for all \((\varvec{\beta }_{{\mathcal {C}}}^{\mathrm {T}},\beta )^{\mathrm {T}} \in {\mathcal {B}}_j\), particularly for \((\varvec{\beta }_{{\mathcal {C}},j}^{\mathrm {T}},\beta _j)^{\mathrm {T}}\), \(j \notin {\mathcal {C}}\). Also, we have \({\overline{{\mathbf {V}}}}_{j}({{\widehat{\varvec{\beta }}}}_{{\mathcal {C}},j},{{\widehat{\beta }}}_j) = {\varvec{0}}_{q+1}\). By Condition 4, we have
Taking \(c_3 = \frac{M^2c_2^2}{8(q+1)^2(K^2_2+1)}\) and by Bonferroni completes the proof for part 1.
For part 2, by Theorem 2,
Note that, for any \(j\in {\mathcal {M}}_{-{\mathcal {C}}}\), event
Take \(\gamma _n = c_4 n^{-\kappa }\) with \(c_4 = c_2/4\),
Thus,
Let \(n\rightarrow \infty \), we have for any \(\epsilon _2>0\),
Note that the left side of the above equation does not depends on n any more. Taking \(\epsilon _2\rightarrow 0\) completes proof. \(\square \)
Rights and permissions
About this article
Cite this article
Hong, H.G., Kang, J. & Li, Y. Conditional screening for ultra-high dimensional covariates with survival outcomes. Lifetime Data Anal 24, 45–71 (2018). https://doi.org/10.1007/s10985-016-9387-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10985-016-9387-7