Model-free feature screening for ultrahigh dimensional censored regression

Zhou, Tingyou; Zhu, Liping

doi:10.1007/s11222-016-9664-z

Model-free feature screening for ultrahigh dimensional censored regression

Published: 14 June 2016

Volume 27, pages 947–961, (2017)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

Tingyou Zhou¹ &
Liping Zhu²

1188 Accesses
33 Citations
Explore all metrics

Abstract

In this paper we design a sure independent ranking and screening procedure for censored regression (cSIRS, for short) with ultrahigh dimensional covariates. The inverse probability weighted cSIRS procedure is model-free in the sense that it does not specify a parametric or semiparametric regression function between the response variable and the covariates. Thus, it is robust to model mis-specification. This model-free property is very appealing in ultrahigh dimensional data analysis, particularly when there is lack of information for the underlying regression structure. The cSIRS procedure is also robust in the presence of outliers or extreme values as it merely uses the rank of the censored response variable. We establish both the sure screening and the ranking consistency properties for the cSIRS procedure when the number of covariates p satisfies $p=o\{\exp (an)\}$, where a is a positive constant and n is the available sample size. The advantages of cSIRS over existing competitors are demonstrated through comprehensive simulations and an application to the diffuse large-B-cell lymphoma data set.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Feature screening based on distance correlation for ultrahigh-dimensional censored data with covariate measurement error

Article 12 October 2020

Feature Screening for Ultrahigh-dimensional Censored Data with Varying Coefficient Single-index Model

Article 01 September 2019

Nonparametric independence feature screening for ultrahigh-dimensional survival data

Article 25 April 2018

References

Fan, J., Feng, Y., Wu, Y.: High-dimensional variable selection for coxs proportional hazards model. IMS Collect. 6, 70–86 (2010)
MathSciNet Google Scholar
Fan, J., Li, R.: Variable selection for cox’s proportional hazards model and frailty model. Ann. Stat. 30, 74–99 (2002)
Article MathSciNet MATH Google Scholar
Fan, J., Lv, J.: Sure independence screening for ultrahigh dimensional feature space (with discussion). J. R. Stat. Soc. B 70, 849–911 (2008)
Article MathSciNet Google Scholar
Fan, J., Samworth, R., Wu, Y.: Ultrahigh dimensional feature selection: beyond the linear model. J. Mach. Learn. Res. 10, 1829–1853 (2009)
MathSciNet MATH Google Scholar
Fan, J., Song, R.: Sure independence screening in generalized linear models with NP-Dimensionality. Ann. Stat. 38, 3567–3604 (2010)
Article MathSciNet MATH Google Scholar
Fang, K.T., Kotz, S., Ng, K.W.: Symmetric Multivariate and Related Distributions. Chapman & Hall, London (1989)
MATH Google Scholar
Gorst-Rasmussen, A., Scheike, T.: Independent screening for single-index hazard rate models with ultrahigh dimensional features. J. R. Stat. Soc. B 75, 217–245 (2013)
Article MathSciNet Google Scholar
He, X., Wang, L., Hong, H.: Quantile-adaptive model-free variable screening for high-dimensional heterogeneous data. Ann. Stat. 41, 342–369 (2013)
Article MathSciNet MATH Google Scholar
Li, G., Peng, H., Zhang, J., Zhu, L.: Robust rank correlation based screening. Ann. Stat. 40, 1846–1877 (2012a)
Article MathSciNet MATH Google Scholar
Li, R., Zhong, W., Zhu, L.: Feature screening via distance correlation learning. J. Am. Stat. Assoc. 107, 1129–1139 (2012b)
Article MathSciNet MATH Google Scholar
Lo, S.H., Singh, K.: The product-limit estimator and the bootstrap: some asymptotic representations. Probab. Theory Relat. Fields 71, 455–465 (1986)
Article MathSciNet MATH Google Scholar
Lu, W., Li, L.: Boosting methods for nonlinear transformation models with censored survival data. Biostatistics 9, 658–667 (2008)
Article Google Scholar
Rosenwald, A., Wright, G., Chan, W.C., Connors, J.M., Hermelink, H.K., Smeland, E.B., Staudt, L.M.: The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma. N. Engl. J. Med. 346, 1937–1947 (2002)
Article Google Scholar
Serfling, R.J.: Approximation Theorems of Mathematical Statistics. Wiley, New York (1980)
Book MATH Google Scholar
Tibshirani, R.: Regression shrinkage and selection via lasso. J. R. Stat. Soc. B 58, 267–288 (1996)
MathSciNet MATH Google Scholar
Uno, H., Cai, T., Pencina, M.J., D’Agostino, R.B., Wei, L.J.: On the c-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data. Stat. Med. 30, 1105–1117 (2011)
MathSciNet Google Scholar
Zhao, S.D., Li, Y.: Principled sure independence screening for cox models with ultra-high-dimensional covariates. J. Multivar. Anal. 105, 397–411 (2012)
Article MathSciNet MATH Google Scholar
Zhu, L.P., Li, L., Li, R., Zhu, L.X.: Model-free feature screening for ultrahigh dimensional data. J. Am. Stat. Assoc. 106, 1464–1475 (2011)
Article MathSciNet MATH Google Scholar
Zou, H.: The adaptive lasso and its oracle properties. J. Am. Stat. Assoc. 101, 1418–1429 (2006)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

Tingyou Zhou’s research is supported by Shanghai University of Finance and Economics Innovation Fund of Graduate Student (CXJJ-2014-447). Liping Zhus research is supported by National Natural Science Foundation of China (11371236 and 11422107), Henry Fok Education Foundation Fund of Young College Teachers (141002) and Innovative Research Team in University of China (IRT13077), Ministry of Education of China. All correspondence should be directed to Liping Zhu at zhu.liping@ruc.edu.cn. The authors thank the Editor, an Associate Editor and the anonymous reviewers for their constructive suggestions, which have helped greatly improve the presentation of our paper.

Author information

Authors and Affiliations

School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai, China
Tingyou Zhou
Institute of Statistics and Big Data, Renmin University of China, The Key Laboratory of Mathematical Economics (SUFE), Ministry of Education, Beijing, China
Liping Zhu

Authors

Tingyou Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Liping Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Liping Zhu.

Appendix 1: Proof of theorems

1.1 Appendix 1.1: Proof of theorem 1

We first observe that $\omega _{k} = \omega _{k,1}$. Thus it suffices to show that

$$\begin{aligned} \underset{k\in {\mathcal {A}}}{\min }\omega _{k,1} > \underset{k\in {\mathcal {I}}}{\max }\omega _{k,1}. \end{aligned}$$

(4.1)

With the conditional independence model (2.1) and the linearity condition, we have

$$\begin{aligned} \Omega _{k,1}(t)= & {} E\left[ E\left\{ X_k \mathbf {1}(Y<t)\mid \mathbf {x}_{\mathcal {A}}\right\} \right] \\= & {} E\left[ E(X_k \mid \mathbf {x}_{\mathcal {A}})E\left\{ \mathbf {1}(Y<t)\mid \mathbf {x}_{\mathcal {A}}\right\} \right] \\= & {} \mathrm {cov}\left( X_k,\mathbf {x}_{{\mathcal {A}}}^\mathrm{\tiny {T}}\right) \left\{ \mathrm {var}(\mathbf {x}_{\mathcal {A}}) \right\} ^{-1}E\left\{ \mathbf {x}_{\mathcal {A}}\mathbf {1}(Y<t) \right\} . \end{aligned}$$

Let ${\varvec{\Omega }}_{{\mathcal {A}}}(t) \!=\! E\left\{ \mathbf {x}_{\mathcal {A}}\mathbf {1}(Y\!<\!t) \right\} $ and ${\varvec{\Omega }}_{{\mathcal {A}}} = E\left\{ {\varvec{\Omega }}_{{\mathcal {A}}}(T) {\varvec{\Omega }}^\mathrm{\tiny {T}}_{{\mathcal {A}}}(T)\right\} $. Thus,

$$\begin{aligned} \omega _k = E\left\{ \Omega _{k,1}^2(T) \right\}= & {} \mathrm {cov}\left( X_k,\mathbf {x}_{{\mathcal {A}}}^\mathrm{\tiny {T}}\right) \left\{ \mathrm {var}(\mathbf {x}_{\mathcal {A}}) \right\} ^{-1}\\&\times \,{\varvec{\Omega }}_{{\mathcal {A}}} \left\{ \mathrm {var}(\mathbf {x}_{\mathcal {A}}) \right\} ^{-1} \mathrm {cov}(\mathbf {x}_{{\mathcal {A}}}, X_k). \end{aligned}$$

Without much difficulty, we can obtain that

$$\begin{aligned} \underset{k\in {\mathcal {I}}}{\max } \omega _k\le & {} \lambda _{\max }\left\{ \mathrm {cov}\left( \mathbf {x}_{\mathcal {I}},\mathbf {x}_{\mathcal {A}}^\mathrm{\tiny {T}}\right) \mathrm {cov}\left( \mathbf {x}_{\mathcal {A}},\mathbf {x}_{\mathcal {I}}^\mathrm{\tiny {T}}\right) \right\} \\&\times \,\lambda _{\max }({\varvec{\Omega }}_{{\mathcal {A}}})\lambda ^{-2}_{\min } \left\{ \mathrm {var}(\mathbf {x}_{\mathcal {A}})\right\} , \end{aligned}$$

which implies the desired result.

1.2 Appendix 1.2: Proof of theorem 2

We merely prove the case ${\widehat{G}}_k(t\mid X_k) = {\widehat{G}}(t)$ as the proof for the other two cases are very similar. We first show that $\widehat{\omega }_k= n^3/\{n(n-1)(n-2)\}{\widetilde{\omega }}_k$, a scaled version of ${\widetilde{\omega }}_k$, can be expressed as follows:

Lemma 1

Under Condition (C4), the Kaplan-Meier estimator $\widehat{G}(\cdot )$ satisfies:

(1)
${\sup }_{0\le t \le T}|\widehat{G}(t)-G(t)|= O\{(\frac{\log n}{n})^{\frac{1}{2}}\}$ almost surely.
(2)
$\{\widehat{G}(t)\}^{-1}-\{G(t)\}^{-1}\!=\! n^{-1}\{G(t)\}^{-2} \sum _{g=1}^n{\xi (T_g,\delta _g,t)}\!+\!R_n(t)$, where $\xi (T_g,\delta _g,t),g=1,\ldots ,n$ are i.i.d. random variables with mean zero and ${\sup }_{0\le t \le T}|R_n(t)|=O\{(\frac{\log n}{n})^{\frac{3}{4}}\}$ almost surely.
(3)
${\sup }_{0\le t \le T}|\frac{1}{\widehat{G}(t)}-\frac{1}{G(t)}|=O\{(\frac{\log n}{n})^{\frac{1}{2}}\}$ almost surely.

Result (1) can be found in Lemma 3 of Lo and Singh (1986). Direct application of Taylor expansion yields (2) and (3).

Using Lemma 1, we can write

$$\begin{aligned} \widehat{\omega }_k= & {} \frac{6}{n(n-1)(n-2)}\\&\times \,\sum _{j<i<l}^{n}{h(X_{jk},T_{j},\delta _{j};X_{ik},T_{i},\delta _{i};X_{lk},T_{l},\delta _{l})}\\&+\,O\left\{ \left( \frac{\log n}{n}\right) ^{\frac{1}{2}}\right\} \\&\mathop {=}\limits ^{{\tiny \hbox {def}}}U_n+O\left\{ \left( \frac{\log n}{n}\right) ^{\frac{1}{2}}\right\} \end{aligned}$$

where $h(\cdot )$ stands for the kernel of the U-statistic $U_n$, which can be expressed as follows:

$$\begin{aligned}&h(X_{jk},T_{j},\delta _{j};X_{ik},T_{i},\delta _{i};X_{lk},T_{l},\delta _{l})\\&\quad =\,\Bigg \{\frac{\delta _{i}X_{ik}\mathbf {1}(T_i<T_j)}{G(T_i)} \frac{\delta _{l}X_{lk}\mathbf {1}(T_l<T_j)}{G(T_l)}\\&\qquad +\,\frac{\delta _{l}X_{lk}\mathbf {1}(T_l<T_i)}{G(T_l)} \frac{\delta _{j}X_{jk}\mathbf {1}(T_j<T_i)}{G(T_j)}\\&\qquad +\,\frac{\delta _{j}X_{jk}\mathbf {1}(T_j<T_l)}{G(T_j)} \frac{\delta _{i}X_{ik}\mathbf {1}(T_i<T_l)}{G(T_i)}\Bigg \}\Bigg /3. \end{aligned}$$

In other words, $U_n$ is a standard U-statistics which can be expressed as

$$\begin{aligned} U_n=(n!)^{-1}\sum _{n!}{\mathcal {W}(X_{1k},T_1,\delta _1;\ldots ;X_{nk},T_n,\delta _n)}, \end{aligned}$$

where each ${\mathcal {W}(X_{1k},T_1,\delta _1;\ldots ;X_{nk},T_n,\delta _n)}$ is an average of $k^{*}=[n/3]$ independent and identically distributed random variables, $\sum _{n!}$ means the summation of n! permutations $(i_1,\ldots ,i_n)$ of $(1,\ldots ,n).$

For any $t\in (0,s_0 k^{*})$, where $s_0$ is a positive constant, it follows that

$$\begin{aligned} \hbox {pr}(\widehat{\omega }_k-\omega _k \ge \varepsilon )= & {} \hbox {pr}[\exp (t\widehat{\omega }_k)\ge \exp \{t(\omega _k+\varepsilon )\}]\\\le & {} \frac{E\{\exp (t\widehat{\omega }_k)\}}{\exp (t\varepsilon ) \exp (t \omega _k)}. \end{aligned}$$

The first equality stands because of the monotony of the exponential function and the inequality follows by Markov’s inequality.

Since $E\{\exp (t\widehat{\omega }_k)\}=E\{\exp (t[U_n+O\{(\frac{\log n}{n})^{\frac{1}{2}}\}])\}=E\{\exp (t U_n)\} \exp [t \cdot O\{(\frac{\log n}{n})^{\frac{1}{2}}\}]$, for any fixed $t\in (0,s_0 k^{*})$, $\exp [t \cdot O\{(\frac{\log n}{n})^{\frac{1}{2}}\}]$ goes to 1 as n goes to infinity, together with the application of Jensne’s inequality, we get that

where ${\psi _{h}(s)}=E[\exp \{s \cdot h(X_{jk},T_{j},\delta _{j};X_{ik},T_{i},\delta _{i}; X_{lk},T_{l},\delta _{l})\}],\,\,s\in (0,s_0)$.

The combination of the above results shows that

$$\begin{aligned} \hbox {pr}(\widehat{\omega }_k-\omega _k \ge \varepsilon )\le & {} \exp (-t\varepsilon )\exp (-t \omega _k)\{\psi _{h}(t/k^{*})\}^{k^{*}}\\= & {} \{\exp (-s\varepsilon )\exp (-s \omega _k)\psi _{h}(s)\}^{k^{*}}. \end{aligned}$$

where $s=t/k^{*}.$ Since $E\{h(X_{jk},T_{j},\delta _{j};X_{ik},T_{i},\delta _{i}; X_{lk},T_{l},\delta _{l})\}=\omega _k$, and Taylor expansion shows that for any generic random variable Y, there exists a constant $s_1\in (0,s)$, such that one can find some random variable Z bounded by $0<Z<Y^2\exp (s_1 Y)$ satisfies $\exp (s Y)=1+sY+s^2 Z/2.$ So, we obtain that

Applying Condition (C3) on $X_k$, we obtain that there exists a constant $C_0$ such that

$$\begin{aligned} \underset{1\le k \le p}{\max }\exp (-s \omega _k)\psi _{h}(s)\le 1+C_0 s^2. \end{aligned}$$

Together with Taylor expansion that $\exp (-s\varepsilon )=1-\varepsilon s+O(s^2)$, it follows that

$$\begin{aligned} \underset{1\le k \le p}{\max }\{\exp (-s\varepsilon )\exp (-s \omega _k)\psi _{h}(s)\}\le 1-\varepsilon s/2, \end{aligned}$$

where $s=t/k^{*}\in (0,s_0)$ is sufficiently small (as long as t is sufficiently small). Thus, for an arbitrary $\varepsilon >0$, there exists a small enough constant $s_{\varepsilon }$ such that

$$\begin{aligned} \underset{1\le k \le p}{\max }\hbox {pr}\{\widehat{\omega }_k-\omega _k \ge \varepsilon \}\le (1-\varepsilon s_{\varepsilon }/2)^{n/3}. \end{aligned}$$

Similarly, we can get that

$$\begin{aligned} \underset{1\le k \le p}{\max }\hbox {pr}\{\widehat{\omega }_k-\omega _k \le -\varepsilon \}\le (1-\varepsilon s_{\varepsilon }/2)^{n/3}. \end{aligned}$$

Consequently,

$$\begin{aligned} \hbox {pr}\left( \underset{k=1,\ldots ,p}{\sup }|\widehat{\omega }_k-\omega _k|> \varepsilon \right) \le (2p)\exp \{n\log (1-\varepsilon s_{\varepsilon }/2)/3\}. \end{aligned}$$

Next, we prove that

$$\begin{aligned} \hbox {pr}\left( \underset{k\in {\mathcal {I}}}{\max }\widehat{\omega }_k<\underset{k\in {\mathcal {A}}}{\min }\widehat{\omega }_k\right) \ge 1-(4p)\exp \{n\log (1-\eta s_{\eta /2}/4)/3\}. \end{aligned}$$

Recall that we set $\eta ={\min }{\omega _{k}}_{k\in {\mathcal {A}}}-{\max }{\omega _{k}}_{k\in {\mathcal {I}}}$. Therefore,

$$\begin{aligned} \hbox {pr}\left( \underset{k\in {\mathcal {A}}}{\min }\widehat{\omega }_k\le \underset{k\in {\mathcal {A}}}{\min }\widehat{\omega }_k\right)= & {} \hbox {pr}\left( \underset{k\in {\mathcal {A}}}{\min }\widehat{\omega }_k-\underset{k\in {\mathcal {A}}}{\min }\omega _k\right. \\&\quad \left. +\eta <\underset{k\in {\mathcal {A}}}{\min }\widehat{\omega }_k -\underset{k\in {\mathcal {A}}}{\min }\omega _k\right) \\&\le \hbox {pr}\left( \mathop {\sup }_{k\in {\mathcal {A}}}|\widehat{\omega }_k-\omega _k|\ge \eta /2\right) \\&+\,\hbox {pr}\left( \mathop {\sup }_{k\in {\mathcal {I}}}|\widehat{\omega }_k-\omega _k|\ge \eta /2\right) . \end{aligned}$$

Applying (2.12) with $\varepsilon =\eta /2$, we complete the proof of (2.13).

1.3 Appendix 1.3: Proof of theorem 3

We first prove that for any $\varepsilon > 0$,

$$\begin{aligned} \underset{k=1,\ldots ,p}{\max }\{\hbox {pr}\left( |\widehat{\omega }_k-\omega _k| > \varepsilon \right) \} \le 2\exp \left( -\frac{\tau _0^2 k^{*} \varepsilon ^2}{2a^2}\right) . \end{aligned}$$

(4.2)

From the uniform bound condition of $\mathbf {x}$, we can see that $h(X_{jk},T_{j},\delta _{j};X_{ik},T_{i},\delta _{i};X_{lk},T_{l},\delta _{l}) $, the kernel of the U-statistic $\widehat{\omega }_k$, is also bounded, that is,

$$\begin{aligned} - \frac{a^2}{\tau _0^2}<h(X_{jk},T_{j},\delta _{j}; X_{ik},T_{i},\delta _{i};X_{lk},T_{l},\delta _{l})<\frac{a^2}{\tau _0^2}. \end{aligned}$$

Our foregoing arguments show that for any $t\in (0,s_0 k^{*})$, we have

$$\begin{aligned} \hbox {pr}(\widehat{\omega }_k-\omega _k \ge \varepsilon )\le \exp (-t\varepsilon )\exp (-t \omega _k)\{\psi _{h}(t/k^{*})\}^{k^{*}}, \end{aligned}$$

where $k^{*}=[n/3]$. Together with the exponential inequality in Lemma 5.6.1.A of Serfling (1980), we obtain that

$$\begin{aligned} \hbox {pr}(\widehat{\omega }_k-\omega _k \ge \varepsilon ) \le \exp \left( -\varepsilon t + \frac{a^2 t^2}{2\tau _0^2 k^{*}}\right) . \end{aligned}$$

By choosing $t=\frac{\tau _0^2 k^{*}}{a^2} \varepsilon $, the right hand side attains its minimum $\exp (-\frac{\tau _0^2 k^{*}}{2a^2} \varepsilon ^2)$, which together with the symmetry of U-statistic implies the validity of (4.2). Let $\varepsilon \mathop {=}\limits ^{{\tiny \hbox {def}}}cn^{-\kappa }$. We have

$$\begin{aligned} \underset{k=1,\ldots ,p}{\max }\{\hbox {pr}\left( |\widehat{\omega }_k-\omega _k| \!>\! cn^{-\kappa }\right) \} \!\le \!2\exp \left( -n^{1-2\kappa } \frac{\tau _0^2 c^2}{6a^2}\right) . \end{aligned}$$

(4.3)

To facilitate our subsequent proof, we write the event ${\mathcal {C}}_n \mathop {=}\limits ^{{\tiny \hbox {def}}}\{{\max }_{k \in {\mathcal {A}}}|\widehat{\omega }_k-\omega _k| \le cn^{-\kappa }\}$. Recall that we assume ${\min }{\omega _k}_{k\in {\mathcal {A}}}\ge 2cn^{-\kappa }$. Under this assumption, if the event ${\mathcal {C}}_n$ occurs, it holds for all $k \in {\mathcal {A}}$ that $\widehat{\omega }\ge cn^{-\kappa }$. Thus, we obtain that

$$\begin{aligned} \hbox {pr}\left( {\mathcal {A}}\subseteq {\widehat{{\mathcal {A}}}}\right) \ge \hbox {pr}\left( {\mathcal {C}}_n\right) . \end{aligned}$$

Since

$$\begin{aligned} \hbox {pr}\left( {\mathcal {C}}_n \right)= & {} 1-\hbox {pr}\left( {\mathcal {C}}_n^c \right) =1-\hbox {pr}\{\underset{k \in {\mathcal {A}}}{\max }|\widehat{\omega }_k-\omega _k|> cn^{-\kappa }\}\\\ge & {} 1-s_n\underset{k \in {\mathcal {A}}}{\max }\{\hbox {pr}\left( |\widehat{\omega }_k-\omega _k|> cn^{-\kappa }\right) \}\\\ge & {} 1-s_n\underset{k=1,\ldots ,p}{\max }\{\hbox {pr}\left( |\widehat{\omega }_k-\omega _k| > cn^{-\kappa }\right) \}\\= & {} 1-O\left\{ s_n\exp \left( -n^{1-2\kappa } \cdot \frac{\tau _0^2 c^2}{6a^2}\right) \right\} . \end{aligned}$$

The last equation holds because of (4.3). This completes the proof of Theorem 3.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhou, T., Zhu, L. Model-free feature screening for ultrahigh dimensional censored regression. Stat Comput 27, 947–961 (2017). https://doi.org/10.1007/s11222-016-9664-z

Download citation

Received: 11 March 2015
Accepted: 26 April 2016
Published: 14 June 2016
Issue Date: July 2017
DOI: https://doi.org/10.1007/s11222-016-9664-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Model-free feature screening for ultrahigh dimensional censored regression

Abstract

Access this article

Similar content being viewed by others

Feature screening based on distance correlation for ultrahigh-dimensional censored data with covariate measurement error

Feature Screening for Ultrahigh-dimensional Censored Data with Varying Coefficient Single-index Model

Nonparametric independence feature screening for ultrahigh-dimensional survival data

References

Acknowledgments