Skip to main content
Log in

Surrogate-variable-based model-free feature screening for survival data under the general censoring mechanism

  • Published:
Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Abstract

Feature screening has been seen as the first step in analyzing the ultrahigh-dimensional data with the censored survival time. In this article, we develop a surrogate-variable-based model-free feature screening approach for the censored data under the general censoring mechanism, where the censoring variable may depend on the survival variable and the covariates. This approach is developed by finding some observable variables whose active covariates contain the active covariates of the survival variable as a subset, respectively. Then, any existing model-free feature screening method with the sure screening property for full data can be applied to estimating the sets of the active covariates of the observable variables and hence the set of the active covariates of the survival variable. The sure screening property of the proposed approach is established, and its finite sample performances are demonstrated through some simulations. Further, we illustrate the proposed approach by analyzing two real datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Chang, J., Tang, C. Y., Wu, Y. (2013). Marginal empirical likelihood and sure independence feature screening. The Annals of Statistics, 41, 2123–2148.

    Article  MathSciNet  Google Scholar 

  • Chen, X., Chen, X., Wang, H. (2018). Robust feature screening for ultra-high dimensional right censored data via distance correlation. Computational Statistics and Data Analysis, 119, 118–138.

    Article  MathSciNet  Google Scholar 

  • Cui, H., Li, R., Zhong, W. (2015). Model-free feature screening for ultrahigh dimensional discriminant analysis. Journal of the American Statistical Association, 110, 630–641.

    Article  MathSciNet  Google Scholar 

  • Fan, J., Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space. Journal of the Royal Statistical Society, Series B, 70, 849–911.

    Article  MathSciNet  Google Scholar 

  • Fan, J., Song, R. (2010). Sure independence screening in generalized linear models with NP-dimensionality. The Annals of Statistics, 38, 3567–3604.

    MathSciNet  MATH  Google Scholar 

  • Fan, J., Feng, Y., Wu, Y. (2010). High-dimensional variable selection for Cox’s proportional hazards model. In: Borrowing Strength: Theory Powering Applications-a Festschrift for Lawrence D. Brown, Vol. 6 (70–86). Institute of Mathematical Statistics.

  • Fan, J., Feng, Y., Song, R. (2011). Nonparametric independence screening in sparse ultra-high dimensional additive models. Journal of the American Statistical Association, 106, 544–557.

    Article  MathSciNet  Google Scholar 

  • Gorst-Rasmussen, A., Scheike, T. (2013). Independent screening for single-index hazard rate models with ultrahigh dimensional features. Journal of the Royal Statistical Society, Series B, 75, 217–245.

    Article  MathSciNet  Google Scholar 

  • He, X., Wang, L., Hong, H. G. (2013). Quantile-adaptive model-free variable screening for high-dimensional heterogeneous data. The Annals of Statistics, 41, 342–369.

    MathSciNet  MATH  Google Scholar 

  • Leung, K. M., Elashoff, R. M., Afifi, A. A. (1997). Censoring issues in survival analysis. Annual Review of Public Health, 18, 83–104.

    Article  Google Scholar 

  • Li, G., Peng, H., Zhang, J., Zhu, L. (2012a). Robust rank correlation based screening. The Annals of Statistics, 40, 1846–1877.

    MathSciNet  MATH  Google Scholar 

  • Li, J., Zheng, Q., Peng, L., Huang, Z. (2016). Survival impact index and ultrahigh-dimensional model-free screening with survival outcomes. Biometrics, 72, 1145–1154.

    Article  MathSciNet  Google Scholar 

  • Li, R., Zhong, W., Zhu, L. (2012b). Feature screening via distance correlation learning. Journal of American Statistical Association, 107, 1129–1139.

    Article  MathSciNet  Google Scholar 

  • Liu, Y., Zhang, J., Zhao, X. (2018). A new nonparametric screening method for ultrahigh-dimensional survival data. Computational Statistics and Data Analysis, 119, 74–85.

    Article  MathSciNet  Google Scholar 

  • Mai, Q., Zou, H. (2015). The fused Kolmogorov filter: A nonparametric model-free screening method. The Annals of Statistics, 43, 1471–1497.

    Article  MathSciNet  Google Scholar 

  • Pan, W. L., Wang, X. Q., Xiao, W. N., Zhu, H. T. (2019). A generic sure independence screening procedure. Journal of American Statistical Association, 114, 928–937.

    Article  MathSciNet  Google Scholar 

  • Rosenwald, A., Wright, G., Wiestner, A., Chan, W. C., Connors, J. M., Campo, E., et al. (2003). The proliferation gene expression signature is a quantitative integrator of oncogenic events that predicts survival in mantle cell lymphoma. Cancer Cell, 3, 185–197.

    Article  Google Scholar 

  • Song, R., Lu, W., Ma, S., Jessie Jeng, X. (2014). Censored rank independence screening for high-dimensional survival data. Biometrika, 101, 799–814.

    Article  MathSciNet  Google Scholar 

  • Van Houwelingen, H. C., Bruinsma, T., Hart, A. A., van’t Veer, L. J., Wessels, L. F. (2006). Cross-validated Cox regression on microarray gene expression data. Statistics in Medicine, 25, 3201–3216.

  • Van’t Veer, L., Dai, H., van de Vijver, M., He, Y., Hart, A., Mao, M., van der, H. P. K., Marton, M., Witteveen, A., Schreiber, G., Kerkhoven, R., Roberts, C., Linsley, P., Bernards, R., Firend, S. (2002). Gene expression profiling predicts clinical outcome of breast cancer. Nature, 415, 530–536.

  • Yang, G., Yu, Y., Li, R., Buu, A. (2016). Feature screening in ultrahigh dimensional Cox’s model. Statistica Sinica, 26, 881–901.

    MathSciNet  MATH  Google Scholar 

  • Zhao, S. D., Li, Y. (2012). Principled sure independence screening for Cox models with ultra-high-dimensional covariates. Journal of Multivariate Analysis, 105, 397–411.

    Article  MathSciNet  Google Scholar 

  • Zhong, W., Zhu, L., Li, R., Cui, H. (2016). Regularized quantile regression and robust feature screening for single index models. Statistica Sinica, 26, 69–95.

    MathSciNet  MATH  Google Scholar 

  • Zhu, L. P., Li, L., Li, R., Zhu, L. X. (2011). Model-free feature screening for ultrahigh-dimensional data. Journal of American Statistical Association, 106, 1464–1475.

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

Wang’s research was supported by the National Natural Science Foundation of China (General program 11871460 and program for Innovative Research Group in China 61621003), a grant from the Key Lab of Random Complex Structure and Data Science, CAS.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qihua Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Proof of Lemma 1

To facilitate the presentation, we write \(\mathbf{X} _{\mathcal {A}}=\{X_{k}:k\in \mathcal {A}\}\) for any non-negative integer set \(\mathcal {A}\). First, we prove Lemma 1 (i).

Under the GC mechanism, for any \(t\in [0,\tau )\), we have

$$\begin{aligned} pr(Y>t|\mathbf{X} )=pr(\text {min}(T,C)>t|\mathbf{X} )= pr(T>t|\mathbf{X} )\cdot pr(C>t|T>t, \mathbf{X} ). \end{aligned}$$
(1)

Recalling the definition of \(\mathcal {A}(Y|\mathbf{X} )\), it is easy to see \(\mathcal {A}(Y|\mathbf{X} )\subseteq \mathcal {A}(T|\mathbf{X} ) \cup \mathcal {A}^{*}(C|\mathbf{X} )\). On the other hand, for any \(X_{j}\in \mathbf{X} _{\mathcal {A}(T|\mathbf{X} ) \cup \mathcal {A}^{*}(C|\mathbf{X} )}\), we have \(X_{j}\in \mathbf{X} _{\mathcal {A}(T|\mathbf{X} )}\) or \(X_{j}\in \mathbf{X} _{\mathcal {A}^{*}(C|\mathbf{X} )}\). That is, \(pr(T>t|\mathbf{X} )\) or \(pr(C>t|T>t, \mathbf{X} )\) depend functionally on \(X_{j}\) for some \(t\in [0,\tau )\), and hence \(pr(Y>t|\mathbf{X} )\) depends functionally on \(X_{j}\) by the conditions of Lemma 1. This proves \(X_{j}\in \mathbf{X} _{\mathcal {A}(Y|\mathbf{X} )}\). Lemma 1 (i) is then proved.

Lemma 1 (ii) can be proved similar to Lemma 1(i) by noting

$$\begin{aligned} pr(\delta T>t|\mathbf{X} ) =&pr(\delta =1,T>t|\mathbf{X} ) \\ =&pr(T>t|\mathbf{X} )\cdot pr(\delta =1|T>t, \mathbf{X} ) \end{aligned}$$

for \(t\in [0,\tau )\). \(\square\)

Proofs of Theorems 1 and 2

The proofs are direct based on Lemma 1, and hence we omit it. \(\square\)

Proof of Lemma 2

Under the CRC mechanism, namely, , we then have \(\mathcal {A}^{*}(C|\mathbf{X} )\) is an empty set and \(\mathcal {A}^{*}(\delta |\mathbf{X} )\) is a subset of \(\mathcal {A}(T|\mathbf{X} )\). This proves Lemma 2. \(\square\)

Proof of Lemma 3

Under the RC mechanism, Lemma 3 is a direct result of Lemma 1 by noting \(\mathcal {A}^{*}(C|\mathbf{X} )=\mathcal {A}(C|\mathbf{X} ).\) \(\square\)

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, J., Wang, Q. & Wang, X. Surrogate-variable-based model-free feature screening for survival data under the general censoring mechanism. Ann Inst Stat Math 74, 379–397 (2022). https://doi.org/10.1007/s10463-021-00801-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10463-021-00801-7

Keywords

Navigation