Surrogate-variable-based model-free feature screening for survival data under the general censoring mechanism

Zhang, Jing; Wang, Qihua; Wang, Xuan

doi:10.1007/s10463-021-00801-7

Surrogate-variable-based model-free feature screening for survival data under the general censoring mechanism

Published: 03 June 2021

Volume 74, pages 379–397, (2022)
Cite this article

Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Jing Zhang¹,
Qihua Wang^2,3 &
Xuan Wang⁴

241 Accesses
1 Citation
Explore all metrics

Abstract

Feature screening has been seen as the first step in analyzing the ultrahigh-dimensional data with the censored survival time. In this article, we develop a surrogate-variable-based model-free feature screening approach for the censored data under the general censoring mechanism, where the censoring variable may depend on the survival variable and the covariates. This approach is developed by finding some observable variables whose active covariates contain the active covariates of the survival variable as a subset, respectively. Then, any existing model-free feature screening method with the sure screening property for full data can be applied to estimating the sets of the active covariates of the observable variables and hence the set of the active covariates of the survival variable. The sure screening property of the proposed approach is established, and its finite sample performances are demonstrated through some simulations. Further, we illustrate the proposed approach by analyzing two real datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Model-free feature screening for high-dimensional survival data

Article 02 April 2018

Feature screening based on distance correlation for ultrahigh-dimensional censored data with covariate measurement error

Article 12 October 2020

Feature Screening for Ultrahigh-dimensional Censored Data with Varying Coefficient Single-index Model

Article 01 September 2019

References

Chang, J., Tang, C. Y., Wu, Y. (2013). Marginal empirical likelihood and sure independence feature screening. The Annals of Statistics, 41, 2123–2148.
Article MathSciNet Google Scholar
Chen, X., Chen, X., Wang, H. (2018). Robust feature screening for ultra-high dimensional right censored data via distance correlation. Computational Statistics and Data Analysis, 119, 118–138.
Article MathSciNet Google Scholar
Cui, H., Li, R., Zhong, W. (2015). Model-free feature screening for ultrahigh dimensional discriminant analysis. Journal of the American Statistical Association, 110, 630–641.
Article MathSciNet Google Scholar
Fan, J., Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space. Journal of the Royal Statistical Society, Series B, 70, 849–911.
Article MathSciNet Google Scholar
Fan, J., Song, R. (2010). Sure independence screening in generalized linear models with NP-dimensionality. The Annals of Statistics, 38, 3567–3604.
MathSciNet MATH Google Scholar
Fan, J., Feng, Y., Wu, Y. (2010). High-dimensional variable selection for Cox’s proportional hazards model. In: Borrowing Strength: Theory Powering Applications-a Festschrift for Lawrence D. Brown, Vol. 6 (70–86). Institute of Mathematical Statistics.
Fan, J., Feng, Y., Song, R. (2011). Nonparametric independence screening in sparse ultra-high dimensional additive models. Journal of the American Statistical Association, 106, 544–557.
Article MathSciNet Google Scholar
Gorst-Rasmussen, A., Scheike, T. (2013). Independent screening for single-index hazard rate models with ultrahigh dimensional features. Journal of the Royal Statistical Society, Series B, 75, 217–245.
Article MathSciNet Google Scholar
He, X., Wang, L., Hong, H. G. (2013). Quantile-adaptive model-free variable screening for high-dimensional heterogeneous data. The Annals of Statistics, 41, 342–369.
MathSciNet MATH Google Scholar
Leung, K. M., Elashoff, R. M., Afifi, A. A. (1997). Censoring issues in survival analysis. Annual Review of Public Health, 18, 83–104.
Article Google Scholar
Li, G., Peng, H., Zhang, J., Zhu, L. (2012a). Robust rank correlation based screening. The Annals of Statistics, 40, 1846–1877.
MathSciNet MATH Google Scholar
Li, J., Zheng, Q., Peng, L., Huang, Z. (2016). Survival impact index and ultrahigh-dimensional model-free screening with survival outcomes. Biometrics, 72, 1145–1154.
Article MathSciNet Google Scholar
Li, R., Zhong, W., Zhu, L. (2012b). Feature screening via distance correlation learning. Journal of American Statistical Association, 107, 1129–1139.
Article MathSciNet Google Scholar
Liu, Y., Zhang, J., Zhao, X. (2018). A new nonparametric screening method for ultrahigh-dimensional survival data. Computational Statistics and Data Analysis, 119, 74–85.
Article MathSciNet Google Scholar
Mai, Q., Zou, H. (2015). The fused Kolmogorov filter: A nonparametric model-free screening method. The Annals of Statistics, 43, 1471–1497.
Article MathSciNet Google Scholar
Pan, W. L., Wang, X. Q., Xiao, W. N., Zhu, H. T. (2019). A generic sure independence screening procedure. Journal of American Statistical Association, 114, 928–937.
Article MathSciNet Google Scholar
Rosenwald, A., Wright, G., Wiestner, A., Chan, W. C., Connors, J. M., Campo, E., et al. (2003). The proliferation gene expression signature is a quantitative integrator of oncogenic events that predicts survival in mantle cell lymphoma. Cancer Cell, 3, 185–197.
Article Google Scholar
Song, R., Lu, W., Ma, S., Jessie Jeng, X. (2014). Censored rank independence screening for high-dimensional survival data. Biometrika, 101, 799–814.
Article MathSciNet Google Scholar
Van Houwelingen, H. C., Bruinsma, T., Hart, A. A., van’t Veer, L. J., Wessels, L. F. (2006). Cross-validated Cox regression on microarray gene expression data. Statistics in Medicine, 25, 3201–3216.
Van’t Veer, L., Dai, H., van de Vijver, M., He, Y., Hart, A., Mao, M., van der, H. P. K., Marton, M., Witteveen, A., Schreiber, G., Kerkhoven, R., Roberts, C., Linsley, P., Bernards, R., Firend, S. (2002). Gene expression profiling predicts clinical outcome of breast cancer. Nature, 415, 530–536.
Yang, G., Yu, Y., Li, R., Buu, A. (2016). Feature screening in ultrahigh dimensional Cox’s model. Statistica Sinica, 26, 881–901.
MathSciNet MATH Google Scholar
Zhao, S. D., Li, Y. (2012). Principled sure independence screening for Cox models with ultra-high-dimensional covariates. Journal of Multivariate Analysis, 105, 397–411.
Article MathSciNet Google Scholar
Zhong, W., Zhu, L., Li, R., Cui, H. (2016). Regularized quantile regression and robust feature screening for single index models. Statistica Sinica, 26, 69–95.
MathSciNet MATH Google Scholar
Zhu, L. P., Li, L., Li, R., Zhu, L. X. (2011). Model-free feature screening for ultrahigh-dimensional data. Journal of American Statistical Association, 106, 1464–1475.
Article MathSciNet Google Scholar

Download references

Acknowledgements

Wang’s research was supported by the National Natural Science Foundation of China (General program 11871460 and program for Innovative Research Group in China 61621003), a grant from the Key Lab of Random Complex Structure and Data Science, CAS.

Author information

Authors and Affiliations

School of Statistics and Mathematics, Shanghai Lixin University of Accounting and Finance, Shanghai, 201209, China
Jing Zhang
Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, 100190, China
Qihua Wang
School of Statistics and Mathematics, Zhejiang Gongshang University, Zhejiang, 310018, China
Qihua Wang
School of Mathematical Sciences, Zhejiang University, Zhejiang, 310018, China
Xuan Wang

Authors

Jing Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Qihua Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xuan Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qihua Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Proof of Lemma 1

To facilitate the presentation, we write $\mathbf{X} _{\mathcal {A}}=\{X_{k}:k\in \mathcal {A}\}$ for any non-negative integer set $\mathcal {A}$. First, we prove Lemma 1 (i).

Under the GC mechanism, for any $t\in [0,\tau )$, we have

$$\begin{aligned} pr(Y>t|\mathbf{X} )=pr(\text {min}(T,C)>t|\mathbf{X} )= pr(T>t|\mathbf{X} )\cdot pr(C>t|T>t, \mathbf{X} ). \end{aligned}$$

(1)

Recalling the definition of $\mathcal {A}(Y|\mathbf{X} )$, it is easy to see $\mathcal {A}(Y|\mathbf{X} )\subseteq \mathcal {A}(T|\mathbf{X} ) \cup \mathcal {A}^{*}(C|\mathbf{X} )$. On the other hand, for any $X_{j}\in \mathbf{X} _{\mathcal {A}(T|\mathbf{X} ) \cup \mathcal {A}^{*}(C|\mathbf{X} )}$, we have $X_{j}\in \mathbf{X} _{\mathcal {A}(T|\mathbf{X} )}$ or $X_{j}\in \mathbf{X} _{\mathcal {A}^{*}(C|\mathbf{X} )}$. That is, $pr(T>t|\mathbf{X} )$ or $pr(C>t|T>t, \mathbf{X} )$ depend functionally on $X_{j}$ for some $t\in [0,\tau )$, and hence $pr(Y>t|\mathbf{X} )$ depends functionally on $X_{j}$ by the conditions of Lemma 1. This proves $X_{j}\in \mathbf{X} _{\mathcal {A}(Y|\mathbf{X} )}$. Lemma 1 (i) is then proved.

Lemma 1 (ii) can be proved similar to Lemma 1(i) by noting

$$\begin{aligned} pr(\delta T>t|\mathbf{X} ) =&pr(\delta =1,T>t|\mathbf{X} ) \\ =&pr(T>t|\mathbf{X} )\cdot pr(\delta =1|T>t, \mathbf{X} ) \end{aligned}$$

for $t\in [0,\tau )$. $\square$

Proofs of Theorems 1 and 2

The proofs are direct based on Lemma 1, and hence we omit it. $\square$

Proof of Lemma 2

Under the CRC mechanism, namely, , we then have $\mathcal {A}^{*}(C|\mathbf{X} )$ is an empty set and $\mathcal {A}^{*}(\delta |\mathbf{X} )$ is a subset of $\mathcal {A}(T|\mathbf{X} )$. This proves Lemma 2. $\square$

Proof of Lemma 3

Under the RC mechanism, Lemma 3 is a direct result of Lemma 1 by noting $\mathcal {A}^{*}(C|\mathbf{X} )=\mathcal {A}(C|\mathbf{X} ).$ $\square$

About this article

Cite this article

Zhang, J., Wang, Q. & Wang, X. Surrogate-variable-based model-free feature screening for survival data under the general censoring mechanism. Ann Inst Stat Math 74, 379–397 (2022). https://doi.org/10.1007/s10463-021-00801-7

Download citation

Received: 20 May 2020
Revised: 26 March 2021
Accepted: 12 April 2021
Published: 03 June 2021
Issue Date: April 2022
DOI: https://doi.org/10.1007/s10463-021-00801-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Surrogate-variable-based model-free feature screening for survival data under the general censoring mechanism

Abstract

Access this article

Similar content being viewed by others

Model-free feature screening for high-dimensional survival data

Feature screening based on distance correlation for ultrahigh-dimensional censored data with covariate measurement error

Feature Screening for Ultrahigh-dimensional Censored Data with Varying Coefficient Single-index Model

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Proof of Lemma 1

Proofs of Theorems 1 and 2

Proof of Lemma 2

Proof of Lemma 3

About this article

Cite this article

Keywords

Navigation

Surrogate-variable-based model-free feature screening for survival data under the general censoring mechanism

Abstract

Access this article

Similar content being viewed by others

Model-free feature screening for high-dimensional survival data

Feature screening based on distance correlation for ultrahigh-dimensional censored data with covariate measurement error

Feature Screening for Ultrahigh-dimensional Censored Data with Varying Coefficient Single-index Model

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

Proof of Lemma 1

Proofs of Theorems 1 and 2

Proof of Lemma 2

Proof of Lemma 3

About this article

Cite this article

Share this article

Keywords

Search

Navigation