Abstract
Many modern biomedical studies have yielded survival data with high-throughput predictors. The goals of scientific research often lie in identifying predictive biomarkers, understanding biological mechanisms and making accurate and precise predictions. Variable screening is a crucial first step in achieving these goals. This work conducts a selective review of feature screening procedures for survival data with ultrahigh dimensional covariates. We present the main methodologies, along with the key conditions that ensure sure screening properties. The practical utility of these methods is examined via extensive simulations. We conclude the review with some future opportunities in this field.
Similar content being viewed by others
References
E Barut, J Q Fan, A Verhasselt. Conditional sure independence screening, J Amer Statist Assoc, 2016, 111(515): 1266–1277.
J Bradic, J Q Fan, J C Jiang. Regularization for Cox’s proportional hazards model with N Pdimensionality, Ann Statist, 2011, 39(6): 3092–3120.
J Q Fan, Y Feng, Y C Wu. High-dimensional variable selection for Cox’s proportional hazards model, In: IMS Collections 6, Borrowing Strength: Theory Powering Applications - A Festschrift for Lawrence D. Brown, 2010, 70–86.
J Q Fan, R Z Li. Variable selection for Cox’s proportional hazards model and frailty model, Ann Statist, 2000, 30(1): 74–99.
J Q Fan, J Lv. Sure independence screening for ultrahigh dimensional feature space (with discussion), J Roy Statist Soc B, 2008, 70(5): 849–911.
J Q Fan, R Samworth, Y C Wu. Ultrahigh dimensional feature selection: beyond the linear model, J Mach Learn Res, 2009, 10: 2013–2038.
J Q Fan, R Song. Sure independence screening in generalized linear models with NP-dimensionality, Ann Statist, 2010, 38(6): 3567–3604.
A Gorst-Rasmussen, T Scheike. Independent screening for single-index hazard rate models with ultrahigh dimensional features, J Roy Statist Soc B, 2013, 75(2): 217–245.
X M He, L Wang, H G Hong. Quantile-adaptive model-free variable screening for high-dimensional heterogeneous data, Ann Statist, 2013, 41(1): 342–369.
H G Hong, X R Chen, D C Christiani, Y Li. Integrated powered density: screening ultrahigh dimensional covariates with survival outcomes, Biometrics, in press.
H G Hong, J Kang, Y Li. Conditional screening for ultra-high dimensional covariates with survival outcomes, Lifetime Data Anal, 2016, https://doi.org/10.1007/s10985-016-9387-7
H G Hong, L Wang, X M He. A data-driven approach to conditional screening of high dimensional variables, Stat, 2016, 5(1): 200–212.
J Huang, T N Sun, Z L Ying, Y Yu, C-H Zhang. Oracle inequalities for the Lasso in the Cox model, Ann Statist, 2013, 41(3): 1142–1165.
J Kang, H G Hong, Y Li. Partition-based ultrahigh-dimensional variable screening, Biometrika, 2017, https://doi.org/10.1093/biomet/asx052
S C Kong, B Nan. Non-asymptotic oracle inequalities for the high-dimensional Cox regression via Lasso, Statist Sinica, 2014, 24: 25–42.
G R Li, H Peng, J Zhang, L X Zhu. Robust rank correlation based screening, Ann Statist, 2012, 40: 1846–1877.
J L Li, Q Zheng, L M Peng, Z P Huang. Survival impact index and ultrahigh-dimensional modelfree screening with survival outcomes, Biometrics, 2016, 72(4): 1145–1154.
D Y Lin, Z L Ying. Semiparametric analysis of the additive risk model, Biometrika, 1994, 81(1): 61–71.
R Song, W B Lu, S G Ma, X J Jeng. Censored rank independence screening for high-dimensional survival data, Biometrika, 2014, 101(4): 799–814.
R J Tibshirani. The lasso method for variable selection in the Cox model, Stat Med, 1997, 16(4): 385–395.
R J Tibshirani. Univariate shrinkage in the Cox model for high dimensional data, Stat Appl Genet Mol Biol, 2009, 8(1): 3498–3528.
X D Yan, N S Tang, X Q Zhao. The Spearman rank correlation screening for ultrahigh dimensional censored data, eprint arXiv:1702.02708.
G R Yang, Y Yu, R Z Li, A Buu. Feature screening in ultrahigh dimensional Cox’s model, 2016, Statist Sinica, 26: 881–901.
M Yue, J L Li. Improvement screening for ultra-high dimensional data with censored survival outcomes and varying coefficients, Int J Biostat, 2017, 13(1), https://doi.org/10.1515/ijb-2017-0024
H H Zhang, W B Lu. Adaptive Lasso for Cox’s proportional hazards model, Biometrika, 2007, 94(3): 691–703.
J Zhang, G S Yin, Y Y Liu, Y S Wu. Censored cumulative residual independent screening for ultrahigh-dimensional survival data, Lifetime Data Anal, 2017, https://doi.org/10.1007/s10985-017-9395-2
S D Zhao, Y Li. Principled sure independence screening for Cox models with ultra-high-dimensional covariates, J Multivariate Anal, 2012, 105(1): 397–411.
S D Zhao, Y Li. Score test variable screening, Biometrics, 2014, 70(4): 862–871.
H Zou. A note on path-based variable selection in the penalized proportional hazards model, Biometrika, 2008, 95: 241–247.
Acknowledgements
We thank Dr. Jialiang Li for providing the code for the survival impact index screening and Ms. Martina Fu for proofreading the manuscript.
Author information
Authors and Affiliations
Corresponding author
Additional information
Supported by the National Natural Science Foundation of China (11528102) and the National Institutes of Health (U01CA209414).
Rights and permissions
About this article
Cite this article
Hong, H.G., Li, Y. Feature selection of ultrahigh-dimensional covariates with survival outcomes: a selective review. Appl. Math. J. Chin. Univ. 32, 379–396 (2017). https://doi.org/10.1007/s11766-017-3547-8
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11766-017-3547-8