Feature selection of ultrahigh-dimensional covariates with survival outcomes: a selective review

Abstract

Many modern biomedical studies have yielded survival data with high-throughput predictors. The goals of scientific research often lie in identifying predictive biomarkers, understanding biological mechanisms and making accurate and precise predictions. Variable screening is a crucial first step in achieving these goals. This work conducts a selective review of feature screening procedures for survival data with ultrahigh dimensional covariates. We present the main methodologies, along with the key conditions that ensure sure screening properties. The practical utility of these methods is examined via extensive simulations. We conclude the review with some future opportunities in this field.

This is a preview of subscription content, access via your institution.

References

  1. [1]

    E Barut, J Q Fan, A Verhasselt. Conditional sure independence screening, J Amer Statist Assoc, 2016, 111(515): 1266–1277.

    MathSciNet  Article  Google Scholar 

  2. [2]

    J Bradic, J Q Fan, J C Jiang. Regularization for Cox’s proportional hazards model with N Pdimensionality, Ann Statist, 2011, 39(6): 3092–3120.

    MathSciNet  Article  MATH  Google Scholar 

  3. [3]

    J Q Fan, Y Feng, Y C Wu. High-dimensional variable selection for Cox’s proportional hazards model, In: IMS Collections 6, Borrowing Strength: Theory Powering Applications - A Festschrift for Lawrence D. Brown, 2010, 70–86.

    Google Scholar 

  4. [4]

    J Q Fan, R Z Li. Variable selection for Cox’s proportional hazards model and frailty model, Ann Statist, 2000, 30(1): 74–99.

    MathSciNet  MATH  Google Scholar 

  5. [5]

    J Q Fan, J Lv. Sure independence screening for ultrahigh dimensional feature space (with discussion), J Roy Statist Soc B, 2008, 70(5): 849–911.

    Article  Google Scholar 

  6. [6]

    J Q Fan, R Samworth, Y C Wu. Ultrahigh dimensional feature selection: beyond the linear model, J Mach Learn Res, 2009, 10: 2013–2038.

    MathSciNet  MATH  Google Scholar 

  7. [7]

    J Q Fan, R Song. Sure independence screening in generalized linear models with NP-dimensionality, Ann Statist, 2010, 38(6): 3567–3604.

    MathSciNet  Article  MATH  Google Scholar 

  8. [8]

    A Gorst-Rasmussen, T Scheike. Independent screening for single-index hazard rate models with ultrahigh dimensional features, J Roy Statist Soc B, 2013, 75(2): 217–245.

    MathSciNet  Article  Google Scholar 

  9. [9]

    X M He, L Wang, H G Hong. Quantile-adaptive model-free variable screening for high-dimensional heterogeneous data, Ann Statist, 2013, 41(1): 342–369.

    MathSciNet  Article  MATH  Google Scholar 

  10. [10]

    H G Hong, X R Chen, D C Christiani, Y Li. Integrated powered density: screening ultrahigh dimensional covariates with survival outcomes, Biometrics, in press.

  11. [11]

    H G Hong, J Kang, Y Li. Conditional screening for ultra-high dimensional covariates with survival outcomes, Lifetime Data Anal, 2016, https://doi.org/10.1007/s10985-016-9387-7

    Google Scholar 

  12. [12]

    H G Hong, L Wang, X M He. A data-driven approach to conditional screening of high dimensional variables, Stat, 2016, 5(1): 200–212.

    MathSciNet  Article  Google Scholar 

  13. [13]

    J Huang, T N Sun, Z L Ying, Y Yu, C-H Zhang. Oracle inequalities for the Lasso in the Cox model, Ann Statist, 2013, 41(3): 1142–1165.

    MathSciNet  Article  MATH  Google Scholar 

  14. [14]

    J Kang, H G Hong, Y Li. Partition-based ultrahigh-dimensional variable screening, Biometrika, 2017, https://doi.org/10.1093/biomet/asx052

    Google Scholar 

  15. [15]

    S C Kong, B Nan. Non-asymptotic oracle inequalities for the high-dimensional Cox regression via Lasso, Statist Sinica, 2014, 24: 25–42.

    MathSciNet  MATH  Google Scholar 

  16. [16]

    G R Li, H Peng, J Zhang, L X Zhu. Robust rank correlation based screening, Ann Statist, 2012, 40: 1846–1877.

    MathSciNet  Article  MATH  Google Scholar 

  17. [17]

    J L Li, Q Zheng, L M Peng, Z P Huang. Survival impact index and ultrahigh-dimensional modelfree screening with survival outcomes, Biometrics, 2016, 72(4): 1145–1154.

    MathSciNet  Article  MATH  Google Scholar 

  18. [18]

    D Y Lin, Z L Ying. Semiparametric analysis of the additive risk model, Biometrika, 1994, 81(1): 61–71.

    MathSciNet  Article  MATH  Google Scholar 

  19. [19]

    R Song, W B Lu, S G Ma, X J Jeng. Censored rank independence screening for high-dimensional survival data, Biometrika, 2014, 101(4): 799–814.

    MathSciNet  Article  MATH  Google Scholar 

  20. [20]

    R J Tibshirani. The lasso method for variable selection in the Cox model, Stat Med, 1997, 16(4): 385–395.

    Article  Google Scholar 

  21. [21]

    R J Tibshirani. Univariate shrinkage in the Cox model for high dimensional data, Stat Appl Genet Mol Biol, 2009, 8(1): 3498–3528.

    MathSciNet  Article  MATH  Google Scholar 

  22. [22]

    X D Yan, N S Tang, X Q Zhao. The Spearman rank correlation screening for ultrahigh dimensional censored data, eprint arXiv:1702.02708.

  23. [23]

    G R Yang, Y Yu, R Z Li, A Buu. Feature screening in ultrahigh dimensional Cox’s model, 2016, Statist Sinica, 26: 881–901.

    Google Scholar 

  24. [24]

    M Yue, J L Li. Improvement screening for ultra-high dimensional data with censored survival outcomes and varying coefficients, Int J Biostat, 2017, 13(1), https://doi.org/10.1515/ijb-2017-0024

    Google Scholar 

  25. [25]

    H H Zhang, W B Lu. Adaptive Lasso for Cox’s proportional hazards model, Biometrika, 2007, 94(3): 691–703.

    MathSciNet  Article  MATH  Google Scholar 

  26. [26]

    J Zhang, G S Yin, Y Y Liu, Y S Wu. Censored cumulative residual independent screening for ultrahigh-dimensional survival data, Lifetime Data Anal, 2017, https://doi.org/10.1007/s10985-017-9395-2

    Google Scholar 

  27. [27]

    S D Zhao, Y Li. Principled sure independence screening for Cox models with ultra-high-dimensional covariates, J Multivariate Anal, 2012, 105(1): 397–411.

    MathSciNet  Article  MATH  Google Scholar 

  28. [28]

    S D Zhao, Y Li. Score test variable screening, Biometrics, 2014, 70(4): 862–871.

    MathSciNet  Article  MATH  Google Scholar 

  29. [29]

    H Zou. A note on path-based variable selection in the penalized proportional hazards model, Biometrika, 2008, 95: 241–247.

    MathSciNet  Article  MATH  Google Scholar 

Download references

Acknowledgements

We thank Dr. Jialiang Li for providing the code for the survival impact index screening and Ms. Martina Fu for proofreading the manuscript.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Yi Li.

Additional information

Supported by the National Natural Science Foundation of China (11528102) and the National Institutes of Health (U01CA209414).

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Hong, H.G., Li, Y. Feature selection of ultrahigh-dimensional covariates with survival outcomes: a selective review. Appl. Math. J. Chin. Univ. 32, 379–396 (2017). https://doi.org/10.1007/s11766-017-3547-8

Download citation

Keywords

  • survival analysis
  • ultrahigh dimensional predictors
  • variable screening
  • sure screening property

MR Subject Classification

  • 97K80