Model-free feature screening for high-dimensional survival data

Articles
  • 7 Downloads

Abstract

With the rapid-growth-in-size scientific data in various disciplines, feature screening plays an important role to reduce the high-dimensionality to a moderate scale in many scientific fields. In this paper, we introduce a unified and robust model-free feature screening approach for high-dimensional survival data with censoring, which has several advantages: it is a model-free approach under a general model framework, and hence avoids the complication to specify an actual model form with huge number of candidate variables; under mild conditions without requiring the existence of any moment of the response, it enjoys the ranking consistency and sure screening properties in ultra-high dimension. In particular, we impose a conditional independence assumption of the response and the censoring variable given each covariate, instead of assuming the censoring variable is independent of the response and the covariates. Moreover, we also propose a more robust variant to the new procedure, which possesses desirable theoretical properties without any finite moment condition of the predictors and the response. The computation of the newly proposed methods does not require any complicated numerical optimization and it is fast and easy to implement. Extensive numerical studies demonstrate that the proposed methods perform competitively for various configurations. Application is illustrated with an analysis of a genetic data set.

Keywords

feature screening random censoring robustness sure independence screening ultra-high dimension 

MSC(2010)

35J60 35J70 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Notes

Acknowledgments

This work was supported by the Research Grant Council of Hong Kong (Grant Nos. 509413 and 14311916), Direct Grants for Research of The Chinese University of Hong Kong (Grant Nos. 3132754 and 4053235), the Natural Science Foundation of Jiangxi Province (Grant No. 20161BAB201024), the Key Science Fund Project of Jiangxi Province Eduction Department (Grant No. GJJ150439), the National Natural Science Foundation of China (Grant Nos. 11461029, 11601197 and 61562030) and the Canadian Institutes of Health Research (Grant No. 145546). The authors are grateful to the two reviewers for their insightful comments that lead to substantial improvements in the paper. The authors are also thankful to Professor Liping Zhu for his constructive comments.

References

  1. 1.
    Beran R. Nonparametric regression with randomly censored survival data. Technical report. Berkeley: University of California, 1981Google Scholar
  2. 2.
    Bradic J, Fan J, Jiang J. Regularization for Cox's proportional hazards model with NP-dimensionality. Ann Statist, 2011, 39: 3092–3120MathSciNetCrossRefMATHGoogle Scholar
  3. 3.
    Candes E, Tao T. The Dantzig selector: Statistical estimation when p is much larger than n: Ann Statist, 2007, 35: 2313–2351Google Scholar
  4. 4.
    Cui H, Li R, Zhong W. Model-free feature screening for ultrahigh dimensional discriminant analysis. J Amer Statist Assoc, 2015, 110: 630–641MathSciNetCrossRefMATHGoogle Scholar
  5. 5.
    Dave S S, Wright G, Tan B, et al. Prediction of survival in follicular lymphoma based on molecular features of tumor-infiltrating immune cells. N Engl J Med, 2004, 351: 2159–2169CrossRefGoogle Scholar
  6. 6.
    Fan J, Li R. Variable selection via nonconcave penalized likelihood and its oracle properties. J Amer Statist Assoc, 2001, 96: 1348–1360MathSciNetCrossRefMATHGoogle Scholar
  7. 7.
    Fan J, Li R. Variable selection for Cox's proportional hazards model and frailty model. Ann Statist, 2002, 30: 74–99MathSciNetCrossRefMATHGoogle Scholar
  8. 8.
    Fan J, Lv J. Sure independence screening for ultrahigh dimensional feature space. J R Stat Soc Ser B Stat Methodol, 2008, 70: 849–911MathSciNetCrossRefGoogle Scholar
  9. 9.
    Fan J, Samworth R, Wu Y. Ultrahigh dimensional feature selection: beyond the linear model. J Mach Learn Res, 2009, 10: 2013–2038MathSciNetMATHGoogle Scholar
  10. 10.
    Fan J, Song R. Sure independence screening in generalized linear models with NP-dimensionality. Ann Statist, 2010, 38: 3567–3604MathSciNetCrossRefMATHGoogle Scholar
  11. 11.
    Gorst-Rasmussen A, Scheike T. Independent screening for single-index hazard rate models with ultrahigh dimensional features. J R Stat Soc Ser B Stat Methodol, 2013, 75: 217–245MathSciNetCrossRefGoogle Scholar
  12. 12.
    He X, Wang L, Hong H G. Quantile-adaptive model-free variable screening for high-dimensional heterogeneous data. Ann Statist, 2013, 41: 342–369MathSciNetCrossRefMATHGoogle Scholar
  13. 13.
    Hoeffding W. Probability inequalities for sums of bounded random variables. J Amer Statist Assoc, 1963, 58: 13–30MathSciNetCrossRefMATHGoogle Scholar
  14. 14.
    Huang J, Sun T, Ying Z, et al. Oracle inequalities for the lasso in the Cox model. Ann Statist, 2013, 41: 1142–1165MathSciNetCrossRefMATHGoogle Scholar
  15. 15.
    Iglewicz B, Hoaglin D C. How to Detect and Handle Outliers. Milwaukee: American Society for Quality Control, 1993Google Scholar
  16. 16.
    Kosorok M R. Introduction to Empirical Processes and Semiparametric Inference. New York: Springer, 2006MATHGoogle Scholar
  17. 17.
    Li G, Peng H, Zhang J, et al. Robust rank correlation based screening. Ann Statist, 2012, 40: 1846–1877MathSciNetCrossRefMATHGoogle Scholar
  18. 18.
    Li R, Zhong W, Zhu L. Feature screening via distance correlation learning. J Amer Statist Assoc, 2012, 107: 1129–1139MathSciNetCrossRefMATHGoogle Scholar
  19. 19.
    Massart P. About the constants in Talagrand's concentration inequalities for empirical processes. Ann Probab, 2000, 28: 863–884MathSciNetCrossRefMATHGoogle Scholar
  20. 20.
    Song R, Lu W, Ma S. Censored rank independence screening for high-dimensional survival data. Biometrika, 2014, 104: 799–814MathSciNetCrossRefMATHGoogle Scholar
  21. 21.
    Tibshirani R. Regression shrinkage and selection via the lasso. J R Stat Soc Ser B Stat Methodol, 1996, 58: 267–288MathSciNetMATHGoogle Scholar
  22. 22.
    Tibshirani R. The lasso method for variable selection in the Cox model. Stat Med, 1997, 16: 385–395CrossRefGoogle Scholar
  23. 23.
    Tibshirani R. Univariate shrinkage in the Cox model for high dimensional data. Stat Appl Genet Mol Biol, 2009, 8: 1–18MathSciNetCrossRefMATHGoogle Scholar
  24. 24.
    Uno H, Cai T, Pencina M J, et al. On the C-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data. Stat Med, 2011, 30: 1105–1117MathSciNetGoogle Scholar
  25. 25.
    Wu Y, Yin G. Conditional quantile screening in ultrahigh-dimensional heterogeneous data. Biometrika, 2015, 102: 65–76MathSciNetCrossRefMATHGoogle Scholar
  26. 26.
    Yuan M, Lin Y. Model selection and estimation in regression with grouped variables. J R Stat Soc Ser B Stat Methodol, 2006, 68: 49–67MathSciNetCrossRefMATHGoogle Scholar
  27. 27.
    Zhang H H, Lu W. Adaptive Lasso for Cox's proportional hazards model. Biometrika, 2007, 94: 691–703MathSciNetCrossRefMATHGoogle Scholar
  28. 28.
    Zhao S D, Li Y. Principled sure independence screening for Cox models with ultra-high-dimensional covariates. J Multivariate Anal, 2012, 105: 397–411MathSciNetCrossRefMATHGoogle Scholar
  29. 29.
    Zhong W. Robust sure independence screening for ultrahigh dimensional non-normal data. Acta Math Sin Engl Ser, 2014, 30: 1885–1896MathSciNetCrossRefMATHGoogle Scholar
  30. 30.
    Zhou T, Zhu L. Model-free feature screening for ultrahigh dimensional censored regression. Stat Comput, 2017, 27: 947–961MathSciNetCrossRefMATHGoogle Scholar
  31. 31.
    Zhu L, Li L, Li R, et al. Model-free feature screening for ultrahigh-dimensional data. J Amer Statist Assoc, 2011, 106: 1464–1474MathSciNetCrossRefMATHGoogle Scholar
  32. 32.
    Zou H. The adaptive lasso and its oracle properties. J Amer Statist Assoc, 2006, 101: 1418–1429MathSciNetCrossRefMATHGoogle Scholar

Copyright information

© Science China Press and Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of StatisticsThe Chinese University of Hong KongHong KongChina
  2. 2.School of Statistics and Research Center of Applied StatisticsJiangxi University of Finance and EconomicsNanchangChina
  3. 3.The Princess Margaret Cancer CenterUniversity Health NetworkTorontoCanada

Personalised recommendations