Advertisement

Feature screening for ultrahigh-dimensional survival data when failure indicators are missing at random

  • Jianglin FangEmail author
Regular Article

Abstract

In modern statistical applications, the dimension of covariates can be much larger than the sample size, and extensive research has been done on screening methods which can effectively reduce the dimensionality. However, the existing feature screening procedure can not be used to handle the ultrahigh-dimensional survival data problems when failure indicators are missing at random. This motivates us to develop a feature screening procedure to handle this case. In this paper, we propose a feature screening procedure by sieved nonparametric maximum likelihood technique for ultrahigh-dimensional survival data with failure indicators missing at random. The proposed method has several desirable advantages. First, it does not rely on any model assumption and works well for nonlinear survival regression models. Second, it can be used to handle the incomplete survival data with failure indicators missing at random. Third, the proposed method is invariant under the monotone transformation of the response and satisfies the sure screening property. Simulation studies are conducted to examine the performance of our approach, and a real data example is also presented for illustration.

Keywords

Ultrahigh-dimensional data Censored data Missing data Feature screening Active variable set 

Notes

Acknowledgements

Fang’s research is supported by Project supported by Provincial Natural Science Foundation of Hunan (Grant No. 2018JJ2078) and Scientific Research Fund of Hunan Provincial Education Department (Grant No. 17C0392).

References

  1. Bitouzé D, Laurent B, Massart P (1999) A Dvoretzky-Kiefer-Wolfowitz type inequality for the Kaplan-Meier estimator. Annals de I’Institut Henri Poincare B 35:735–763MathSciNetCrossRefzbMATHGoogle Scholar
  2. Candes E, Tao T (2007) The Dantzig selector: statistical estimation when \(p\) is much larger than \(n\). Ann Stat 35:2313–2351MathSciNetCrossRefzbMATHGoogle Scholar
  3. Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96:1348–1360MathSciNetCrossRefzbMATHGoogle Scholar
  4. Fan J, Lv J (2008) Sure independence screening for ultrahigh dimensional feature space. J R Stat Soc Ser B 35:2313–2351zbMATHGoogle Scholar
  5. Fan J, Song R (2010) Sure independence screening in generalized linear models with NP-dimensionality. J Am Stat Assoc 38:3567–3604MathSciNetzbMATHGoogle Scholar
  6. Fan J, Feng Y, Wu Y (2010) High-dimensional variable selection for Cox’s proportional hazards model. Statistics 2:70–86MathSciNetGoogle Scholar
  7. Gill R (1981) Testing with replacement and the product limit estimator. Ann Stat 9:853–860MathSciNetCrossRefzbMATHGoogle Scholar
  8. Gill R (1983) Large sample behaviour of the product-limit estimator on the whole line. Ann Stat 11:49–58MathSciNetCrossRefzbMATHGoogle Scholar
  9. González S, Rueda M, Arcos A (2008) An improved estimator to analyse missing data. Stat Pap 49:791–792MathSciNetCrossRefzbMATHGoogle Scholar
  10. He X, Wang L, Hong H (2013) Quantile-adaptive model-free variable screening for high-dimensional heterogeneous data. Ann Stat 41:342–369MathSciNetCrossRefzbMATHGoogle Scholar
  11. Li R, Zhong W, Zhu L (2012) Feature screening via distance correlation learning. J Am Stat Assoc 107:1129–1140MathSciNetCrossRefzbMATHGoogle Scholar
  12. Li G, Peng H, Zhang J, Zhu L (2012) Robust rank correlation based screening. Ann Stat 40:846–877MathSciNetzbMATHGoogle Scholar
  13. Lin W, Lv J (2013) High-dimensional sparse additive hazards regression. J Am Stat Assoc 108:247–264MathSciNetCrossRefzbMATHGoogle Scholar
  14. Little R, Rubin D (2002) Statistical analysis with missing data. Wiley, HobokenCrossRefzbMATHGoogle Scholar
  15. Qin J, Shao J, Zhang B (2008) ANOVA for longitudinal data with missing values. J Am Stat Assoc 103:797–810CrossRefGoogle Scholar
  16. Rosenwald A, Wright G, Wiestner A, Chan W et al (2003) The proliferation gene expression signature is a quantitative integrator of oncogenic events that predicts survival in mantle cell lymphoma. Cancer Cell 3:185–197CrossRefGoogle Scholar
  17. Shen Y, Liang H (2018) Quantile regression and its empirical likelihood with missing response at random. Stat Pap 59:685–707MathSciNetCrossRefzbMATHGoogle Scholar
  18. Song R, Lu W, Ma S, Jeng X (2014) Censored rank independence screening for high-dimensional survival data. Biometrika 101:799–814MathSciNetCrossRefzbMATHGoogle Scholar
  19. Tibshirani R (1997) Regression shrinkage and selection via the Lasso. J R Stat Soc Ser B 58:267–288MathSciNetzbMATHGoogle Scholar
  20. van der Laan M (1996) Efficient estimation in the bivariate censoring model and repairing NPMLE. Ann Stat 24:596–627MathSciNetCrossRefzbMATHGoogle Scholar
  21. van der Laan M, Mckeague I (1998) Efficient estimation from right-censored data when failure indicators are missing at random. Ann Stat 26:164–182MathSciNetCrossRefzbMATHGoogle Scholar
  22. Wang J (1987) A note on the uniform consistency of the Kaplan-Meier estimator. Ann Stat 15:1313–1316MathSciNetCrossRefzbMATHGoogle Scholar
  23. Wang Q, Rao J (2002) Empirical likelihood-based inference under imputation for missing response data. Ann Stat 30:894–924MathSciNetzbMATHGoogle Scholar
  24. Wu Y, Yin G (2015) Conditional quantile screening in ultrahigh-dimensional heterogeneous data. Biometrika 102:65–76MathSciNetCrossRefzbMATHGoogle Scholar
  25. Zhang H, Lu W (2007) Adaptive Lasso for Cox’s proportional hazards model. Biometrika 94:691–703MathSciNetCrossRefzbMATHGoogle Scholar
  26. Zhao S, Li Y (2012) Principled sure independence screening for Cox models with ultra-high-dimensional covariates. J Multivar Anal 105:397–411MathSciNetCrossRefzbMATHGoogle Scholar
  27. Zhang J, Liu Y, Wu Y (2017) Correlation rank screening for ultrahigh-dimensional survival data. Comput Stat Data Anal 108:121–132MathSciNetCrossRefzbMATHGoogle Scholar
  28. Zhu L, Li L, Li R, Zhu L (2011) Model-free feature screening for ultrahigh-dimensional data. J Am Stat Assoc 106:1464–1475MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.College of ScienceHunan Institute of EngineeringXiangtanChina

Personalised recommendations