Lifetime Data Analysis

, Volume 24, Issue 1, pp 45–71 | Cite as

Conditional screening for ultra-high dimensional covariates with survival outcomes

  • Hyokyoung G. Hong
  • Jian KangEmail author
  • Yi Li


Identifying important biomarkers that are predictive for cancer patients’ prognosis is key in gaining better insights into the biological influences on the disease and has become a critical component of precision medicine. The emergence of large-scale biomedical survival studies, which typically involve excessive number of biomarkers, has brought high demand in designing efficient screening tools for selecting predictive biomarkers. The vast amount of biomarkers defies any existing variable selection methods via regularization. The recently developed variable screening methods, though powerful in many practical setting, fail to incorporate prior information on the importance of each biomarker and are less powerful in detecting marginally weak while jointly important signals. We propose a new conditional screening method for survival outcome data by computing the marginal contribution of each biomarker given priorily known biological information. This is based on the premise that some biomarkers are known to be associated with disease outcomes a priori. Our method possesses sure screening properties and a vanishing false selection rate. The utility of the proposal is further confirmed with extensive simulation studies and analysis of a diffuse large B-cell lymphoma dataset. We are pleased to dedicate this work to Jack Kalbfleisch, who has made instrumental contributions to the development of modern methods of analyzing survival data.


Conditional screening Cox model Diffuse large B-cell lymphoma High-dimensional variable screening 



This research was partially supported by a grant from NSA (H98230-15-1-0260, Hong), an NIH grant (R01MH105561, Kang) and Chinese Natural Science Foundation (11528102, Li).


  1. Barut E, Fan J, Verhasselt A (2016) Conditional sure independence screening. J Am Stat Assoc 116:544–557MathSciNetGoogle Scholar
  2. Binder H, Schumacher M (2009) Incorporating pathway information into boosting estimation of high-dimensional risk prediction models. BMC Bioinform 10:18CrossRefGoogle Scholar
  3. Chow ML, Moler EJ, Mian IS (2001) Identifying marker genes in transcription profiling data using a mixture of feature relevance experts. Physiol Genomics 5:99–111CrossRefGoogle Scholar
  4. Deb K, Reddy AR (2003) Reliable classification of two-class cancer data using evolutionary algorithms. BioSystems 72:111–129CrossRefGoogle Scholar
  5. Fan J, Lv J (2008) Sure independence screening for ultrahigh dimensional feature space (with discussion). J R Stat Soc B 70:849–911MathSciNetCrossRefGoogle Scholar
  6. Fan J, Feng Y, Wu Y (2010) High-dimensional variable selection for Cox’s proportional hazards model. In: IMS collections borrowing strength: theory powering applications—A Festschrift for Lawrence D. Brown, vol 6, pp 70–86. Institute of Mathematical Statistics, BeachwoodGoogle Scholar
  7. Gui J, Li H (2005) Penalized Cox regression analysis in the high-dimensional and low-sample size settings, with applications to microarray gene expression data. Bioinformatics 21(13):3001–3008CrossRefGoogle Scholar
  8. Hong H, Wang L, He X (2016) A data-driven approach to conditional screening of high dimensional variables. Stat 5(1):200–212Google Scholar
  9. Jiang Y, He Y, Zhang H (2015) Variable selection with prior information for generalized linear models via the prior Lasso method. J Am Stat Assoc 111(513):355–376MathSciNetCrossRefGoogle Scholar
  10. Li H, Luan Y (2005) Boosting proportional hazards models using smoothing splines, with applications to high-dimensional microarray data. Bioinformatics 21(10):2403–2409CrossRefGoogle Scholar
  11. Lin DY, Wei LJ (1989) The robust inference for the Cox proportional hazards model. J Am Stat Assoc 84(408):1074–1078MathSciNetCrossRefzbMATHGoogle Scholar
  12. Liu XY, Liang Y, Xu ZB, Zhang H, Leung KS (2013) Adaptive \(l_{1/2}\) shooting regularization method for survival analysis using gene expression data. Sci World J 2013:475702Google Scholar
  13. Mikovits J, Ruscetti F, Zhu W, Bagni R, Dorjsuren D, Shoemaker R (2001) Potential cellular signatures of viral infections in human hematopoietic cells. Dis Markers 17(3):173–178CrossRefGoogle Scholar
  14. Rosenwald A, Wright G, Chan W, Connors J, Campo E et al (2002) The use of molecular profiling to predict survival after chemotherapy for diffuse large-b-cell lymphoma. N Engl J Med 346(25):1937–1947CrossRefGoogle Scholar
  15. Schifano ED, Strawderman RL, Wells MT (2010) Mm algorithms for minimizing nonsmoothly penalized objective functions. Electron J Stat 4:1258–1299MathSciNetCrossRefzbMATHGoogle Scholar
  16. Song R, Lu W, Ma S, Jeng XJ (2014) Censored rank independence screening for high-dimensional survival data. Biometrika 101(4):799–814MathSciNetCrossRefzbMATHGoogle Scholar
  17. Stewart AK, Schuh AC (2000) White cells 2: impact of understanding the molecular basis of haematological malignant disorders on clinical practice. Lancet 355(9213):1447–1453CrossRefGoogle Scholar
  18. Uno H, Cai T, Pencina MJ, D’Agostino RB, Wei LJ (2011) On the c-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data. Stat Med 30(10):1105–1117MathSciNetGoogle Scholar
  19. Van Der Vaart AW, Wellner JA (1996) Weak convergence. Springer, New YorkzbMATHGoogle Scholar
  20. Wang Z, Xu W, San Lucas F, Liu Y (2013) Incorporating prior knowledge into gene network study. Bioinformatics 29:2633–2640CrossRefGoogle Scholar
  21. Zhao SD, Li Y (2012) Principled sure independence screening for Cox models with ultra-high-dimensional covariates. J Multivar Anal 105(1):397–411MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  1. 1.Michigan State UniversityEast LansingUSA
  2. 2.University of MichiganAnn ArborUSA

Personalised recommendations