Skip to main content

Variable Selection for Time-to-Event Data

  • Protocol
  • First Online:
  • 1679 Accesses

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2194))

Abstract

With the increasing availability of large scale biomedical and -omics data, researchers are offered with unprecedented opportunities to discover novel biomarkers for clinical outcomes. At the same time, they are also faced with great challenges to accurately identify important biomarkers from numerous candidates. Many novel statistical methodologies have been developed to tackle these challenges in the last couple of decades. When the clinical outcome is time-to-event data, special statistical methods are needed to analyze this type of data due to the presence of censoring. In this article, we review some of the most commonly used modern statistical methodologies for variable selection for time-to-event data. The reviewed methods are classified into three large categories: filter-test based method, penalized regression method, and machine learning method.

Both the authors “Ai Ni and Chi Song” contributed equally to this work.

This is a preview of subscription content, log in via an institution.

Buying options

Protocol
USD   49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Springer Nature is developing a new tool to find and evaluate Protocols. Learn more

References

  1. Ahn H, Loh WY (1994) Tree-structured proportional hazards regression modeling. Biometrics 50:471–485

    Article  CAS  PubMed  Google Scholar 

  2. Akaike H (1973) Information theory and an extension of the maximum likelihood principle. In: Petrov NN, Csaki F (eds) Second international symposium on information theory, pp 267–281

    Google Scholar 

  3. Bair E, Tibshirani R (2004) Semi-supervised methods to predict patient survival from gene expression data. PLoS Biol 2(4):e108

    Article  PubMed  PubMed Central  Google Scholar 

  4. Bair E, Hastie T, Paul D, Tibshirani R (2006) Prediction by supervised principal components. J Am Stat Assoc 101(473):119–137

    Article  CAS  Google Scholar 

  5. Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B (Methodol) 57(1):289–300

    Google Scholar 

  6. Bou-Hamad I, Larocque D, Ben-Ameur H, et al (2011) A review of survival trees. Stat Surv 5:44–71

    Article  Google Scholar 

  7. Ciampi A, Thiffault J, Nakache JP, Asselain B (1986) Stratification by stepwise regression, correspondence analysis and recursive partition: a comparison of three methods of analysis for survival data with covariates. Comput Stat Data Anal 4(3):185–204

    Article  Google Scholar 

  8. Ciampi A, Chang CH, Hogg S, McKinney S (1987) Recursive partition: a versatile method for exploratory-data analysis in biostatistics. In: Biostatistics. Springer, Berlin, pp 23–50

    Chapter  Google Scholar 

  9. Ciampi A, Hogg SA, McKinney S, Thiffault J (1988) RECPAM: a computer program for recursive partition and amalgamation for censored survival data and other situations frequently occurring in biostatistics. I. Methods and program features. Comput Methods Prog Biomed 26(3):239–256

    Article  CAS  Google Scholar 

  10. Cox DR (1972) Regression models and life-tables. J R Stat Soc (Ser B) 34(2):187–220

    Google Scholar 

  11. Craven P, Wahba G (1979) Smoothing noisy data with spline functions: estimating the correct degree of smoothing by the method of generalized cross-validation. Numer Math 31:377–403

    Article  Google Scholar 

  12. Dezeure R, Bühlmann P, Meier L, Meinshausen N (2015) High-dimensional inference: confidence intervals, p-values and R-software hdi. Stat Sci 30:533–558

    Article  Google Scholar 

  13. Efron B, Hastie T, Johnstone I, Tibshirani RJ (2004) Least angle regression. Ann Stat 32(2):407–451. http://www.jstor.org/stable/3448465

    Article  Google Scholar 

  14. Fan J, Li G, Li R (2005) An overview on variable selection for survival analysis. In: Contemporary multivariate analysis and design of experiments: in celebration of Professor Kai-Tai Fang’s 65th birthday. World Scientific, Singapore, pp 315–336

    Chapter  Google Scholar 

  15. Friedman J, Hastie T, Tibshirani R (2009) glmnet: Lasso and elastic-net regularized generalized linear models. R package version 1(4)

    Google Scholar 

  16. Goeman JJ (2010) L1 penalized estimation in the Cox proportional hazards model. Biom J 52(1):70–84

    PubMed  Google Scholar 

  17. Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference and prediction, 2nd edn. Springer. http://www-stat.stanford.edu/~tibs/ElemStatLearn/

    Book  Google Scholar 

  18. Hoerl AE, Kennard RW (1970) Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12(1):55–67

    Article  Google Scholar 

  19. Huang J, Ma S, Zhang CH (2008) Adaptive lasso for sparse high-dimensional regression models. Stat Sin 18:1603–1618

    Google Scholar 

  20. Ishwaran H, Kogalur UB, Blackstone EH, Lauer MS (2008) Random survival forests. Ann Appl Stat 2(3):841–860

    Article  Google Scholar 

  21. Ishwaran H, Kogalur UB, Gorodeski EZ, Minn AJ, Lauer MS (2010) High-dimensional variable selection for survival data. J Am Stat Assoc 105(489):205–217

    Article  CAS  Google Scholar 

  22. Klein JP, Moeschberger ML (2006) Survival analysis: techniques for censored and truncated data. Springer Science & Business Media, Berlin

    Google Scholar 

  23. Ni A, Cai J (2018) Tuning parameter selection in Cox proportional hazards model with a diverging number of parameters. Scand J Stat 45(3):557–570

    Article  Google Scholar 

  24. Park MY, Hastie T (2007) L1-regularization path algorithm for generalized linear models. J R Stat Soc Ser B (Stat Methodol) 69(4):659–677

    Article  Google Scholar 

  25. Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464

    Article  Google Scholar 

  26. Simon N, Friedman J, Hastie T, Tibshirani R (2011) Regularization paths for Cox’s proportional hazards model via coordinate descent. J Stat Softw 39(5):1

    Article  PubMed  PubMed Central  Google Scholar 

  27. Simon N, Friedman J, Hastie T, Tibshirani R (2013) A sparse-group lasso. J Comput Graph Stat 22(2):231–245

    Article  Google Scholar 

  28. Simpson EH (1951) The interpretation of interaction in contingency tables. J R Stat Soc Ser B (Methodol) 13(2):238–241

    Google Scholar 

  29. Therneau TM, Grambsch PM, Fleming TR (1990) Martingale-based residuals for survival models. Biometrika 77(1):147–160

    Article  Google Scholar 

  30. Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc (Ser B) 58:267–288

    Google Scholar 

  31. Tibshirani RJ (1997) The lasso method for variable selection in the Cox model. Stat Med 16(4):385–395

    Article  CAS  PubMed  Google Scholar 

  32. van Houwelingen HC, Bruinsma T, Hart AA, van’t Veer LJ, Wessels LF (2006) Cross-validated Cox regression on microarray gene expression data. Stat Med 25(18):3201–3216

    Article  PubMed  Google Scholar 

  33. Wang H, Li R, Tsai CL (2007) Tuning parameter selectors for the smoothly clipped absolute deviation method. Biometrika 94(3):553–568

    Article  PubMed  PubMed Central  Google Scholar 

  34. Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J R Stat Soc (Ser B) 68(1):49–67

    Article  Google Scholar 

  35. Zhang H (1995) Splitting criteria in survival trees. In: Statistical modelling. Springer, Berlin, pp 305–313

    Chapter  Google Scholar 

  36. Zhang HH, Lu W (2007) Adaptive lasso for Cox’s proportional hazards model. Biometrika 94(3):691–703

    Article  Google Scholar 

  37. Zhao SD, Li Y (2012) Principled sure independence screening for Cox models with ultra-high-dimensional covariates. J Multivar Anal 105(1):397–411

    Article  PubMed  PubMed Central  Google Scholar 

  38. Zou H (2006) The adaptive lasso and its oracle properties. J Am Stat Assoc 101(476):1418–1429

    Article  CAS  Google Scholar 

  39. Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc: Ser B (Stat Methodol) 67(2):301–320

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Ai Ni or Chi Song .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Ni, A., Song, C. (2021). Variable Selection for Time-to-Event Data. In: Markowitz, J. (eds) Translational Bioinformatics for Therapeutic Development. Methods in Molecular Biology, vol 2194. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-0849-4_5

Download citation

  • DOI: https://doi.org/10.1007/978-1-0716-0849-4_5

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-0716-0848-7

  • Online ISBN: 978-1-0716-0849-4

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics