Abstract
With the increasing availability of large scale biomedical and -omics data, researchers are offered with unprecedented opportunities to discover novel biomarkers for clinical outcomes. At the same time, they are also faced with great challenges to accurately identify important biomarkers from numerous candidates. Many novel statistical methodologies have been developed to tackle these challenges in the last couple of decades. When the clinical outcome is time-to-event data, special statistical methods are needed to analyze this type of data due to the presence of censoring. In this article, we review some of the most commonly used modern statistical methodologies for variable selection for time-to-event data. The reviewed methods are classified into three large categories: filter-test based method, penalized regression method, and machine learning method.
Both the authors “Ai Ni and Chi Song” contributed equally to this work.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Ahn H, Loh WY (1994) Tree-structured proportional hazards regression modeling. Biometrics 50:471–485
Akaike H (1973) Information theory and an extension of the maximum likelihood principle. In: Petrov NN, Csaki F (eds) Second international symposium on information theory, pp 267–281
Bair E, Tibshirani R (2004) Semi-supervised methods to predict patient survival from gene expression data. PLoS Biol 2(4):e108
Bair E, Hastie T, Paul D, Tibshirani R (2006) Prediction by supervised principal components. J Am Stat Assoc 101(473):119–137
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B (Methodol) 57(1):289–300
Bou-Hamad I, Larocque D, Ben-Ameur H, et al (2011) A review of survival trees. Stat Surv 5:44–71
Ciampi A, Thiffault J, Nakache JP, Asselain B (1986) Stratification by stepwise regression, correspondence analysis and recursive partition: a comparison of three methods of analysis for survival data with covariates. Comput Stat Data Anal 4(3):185–204
Ciampi A, Chang CH, Hogg S, McKinney S (1987) Recursive partition: a versatile method for exploratory-data analysis in biostatistics. In: Biostatistics. Springer, Berlin, pp 23–50
Ciampi A, Hogg SA, McKinney S, Thiffault J (1988) RECPAM: a computer program for recursive partition and amalgamation for censored survival data and other situations frequently occurring in biostatistics. I. Methods and program features. Comput Methods Prog Biomed 26(3):239–256
Cox DR (1972) Regression models and life-tables. J R Stat Soc (Ser B) 34(2):187–220
Craven P, Wahba G (1979) Smoothing noisy data with spline functions: estimating the correct degree of smoothing by the method of generalized cross-validation. Numer Math 31:377–403
Dezeure R, Bühlmann P, Meier L, Meinshausen N (2015) High-dimensional inference: confidence intervals, p-values and R-software hdi. Stat Sci 30:533–558
Efron B, Hastie T, Johnstone I, Tibshirani RJ (2004) Least angle regression. Ann Stat 32(2):407–451. http://www.jstor.org/stable/3448465
Fan J, Li G, Li R (2005) An overview on variable selection for survival analysis. In: Contemporary multivariate analysis and design of experiments: in celebration of Professor Kai-Tai Fang’s 65th birthday. World Scientific, Singapore, pp 315–336
Friedman J, Hastie T, Tibshirani R (2009) glmnet: Lasso and elastic-net regularized generalized linear models. R package version 1(4)
Goeman JJ (2010) L1 penalized estimation in the Cox proportional hazards model. Biom J 52(1):70–84
Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference and prediction, 2nd edn. Springer. http://www-stat.stanford.edu/~tibs/ElemStatLearn/
Hoerl AE, Kennard RW (1970) Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12(1):55–67
Huang J, Ma S, Zhang CH (2008) Adaptive lasso for sparse high-dimensional regression models. Stat Sin 18:1603–1618
Ishwaran H, Kogalur UB, Blackstone EH, Lauer MS (2008) Random survival forests. Ann Appl Stat 2(3):841–860
Ishwaran H, Kogalur UB, Gorodeski EZ, Minn AJ, Lauer MS (2010) High-dimensional variable selection for survival data. J Am Stat Assoc 105(489):205–217
Klein JP, Moeschberger ML (2006) Survival analysis: techniques for censored and truncated data. Springer Science & Business Media, Berlin
Ni A, Cai J (2018) Tuning parameter selection in Cox proportional hazards model with a diverging number of parameters. Scand J Stat 45(3):557–570
Park MY, Hastie T (2007) L1-regularization path algorithm for generalized linear models. J R Stat Soc Ser B (Stat Methodol) 69(4):659–677
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464
Simon N, Friedman J, Hastie T, Tibshirani R (2011) Regularization paths for Cox’s proportional hazards model via coordinate descent. J Stat Softw 39(5):1
Simon N, Friedman J, Hastie T, Tibshirani R (2013) A sparse-group lasso. J Comput Graph Stat 22(2):231–245
Simpson EH (1951) The interpretation of interaction in contingency tables. J R Stat Soc Ser B (Methodol) 13(2):238–241
Therneau TM, Grambsch PM, Fleming TR (1990) Martingale-based residuals for survival models. Biometrika 77(1):147–160
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc (Ser B) 58:267–288
Tibshirani RJ (1997) The lasso method for variable selection in the Cox model. Stat Med 16(4):385–395
van Houwelingen HC, Bruinsma T, Hart AA, van’t Veer LJ, Wessels LF (2006) Cross-validated Cox regression on microarray gene expression data. Stat Med 25(18):3201–3216
Wang H, Li R, Tsai CL (2007) Tuning parameter selectors for the smoothly clipped absolute deviation method. Biometrika 94(3):553–568
Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J R Stat Soc (Ser B) 68(1):49–67
Zhang H (1995) Splitting criteria in survival trees. In: Statistical modelling. Springer, Berlin, pp 305–313
Zhang HH, Lu W (2007) Adaptive lasso for Cox’s proportional hazards model. Biometrika 94(3):691–703
Zhao SD, Li Y (2012) Principled sure independence screening for Cox models with ultra-high-dimensional covariates. J Multivar Anal 105(1):397–411
Zou H (2006) The adaptive lasso and its oracle properties. J Am Stat Assoc 101(476):1418–1429
Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc: Ser B (Stat Methodol) 67(2):301–320
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Ni, A., Song, C. (2021). Variable Selection for Time-to-Event Data. In: Markowitz, J. (eds) Translational Bioinformatics for Therapeutic Development. Methods in Molecular Biology, vol 2194. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-0849-4_5
Download citation
DOI: https://doi.org/10.1007/978-1-0716-0849-4_5
Published:
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-0848-7
Online ISBN: 978-1-0716-0849-4
eBook Packages: Springer Protocols