In high throughput genomic studies, an important goal is to identify a small number of genomic markers that are associated with development and progression of diseases. A representative example is microarray prognostic studies, where the goal is to identify genes whose expressions are associated with disease free or overall survival. Because of the high dimensionality of gene expression data, standard survival analysis techniques cannot be directly applied. In addition, among the thousands of genes surveyed, only a subset are disease-associated. Gene selection is needed along with estimation. In this article, we model the relationship between gene expressions and survival using the accelerated failure time (AFT) models. We use the bridge penalization for regularized estimation and gene selection. An efficient iterative computational algorithm is proposed. Tuning parameters are selected using V-fold cross validation. We use a resampling method to evaluate the prediction performance of bridge estimator and the relative stability of identified genes. We show that the proposed bridge estimator is selection consistent under appropriate conditions. Analysis of two lymphoma prognostic studies suggests that the bridge estimator can identify a small number of genes and can have better prediction performance than the Lasso.
Bridge penalization Censored data High dimensional data Selection consistency Stability Sparse model
This is a preview of subscription content, log in to check access.
Frank IE, Friedman JH (1993) A statistical view of some chemometrics regression tools (with discussion). Technometrics 35: 109–148zbMATHCrossRefGoogle Scholar
Fu WJ (1998) Penalized regressions: the bridge versus the Lasso. J Comput Graph Stat 7: 397–416CrossRefGoogle Scholar
Gui J, Li H (2005) Penalized Cox regression analysis in the high-dimensional and low-sample size settings, with applications to microarray gene expression data. Bioinformatics 21: 3001–3008CrossRefGoogle Scholar
Ma S, Huang J (2007) Additive risk survival model with microarray data. BMC Bioinform 8: 192CrossRefGoogle Scholar
Rosenwald A, Wright G, Chan WC, Conners JM et al (2002) The use of molecular profiling to predict survival after chemotherapy for diffuse large B cell lymphoma. New Engl J Med 346: 1937–1947CrossRefGoogle Scholar
Rosenwald A, Wright G, Wiestner A, Chan WC et al (2003) The proliferation gene expression signature is a quantitative integrator of oncogenic events that predicts survival in mantle cell lymphoma. Cancer Cell 3: 185–197CrossRefGoogle Scholar