Partial least squares Cox regression for genome-wide data
Most methods for survival prediction from high-dimensional genomic data combine the Cox proportional hazards model with some technique of dimension reduction, such as partial least squares regression (PLS). Applying PLS to the Cox model is not entirely straightforward, and multiple approaches have been proposed. The method of Park et al. (Bioinformatics 18(Suppl. 1):S120–S127, 2002) uses a reformulation of the Cox likelihood to a Poisson type likelihood, thereby enabling estimation by iteratively reweighted partial least squares for generalized linear models. We propose a modification of the method of park et al. (2002) such that estimates of the baseline hazard and the gene effects are obtained in separate steps. The resulting method has several advantages over the method of park et al. (2002) and other existing Cox PLS approaches, as it allows for estimation of survival probabilities for new patients, enables a less memory-demanding estimation procedure, and allows for incorporation of lower-dimensional non-genomic variables like disease grade and tumor thickness. We also propose to combine our Cox PLS method with an initial gene selection step in which genes are ordered by their Cox score and only the highest-ranking k% of the genes are retained, obtaining a so-called supervised partial least squares regression method. In simulations, both the unsupervised and the supervised version outperform other Cox PLS methods.
KeywordsCox regression Dimension reduction Gene expression data High-dimensional data Partial least squares Survival prediction
Unable to display preview. Download preview PDF.
- Park PJ, Tian L and Kohane IS (2002). Linking gene expression data with patient survival times using partial least squares. Bioinformatics 18(Suppl. 1): S120–S127 Google Scholar
- van’t Veer LJ, Dai HY, van de Vijver MJ, He YDD, Hart AAM, Mao M, Peterse HL, van der Kooy K, Marton MJ, Witteveen AT, Schreiber GJ, Kerkhoven RM, Roberts C, Linsley PS, Bernards R, Friend SH (2002) Gene expression profiling predicts clinical outcome of breast cancer. Nature 415:530–536 CrossRefGoogle Scholar