Lifetime Data Analysis

, Volume 14, Issue 2, pp 179–195 | Cite as

Partial least squares Cox regression for genome-wide data

  • Ståle Nygård
  • Ørnulf Borgan
  • Ole Christian Lingjærde
  • Hege Leite Størvold


Most methods for survival prediction from high-dimensional genomic data combine the Cox proportional hazards model with some technique of dimension reduction, such as partial least squares regression (PLS). Applying PLS to the Cox model is not entirely straightforward, and multiple approaches have been proposed. The method of Park et al. (Bioinformatics 18(Suppl. 1):S120–S127, 2002) uses a reformulation of the Cox likelihood to a Poisson type likelihood, thereby enabling estimation by iteratively reweighted partial least squares for generalized linear models. We propose a modification of the method of park et al. (2002) such that estimates of the baseline hazard and the gene effects are obtained in separate steps. The resulting method has several advantages over the method of park et al. (2002) and other existing Cox PLS approaches, as it allows for estimation of survival probabilities for new patients, enables a less memory-demanding estimation procedure, and allows for incorporation of lower-dimensional non-genomic variables like disease grade and tumor thickness. We also propose to combine our Cox PLS method with an initial gene selection step in which genes are ordered by their Cox score and only the highest-ranking k% of the genes are retained, obtaining a so-called supervised partial least squares regression method. In simulations, both the unsupervised and the supervised version outperform other Cox PLS methods.


Cox regression Dimension reduction Gene expression data High-dimensional data Partial least squares Survival prediction 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Bair E and Tibshirani R (2004). Semi-supervised methods to predict patient survival from gene expression data. PLoS Biol 2: 511–522 CrossRefGoogle Scholar
  2. Bair E, Hastie T, Paul D and Tibshirani R (2006). Prediction by supervised principal components. J Am Stat Assoc 473: 119–137 CrossRefMathSciNetGoogle Scholar
  3. Bøvelstad HM, Nygård S, Størvold HL, Aldrin M, Borgan Ø, Frigessi A and Lingjærde OC (2007). Survival prediction from microarray data—a comparative study. Bioinformatics 23: 2080–2087 CrossRefGoogle Scholar
  4. Datta S, Le-Rademacher J and Datta S (2007). Predicting patient survival from microarray data by accelrated failure time modeling using partial least squares and lasso. Biometrics 63: 259–271 CrossRefMathSciNetGoogle Scholar
  5. Li H and Gui J (2004). Partial Cox regression analysis for high-dimensional gene expression data. Bioinformatics 20: i208–i215 CrossRefGoogle Scholar
  6. Hastie T, Tibshirani R and Friedman J (2001). The elements of statistical learning. Springer-Verlag, New York MATHGoogle Scholar
  7. Helland IS (1988). On the structure of partial least squares regression. Commun Stat Simulat Comput 17: 581–607 CrossRefMathSciNetMATHGoogle Scholar
  8. Johansen S (1983). An extension of Cox’s regression model. Int Stat Rev 51: 165–174 MathSciNetMATHCrossRefGoogle Scholar
  9. Martens H and Næs T (1989). Multivariate calibration. Wiley, New York MATHGoogle Scholar
  10. Marx BD (1996). Iteratively reweighted partial least squares estimation for generalized linear regression. Technometrics 38: 374–381 CrossRefMATHGoogle Scholar
  11. McCullagh P and Nelder JA (1989). Generalized linear models, 2nd edn. Chapman & Hall, London MATHGoogle Scholar
  12. Nguyen DV and Rocke DM (2002). Partial least squares proportional hazard regression for application to DNA microarray survival data. Bioinformatics 18: 1625–1632 CrossRefGoogle Scholar
  13. Park PJ, Tian L and Kohane IS (2002). Linking gene expression data with patient survival times using partial least squares. Bioinformatics 18(Suppl. 1): S120–S127 Google Scholar
  14. van’t Veer LJ, Dai HY, van de Vijver MJ, He YDD, Hart AAM, Mao M, Peterse HL, van der Kooy K, Marton MJ, Witteveen AT, Schreiber GJ, Kerkhoven RM, Roberts C, Linsley PS, Bernards R, Friend SH (2002) Gene expression profiling predicts clinical outcome of breast cancer. Nature 415:530–536 CrossRefGoogle Scholar
  15. van Houwelingen HC, Bruinsma T, Hart AAM, van’t Veer LJ, Wessels LFA (2006) Cross-validated Cox regression on microarray gene expression data. Stat Med 25:3201–3216 CrossRefMathSciNetGoogle Scholar
  16. Verweij PJM and van Houwelingen HC (1993). Cross-validation in survival analysis. Stat Med 12: 2305–2314 CrossRefGoogle Scholar
  17. Whitehead J (1980). Fitting Cox’s regression model to survival data using GLIM. Appl Stat 29: 268–275 CrossRefMathSciNetMATHGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  • Ståle Nygård
    • 1
  • Ørnulf Borgan
    • 1
  • Ole Christian Lingjærde
    • 2
  • Hege Leite Størvold
    • 2
  1. 1.Department of MathematicsUniversity of OsloOsloNorway
  2. 2.Department of InformaticsUniversity of OsloOsloNorway

Personalised recommendations