Abstract
With rapid development in the technology of measuring disease characteristics at molecular or genetic level, it is possible to collect a large amount of data on various potential predictors of the clinical outcome of interest in medical research. It is often of interest to effectively use the information on a large number of predictors to make prediction of the interested outcome. Various statistical tools were developed to overcome the difficulties caused by the high-dimensionality of the covariate space in the setting of a linear regression model. This paper focuses on the situation, where the interested outcomes are subjected to right censoring. We implemented the extended partial least squares method along with other commonly used approaches for analyzing the high-dimensional covariates to the ACTG333 data set. Especially, we compared the prediction performance of different approaches with extensive cross-validation studies. The results show that the Buckley–James based partial least squares, stepwise subset model selection and principal components regression have similar promising predictive power and the partial least square method has several advantages in terms of interpretability and numerical computation.
Similar content being viewed by others
References
J. Buckley I. James (1979) ArticleTitleLinear regression with censored data Biometrika 66 429–436
A. Collier R. Coombs D. Schoenfeld R. Bassett J. Timpone A. Baruch M. Jones K. Facey C. Whitacre V. McAuliffe H. Friedman T. Merigan R. Reichman C. Hooper L. Corey (1996) ArticleTitleTreatment of human immunodeficiency virus infection with saquinavir, zidovudine, and zalcitabine:AIDS Clinical Trial Group N. Engl. J. Med. 16 1011–1017
J. Condra W. Schleif O. Blahy L. Gabryelski D. Graham J. Quintero A. Rhodes H. Robbins E. Roth M. Shivaprakash D. Titus T. Yang H. Tepplert K. Squires P. Deutsch E. Emini (1995) ArticleTitleIn vivo emergence of HIV-I variants resistant to multiple protease inhibitors Nature 374 569–571
J. Condra D. Holder W. Schleif et al. (1996) ArticleTitleGenetic correlates of in vivo viral resistance to indinavir, a human immunodeficiency virus type I protease inhibitor J. Virol. 70 8270–8276
D. Cox (1972) ArticleTitleRegression models and life tables J. Roy. Stat. Soc., Ser. B 34 187–220
N Draper H. Smith (1981) Applied Regression Analysis Wiley New York
I. Helland (1988) ArticleTitleOn the structure of partial least squares regression Commun. Stat. Simu. Comp. 17 581–607
R. Hocking (1976) ArticleTitleThe analysis and selection of variables in linear regression Biometrics 32 1–49
H. Hotelling, ‘‘Analysis of a complex of statistical variables into principal components’’, J. Edu. Psychol. vol. 24 pp. 417–441, 489–520, 1933.
J. Hughes (1999) ArticleTitleMixed effects models with censored data with applications to HIV RNA levels Biometrics 55 625–629
J. Huang and D. Harrington, ‘‘Iterative partial least squares with right-censored data analysis: A comparision to other dimension reduction technique,’’ Biometrics, 2005.
H. Jacobsen M. Hanggi M. Ott I. Duncan S. Owen M. Andreoni S. Vella J. Mous (1996) ArticleTitleIn vivo resistance to a human immunodeficiency virus type I protease inhibitor:Mutations, kinetics, and frequencies J. Inf. Dis. 173 1379–1387
H. Jacqmin-Gadda R. Thiébaut (2000) ArticleTitleAnalysis of left censored longitudinal data with application to viral load in HIV infection Biostatistics 1 355–368
Z. Jin D. Lin L. Wei Z. Ying (2003) ArticleTitleRank-based inference for the accelerated failure time model Biometrika 90 341–353
I. Jolliffe (1986) Principal Component Analysis Springer-Verlag New York
N. Laird J. Ware (1982) ArticleTitleRandom effects models for longitudinal data Biometrics 38 963–974 Occurrence Handle1:STN:280:BiyC2sbhs1Q%3D Occurrence Handle7168798
I. Marschner R. Betensky V. Degruttola S. Hammer D. Kuritzkes (1999) ArticleTitleClinical trials using HIV-1 RNA-based primary endpoints statistical analysis and potential biases J. Acq. Imm. Def. Syndr. Hum. Retr. 20 220–227
A. r Mille (1990) Subset Selection in Regression Chapman and Hall London
D. Nguyen D. Rocke (2002) ArticleTitlePartial least squares proportional hazard regression for application to DNA microarray survival data Bioinformatics 18 1625–1632
M. Para D. Glidden R. Coombs A. Collier J. Condra C. Craig R. Bassett S. Leavitt V. McAuliffe C. Roucher (2000) ArticleTitleBaseline human immunodeficiency virus type I phenotype, genotype, and RNA response after switching from long-term hard-capsule saquinavir to indinavir or soft-gel-capsule in AIDS clinical trials group protocol 333 J. Inf. Dis. 182 733–743
P. Park L. Tian I. Kohane (2002) ArticleTitleLinking gene expression data with patient survival times using partial least squares Bioinformatics 18 S120–S127
M Stone R. Brooks (1990) and Cross-validation sequentially constructed prediction embracing ordinary least squares, partial least squares and principal components regression’’, J. Roy. Stat. Soc., Ser. B vol. 52 pp. 237–269 , ‘‘Continuum regression
R. Tibshirani (1996) ArticleTitleRegression shrinkage and selection via the lasso J. Roy. Stat. Soc., Ser. B 58 267–288
A. Tsiatis (1990) ArticleTitleEstimation regression parameters using linear rank tests for censored data model with censored data Ann. Stat. 18 354–372
M. Vaillancourt R. Irlbeck T. Smith R. Coombs R. Swanstrom (1999) ArticleTitleThe HIV type I protease inhibitor saquinavir can select for multiple mutations that confer increasing resistance’’, AIDS Res. Hum. Retr. 15 355–363
P. Wentzell L. Montot (2003) ArticleTitleComparison of prinicpal components regression and partial least squares through generic simulations of complex mixtures Chem. Intell. Lab. Syst. 65 257–279
H Wold (1966) Wold, ‘‘Nonlinear estimation by iterative least squares procedures’’, Research papers in Statistics: Festschrift for J. Neyman John Wiley and Sons New York 411–444
H. Wold, ‘‘Soft modeling by latent variables: The non-linear iterative partial least squares (NIPALS) approach,’’ Perspectives in Probability and Statistics, In Honor of M. S. Bartlett, Academic: New York, pp. 117--144, 1976.
S. Wold, H. Wold, W. Dunn, and A. Ruhe, ‘‘The collinearity problem in linear regression: The partial least squares (PLS) approach to generalized inverse,’’ SIAM J. Sci. Stat. Comput. vol. 5 pp. 735--743, 1984
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Huang, J., Harrington, D. Dimension Reduction in the Linear Model for Right-Censored Data: Predicting the Change of HIV-I RNA Levels using Clinical and Protease Gene Mutation Data. Lifetime Data Anal 10, 425–443 (2004). https://doi.org/10.1007/s10985-004-4776-8
Received:
Revised:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/s10985-004-4776-8