Dimension Reduction in the Linear Model for Right-Censored Data: Predicting the Change of HIV-I RNA Levels using Clinical and Protease Gene Mutation Data Article Received: 31 December 2003 Revised: 13 May 2004 Accepted: 21 June 2004 DOI:
Cite this article as: Huang, J. & Harrington, D. Lifetime Data Anal (2004) 10: 425. doi:10.1007/s10985-004-4776-8 Abstract
With rapid development in the technology of measuring disease characteristics at molecular or genetic level, it is possible to collect a large amount of data on various potential predictors of the clinical outcome of interest in medical research. It is often of interest to effectively use the information on a large number of predictors to make prediction of the interested outcome. Various statistical tools were developed to overcome the difficulties caused by the high-dimensionality of the covariate space in the setting of a linear regression model. This paper focuses on the situation, where the interested outcomes are subjected to right censoring. We implemented the extended partial least squares method along with other commonly used approaches for analyzing the high-dimensional covariates to the ACTG333 data set. Especially, we compared the prediction performance of different approaches with extensive cross-validation studies. The results show that the Buckley–James based partial least squares, stepwise subset model selection and principal components regression have similar promising predictive power and the partial least square method has several advantages in terms of interpretability and numerical computation.
Keywords failure time model cross-validation dimension reduction partial least squares References Buckley, J., James, I. 1979 Linear regression with censored data Biometrika 66 429 436 Google Scholar Collier, A., Coombs, R., Schoenfeld, D., Bassett, R., Timpone, J., Baruch, A., Jones, M., Facey, K., Whitacre, C., McAuliffe , V., Friedman, H., Merigan, T., Reichman, R., Hooper, C., Corey, L. 1996 Treatment of human immunodeficiency virus infection with saquinavir, zidovudine, and zalcitabine:AIDS Clinical Trial Group N. Engl. J. Med. 16 1011 1017 Google Scholar Condra, J., Schleif, W. , Blahy, O., Gabryelski, L. , Graham, D., Quintero, J., Rhodes, A., Robbins, H. , Roth, E., Shivaprakash, M., Titus, D. , Yang, T., Tepplert, H., Squires, K., Deutsch, P., Emini, E. 1995 In vivo emergence of HIV-I variants resistant to multiple protease inhibitors Nature 374 569 571 Google Scholar Condra, J., Holder, D., Schleif, W., et al. 1996 Genetic correlates of in vivo viral resistance to indinavir, a human immunodeficiency virus type I protease inhibitor J. Virol. 70 8270 8276 Google Scholar Cox, D. 1972 Regression models and life tables J. Roy. Stat. Soc., Ser. B 34 187 220 Google Scholar Draper , N, Smith, H. 1981Applied Regression Analysis Wiley New York Google Scholar Helland, I. 1988 On the structure of partial least squares regression Commun. Stat. Simu. Comp. 17 581 607 Google Scholar Hocking, R. 1976 The analysis and selection of variables in linear regression Biometrics 32 1 49 Google Scholar
H. Hotelling, ‘‘Analysis of a complex of statistical variables into principal components’’, J. Edu. Psychol. vol. 24 pp. 417–441, 489–520, 1933.
Hughes, J. 1999 Mixed effects models with censored data with applications to HIV RNA levels Biometrics 55 625 629 Google Scholar
J. Huang and D. Harrington, ‘‘Iterative partial least squares with right-censored data analysis: A comparision to other dimension reduction technique,’’
Biometrics, 2005. Jacobsen, H. , Hanggi, M., Ott, M., Duncan, I., Owen, S., Andreoni, M., Vella, S., Mous, J. 1996 In vivo resistance to a human immunodeficiency virus type I protease inhibitor:Mutations, kinetics, and frequencies J. Inf. Dis. 173 1379 1387 Google Scholar Jacqmin-Gadda, H. , Thiébaut, R. 2000 Analysis of left censored longitudinal data with application to viral load in HIV infection Biostatistics 1 355 368 Google Scholar Jin, Z., Lin, D. , Wei, L., Ying, Z. 2003 Rank-based inference for the accelerated failure time model Biometrika 90 341 353 Google Scholar Jolliffe, I. 1986Principal Component Analysis Springer-Verlag New York Google Scholar Laird, N. , Ware, J. 1982 Random effects models for longitudinal data Biometrics 38 963 974 PubMed Google Scholar Marschner, I., Betensky, R., Degruttola, V., Hammer, S., Kuritzkes, D. 1999 Clinical trials using HIV-1 RNA-based primary endpoints statistical analysis and potential biases J. Acq. Imm. Def. Syndr. Hum. Retr. 20 220 227 Google Scholar Mille, A. r 1990Subset Selection in Regression Chapman and Hall London Google Scholar Nguyen, D., Rocke, D. 2002 Partial least squares proportional hazard regression for application to DNA microarray survival data Bioinformatics 18 1625 1632 Google Scholar Para, M. , Glidden, D., Coombs, R. , Collier, A. , Condra, J. , Craig, C., Bassett, R., Leavitt, S., McAuliffe, V. , Roucher, C. 2000 Baseline human immunodeficiency virus type I phenotype, genotype, and RNA response after switching from long-term hard-capsule saquinavir to indinavir or soft-gel-capsule in AIDS clinical trials group protocol 333 J. Inf. Dis. 182 733 743 Google Scholar Park, P. , Tian , L. , Kohane, I. 2002 Linking gene expression data with patient survival times using partial least squares Bioinformatics 18 S120 S127 Google Scholar Stone, M , Brooks, R. 1990 and Cross-validation sequentially constructed prediction embracing ordinary least squares, partial least squares and principal components regression’’, J. Roy. Stat. Soc., Ser. B vol. 52 pp. 237–269 , ‘‘Continuum regression Google Scholar Tibshirani, R. 1996 Regression shrinkage and selection via the lasso J. Roy. Stat. Soc., Ser. B 58 267 288 Google Scholar Tsiatis, A. 1990 Estimation regression parameters using linear rank tests for censored data model with censored data Ann. Stat. 18 354 372 Google Scholar Vaillancourt, M., Irlbeck, R. , Smith, T. , Coombs, R., Swanstrom, R. 1999 The HIV type I protease inhibitor saquinavir can select for multiple mutations that confer increasing resistance’’, AIDS Res. Hum. Retr. 15 355 363 Google Scholar Wentzell , P. , Montot, L. 2003 Comparison of prinicpal components regression and partial least squares through generic simulations of complex mixtures Chem. Intell. Lab. Syst. 65 257 279 Google Scholar Wold, H 1966Wold, ‘‘Nonlinear estimation by iterative least squares procedures’’, Research papers in Statistics: Festschrift for J. Neyman John Wiley and Sons New York 411 444 Google Scholar
H. Wold, ‘‘Soft modeling by latent variables: The non-linear iterative partial least squares (NIPALS) approach,’’ Perspectives in Probability and Statistics, In Honor of M. S. Bartlett, Academic: New York, pp. 117--144, 1976.
S. Wold, H. Wold, W. Dunn, and A. Ruhe, ‘‘The collinearity problem in linear regression: The partial least squares (PLS) approach to generalized inverse,’’ SIAM J. Sci. Stat. Comput. vol. 5 pp. 735--743, 1984
© Kluwer Academic Publishers 2004