Abstract
Regression tends to give very unstable and unreliable regression weights when predictors are highly collinear. Several methods have been proposed to counter this problem. A subset of these do so by finding components that summarize the information in the predictors and the criterion variables. The present paper compares six such methods (two of which are almost completely new) to ordinary regression: Partial least Squares (PLS), Principal Component regression (PCR), Principle covariates regression, reduced rank regression, and two variants of what is called power regression. The comparison is mainly done by means of a series of simulation studies, in which data are constructed in various ways, with different degrees of collinearity and noise, and the methods are compared in terms of their capability of recovering the population regression weights, as well as their prediction quality for the complete population. It turns out that recovery of regression weights in situations with collinearity is often very poor by all methods, unless the regression weights lie in the subspace spanning the first few principal components of the predictor variables. In those cases, typically PLS and PCR give the best recoveries of regression weights. The picture is inconclusive, however, because, especially in the study with more real life like simulated data, PLS and PCR gave the poorest recoveries of regression weights in conditions with relatively low noise and collinearity. It seems that PLS and PCR are particularly indicated in cases with much collinearity, whereas in other cases it is better to use ordinary regression. As far as prediction is concerned: Prediction suffers far less from collinearity than recovery of the regression weights.
Similar content being viewed by others
References
Breiman L, Friedman JH (1997) Predicting multivariate responses in multiple linear regression (with discussion). J R Stat Soc Series B 59: 3–54
Bro R (1996) Multiway calibration. Multilinear PLS. J Chemometrics 10: 47–61
Brooks R, Stone M (1994) Joint continuum regression for multiple predictands. J Am Stat Assoc 89: 1374–1377
Coxe KL (1986) Principal components regression analysis. In: Encyclopedia of Statistical Sciences, vol. 7, Johnson NL, Kotz S (Eds), Wiley, New York, pp. 181–186
D’Ambra L, Lauro N (1982) Analisi in componenti principali in rapporto ad un sottospazio di riferimento. Rivista di Statistica Applicata 1 .
De Jong S (1993) SIMPLS: an alternative approach to patial least squares regression. Chemome Intell Lab Syst 18: 251–263
De Jong S, Kiers HAL (1992) Principal Covariates Regression: part I. Theory, Chemometrics and Intelligent Laboratory Syst 14: 155–164
De Jong S, Wise BM, Ricker NL (2001) Canonical partial least squares and continuum power regression. J Chemom 15: 85–100
Esposito Vinzi V, Lauro C, Morineau A, Tenenhaus M (2001) PLS and related methods. In: Proceedings of the PLS’01 International Symposium, CISIA-CERESTA, Montreuil, France
Frank IE (1987) Intermediate least squares regression method. Chemom Intell Lab Syst 1: 233–242
Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning. Springer, Berlin Heidelberg New York
Hoerl AE, Kennard RW (1970) Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12: 55–67
Izenman AJ (1975) Reduced-rank regression for the multivariate linear model. J Multivar Analy 5: 248–264
Kelly JJ, Barlow CH, Jinguji TM, Callis JB (1989) Prediction of gasoline octane numbers from near-infrared spectral features in the range 660–1215nm. Anal Chem 61: 313–320
Kiers HAL (1990) Maximizaiton as a tool for optimizing a class of matrix functions. Psychometrika 55: 417–428
Kresta JV, Marlin TE, MacGregor JF (1994) Development of inferential process models using PLS. Comput Chem Eng 18: 597–611
Manne R (1987) Analysis of partial-least-squares algorithms for multivariate calibration. Chemom Intell Lab Syst 2: 283–290
Martens H, Naes T (1989) Multivariate Calibration. Wiley, New York
Rao CR (1964) The use and interpretation of principal components analysis in applied research. Sankhya A 26: 329–358
Stone M, Brooks RJ (1990) Continuum regression: cross-validated sequentially constructred prediction embracing ordinary least squares, partial least squares and principal components regression (with discussion). J R Stat Soc, Series B 52: 237–269
Van den Wollenberg AL (1977) Redundancy analysis: an alternative for canonical correlation analysis. Psychometrika 49: 79–94
Van der Werf MJ (2005) Towards replacing closed with open target selection approaches. Trends Biotechnol 23: 11–16
Van Sprang ENM, Ramaker HJ, Boelens HFM, Westerhuis JA, Whiteman D, Baines D, Weaver I(2002) Batch process monitoring using on-line MIR spectroscopy. Analyst 127: 3–7
Wold S, Ruhe A, Wold H, Dunn III WJ (1984) The collinearity problem in linear regression. The partial least squares (PLS) approach to generalized inverses. SIAM J Sci Stat Comput 5: 735–744
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kiers, H.A.L., Smilde, A.K. A comparison of various methods for multivariate regression with highly collinear variables. Stat. Meth. & Appl. 16, 193–228 (2007). https://doi.org/10.1007/s10260-006-0025-5
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10260-006-0025-5