A comparison of various methods for multivariate regression with highly collinear variables
- 399 Downloads
Regression tends to give very unstable and unreliable regression weights when predictors are highly collinear. Several methods have been proposed to counter this problem. A subset of these do so by finding components that summarize the information in the predictors and the criterion variables. The present paper compares six such methods (two of which are almost completely new) to ordinary regression: Partial least Squares (PLS), Principal Component regression (PCR), Principle covariates regression, reduced rank regression, and two variants of what is called power regression. The comparison is mainly done by means of a series of simulation studies, in which data are constructed in various ways, with different degrees of collinearity and noise, and the methods are compared in terms of their capability of recovering the population regression weights, as well as their prediction quality for the complete population. It turns out that recovery of regression weights in situations with collinearity is often very poor by all methods, unless the regression weights lie in the subspace spanning the first few principal components of the predictor variables. In those cases, typically PLS and PCR give the best recoveries of regression weights. The picture is inconclusive, however, because, especially in the study with more real life like simulated data, PLS and PCR gave the poorest recoveries of regression weights in conditions with relatively low noise and collinearity. It seems that PLS and PCR are particularly indicated in cases with much collinearity, whereas in other cases it is better to use ordinary regression. As far as prediction is concerned: Prediction suffers far less from collinearity than recovery of the regression weights.
KeywordsMultivariate regression PLS Principal component regression Principal covariate regression Power regression Multicollinearity
Unable to display preview. Download preview PDF.
- Bro R (1996) Multiway calibration. Multilinear PLS. J Chemometrics 10: 47–61Google Scholar
- Coxe KL (1986) Principal components regression analysis. In: Encyclopedia of Statistical Sciences, vol. 7, Johnson NL, Kotz S (Eds), Wiley, New York, pp. 181–186Google Scholar
- D’Ambra L, Lauro N (1982) Analisi in componenti principali in rapporto ad un sottospazio di riferimento. Rivista di Statistica Applicata 1 .Google Scholar
- Esposito Vinzi V, Lauro C, Morineau A, Tenenhaus M (2001) PLS and related methods. In: Proceedings of the PLS’01 International Symposium, CISIA-CERESTA, Montreuil, FranceGoogle Scholar
- Van den Wollenberg AL (1977) Redundancy analysis: an alternative for canonical correlation analysis. Psychometrika 49: 79–94Google Scholar
- Van Sprang ENM, Ramaker HJ, Boelens HFM, Westerhuis JA, Whiteman D, Baines D, Weaver I(2002) Batch process monitoring using on-line MIR spectroscopy. Analyst 127: 3–7Google Scholar