Statistical Methods and Applications

, Volume 16, Issue 2, pp 193–228 | Cite as

A comparison of various methods for multivariate regression with highly collinear variables

Original Article

Abstract

Regression tends to give very unstable and unreliable regression weights when predictors are highly collinear. Several methods have been proposed to counter this problem. A subset of these do so by finding components that summarize the information in the predictors and the criterion variables. The present paper compares six such methods (two of which are almost completely new) to ordinary regression: Partial least Squares (PLS), Principal Component regression (PCR), Principle covariates regression, reduced rank regression, and two variants of what is called power regression. The comparison is mainly done by means of a series of simulation studies, in which data are constructed in various ways, with different degrees of collinearity and noise, and the methods are compared in terms of their capability of recovering the population regression weights, as well as their prediction quality for the complete population. It turns out that recovery of regression weights in situations with collinearity is often very poor by all methods, unless the regression weights lie in the subspace spanning the first few principal components of the predictor variables. In those cases, typically PLS and PCR give the best recoveries of regression weights. The picture is inconclusive, however, because, especially in the study with more real life like simulated data, PLS and PCR gave the poorest recoveries of regression weights in conditions with relatively low noise and collinearity. It seems that PLS and PCR are particularly indicated in cases with much collinearity, whereas in other cases it is better to use ordinary regression. As far as prediction is concerned: Prediction suffers far less from collinearity than recovery of the regression weights.

Keywords

Multivariate regression PLS Principal component regression Principal covariate regression Power regression Multicollinearity 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Breiman L, Friedman JH (1997) Predicting multivariate responses in multiple linear regression (with discussion). J R Stat Soc Series B 59: 3–54MATHCrossRefGoogle Scholar
  2. Bro R (1996) Multiway calibration. Multilinear PLS. J Chemometrics 10: 47–61Google Scholar
  3. Brooks R, Stone M (1994) Joint continuum regression for multiple predictands. J Am Stat Assoc 89: 1374–1377MATHCrossRefGoogle Scholar
  4. Coxe KL (1986) Principal components regression analysis. In: Encyclopedia of Statistical Sciences, vol. 7, Johnson NL, Kotz S (Eds), Wiley, New York, pp. 181–186Google Scholar
  5. D’Ambra L, Lauro N (1982) Analisi in componenti principali in rapporto ad un sottospazio di riferimento. Rivista di Statistica Applicata 1 .Google Scholar
  6. De Jong S (1993) SIMPLS: an alternative approach to patial least squares regression. Chemome Intell Lab Syst 18: 251–263CrossRefGoogle Scholar
  7. De Jong S, Kiers HAL (1992) Principal Covariates Regression: part I. Theory, Chemometrics and Intelligent Laboratory Syst 14: 155–164CrossRefGoogle Scholar
  8. De Jong S, Wise BM, Ricker NL (2001) Canonical partial least squares and continuum power regression. J Chemom 15: 85–100CrossRefGoogle Scholar
  9. Esposito Vinzi V, Lauro C, Morineau A, Tenenhaus M (2001) PLS and related methods. In: Proceedings of the PLS’01 International Symposium, CISIA-CERESTA, Montreuil, FranceGoogle Scholar
  10. Frank IE (1987) Intermediate least squares regression method. Chemom Intell Lab Syst 1: 233–242CrossRefGoogle Scholar
  11. Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning. Springer, Berlin Heidelberg New YorkMATHGoogle Scholar
  12. Hoerl AE, Kennard RW (1970) Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12: 55–67MATHCrossRefGoogle Scholar
  13. Izenman AJ (1975) Reduced-rank regression for the multivariate linear model. J Multivar Analy 5: 248–264CrossRefGoogle Scholar
  14. Kelly JJ, Barlow CH, Jinguji TM, Callis JB (1989) Prediction of gasoline octane numbers from near-infrared spectral features in the range 660–1215nm. Anal Chem 61: 313–320CrossRefGoogle Scholar
  15. Kiers HAL (1990) Maximizaiton as a tool for optimizing a class of matrix functions. Psychometrika 55: 417–428MATHCrossRefGoogle Scholar
  16. Kresta JV, Marlin TE, MacGregor JF (1994) Development of inferential process models using PLS. Comput Chem Eng 18: 597–611CrossRefGoogle Scholar
  17. Manne R (1987) Analysis of partial-least-squares algorithms for multivariate calibration. Chemom Intell Lab Syst 2: 283–290CrossRefGoogle Scholar
  18. Martens H, Naes T (1989) Multivariate Calibration. Wiley, New YorkMATHGoogle Scholar
  19. Rao CR (1964) The use and interpretation of principal components analysis in applied research. Sankhya A 26: 329–358MATHGoogle Scholar
  20. Stone M, Brooks RJ (1990) Continuum regression: cross-validated sequentially constructred prediction embracing ordinary least squares, partial least squares and principal components regression (with discussion). J R Stat Soc, Series B 52: 237–269MATHGoogle Scholar
  21. Van den Wollenberg AL (1977) Redundancy analysis: an alternative for canonical correlation analysis. Psychometrika 49: 79–94Google Scholar
  22. Van der Werf MJ (2005) Towards replacing closed with open target selection approaches. Trends Biotechnol 23: 11–16CrossRefGoogle Scholar
  23. Van Sprang ENM, Ramaker HJ, Boelens HFM, Westerhuis JA, Whiteman D, Baines D, Weaver I(2002) Batch process monitoring using on-line MIR spectroscopy. Analyst 127: 3–7Google Scholar
  24. Wold S, Ruhe A, Wold H, Dunn III WJ (1984) The collinearity problem in linear regression. The partial least squares (PLS) approach to generalized inverses. SIAM J Sci Stat Comput 5: 735–744MATHGoogle Scholar

Copyright information

© Springer-Verlag 2006

Authors and Affiliations

  1. 1.Heymans InstituteUniversity of GroningenGroningenThe Netherlands
  2. 2.Swammerdam Institute for Life SciencesUniversity of AmsterdamAmsterdamThe Netherlands

Personalised recommendations