Skip to main content
Log in

A comparison of various methods for multivariate regression with highly collinear variables

  • Original Article
  • Published:
Statistical Methods and Applications Aims and scope Submit manuscript

Abstract

Regression tends to give very unstable and unreliable regression weights when predictors are highly collinear. Several methods have been proposed to counter this problem. A subset of these do so by finding components that summarize the information in the predictors and the criterion variables. The present paper compares six such methods (two of which are almost completely new) to ordinary regression: Partial least Squares (PLS), Principal Component regression (PCR), Principle covariates regression, reduced rank regression, and two variants of what is called power regression. The comparison is mainly done by means of a series of simulation studies, in which data are constructed in various ways, with different degrees of collinearity and noise, and the methods are compared in terms of their capability of recovering the population regression weights, as well as their prediction quality for the complete population. It turns out that recovery of regression weights in situations with collinearity is often very poor by all methods, unless the regression weights lie in the subspace spanning the first few principal components of the predictor variables. In those cases, typically PLS and PCR give the best recoveries of regression weights. The picture is inconclusive, however, because, especially in the study with more real life like simulated data, PLS and PCR gave the poorest recoveries of regression weights in conditions with relatively low noise and collinearity. It seems that PLS and PCR are particularly indicated in cases with much collinearity, whereas in other cases it is better to use ordinary regression. As far as prediction is concerned: Prediction suffers far less from collinearity than recovery of the regression weights.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Breiman L, Friedman JH (1997) Predicting multivariate responses in multiple linear regression (with discussion). J R Stat Soc Series B 59: 3–54

    Article  MATH  Google Scholar 

  • Bro R (1996) Multiway calibration. Multilinear PLS. J Chemometrics 10: 47–61

    Google Scholar 

  • Brooks R, Stone M (1994) Joint continuum regression for multiple predictands. J Am Stat Assoc 89: 1374–1377

    Article  MATH  Google Scholar 

  • Coxe KL (1986) Principal components regression analysis. In: Encyclopedia of Statistical Sciences, vol. 7, Johnson NL, Kotz S (Eds), Wiley, New York, pp. 181–186

  • D’Ambra L, Lauro N (1982) Analisi in componenti principali in rapporto ad un sottospazio di riferimento. Rivista di Statistica Applicata 1 .

  • De Jong S (1993) SIMPLS: an alternative approach to patial least squares regression. Chemome Intell Lab Syst 18: 251–263

    Article  Google Scholar 

  • De Jong S, Kiers HAL (1992) Principal Covariates Regression: part I. Theory, Chemometrics and Intelligent Laboratory Syst 14: 155–164

    Article  Google Scholar 

  • De Jong S, Wise BM, Ricker NL (2001) Canonical partial least squares and continuum power regression. J Chemom 15: 85–100

    Article  Google Scholar 

  • Esposito Vinzi V, Lauro C, Morineau A, Tenenhaus M (2001) PLS and related methods. In: Proceedings of the PLS’01 International Symposium, CISIA-CERESTA, Montreuil, France

  • Frank IE (1987) Intermediate least squares regression method. Chemom Intell Lab Syst 1: 233–242

    Article  Google Scholar 

  • Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning. Springer, Berlin Heidelberg New York

    MATH  Google Scholar 

  • Hoerl AE, Kennard RW (1970) Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12: 55–67

    Article  MATH  Google Scholar 

  • Izenman AJ (1975) Reduced-rank regression for the multivariate linear model. J Multivar Analy 5: 248–264

    Article  Google Scholar 

  • Kelly JJ, Barlow CH, Jinguji TM, Callis JB (1989) Prediction of gasoline octane numbers from near-infrared spectral features in the range 660–1215nm. Anal Chem 61: 313–320

    Article  Google Scholar 

  • Kiers HAL (1990) Maximizaiton as a tool for optimizing a class of matrix functions. Psychometrika 55: 417–428

    Article  MATH  Google Scholar 

  • Kresta JV, Marlin TE, MacGregor JF (1994) Development of inferential process models using PLS. Comput Chem Eng 18: 597–611

    Article  Google Scholar 

  • Manne R (1987) Analysis of partial-least-squares algorithms for multivariate calibration. Chemom Intell Lab Syst 2: 283–290

    Article  Google Scholar 

  • Martens H, Naes T (1989) Multivariate Calibration. Wiley, New York

    MATH  Google Scholar 

  • Rao CR (1964) The use and interpretation of principal components analysis in applied research. Sankhya A 26: 329–358

    MATH  Google Scholar 

  • Stone M, Brooks RJ (1990) Continuum regression: cross-validated sequentially constructred prediction embracing ordinary least squares, partial least squares and principal components regression (with discussion). J R Stat Soc, Series B 52: 237–269

    MATH  Google Scholar 

  • Van den Wollenberg AL (1977) Redundancy analysis: an alternative for canonical correlation analysis. Psychometrika 49: 79–94

    Google Scholar 

  • Van der Werf MJ (2005) Towards replacing closed with open target selection approaches. Trends Biotechnol 23: 11–16

    Article  Google Scholar 

  • Van Sprang ENM, Ramaker HJ, Boelens HFM, Westerhuis JA, Whiteman D, Baines D, Weaver I(2002) Batch process monitoring using on-line MIR spectroscopy. Analyst 127: 3–7

    Google Scholar 

  • Wold S, Ruhe A, Wold H, Dunn III WJ (1984) The collinearity problem in linear regression. The partial least squares (PLS) approach to generalized inverses. SIAM J Sci Stat Comput 5: 735–744

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Henk A. L. Kiers.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kiers, H.A.L., Smilde, A.K. A comparison of various methods for multivariate regression with highly collinear variables. Stat. Meth. & Appl. 16, 193–228 (2007). https://doi.org/10.1007/s10260-006-0025-5

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10260-006-0025-5

Keywords

Navigation