Skip to main content
Log in

Group least squares regression for linear models with strongly correlated predictor variables

  • Published:
Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

A Correction to this article was published on 11 January 2023

This article has been updated

Abstract

Traditionally, the main focus of the least squares regression is to study the effects of individual predictor variables, but strongly correlated variables generate multicollinearity which makes it difficult to study their effects. To resolve the multicollinearity issue without abandoning the least squares regression, for situations where predictor variables are in groups with strong within-group correlations but weak between-group correlations, we propose to study the effects of the groups with a group approach to the least squares regression. Using an all positive correlations arrangement of the strongly correlated variables, we first characterize group effects that are meaningful and can be accurately estimated. We then discuss the group approach to the least squares regression through a simulation study and demonstrate that it is an effective method for handling multicollinearity. We also address a common misconception about prediction accuracy of the least squares estimated model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Similar content being viewed by others

Change history

Notes

  1. \({\mathbf {R}}_{12}\rightarrow {\mathbf {0}}\) denotes element-wise convergence of \({\mathbf {R}}_{12}\) to zero. It implies \({\mathbf {R}}_{12}{\mathbf {R}}^{-1}_{22}{\mathbf {R}}_{21} \rightarrow {\mathbf {0}}\) under general conditions such as \(\Vert {\mathbf {R}}^{-1}_{22}\Vert _{max}\) is bounded or \((\Vert {\mathbf {R}}_{12}\Vert _{max})^2(\Vert {\mathbf {R}}^{-1}_{22}\Vert _{max})=o(1)\). This observation will be used in the proof of (ii) which requires \({\mathbf {R}}_{12}{\mathbf {R}}^{-1}_{22}{\mathbf {R}}_{21} \rightarrow {\mathbf {0}}\).

  2. For unstandardised variables and/or variables not in an APC arrangement, this line is difficult to characterize. But for standardized variables in APC arrangement, this line is easy to describe; e.g. for the q variable in \({\mathbf {X}}_1'\) of (8), this line is \(x_1'=x_2'=\dots =x_q'\).

References

  • Belsley, D. A., Kuh, E., Welsch, R. E. (2004). Regression diagnostics: Identifying influential data and sources of collinearity. New York: Wiley & Sons.

    MATH  Google Scholar 

  • Conniffe, D., Stone, J. (1973). A critical view of ridge regression. American Statistician, 22, 181–187.

    Article  Google Scholar 

  • Draper, N. R., Smith, H. (1998). Applied regression analysis (3rd ed.). New York: Wiley.

    Book  MATH  Google Scholar 

  • Draper, N. R., Van Nostrand, R. C. (1979). Ridge regression and James-Stein estimators: Review and comments. Technometrics, 21, 451–466.

    Article  MathSciNet  MATH  Google Scholar 

  • Gunst, R. F., Mason, R. L. (1977). Biased estimation in regression: An evaluation using mean squared error. Journal of the American Statistical Association, 72, 616–628.

    Article  MathSciNet  MATH  Google Scholar 

  • Gunst, R. F., Webster, J. T., Mason, R. L. (1976). A comparison of least squares and latent root regression estimators. Technometrics, 18, 75–83.

    Article  MATH  Google Scholar 

  • Hoerl, A. E., Kennard, R. W. (1970). Ridge regression: Biased estimation for nonorthonogal problems. Technometrics, 12, 55–67.

    Article  MATH  Google Scholar 

  • Hoerl, A. E., Kennard, R. W., Baldwin, K. F. (1975). Ridge regression: Some simulations. Communications in Statistics: Theory and Methods, 4, 105–123.

    Article  MATH  Google Scholar 

  • Horn, R. A., Johnson, C. A. (1985). Matrix analysis. Cambridge: Cambridge University Press.

    Book  MATH  Google Scholar 

  • Jolliffe, I. T. (1986). Principal component analysis. New York: Springer-Verlag.

    Book  MATH  Google Scholar 

  • Lawless, J. F. (1978). Ridge and related estimation procedures: Theory and practice. Communications in Statistics: Theory and Methods, 7, 135–164.

    MathSciNet  MATH  Google Scholar 

  • Montgomery, D. C., Peck, E. A., Vining, G. G. (2012). Introduction to linear regression analysis (5th ed.). New York: Wiley.

    MATH  Google Scholar 

  • Tsao, M. (2019). Estimable group effects for strongly correlated variables in linear models. Journal of Statistical Inference and Planning, 198, 29–42.

    Article  MathSciNet  MATH  Google Scholar 

  • Webster, J. T., Gunst, R. F., Mason, R. L. (1974). Latent root regression analysis. Technometrics, 16, 513–522.

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

We would like to thank two anonymous reviewers and an Associate Editor for their helpful comments which have led to many improvements in this paper. This work is supported by the Natural Sciences and Engineering Research Council of Canada.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Min Tsao.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The Original article is revised to update the Principle to Principal.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 326 KB)

Appendix

Appendix

Proof of Lemma 1

Let \({\mathbf {A}}\) be the \(q\times q\) matrix whose elements are all 1. Then, \({\mathbf {A}}\) has two distinct eigenvalues, \(\lambda ^A_1=q\) and \(\lambda ^A_2=0\). Eigenvalue \(\lambda ^A_1\) has multiplicity 1, and \(\lambda ^A_2\) has multiplicity \((q-1)\). The orthonormal eigenvector of \(\lambda ^A_1\) is \(\frac{1}{\sqrt{q}}{\mathbf {1}}_q\). Here, we ignore the other orthonormal eigenvector of \(\lambda ^A_1\), \(-\frac{1}{\sqrt{q}}{\mathbf {1}}_q\), which differs only in sign from \(\frac{1}{\sqrt{q}}{\mathbf {1}}_q\).

Let \({\mathbf {P}}=[p_{ij}]\) be a perturbation matrix of \({\mathbf {A}}\) defined by

$$\begin{aligned} {\mathbf {P}}={\mathbf {A}}-{\mathbf {R}}. \end{aligned}$$
(24)

Then, \({\mathbf {P}}\) is real and symmetric and \(p_{ij}=1-r_{ij}\). When \(r_M \rightarrow 1\), since \(p_{ij}=(1-r_{ij})\rightarrow 0\), we have \(\Vert {\mathbf {P}}\Vert _2\rightarrow 0\). It follows from this and \({\mathbf {R}}={\mathbf {A}}-{\mathbf {P}}\) (so \({\mathbf {R}}\) is a perturbed version of \({\mathbf {A}}\)) that \(\lambda _1 \rightarrow \lambda _1^A=q\) and \(\lambda _i \rightarrow \lambda ^A_2=0\) for \(i=2,3,\dots ,q\) as \(r_M \rightarrow 1\) (Horn and Johnson, 1985; page 367).

To show that \({\mathbf {v}}_1\rightarrow \frac{1}{\sqrt{q}}{\mathbf {1}}_q\) as \(r_M \rightarrow 1\), since \({\mathbf {R}} {\mathbf {v}}_1=\lambda _1{\mathbf {v}}_1\), we have

$$\begin{aligned} r_{i1}v_{11}+r_{i2}v_{12}+\dots +r_{iq}v_{1q}=\lambda _1v_{1i} \end{aligned}$$
(25)

for \(i=1,2,\dots , q\), where \((r_{i1}, r_{i2}, \dots , r_{iq})\) is the ith row of \({\mathbf {R}}\) and \(v_{1i}\) is the ith element of \({\mathbf {v}}_1\). All \(v_{1i}\) are bounded between \(-1\) and 1 since \(v_{1i}^2 \le \Vert {\mathbf {v}}_1\Vert ^2= 1\). When \(r_M \rightarrow 1\), all \(r_{ij}\rightarrow 1\), so \((r_{ij}v_{1j}-v_{1j})\rightarrow 0\) for \(j=1,2,\dots ,q\). Thus,

$$\begin{aligned} (r_{i1}v_{11}+r_{i2}v_{12}+\dots +r_{iq}v_{1q})-(v_{11}+v_{12}+\dots +v_{1q}) \rightarrow 0 \end{aligned}$$
(26)

as \(r_M \rightarrow 1\). By (25) and (26), \(\lambda _1v_{1i} - (v_{11}+v_{12}+\dots +v_{1q}) \rightarrow 0\) which implies \(\lambda _1^2v_{1i}^2 - (v_{11}+v_{12}+\dots +v_{1q})^2 \rightarrow 0\) for \(i=1,2,\dots ,q\). It follows that

$$\begin{aligned} \lambda _1^2(v_{11}^2+v_{12}^2+\dots +v_{1q}^2)-q(v_{11}+v_{12}+\dots +v_{1q})^2 \rightarrow 0. \end{aligned}$$
(27)

Since \(v_{11}^2+v_{12}^2+\dots +v_{1q}^2= \Vert {\mathbf {v}}_1\Vert ^2=1\) and \(\lambda _1 \rightarrow q\), (27) implies that \((v_{11}+v_{12}+\dots +v_{1q})\rightarrow \sqrt{q}\). This and (26) imply that

$$\begin{aligned} (r_{i1}v_{11}+r_{i2}v_{12}+\dots +r_{iq}v_{1q}) \rightarrow \sqrt{q} \end{aligned}$$

for \(i=1,2,\dots ,q\). By (25), we also have \(\lambda _1v_{1i} \rightarrow \sqrt{q}\). This and \(\lambda _1 \rightarrow q\) imply that \(v_{1i}\rightarrow 1/\sqrt{q}\) for \(i=1,2,\dots ,q\), that is, \({\mathbf {v}}_1\rightarrow \frac{1}{\sqrt{q}}{\mathbf {1}}_q\). \(\square\)

Proof of Lemma 2

Since \({\mathbf {R}}\) is positive definite, \({\mathbf {R}}^{-1}\) is also positive definite. Let \(\lambda _1'\ge \lambda _{2}'\ge \dots \ge \lambda _q'>0\) be the eigenvalues of \({\mathbf {R}}^{-1}\). Then, \(\lambda _i'=\lambda ^{-1}_{q-i+1}\) and its eigenvector is \({\mathbf {v}}_i'={\mathbf {v}}_{q-i+1}\) for \(i=1,2, \dots , q\). In particular, \(\lambda _q'=\lambda _1^{-1}\) and \({\mathbf {v}}_q'={\mathbf {v}}_1\). Since all \(\lambda _i>0\) and \(trace({\mathbf {R}})=q=\sum ^q_{i=1}\lambda _i\), we have \(0<\lambda _1<q\). Also, \({\mathbf {v}}_1^T{\mathbf {v}}_1=1\) as \({\mathbf {v}}_1\) is orthonormal. It follows from these that

$$\begin{aligned} {\mathbf {v}}_1^T {\mathbf {R}}^{-1} {\mathbf {v}}_1 = {\mathbf {v}}_q'^T {\mathbf {R}}^{-1} {\mathbf {v}}_q' = {\mathbf {v}}_q'^T\lambda _q' {\mathbf {v}}_q' = \frac{{\mathbf {v}}_1^T{\mathbf {v}}_1 }{\lambda _1}=\frac{1}{\lambda _1} >\frac{1}{q}, \end{aligned}$$
(28)

which proves (i). By Lemma 1, \(\lambda _1 \rightarrow q\) as \(r_M \rightarrow 1\). Thus, by (28)

$$\begin{aligned} {\mathbf {v}}_1^T {\mathbf {R}}^{-1} {\mathbf {v}}_1 = \frac{1}{\lambda _1} \rightarrow \frac{1}{q}, \end{aligned}$$

as \(r_M \rightarrow 1\), which proves (ii). \(\square\)

Proof of Theorem 1

For any constant vector \({\mathbf {c}} \in {\mathbb {R}}^p\), we have

$$\begin{aligned} {var}({\mathbf {c}}^T\hat{\varvec{\beta }}')={\sigma }^2{\mathbf {c}}^T[{\mathbf {X}}'^T{\mathbf {X}}']^{-1}{\mathbf {c}}. \end{aligned}$$
(29)

Let \({\mathbf {c}}_E=({{\mathbf {v}}^*_1}^T, 0, \dots , 0)^T\). Then, \({\xi }_E= {\mathbf {c}}_E^T {\varvec{\beta }}'\) and \({\hat{\xi }}_E= {\mathbf {c}}_E^T \hat{\varvec{\beta }}'\). By (11) and (29),

$$\begin{aligned} var({\hat{\xi }}_E) =\sigma ^2{{\mathbf {v}}^*_1}^T[{\mathbf {R}}_{11}-{\mathbf {R}}_{12}{\mathbf {R}}^{-1}_{22}{\mathbf {R}}_{21}]^{-1}{{\mathbf {v}}^*_1} = \sigma ^2 {{\mathbf {v}}^*_1}^T{{\mathbf {R}}^*}^{-1}{{\mathbf {v}}^*_1}. \end{aligned}$$
(30)

To show (i), when variables in \({\mathbf {X}}_1'\) are uncorrelated with variables in \({\mathbf {X}}_2'\), \({\mathbf {R}}_{12}={\mathbf {0}}\) and so \({\mathbf {R}}^*={\mathbf {R}}\) and \({\mathbf {v}}^*_1={\mathbf {v}}_1\). By (30),

$$\begin{aligned} var({\hat{\xi }}_E) ={\sigma }^2{\mathbf {v}}^T_1 {\mathbf {R}}^{-1}{\mathbf {v}}_1. \end{aligned}$$
(31)

Applying Lemma 2 to the right-hand side of (31), we obtain (\(i_1\)) and (\(i_2\)).

To show (ii), for simplicity, we assume general conditions discussed in footnote 1 hold so that \({\mathbf {R}}_{12}{\mathbf {R}}^{-1}_{22}{\mathbf {R}}_{21}\rightarrow {\mathbf {0}}\) when \({\mathbf {R}}_{12}\rightarrow {\mathbf {0}}\). It follows from this and conditions in Theorem 1(ii) that \({\mathbf {R}}_{11}\) and \({\mathbf {R}}^*\) will both converge to matrix \({\mathbf {A}}\) in (24). We again define a perturbation matrix of \({\mathbf {A}}\) as

$$\begin{aligned} {\mathbf {P}}^*={\mathbf {A}} - {\mathbf {R}}^* \end{aligned}$$

like what we did in (24). By following steps similar to those in the proofs of Lemma 1 and Lemma 2, we can show that \({\mathbf {R}}^*\) also has the two properties in Lemma 1 and property (ii) in Lemma 2. The latter and (30) imply (ii). \(\square\)

Proof of Theorem 2

Since \({\mathbf {v}}\cdot {\mathbf {v}}^*_1=\Vert {\mathbf {v}}\Vert \Vert {\mathbf {v}}^*_1\Vert cos(\theta )=cos(\theta )\) where \(\theta\) is the angle between \({\mathbf {v}}\) and \({\mathbf {v}}^*_1\), \(\sqrt{1-\delta }< {\mathbf {v}} \cdot {\mathbf {v}}^*_1 \le 1\) is equivalent to \(\sqrt{1-\delta }<cos(\theta ) \le 1\) or \(0\le \theta < \theta _\delta\) for some small fixed \(\theta _\delta >0\). Thus, \(\mathcal{N}_{\delta }\) in (14) represents a small open circular region centred on \({\mathbf {v}}_1^*\) on the surface of the unit sphere.

Similar to \(var({\hat{\xi }}_E)\) in (30), \(var({\mathbf {v}}^T\hat{\varvec{\beta }}_1')=\sigma ^2{\mathbf {v}}^T{{\mathbf {R}}^*}^{-1}{\mathbf {v}}\). Since \({{\mathbf {R}}^*}^{-1}\) is real symmetric positive definite, it has eigendecomposition \(\mathbf {Q\Lambda Q}^T\) where \({\mathbf {Q}}\) is the matrix of orthonormal eigenvectors including \({\mathbf {v}}^*_1\) and \(\mathbf {\Lambda }\) is the diagonal matrix of eigenvalues. The smallest eigenvalue of \({{\mathbf {R}}^*}^{-1}\) is \(1/\lambda ^*_1\) which converges to 1/q under the condition of Theorem 2 as \(r_M\) goes to 1. The other eigenvalues of \({{\mathbf {R}}^*}^{-1}\) all go to infinity as \(r_M\) goes to 1. For any unit vector \({\mathbf {v}}\),

$$\begin{aligned} 1={\mathbf {v}}^T{\mathbf {v}} = {\mathbf {v}}^T{\mathbf {Q}}{\mathbf {Q}}^T{\mathbf {v}}= {\mathbf {v}}^T[\tilde{{\mathbf {Q}}},{\mathbf {v}}^*_1][\tilde{{\mathbf {Q}}},{\mathbf {v}}^*_1]^T{\mathbf {v}} = {\mathbf {v}}^T\tilde{{\mathbf {Q}}}\tilde{{\mathbf {Q}}}^T{\mathbf {v}}+ ({\mathbf {v}}^T {\mathbf {v}}^*_1)^2 \end{aligned}$$
(32)

where \(\tilde{{\mathbf {Q}}}\) is the matrix containing all columns of \({\mathbf {Q}}\) but \({\mathbf {v}}^*_1\). If \({\mathbf {v}} \notin {\mathcal N}_{\delta }\), then \(({\mathbf {v}}^T {\mathbf {v}}^*_1)^2\le 1-\delta\). This and (32) imply that \(1\le {\mathbf {v}}^T\tilde{{\mathbf {Q}}}\tilde{{\mathbf {Q}}}^T{\mathbf {v}}+(1-\delta )\), that is, \({\mathbf {v}}^T\tilde{{\mathbf {Q}}}\tilde{{\mathbf {Q}}}^T{\mathbf {v}} \ge \delta\). This leads to the following lower bound on \(var({\mathbf {v}}^T\hat{\varvec{\beta }}_1')\),

$$\begin{aligned} var({\mathbf {v}}^T\hat{\varvec{\beta }}_1')=\sigma ^2{\mathbf {v}}^T{{\mathbf {R}}^*}^{-1}{\mathbf {v}} = \sigma ^2{\mathbf {v}}^T\mathbf {Q\Lambda Q}^T{\mathbf {v}}\ge \sigma ^2{\mathbf {v}}^T\tilde{{\mathbf {Q}}} \tilde{\mathbf {\Lambda }} \tilde{{\mathbf {Q}}}^T{\mathbf {v}}\ge \frac{\sigma ^2\delta }{\lambda ^*_2}, \end{aligned}$$
(33)

where \(\tilde{\mathbf {\Lambda }}\) is the diagonal matrix of all eigenvalues of \({{\mathbf {R}}^*}^{-1}\) except the smallest one \(1/\lambda ^*_1\), and \(1/\lambda ^*_2\) is the second smallest eigenvalue of \({{\mathbf {R}}^*}^{-1}\). Since \(1/\lambda ^*_2 \rightarrow \infty\) as \(r_M \rightarrow 1\), (33) implies that \(var({\mathbf {v}}^T\hat{\varvec{\beta }}_1') \rightarrow \infty\) as \(r_M \rightarrow 1\) if \({\mathbf {v}} \notin {{\mathcal {N}}}_{\delta }\). \(\square\)

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tsao, M. Group least squares regression for linear models with strongly correlated predictor variables. Ann Inst Stat Math 75, 233–250 (2023). https://doi.org/10.1007/s10463-022-00841-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10463-022-00841-7

Keywords

Navigation