Abstract
Traditionally, the main focus of the least squares regression is to study the effects of individual predictor variables, but strongly correlated variables generate multicollinearity which makes it difficult to study their effects. To resolve the multicollinearity issue without abandoning the least squares regression, for situations where predictor variables are in groups with strong within-group correlations but weak between-group correlations, we propose to study the effects of the groups with a group approach to the least squares regression. Using an all positive correlations arrangement of the strongly correlated variables, we first characterize group effects that are meaningful and can be accurately estimated. We then discuss the group approach to the least squares regression through a simulation study and demonstrate that it is an effective method for handling multicollinearity. We also address a common misconception about prediction accuracy of the least squares estimated model.
Similar content being viewed by others
Change history
10 January 2023
The Original article is revised to update the Principle to Principal.
11 January 2023
A Correction to this paper has been published: https://doi.org/10.1007/s10463-022-00861-3
Notes
\({\mathbf {R}}_{12}\rightarrow {\mathbf {0}}\) denotes element-wise convergence of \({\mathbf {R}}_{12}\) to zero. It implies \({\mathbf {R}}_{12}{\mathbf {R}}^{-1}_{22}{\mathbf {R}}_{21} \rightarrow {\mathbf {0}}\) under general conditions such as \(\Vert {\mathbf {R}}^{-1}_{22}\Vert _{max}\) is bounded or \((\Vert {\mathbf {R}}_{12}\Vert _{max})^2(\Vert {\mathbf {R}}^{-1}_{22}\Vert _{max})=o(1)\). This observation will be used in the proof of (ii) which requires \({\mathbf {R}}_{12}{\mathbf {R}}^{-1}_{22}{\mathbf {R}}_{21} \rightarrow {\mathbf {0}}\).
For unstandardised variables and/or variables not in an APC arrangement, this line is difficult to characterize. But for standardized variables in APC arrangement, this line is easy to describe; e.g. for the q variable in \({\mathbf {X}}_1'\) of (8), this line is \(x_1'=x_2'=\dots =x_q'\).
References
Belsley, D. A., Kuh, E., Welsch, R. E. (2004). Regression diagnostics: Identifying influential data and sources of collinearity. New York: Wiley & Sons.
Conniffe, D., Stone, J. (1973). A critical view of ridge regression. American Statistician, 22, 181–187.
Draper, N. R., Smith, H. (1998). Applied regression analysis (3rd ed.). New York: Wiley.
Draper, N. R., Van Nostrand, R. C. (1979). Ridge regression and James-Stein estimators: Review and comments. Technometrics, 21, 451–466.
Gunst, R. F., Mason, R. L. (1977). Biased estimation in regression: An evaluation using mean squared error. Journal of the American Statistical Association, 72, 616–628.
Gunst, R. F., Webster, J. T., Mason, R. L. (1976). A comparison of least squares and latent root regression estimators. Technometrics, 18, 75–83.
Hoerl, A. E., Kennard, R. W. (1970). Ridge regression: Biased estimation for nonorthonogal problems. Technometrics, 12, 55–67.
Hoerl, A. E., Kennard, R. W., Baldwin, K. F. (1975). Ridge regression: Some simulations. Communications in Statistics: Theory and Methods, 4, 105–123.
Horn, R. A., Johnson, C. A. (1985). Matrix analysis. Cambridge: Cambridge University Press.
Jolliffe, I. T. (1986). Principal component analysis. New York: Springer-Verlag.
Lawless, J. F. (1978). Ridge and related estimation procedures: Theory and practice. Communications in Statistics: Theory and Methods, 7, 135–164.
Montgomery, D. C., Peck, E. A., Vining, G. G. (2012). Introduction to linear regression analysis (5th ed.). New York: Wiley.
Tsao, M. (2019). Estimable group effects for strongly correlated variables in linear models. Journal of Statistical Inference and Planning, 198, 29–42.
Webster, J. T., Gunst, R. F., Mason, R. L. (1974). Latent root regression analysis. Technometrics, 16, 513–522.
Acknowledgements
We would like to thank two anonymous reviewers and an Associate Editor for their helpful comments which have led to many improvements in this paper. This work is supported by the Natural Sciences and Engineering Research Council of Canada.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The Original article is revised to update the Principle to Principal.
Supplementary Information
Below is the link to the electronic supplementary material.
Appendix
Appendix
Proof of Lemma 1
Let \({\mathbf {A}}\) be the \(q\times q\) matrix whose elements are all 1. Then, \({\mathbf {A}}\) has two distinct eigenvalues, \(\lambda ^A_1=q\) and \(\lambda ^A_2=0\). Eigenvalue \(\lambda ^A_1\) has multiplicity 1, and \(\lambda ^A_2\) has multiplicity \((q-1)\). The orthonormal eigenvector of \(\lambda ^A_1\) is \(\frac{1}{\sqrt{q}}{\mathbf {1}}_q\). Here, we ignore the other orthonormal eigenvector of \(\lambda ^A_1\), \(-\frac{1}{\sqrt{q}}{\mathbf {1}}_q\), which differs only in sign from \(\frac{1}{\sqrt{q}}{\mathbf {1}}_q\).
Let \({\mathbf {P}}=[p_{ij}]\) be a perturbation matrix of \({\mathbf {A}}\) defined by
Then, \({\mathbf {P}}\) is real and symmetric and \(p_{ij}=1-r_{ij}\). When \(r_M \rightarrow 1\), since \(p_{ij}=(1-r_{ij})\rightarrow 0\), we have \(\Vert {\mathbf {P}}\Vert _2\rightarrow 0\). It follows from this and \({\mathbf {R}}={\mathbf {A}}-{\mathbf {P}}\) (so \({\mathbf {R}}\) is a perturbed version of \({\mathbf {A}}\)) that \(\lambda _1 \rightarrow \lambda _1^A=q\) and \(\lambda _i \rightarrow \lambda ^A_2=0\) for \(i=2,3,\dots ,q\) as \(r_M \rightarrow 1\) (Horn and Johnson, 1985; page 367).
To show that \({\mathbf {v}}_1\rightarrow \frac{1}{\sqrt{q}}{\mathbf {1}}_q\) as \(r_M \rightarrow 1\), since \({\mathbf {R}} {\mathbf {v}}_1=\lambda _1{\mathbf {v}}_1\), we have
for \(i=1,2,\dots , q\), where \((r_{i1}, r_{i2}, \dots , r_{iq})\) is the ith row of \({\mathbf {R}}\) and \(v_{1i}\) is the ith element of \({\mathbf {v}}_1\). All \(v_{1i}\) are bounded between \(-1\) and 1 since \(v_{1i}^2 \le \Vert {\mathbf {v}}_1\Vert ^2= 1\). When \(r_M \rightarrow 1\), all \(r_{ij}\rightarrow 1\), so \((r_{ij}v_{1j}-v_{1j})\rightarrow 0\) for \(j=1,2,\dots ,q\). Thus,
as \(r_M \rightarrow 1\). By (25) and (26), \(\lambda _1v_{1i} - (v_{11}+v_{12}+\dots +v_{1q}) \rightarrow 0\) which implies \(\lambda _1^2v_{1i}^2 - (v_{11}+v_{12}+\dots +v_{1q})^2 \rightarrow 0\) for \(i=1,2,\dots ,q\). It follows that
Since \(v_{11}^2+v_{12}^2+\dots +v_{1q}^2= \Vert {\mathbf {v}}_1\Vert ^2=1\) and \(\lambda _1 \rightarrow q\), (27) implies that \((v_{11}+v_{12}+\dots +v_{1q})\rightarrow \sqrt{q}\). This and (26) imply that
for \(i=1,2,\dots ,q\). By (25), we also have \(\lambda _1v_{1i} \rightarrow \sqrt{q}\). This and \(\lambda _1 \rightarrow q\) imply that \(v_{1i}\rightarrow 1/\sqrt{q}\) for \(i=1,2,\dots ,q\), that is, \({\mathbf {v}}_1\rightarrow \frac{1}{\sqrt{q}}{\mathbf {1}}_q\). \(\square\)
Proof of Lemma 2
Since \({\mathbf {R}}\) is positive definite, \({\mathbf {R}}^{-1}\) is also positive definite. Let \(\lambda _1'\ge \lambda _{2}'\ge \dots \ge \lambda _q'>0\) be the eigenvalues of \({\mathbf {R}}^{-1}\). Then, \(\lambda _i'=\lambda ^{-1}_{q-i+1}\) and its eigenvector is \({\mathbf {v}}_i'={\mathbf {v}}_{q-i+1}\) for \(i=1,2, \dots , q\). In particular, \(\lambda _q'=\lambda _1^{-1}\) and \({\mathbf {v}}_q'={\mathbf {v}}_1\). Since all \(\lambda _i>0\) and \(trace({\mathbf {R}})=q=\sum ^q_{i=1}\lambda _i\), we have \(0<\lambda _1<q\). Also, \({\mathbf {v}}_1^T{\mathbf {v}}_1=1\) as \({\mathbf {v}}_1\) is orthonormal. It follows from these that
which proves (i). By Lemma 1, \(\lambda _1 \rightarrow q\) as \(r_M \rightarrow 1\). Thus, by (28)
as \(r_M \rightarrow 1\), which proves (ii). \(\square\)
Proof of Theorem 1
For any constant vector \({\mathbf {c}} \in {\mathbb {R}}^p\), we have
Let \({\mathbf {c}}_E=({{\mathbf {v}}^*_1}^T, 0, \dots , 0)^T\). Then, \({\xi }_E= {\mathbf {c}}_E^T {\varvec{\beta }}'\) and \({\hat{\xi }}_E= {\mathbf {c}}_E^T \hat{\varvec{\beta }}'\). By (11) and (29),
To show (i), when variables in \({\mathbf {X}}_1'\) are uncorrelated with variables in \({\mathbf {X}}_2'\), \({\mathbf {R}}_{12}={\mathbf {0}}\) and so \({\mathbf {R}}^*={\mathbf {R}}\) and \({\mathbf {v}}^*_1={\mathbf {v}}_1\). By (30),
Applying Lemma 2 to the right-hand side of (31), we obtain (\(i_1\)) and (\(i_2\)).
To show (ii), for simplicity, we assume general conditions discussed in footnote 1 hold so that \({\mathbf {R}}_{12}{\mathbf {R}}^{-1}_{22}{\mathbf {R}}_{21}\rightarrow {\mathbf {0}}\) when \({\mathbf {R}}_{12}\rightarrow {\mathbf {0}}\). It follows from this and conditions in Theorem 1(ii) that \({\mathbf {R}}_{11}\) and \({\mathbf {R}}^*\) will both converge to matrix \({\mathbf {A}}\) in (24). We again define a perturbation matrix of \({\mathbf {A}}\) as
like what we did in (24). By following steps similar to those in the proofs of Lemma 1 and Lemma 2, we can show that \({\mathbf {R}}^*\) also has the two properties in Lemma 1 and property (ii) in Lemma 2. The latter and (30) imply (ii). \(\square\)
Proof of Theorem 2
Since \({\mathbf {v}}\cdot {\mathbf {v}}^*_1=\Vert {\mathbf {v}}\Vert \Vert {\mathbf {v}}^*_1\Vert cos(\theta )=cos(\theta )\) where \(\theta\) is the angle between \({\mathbf {v}}\) and \({\mathbf {v}}^*_1\), \(\sqrt{1-\delta }< {\mathbf {v}} \cdot {\mathbf {v}}^*_1 \le 1\) is equivalent to \(\sqrt{1-\delta }<cos(\theta ) \le 1\) or \(0\le \theta < \theta _\delta\) for some small fixed \(\theta _\delta >0\). Thus, \(\mathcal{N}_{\delta }\) in (14) represents a small open circular region centred on \({\mathbf {v}}_1^*\) on the surface of the unit sphere.
Similar to \(var({\hat{\xi }}_E)\) in (30), \(var({\mathbf {v}}^T\hat{\varvec{\beta }}_1')=\sigma ^2{\mathbf {v}}^T{{\mathbf {R}}^*}^{-1}{\mathbf {v}}\). Since \({{\mathbf {R}}^*}^{-1}\) is real symmetric positive definite, it has eigendecomposition \(\mathbf {Q\Lambda Q}^T\) where \({\mathbf {Q}}\) is the matrix of orthonormal eigenvectors including \({\mathbf {v}}^*_1\) and \(\mathbf {\Lambda }\) is the diagonal matrix of eigenvalues. The smallest eigenvalue of \({{\mathbf {R}}^*}^{-1}\) is \(1/\lambda ^*_1\) which converges to 1/q under the condition of Theorem 2 as \(r_M\) goes to 1. The other eigenvalues of \({{\mathbf {R}}^*}^{-1}\) all go to infinity as \(r_M\) goes to 1. For any unit vector \({\mathbf {v}}\),
where \(\tilde{{\mathbf {Q}}}\) is the matrix containing all columns of \({\mathbf {Q}}\) but \({\mathbf {v}}^*_1\). If \({\mathbf {v}} \notin {\mathcal N}_{\delta }\), then \(({\mathbf {v}}^T {\mathbf {v}}^*_1)^2\le 1-\delta\). This and (32) imply that \(1\le {\mathbf {v}}^T\tilde{{\mathbf {Q}}}\tilde{{\mathbf {Q}}}^T{\mathbf {v}}+(1-\delta )\), that is, \({\mathbf {v}}^T\tilde{{\mathbf {Q}}}\tilde{{\mathbf {Q}}}^T{\mathbf {v}} \ge \delta\). This leads to the following lower bound on \(var({\mathbf {v}}^T\hat{\varvec{\beta }}_1')\),
where \(\tilde{\mathbf {\Lambda }}\) is the diagonal matrix of all eigenvalues of \({{\mathbf {R}}^*}^{-1}\) except the smallest one \(1/\lambda ^*_1\), and \(1/\lambda ^*_2\) is the second smallest eigenvalue of \({{\mathbf {R}}^*}^{-1}\). Since \(1/\lambda ^*_2 \rightarrow \infty\) as \(r_M \rightarrow 1\), (33) implies that \(var({\mathbf {v}}^T\hat{\varvec{\beta }}_1') \rightarrow \infty\) as \(r_M \rightarrow 1\) if \({\mathbf {v}} \notin {{\mathcal {N}}}_{\delta }\). \(\square\)
About this article
Cite this article
Tsao, M. Group least squares regression for linear models with strongly correlated predictor variables. Ann Inst Stat Math 75, 233–250 (2023). https://doi.org/10.1007/s10463-022-00841-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10463-022-00841-7