Skip to main content
Log in

Diagnosis and quantification of the non-essential collinearity

  • Original paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

Marquandt and Snee (Am Stat 29(1):3–20, 1975), Marquandt (J Am Stat Assoc 75(369):87–91, 1980) and Snee and Marquardt (Am Stat 38(2):83–87, 1984) refer to non-essential multicollinearity as that caused by the relation with the independent term. Although it is clear that the solution is to center the independent variables in the regression model, it is unclear when this kind of collinearity exists. The goal of this study is to diagnose the non-essential collinearity parting from a simple linear model. The collinearity indices \(k_{j}\), traditionally misinterpreted as variance inflation factors, are reinterpreted in this paper where they will be used to distinguish and quantify the essential and non-essential collinearity. The results can be immediately extended to the multiple linear model. The study also has some recommendations for statistical software such as SPSS, Stata, GRETL or R for improving the diagnosis of non-essential collinearity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Belsley DA (1982) Assessing the presence of harmful collinearity and other forms of weak data through a test for signal-to-noise. J Econ 20(2):211–253

    Article  MathSciNet  Google Scholar 

  • Belsley DA (1984) Demeaning conditioning diagnostics through centering. Am Stat 38(2):73–77

    Google Scholar 

  • Berk KN (1977) Tolerance and condition in regression computations. J Am Stat Assoc 72:863–866

    MathSciNet  MATH  Google Scholar 

  • Christensen R (2018) Comment on a note on collinearity diagnostics and centering. Am Stat 72(1):114–117

    Article  Google Scholar 

  • Curto JD, Pinto JC (2011) The corrected vif (cvif). J Appl Stat 38(7):1499–1507

    Article  MathSciNet  MATH  Google Scholar 

  • EMMI (2018) European money markets institute. https://www.emmi-benchmarkseu Checked: 1 Feb 2018

  • Eurostat (2018) European commission. http://www.eceuropaeu/eurostat/web Checked: 1 Feb 2018

  • García J, Salmerón R, García C, López M (2016) Standardization of variables and collinearity diagnostic in ridge regression. Int Stat Rev 84(2):245–266

    Article  MathSciNet  Google Scholar 

  • Gujarati D (2003) Basic Econometrics, 4th edn. McGraw-Hill, New York

    Google Scholar 

  • Gunst RF (1984) Toward a balanced assessment of collinearity diagnostics. Am Stat 38:79–82

    Google Scholar 

  • Jensen D, Ramírez D (2013) Revision: variance inflation in regression. Adv Decis Sci Article ID 671204

  • Johnston JD, Dinardo J (2001) Métodos de econometría. Ed. Vicens Vives, Barcelona

    Google Scholar 

  • Marquandt DW (1980) You should standardize the predictor variables in your regression models. J Am Stat Assoc 75(369):87–91

    Google Scholar 

  • Marquandt DW, Snee R (1975) Ridge regression in practice. Am Stat 29(1):3–20

    MATH  Google Scholar 

  • Novales A (1993) Econometría, 2nd edn. Ed. McGraw-Hil, Madrid

    Google Scholar 

  • Novales A (2010) Análisis de regresión. https://www.ucmes/data/cont/docs/518-2013-11-13-Analisis%20de%20Regresionpdf Checked: 16 Oct 2017

  • Salmerón R, Blanco V (2016) El problema de un tamaño muestral pequeño en la regresión lineal: micronumerosidad. Rect@ 17(2):167–177

    Google Scholar 

  • Salmerón R, García J, García C, Martín ML (2017) A note about the corrected vif. Stat Pap 58(3):929–945

    Article  MathSciNet  MATH  Google Scholar 

  • Salmeron R, Garcia C, Garcia J (2019) multiColl: collinearity detection in a multiple linear regression model. https://CRAN.R-project.org/package=multiColl, R package version 1.0

  • Snee RD, Marquardt DW (1984) Collinearity diagnostics depend on the domain of prediction, the model, and the data. Am Stat 38(2):83–87

    Google Scholar 

  • Stewart G (1987) Collinearity and least squares regression. Stat Sci 2(1):68–100

    Article  MathSciNet  MATH  Google Scholar 

  • Stock J, Watson M (2012) Introducción a la Econometría, 3rd edn. Ed. Pearson, Madrid

    Google Scholar 

  • Uriel E, Periró A, Contreras D, Moltó M (1997) Econometría: El Modelo Lineal. Ed. Alfa Centauro, Madrid

    Google Scholar 

  • Velilla S (2018) A note on collinearity diagnostics and centering. Am Stat 72(2):140–146

    Article  MathSciNet  Google Scholar 

  • Wood F (1984) Comment on effect of centering on collinearity and interpretation of the constant. Am Stat 38(2):88–90

    MathSciNet  Google Scholar 

  • Wooldridge J (2009) Introductory Econometrics: A Modern Approach. South-Western Cengage Learning, Canada

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Catalina García-García.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: Stewart indices

Appendix A: Stewart indices

Given matrix \(\mathbf {A}\) with dimensions \(n \times p\) partitioned as \(\mathbf {A} = [ \mathbf {A}_{1}, \ldots , \mathbf {A}_{i}, \ldots , \mathbf {A}_{p} ] = [ \mathbf {A}_{i}, \mathbf {A}_{-i}]\) where \(\vert \mathbf {A}\vert \) is the determinant of A and \(\mathbf {A}_{-i}\) is equal to \(\mathbf {A}\) after eliminating column i, Stewart (1987) defined the following index to measure the relation between \(\mathbf {A}_{i}\) and the rest of the columns of \(\mathbf {A}\):

$$\begin{aligned} k_{i}^{2} = \frac{\vert \mathbf {A}_{-i}^{t} \mathbf {A}_{-i} \vert \cdot \mathbf {A}_{i}^{t} \mathbf {A}_{i}}{\vert \mathbf {A}^{t} \mathbf {A} \vert }, \quad i=1,\ldots ,p. \end{aligned}$$
(20)

Since \(\vert \mathbf {A}^{t} \mathbf {A} \vert = \vert \mathbf {A}_{-i}^{t} \mathbf {A}_{-i} \vert \cdot \vert \mathbf {A}_{i}^{t} \mathbf {A}_{i} - \mathbf {A}_{i}^{t} \mathbf {A}_{-i} \cdot \left( \mathbf {A}_{-i}^{t} \mathbf {A}_{-i} \right) ^{-1} \cdot \mathbf {A}_{-i}^{t} \mathbf {A}_{i} \vert \), is clear that:

$$\begin{aligned} k_{i}^{2} = \frac{\mathbf {A}_{i}^{t} \mathbf {A}_{i}}{\mathbf {A}_{i}^{t} \mathbf {A}_{i} - \mathbf {A}_{i}^{t} \mathbf {A}_{-i} \cdot \left( \mathbf {A}_{-i}^{t} \mathbf {A}_{-i} \right) ^{-1} \cdot \mathbf {A}_{-i}^{t} \mathbf {A}_{i}}, \quad i=1,\ldots ,p. \end{aligned}$$
(21)

Then, it is verified that:

$$\begin{aligned} k_{i}^{2}= & {} 1, \quad \text{ if } \mathbf {A}_{i}^{t} \mathbf {A}_{-i} = \mathbf {0},\\ k_{i}^{2}&\not =&1, \quad \text{ if } \mathbf {A}_{i}^{t} \mathbf {A}_{-i} \not = \mathbf {0}, \end{aligned}$$

where \(\mathbf {0}\) is a vector composed of zeros with appropriate dimensions. In addition, when \(i=1,\ldots ,p\), it is verified that:

  • \(k_{i}^{2} > 1\)    if \(\mathbf {A}_{-i}^{t} \mathbf {A}_{-i}\) is positive defined.

  • \(k_{i}^{2} < 1\)    if \(\mathbf {A}_{-i}^{t} \mathbf {A}_{-i}\) is negative defined.

Thus, this index can capture the orthogonality between \(\mathbf {A}_{i}\) and the rest of the columns of matrix \(\mathbf {A}\). However, note that orthogonality does not imply that there is no correlation:

$$\begin{aligned} \mathbf {A}_{i}^{t} \mathbf {A}_{j}= & {} \sum \limits _{k=1}^{n} A_{ik} A_{jk} = 0 \nRightarrow corr(\mathbf {A}_{i},\mathbf {A}_{j})\\= & {} - \frac{\overline{\mathbf {A}}_{i} \cdot \overline{\mathbf {A}}_{i}}{\sqrt{\sum \nolimits _{k=1}^{n} \left( A_{ik} - \overline{\mathbf {A}}_{i} \right) ^{2}} \cdot \sqrt{\sum \nolimits _{k=1}^{n} \left( A_{jk} - \overline{\mathbf {A}}_{j} \right) ^{2}}} = 0, \end{aligned}$$

for \(i,j = 1,\ldots ,p, \ i \not = j\), unless the columns have zero mean.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Salmerón-Gómez, R., Rodríguez-Sánchez, A. & García-García, C. Diagnosis and quantification of the non-essential collinearity. Comput Stat 35, 647–666 (2020). https://doi.org/10.1007/s00180-019-00922-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-019-00922-x

Keywords

Navigation