Skip to main content
Log in

A note about the corrected VIF

  • Regular Article
  • Published:
Statistical Papers Aims and scope Submit manuscript

Abstract

This paper discusses some limitations when applying the CVIF of Curto and Pinto in J Appl Stat 38(7):1499–1507 (2011) and proposes some modifications to overcome them. The concept of modified CVIF is also extended to be applied in ridge estimation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. Due to this property, when \(R_{0}^{2}(0)\) is higher than 1, a value \(k_{h}\) may exist from which it is verified that \(R_{0}^{2}(k)<1\) for \(k>k_{h}\).

  2. For the rest of the values it is less than 1, that is, \(k_{h}=0.5\) in this case. From this value, the desirable properties (monotony and values higher than one) are recovered. In addition, it is possible to establish the prevalence \(MCVIF _{R}(j,k)\ge VIF _{R}(j,k)\) for the considered values of k and j.

References

  • Belsley DA, Kuh E, Welsch RE (1980) Regression diagnostics: identifying influential data and sources of collinearity. John Wiley & Sons, New York

    Book  MATH  Google Scholar 

  • Chang X, Yang H (2012) Combining two-parameter and principal component regression estimators. Stat Pap 53(3):549–562

    Article  MathSciNet  MATH  Google Scholar 

  • Cuadras C (1993) Interpreting an inequality un multiple regression. Am Stat 47:256–258

    Google Scholar 

  • Curto JD, Pinto JC (2007) New multicollinearity indicators in linear regression models. Int Stat Rev 75(1):114–121

    Article  Google Scholar 

  • Curto JD, Pinto JC (2011) The corrected vif (cvif). J Appl Stat 38(7):1499–1507

    Article  MathSciNet  MATH  Google Scholar 

  • Farrar DE, Glauber RR (1967) Multicollinearity in regression analysis: the problem revisited. Rev Econ Stat 49:92–107

    Article  Google Scholar 

  • Feng-Jeng L (2008) Solving multicollinearity in the process of fitting regression model using the nested estimate procedure. Qual Quant 42:417–426

    Article  Google Scholar 

  • Flury B (1989) Understanding partial statistics and redundancy of variables in regression and discriminant analysis. Am Stat 43:27–31

    Google Scholar 

  • Fox J, Monette G (1992) Generalized collinearity diagnostics. J Am Stat Assoc 87:178–183

    Article  Google Scholar 

  • García C, García J, López MDM, Salmerón R (2015a) Collinearity: revisiting the variance inflation factor in ridge regression. J Appl Stat 42(3):648–661

    Article  MathSciNet  Google Scholar 

  • García CB, García J, Soto J (2010) The raise method: an alternative procedure to estimate the parameters in presence of collinearity. Qual Quant 45(2):403–423

    Article  Google Scholar 

  • García J, Salmerón R, García C, López MDM (2015b) Standardization of variables and diagnostic of collinearity in the ridge regression. Int Stat Rev. doi:10.1111/insr.12099

  • Gunst RL, Mason RL (1977) Advantages of examining multicollinearities in regression analysis. Biometrics 33:249–260

    Article  MATH  Google Scholar 

  • Hadi AS (2011) Ridge and surrogate ridge regressions. Springer, Heidelberg

    Book  Google Scholar 

  • Hamilton D (1987) Sometimes \(r^2{\>}r^2_{yx_1}+r^2_{yx_2}\) correlated variables are not always redundant. Am Stat 41(2):129–132

    MathSciNet  Google Scholar 

  • Hoerl AE, Kennard RW (1970) Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12:55–67

    Article  MATH  Google Scholar 

  • Jensen DR, Ramirez DE (2008) Anomalies in the foundations of ridge regression. Int Stat Rev 76(1):89–105

    Article  MATH  Google Scholar 

  • Kovacs P, Petres T, Toth L (2005) A new measure of multicollinearity in linear regression models. Int Stat Rev 73(3):405–412

    Article  MATH  Google Scholar 

  • Kurnaz F, Akay K (2015) A new liu-type estimator. Stat Pap 56(2):495–517. doi:10.1007/s00362-014-0594-6

    Article  MathSciNet  MATH  Google Scholar 

  • Lazaridis A (2007) A note regarding the condition number: the case of spurious and latent multicollinearity. Qual Quant 41:123–135

    Article  Google Scholar 

  • Liu K (2003) Using liu-type estimator to combat collinearity. Commun Stati Theory Methods 32(5):1009–1020

    Article  MathSciNet  MATH  Google Scholar 

  • Liu XQ, Gao F, Yu ZF (2013) Improved ridge estimators in a linear regression model. J Appl Stat 40(1):209–220

    Article  MathSciNet  Google Scholar 

  • Marquardt DW (1970) Generalized inverses, ridge regression, biased linear estimation and nonlinear estimation. Technometrics 12(3):591–612

    Article  MATH  Google Scholar 

  • Marquardt DW, Snee SR (1975) Ridge regression in practice. Am Stat 29(1):3–20

    MATH  Google Scholar 

  • McDonald GC (2009) Ridge regression. Wiley Interdiscip Rev 1:93–100

    Article  Google Scholar 

  • McDonald GC (2010) Tracing ridge regression coefficients. Wiley Interdiscip Rev 2:695–703

    Article  Google Scholar 

  • O’Brien RM (2007) A caution regarding rules of thumb for variance inflationfactors. Qual Quant 41:673–690

    Article  Google Scholar 

  • Sakallıoğlu S, Kaçıranlar S (2008) A new biased estimator based on ridge estimation. Stat Pap 49(4):669–689

    Article  MathSciNet  MATH  Google Scholar 

  • Silvey S (1969) Multicollinearity and imprecise estimation. J R Stat Soc Ser B (Methodol) 31:539–552

    MathSciNet  MATH  Google Scholar 

  • Spanos A, McGuirk A (2002) The problem of near-multicollinearity revisited: erractic versus systematic volatility. J Econ 108:365–393

    Article  MATH  Google Scholar 

  • Stein C et al (1956) Inadmissibility of the usual estimator for the mean of a multivariate normal distribution. Proc Third Berkeley Symp Math Stat Probab 1:197–206

    MathSciNet  MATH  Google Scholar 

  • Theil H (1971) Principles of econometrics. Wiley, New York

    MATH  Google Scholar 

  • Willan AR, Watts DG (1978) Meaningful multicollinearity measures. Technometrics 20(4):407–412

    Article  MATH  Google Scholar 

  • Wu J, Yang H (2014) More on the unbiased ridge regression estimation. Stat Pap: 1–12. doi:10.1007/s00362-014-0637-z

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to C. B. García.

Appendices

Appendices

1.1 Appendix 1: \(R^{2}_{0}(k)\) is a continuous function at \(k=0\)

\(R_{0}^{2}(k)=\sum \nolimits _{j=1}^{p}corr\left( \mathbf {y}^{R},\mathbf {X} _{j}^{R}\right) ^{2}\) will be a continuous function at \(k=0 \) if \(R_{0}^{2}(0)=R_{0}^{2}=\sum \nolimits _{j=1}^{p}corr\left( \mathbf {y}, \mathbf {X}_{j}\right) ^{2}\) where , \(\mathbf {X}_{j}^{R}\) is the j-column of

and \(corr\left( \mathbf {y}^{R},\mathbf {X}_{j}^{R}\right) \) is the coefficient of correlation between \(\mathbf {y}^{R}\) and \(\mathbf {X}_{j}^{R}\) , that is:

$$\begin{aligned} corr\left( \mathbf {y}^{R},\mathbf {X}_{j}^{R}\right) =\frac{cov\left( \mathbf { y}^{R},\mathbf {X}_{j}^{R}\right) }{\sqrt{var\left( \mathbf {y}^{R}\right) } \sqrt{var\left( \mathbf {X}_{j}^{R}\right) }},\qquad j=1,\ldots ,p. \end{aligned}$$

Taking into account that:

$$\begin{aligned} y_{i}^{R}= & {} \left\{ \begin{array}{c@{\quad }l} y_{i} &{} i=1,\ldots ,n \\ 0 &{} i=n+1,\ldots ,n+p \end{array} \right. ,\\ X_{ij}^{R}= & {} \left\{ \begin{array}{l@{\quad }l} X_{ij} &{} i=1,\ldots ,n \\ 0 &{} \quad i=n+1,\ldots ,n+p,\ i\not =n+j \\ \sqrt{k} &{} i=n+j \end{array} \right. , \end{aligned}$$

we obtain that:

$$\begin{aligned} cov\left( \mathbf {y}^{R},\mathbf {X}_{j}^{R}\right)= & {} \frac{1}{n+p} \sum \limits _{i=1}^{n}y_{i}X_{ij}-\frac{n\overline{\mathbf {y}}(n\overline{ \mathbf {X}}_{j}+\sqrt{k})}{(n+p)^{2}}, \end{aligned}$$
(11)
$$\begin{aligned} var\left( \mathbf {y}^{R}\right)= & {} \frac{1}{n+p}\sum \limits _{i=1}^{n}y_{i}^{2}-\frac{n^{2}\overline{\mathbf {y}}^{2}}{(n+p)^{2}}, \end{aligned}$$
(12)
$$\begin{aligned} var\left( \mathbf {X}_{j}^{R}\right)= & {} \frac{1}{n+p}\sum \limits _{i=1}^{n} \left( X_{ij}^{2}+k\right) -\left( \frac{n\overline{\mathbf {X}}_{j}+\sqrt{k} }{n+p}\right) ^{2}. \end{aligned}$$
(13)

By considering \(k=0\), the previous expressions become:

$$\begin{aligned} cov\left( \mathbf {y}^{R},\mathbf {X}_{j}^{R}\right)= & {} \frac{1}{n+p} \sum \limits _{i=1}^{n}y_{i}X_{ij}-\frac{n^{2}\overline{\mathbf {y}}\overline{ \mathbf {X}}_{j}}{(n+p)^{2}},\\ var\left( \mathbf {y}^{R}\right)= & {} \frac{1}{n+p}\sum \limits _{i=1}^{n}y_{i}^{2}-\frac{n^{2}\overline{\mathbf {y}}^{2}}{(n+p)^{2}},\\ var\left( \mathbf {X}_{j}^{R}\right)= & {} \frac{1}{n+p}\sum \limits _{i=1}^{n}X_{ij}^{2}-\left( \frac{n\overline{\mathbf {X}}_{j}}{n+p} \right) ^{2}, \end{aligned}$$

and it is then evident that \(R_{0}^{2}(0)\not =R_{0}^{2}\) except in the case when \(p=0\) (which is not possible since it means that there are no independent variables in the model).

However, if the data are standardized, that is, \( \overline{\mathbf {y}}=0=\overline{\mathbf {X}}_{j}\) , we have that:

$$\begin{aligned} cov\left( \mathbf {y}^{R},\mathbf {X}_{j}^{R}\right)= & {} \frac{1}{n+p} \sum \limits _{i=1}^{n}y_{i}X_{ij}=\frac{n}{n+p}cov\left( \mathbf {y},\mathbf {X} _{j}\right) , \\ var\left( \mathbf {y}^{R}\right)= & {} \frac{1}{n+p}\sum \limits _{i=1}^{n}y_{i}^{2}=\frac{n}{n+p}var\left( \mathbf {y}\right) ,\\ var\left( \mathbf {X}_{j}^{R}\right)= & {} \frac{1}{n+p}\sum \limits _{i=1}^{n}X_{ij}^{2}=\frac{n}{n+p}var\left( \mathbf {X}_{j}^{R}\right) , \end{aligned}$$

and, in that case, it is clear that

$$\begin{aligned} corr\left( \mathbf {y}^{R},\mathbf {X}_{j}^{R}\right) =corr\left( \mathbf {y}, \mathbf {X}_{j}\right) ,\qquad j=1,\dots ,p, \end{aligned}$$

and, as consequence \(R_{0}^{2}(0)=R_{0}^{2}\).

1.2 Appendix 2: \(R^{2}(k)\) is a continuous function at \(k=0\)

\(R^{2}(k)\) will be a continuous function at \(k=0\) if the coefficient of determination of the model (7) for \(k=0\) is equal to the one obtained from model (1), that is, if expressions:

$$\begin{aligned} R^{2}(k)=\frac{\widehat{\varvec{\beta }}(k)^{t}\left( \mathbf {X} ^{R}\right) ^{t}\mathbf {y}^{R}-(n+p)\left( \overline{\mathbf {y}}^{R}\right) ^{2}}{\left( \mathbf {y}^{R}\right) ^{t}\mathbf {y}^{R}-(n+p)\left( \overline{ \mathbf {y}}^{R}\right) ^{2}},\quad R^{2}=\frac{\widehat{\varvec{\beta }} \mathbf {X}^{t}\mathbf {y}-n\overline{\mathbf {y}}^{2}}{\mathbf {y}^{t}\mathbf {y} -n\overline{\mathbf {y}}^{2}}, \end{aligned}$$

coincide for \(k=0\).

From the original data we obtain that:

$$\begin{aligned} \left( \mathbf {y}^{R}\right) ^{t}\mathbf {y}^{R}=\mathbf {y}^{t}\mathbf {y} ,\quad \left( \mathbf {X}^{R}\right) ^{t}\mathbf {y}^{R}=\mathbf {X}^{t}\mathbf { y},\quad \overline{\mathbf {y}}^{R}=\frac{n}{n+p}\overline{\mathbf {y}}, \end{aligned}$$

and since \(\widehat{\varvec{\beta }}(0)=\widehat{\varvec{\beta }}\) it is verified that:

$$\begin{aligned} R^{2}(0)=\frac{\widehat{\varvec{\beta }}\mathbf {X}^{t}\mathbf {y}-\frac{ n^{2}}{n+p}\overline{\mathbf {y}}^{2}}{\mathbf {y}^{t}\mathbf {y}-\frac{n^{2}}{ n+p}\overline{\mathbf {y}}^{2}}. \end{aligned}$$

It is then clear that \(R_{0}^{2}\not =R^{2}\) except in the case when \(p=0\) (which it is not possible since it means that there are no independent variables in the model).

However, if the data are standardized (\(\overline{\mathbf {y}}=0= \overline{\mathbf {X}}_{j}\)), we obtain that:

$$\begin{aligned} R^{2}(0)=\widehat{\varvec{\beta }}\mathbf {X}^{t}\mathbf {y}=R^{2}. \end{aligned}$$

1.3 Appendix 3: \(R_{0}^{2}(k)\) is a decreasing function in k

Since the continuity is verified with standardized data, we will work directly from standardized data. By considering Eqs. (11) and (13) for \(\overline{\mathbf {y}}=0=\overline{\mathbf {X}} _{j}\) we obtain that:

$$\begin{aligned} cov\left( \mathbf {y}^{R},\mathbf {X}_{j}^{R}\right)= & {} \frac{1}{n+p} \sum \limits _{i=1}^{n}y_{i}X_{ij},\\ var\left( \mathbf {y}^{R}\right)= & {} \frac{1}{n+p}\sum \limits _{i=1}^{n}y_{i}^{2},\\ var\left( \mathbf {X}_{j}^{R}\right)= & {} \frac{1}{n+p}\sum \limits _{i=1}^{n} \left( X_{ij}^{2}+k\right) -\frac{k}{(n+p)^{2}}\\= & {} \frac{1}{n+p}\sum \limits _{i=1}^{n}X_{ij}^{2}+\frac{n+p-1}{(n+p)^{2}}k. \end{aligned}$$

Since it is verified that

$$\begin{aligned} corr\left( \mathbf {y}^{R},\mathbf {X}_{j}^{R}\right) ^{2}=\frac{cov\left( \mathbf {y}^{R},\mathbf {X}_{j}^{R}\right) ^{2}}{var\left( \mathbf {y} ^{R}\right) var\left( \mathbf {X}_{j}^{R}\right) }, \end{aligned}$$

it is evident that

$$\begin{aligned} \frac{\partial }{\partial k}corr\left( \mathbf {y}^{R},\mathbf {X} _{j}^{R}\right) ^{2}= & {} \frac{cov\left( \mathbf {y}^{R},\mathbf {X} _{j}^{R}\right) ^{2}}{var\left( \mathbf {y}^{R}\right) }\left[ \frac{\partial }{\partial k}\frac{1}{var\left( \mathbf {X}_{j}^{R}\right) }\right] \\= & {} \frac{cov\left( \mathbf {y}^{R},\mathbf {X}_{j}^{R}\right) ^{2}}{var\left( \mathbf {y}^{R}\right) }\left[ -\frac{\frac{n+p-1}{(n+p)^{2}}}{var\left( \mathbf {X}_{j}^{R}\right) ^{2}}\right] , \end{aligned}$$

which clearly has a negative sign. Hence we conclude that \(R_{0}^{2}(k)\) is a decreasing function in k.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Salmerón, R., García, J., García, C.B. et al. A note about the corrected VIF. Stat Papers 58, 929–945 (2017). https://doi.org/10.1007/s00362-015-0732-9

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00362-015-0732-9

Keywords

Navigation