A note about the corrected VIF

Salmerón, R.; García, J.; García, C. B.; Martín, M. M. López

doi:10.1007/s00362-015-0732-9

A note about the corrected VIF

Regular Article
Published: 21 December 2015

Volume 58, pages 929–945, (2017)
Cite this article

Statistical Papers Aims and scope Submit manuscript

R. Salmerón¹,
J. García²,
C. B. García¹ &
…
M. M. López Martín³

675 Accesses
6 Citations
Explore all metrics

Abstract

This paper discusses some limitations when applying the CVIF of Curto and Pinto in J Appl Stat 38(7):1499–1507 (2011) and proposes some modifications to overcome them. The concept of modified CVIF is also extended to be applied in ridge estimation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SHOX2-Test

Kendall, Edward Calvin

A Detailed Proof of a Theorem of Aubin

Article 25 November 2014

Notes

Due to this property, when $R_{0}^{2}(0)$ is higher than 1, a value $k_{h}$ may exist from which it is verified that $R_{0}^{2}(k)<1$ for $k>k_{h}$.
For the rest of the values it is less than 1, that is, $k_{h}=0.5$ in this case. From this value, the desirable properties (monotony and values higher than one) are recovered. In addition, it is possible to establish the prevalence $MCVIF _{R}(j,k)\ge VIF _{R}(j,k)$ for the considered values of k and j.

References

Belsley DA, Kuh E, Welsch RE (1980) Regression diagnostics: identifying influential data and sources of collinearity. John Wiley & Sons, New York
Book MATH Google Scholar
Chang X, Yang H (2012) Combining two-parameter and principal component regression estimators. Stat Pap 53(3):549–562
Article MathSciNet MATH Google Scholar
Cuadras C (1993) Interpreting an inequality un multiple regression. Am Stat 47:256–258
Google Scholar
Curto JD, Pinto JC (2007) New multicollinearity indicators in linear regression models. Int Stat Rev 75(1):114–121
Article Google Scholar
Curto JD, Pinto JC (2011) The corrected vif (cvif). J Appl Stat 38(7):1499–1507
Article MathSciNet MATH Google Scholar
Farrar DE, Glauber RR (1967) Multicollinearity in regression analysis: the problem revisited. Rev Econ Stat 49:92–107
Article Google Scholar
Feng-Jeng L (2008) Solving multicollinearity in the process of fitting regression model using the nested estimate procedure. Qual Quant 42:417–426
Article Google Scholar
Flury B (1989) Understanding partial statistics and redundancy of variables in regression and discriminant analysis. Am Stat 43:27–31
Google Scholar
Fox J, Monette G (1992) Generalized collinearity diagnostics. J Am Stat Assoc 87:178–183
Article Google Scholar
García C, García J, López MDM, Salmerón R (2015a) Collinearity: revisiting the variance inflation factor in ridge regression. J Appl Stat 42(3):648–661
Article MathSciNet Google Scholar
García CB, García J, Soto J (2010) The raise method: an alternative procedure to estimate the parameters in presence of collinearity. Qual Quant 45(2):403–423
Article Google Scholar
García J, Salmerón R, García C, López MDM (2015b) Standardization of variables and diagnostic of collinearity in the ridge regression. Int Stat Rev. doi:10.1111/insr.12099
Gunst RL, Mason RL (1977) Advantages of examining multicollinearities in regression analysis. Biometrics 33:249–260
Article MATH Google Scholar
Hadi AS (2011) Ridge and surrogate ridge regressions. Springer, Heidelberg
Book Google Scholar
Hamilton D (1987) Sometimes $r^2{\>}r^2_{yx_1}+r^2_{yx_2}$ correlated variables are not always redundant. Am Stat 41(2):129–132
MathSciNet Google Scholar
Hoerl AE, Kennard RW (1970) Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12:55–67
Article MATH Google Scholar
Jensen DR, Ramirez DE (2008) Anomalies in the foundations of ridge regression. Int Stat Rev 76(1):89–105
Article MATH Google Scholar
Kovacs P, Petres T, Toth L (2005) A new measure of multicollinearity in linear regression models. Int Stat Rev 73(3):405–412
Article MATH Google Scholar
Kurnaz F, Akay K (2015) A new liu-type estimator. Stat Pap 56(2):495–517. doi:10.1007/s00362-014-0594-6
Article MathSciNet MATH Google Scholar
Lazaridis A (2007) A note regarding the condition number: the case of spurious and latent multicollinearity. Qual Quant 41:123–135
Article Google Scholar
Liu K (2003) Using liu-type estimator to combat collinearity. Commun Stati Theory Methods 32(5):1009–1020
Article MathSciNet MATH Google Scholar
Liu XQ, Gao F, Yu ZF (2013) Improved ridge estimators in a linear regression model. J Appl Stat 40(1):209–220
Article MathSciNet Google Scholar
Marquardt DW (1970) Generalized inverses, ridge regression, biased linear estimation and nonlinear estimation. Technometrics 12(3):591–612
Article MATH Google Scholar
Marquardt DW, Snee SR (1975) Ridge regression in practice. Am Stat 29(1):3–20
MATH Google Scholar
McDonald GC (2009) Ridge regression. Wiley Interdiscip Rev 1:93–100
Article Google Scholar
McDonald GC (2010) Tracing ridge regression coefficients. Wiley Interdiscip Rev 2:695–703
Article Google Scholar
O’Brien RM (2007) A caution regarding rules of thumb for variance inflationfactors. Qual Quant 41:673–690
Article Google Scholar
Sakallıoğlu S, Kaçıranlar S (2008) A new biased estimator based on ridge estimation. Stat Pap 49(4):669–689
Article MathSciNet MATH Google Scholar
Silvey S (1969) Multicollinearity and imprecise estimation. J R Stat Soc Ser B (Methodol) 31:539–552
MathSciNet MATH Google Scholar
Spanos A, McGuirk A (2002) The problem of near-multicollinearity revisited: erractic versus systematic volatility. J Econ 108:365–393
Article MATH Google Scholar
Stein C et al (1956) Inadmissibility of the usual estimator for the mean of a multivariate normal distribution. Proc Third Berkeley Symp Math Stat Probab 1:197–206
MathSciNet MATH Google Scholar
Theil H (1971) Principles of econometrics. Wiley, New York
MATH Google Scholar
Willan AR, Watts DG (1978) Meaningful multicollinearity measures. Technometrics 20(4):407–412
Article MATH Google Scholar
Wu J, Yang H (2014) More on the unbiased ridge regression estimation. Stat Pap: 1–12. doi:10.1007/s00362-014-0637-z

Download references

Author information

Authors and Affiliations

Department of Quantitative Methods for Economics and Business, Granada University, 18071, Granada, Spain
R. Salmerón & C. B. García
Department of Economic and Business, Almería University, 04120, Almería, Spain
J. García
Department of Mathematics Didactic, Granada University, 18071, Granada, Spain
M. M. López Martín

Authors

R. Salmerón
View author publications
You can also search for this author in PubMed Google Scholar
J. García
View author publications
You can also search for this author in PubMed Google Scholar
C. B. García
View author publications
You can also search for this author in PubMed Google Scholar
M. M. López Martín
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to C. B. García.

Appendices

1.1 Appendix 1: $R^{2}_{0}(k)$ is a continuous function at $k=0$

$R_{0}^{2}(k)=\sum \nolimits _{j=1}^{p}corr\left( \mathbf {y}^{R},\mathbf {X} _{j}^{R}\right) ^{2}$ will be a continuous function at $k=0 $ if $R_{0}^{2}(0)=R_{0}^{2}=\sum \nolimits _{j=1}^{p}corr\left( \mathbf {y}, \mathbf {X}_{j}\right) ^{2}$ where , $\mathbf {X}_{j}^{R}$ is the j-column of

and $corr\left( \mathbf {y}^{R},\mathbf {X}_{j}^{R}\right) $ is the coefficient of correlation between $\mathbf {y}^{R}$ and $\mathbf {X}_{j}^{R}$ , that is:

$$\begin{aligned} corr\left( \mathbf {y}^{R},\mathbf {X}_{j}^{R}\right) =\frac{cov\left( \mathbf { y}^{R},\mathbf {X}_{j}^{R}\right) }{\sqrt{var\left( \mathbf {y}^{R}\right) } \sqrt{var\left( \mathbf {X}_{j}^{R}\right) }},\qquad j=1,\ldots ,p. \end{aligned}$$

Taking into account that:

$$\begin{aligned} y_{i}^{R}= & {} \left\{ \begin{array}{c@{\quad }l} y_{i} &{} i=1,\ldots ,n \\ 0 &{} i=n+1,\ldots ,n+p \end{array} \right. ,\\ X_{ij}^{R}= & {} \left\{ \begin{array}{l@{\quad }l} X_{ij} &{} i=1,\ldots ,n \\ 0 &{} \quad i=n+1,\ldots ,n+p,\ i\not =n+j \\ \sqrt{k} &{} i=n+j \end{array} \right. , \end{aligned}$$

we obtain that:

$$\begin{aligned} cov\left( \mathbf {y}^{R},\mathbf {X}_{j}^{R}\right)= & {} \frac{1}{n+p} \sum \limits _{i=1}^{n}y_{i}X_{ij}-\frac{n\overline{\mathbf {y}}(n\overline{ \mathbf {X}}_{j}+\sqrt{k})}{(n+p)^{2}}, \end{aligned}$$

(11)

$$\begin{aligned} var\left( \mathbf {y}^{R}\right)= & {} \frac{1}{n+p}\sum \limits _{i=1}^{n}y_{i}^{2}-\frac{n^{2}\overline{\mathbf {y}}^{2}}{(n+p)^{2}}, \end{aligned}$$

(12)

$$\begin{aligned} var\left( \mathbf {X}_{j}^{R}\right)= & {} \frac{1}{n+p}\sum \limits _{i=1}^{n} \left( X_{ij}^{2}+k\right) -\left( \frac{n\overline{\mathbf {X}}_{j}+\sqrt{k} }{n+p}\right) ^{2}. \end{aligned}$$

(13)

By considering $k=0$, the previous expressions become:

$$\begin{aligned} cov\left( \mathbf {y}^{R},\mathbf {X}_{j}^{R}\right)= & {} \frac{1}{n+p} \sum \limits _{i=1}^{n}y_{i}X_{ij}-\frac{n^{2}\overline{\mathbf {y}}\overline{ \mathbf {X}}_{j}}{(n+p)^{2}},\\ var\left( \mathbf {y}^{R}\right)= & {} \frac{1}{n+p}\sum \limits _{i=1}^{n}y_{i}^{2}-\frac{n^{2}\overline{\mathbf {y}}^{2}}{(n+p)^{2}},\\ var\left( \mathbf {X}_{j}^{R}\right)= & {} \frac{1}{n+p}\sum \limits _{i=1}^{n}X_{ij}^{2}-\left( \frac{n\overline{\mathbf {X}}_{j}}{n+p} \right) ^{2}, \end{aligned}$$

and it is then evident that $R_{0}^{2}(0)\not =R_{0}^{2}$ except in the case when $p=0$ (which is not possible since it means that there are no independent variables in the model).

However, if the data are standardized, that is, $ \overline{\mathbf {y}}=0=\overline{\mathbf {X}}_{j}$ , we have that:

$$\begin{aligned} cov\left( \mathbf {y}^{R},\mathbf {X}_{j}^{R}\right)= & {} \frac{1}{n+p} \sum \limits _{i=1}^{n}y_{i}X_{ij}=\frac{n}{n+p}cov\left( \mathbf {y},\mathbf {X} _{j}\right) , \\ var\left( \mathbf {y}^{R}\right)= & {} \frac{1}{n+p}\sum \limits _{i=1}^{n}y_{i}^{2}=\frac{n}{n+p}var\left( \mathbf {y}\right) ,\\ var\left( \mathbf {X}_{j}^{R}\right)= & {} \frac{1}{n+p}\sum \limits _{i=1}^{n}X_{ij}^{2}=\frac{n}{n+p}var\left( \mathbf {X}_{j}^{R}\right) , \end{aligned}$$

and, in that case, it is clear that

$$\begin{aligned} corr\left( \mathbf {y}^{R},\mathbf {X}_{j}^{R}\right) =corr\left( \mathbf {y}, \mathbf {X}_{j}\right) ,\qquad j=1,\dots ,p, \end{aligned}$$

and, as consequence $R_{0}^{2}(0)=R_{0}^{2}$.

1.2 Appendix 2: $R^{2}(k)$ is a continuous function at $k=0$

$R^{2}(k)$ will be a continuous function at $k=0$ if the coefficient of determination of the model (7) for $k=0$ is equal to the one obtained from model (1), that is, if expressions:

$$\begin{aligned} R^{2}(k)=\frac{\widehat{\varvec{\beta }}(k)^{t}\left( \mathbf {X} ^{R}\right) ^{t}\mathbf {y}^{R}-(n+p)\left( \overline{\mathbf {y}}^{R}\right) ^{2}}{\left( \mathbf {y}^{R}\right) ^{t}\mathbf {y}^{R}-(n+p)\left( \overline{ \mathbf {y}}^{R}\right) ^{2}},\quad R^{2}=\frac{\widehat{\varvec{\beta }} \mathbf {X}^{t}\mathbf {y}-n\overline{\mathbf {y}}^{2}}{\mathbf {y}^{t}\mathbf {y} -n\overline{\mathbf {y}}^{2}}, \end{aligned}$$

coincide for $k=0$.

From the original data we obtain that:

$$\begin{aligned} \left( \mathbf {y}^{R}\right) ^{t}\mathbf {y}^{R}=\mathbf {y}^{t}\mathbf {y} ,\quad \left( \mathbf {X}^{R}\right) ^{t}\mathbf {y}^{R}=\mathbf {X}^{t}\mathbf { y},\quad \overline{\mathbf {y}}^{R}=\frac{n}{n+p}\overline{\mathbf {y}}, \end{aligned}$$

and since $\widehat{\varvec{\beta }}(0)=\widehat{\varvec{\beta }}$ it is verified that:

$$\begin{aligned} R^{2}(0)=\frac{\widehat{\varvec{\beta }}\mathbf {X}^{t}\mathbf {y}-\frac{ n^{2}}{n+p}\overline{\mathbf {y}}^{2}}{\mathbf {y}^{t}\mathbf {y}-\frac{n^{2}}{ n+p}\overline{\mathbf {y}}^{2}}. \end{aligned}$$

It is then clear that $R_{0}^{2}\not =R^{2}$ except in the case when $p=0$ (which it is not possible since it means that there are no independent variables in the model).

However, if the data are standardized ($\overline{\mathbf {y}}=0= \overline{\mathbf {X}}_{j}$), we obtain that:

$$\begin{aligned} R^{2}(0)=\widehat{\varvec{\beta }}\mathbf {X}^{t}\mathbf {y}=R^{2}. \end{aligned}$$

1.3 Appendix 3: $R_{0}^{2}(k)$ is a decreasing function in k

Since the continuity is verified with standardized data, we will work directly from standardized data. By considering Eqs. (11) and (13) for $\overline{\mathbf {y}}=0=\overline{\mathbf {X}} _{j}$ we obtain that:

$$\begin{aligned} cov\left( \mathbf {y}^{R},\mathbf {X}_{j}^{R}\right)= & {} \frac{1}{n+p} \sum \limits _{i=1}^{n}y_{i}X_{ij},\\ var\left( \mathbf {y}^{R}\right)= & {} \frac{1}{n+p}\sum \limits _{i=1}^{n}y_{i}^{2},\\ var\left( \mathbf {X}_{j}^{R}\right)= & {} \frac{1}{n+p}\sum \limits _{i=1}^{n} \left( X_{ij}^{2}+k\right) -\frac{k}{(n+p)^{2}}\\= & {} \frac{1}{n+p}\sum \limits _{i=1}^{n}X_{ij}^{2}+\frac{n+p-1}{(n+p)^{2}}k. \end{aligned}$$

Since it is verified that

$$\begin{aligned} corr\left( \mathbf {y}^{R},\mathbf {X}_{j}^{R}\right) ^{2}=\frac{cov\left( \mathbf {y}^{R},\mathbf {X}_{j}^{R}\right) ^{2}}{var\left( \mathbf {y} ^{R}\right) var\left( \mathbf {X}_{j}^{R}\right) }, \end{aligned}$$

it is evident that

$$\begin{aligned} \frac{\partial }{\partial k}corr\left( \mathbf {y}^{R},\mathbf {X} _{j}^{R}\right) ^{2}= & {} \frac{cov\left( \mathbf {y}^{R},\mathbf {X} _{j}^{R}\right) ^{2}}{var\left( \mathbf {y}^{R}\right) }\left[ \frac{\partial }{\partial k}\frac{1}{var\left( \mathbf {X}_{j}^{R}\right) }\right] \\= & {} \frac{cov\left( \mathbf {y}^{R},\mathbf {X}_{j}^{R}\right) ^{2}}{var\left( \mathbf {y}^{R}\right) }\left[ -\frac{\frac{n+p-1}{(n+p)^{2}}}{var\left( \mathbf {X}_{j}^{R}\right) ^{2}}\right] , \end{aligned}$$

which clearly has a negative sign. Hence we conclude that $R_{0}^{2}(k)$ is a decreasing function in k.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Salmerón, R., García, J., García, C.B. et al. A note about the corrected VIF. Stat Papers 58, 929–945 (2017). https://doi.org/10.1007/s00362-015-0732-9

Download citation

Received: 03 May 2015
Revised: 07 September 2015
Published: 21 December 2015
Issue Date: September 2017
DOI: https://doi.org/10.1007/s00362-015-0732-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A note about the corrected VIF

Abstract

Access this article