Abstract
This paper discusses some limitations when applying the CVIF of Curto and Pinto in J Appl Stat 38(7):1499–1507 (2011) and proposes some modifications to overcome them. The concept of modified CVIF is also extended to be applied in ridge estimation.
Similar content being viewed by others
Notes
Due to this property, when \(R_{0}^{2}(0)\) is higher than 1, a value \(k_{h}\) may exist from which it is verified that \(R_{0}^{2}(k)<1\) for \(k>k_{h}\).
For the rest of the values it is less than 1, that is, \(k_{h}=0.5\) in this case. From this value, the desirable properties (monotony and values higher than one) are recovered. In addition, it is possible to establish the prevalence \(MCVIF _{R}(j,k)\ge VIF _{R}(j,k)\) for the considered values of k and j.
References
Belsley DA, Kuh E, Welsch RE (1980) Regression diagnostics: identifying influential data and sources of collinearity. John Wiley & Sons, New York
Chang X, Yang H (2012) Combining two-parameter and principal component regression estimators. Stat Pap 53(3):549–562
Cuadras C (1993) Interpreting an inequality un multiple regression. Am Stat 47:256–258
Curto JD, Pinto JC (2007) New multicollinearity indicators in linear regression models. Int Stat Rev 75(1):114–121
Curto JD, Pinto JC (2011) The corrected vif (cvif). J Appl Stat 38(7):1499–1507
Farrar DE, Glauber RR (1967) Multicollinearity in regression analysis: the problem revisited. Rev Econ Stat 49:92–107
Feng-Jeng L (2008) Solving multicollinearity in the process of fitting regression model using the nested estimate procedure. Qual Quant 42:417–426
Flury B (1989) Understanding partial statistics and redundancy of variables in regression and discriminant analysis. Am Stat 43:27–31
Fox J, Monette G (1992) Generalized collinearity diagnostics. J Am Stat Assoc 87:178–183
García C, García J, López MDM, Salmerón R (2015a) Collinearity: revisiting the variance inflation factor in ridge regression. J Appl Stat 42(3):648–661
García CB, García J, Soto J (2010) The raise method: an alternative procedure to estimate the parameters in presence of collinearity. Qual Quant 45(2):403–423
García J, Salmerón R, García C, López MDM (2015b) Standardization of variables and diagnostic of collinearity in the ridge regression. Int Stat Rev. doi:10.1111/insr.12099
Gunst RL, Mason RL (1977) Advantages of examining multicollinearities in regression analysis. Biometrics 33:249–260
Hadi AS (2011) Ridge and surrogate ridge regressions. Springer, Heidelberg
Hamilton D (1987) Sometimes \(r^2{\>}r^2_{yx_1}+r^2_{yx_2}\) correlated variables are not always redundant. Am Stat 41(2):129–132
Hoerl AE, Kennard RW (1970) Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12:55–67
Jensen DR, Ramirez DE (2008) Anomalies in the foundations of ridge regression. Int Stat Rev 76(1):89–105
Kovacs P, Petres T, Toth L (2005) A new measure of multicollinearity in linear regression models. Int Stat Rev 73(3):405–412
Kurnaz F, Akay K (2015) A new liu-type estimator. Stat Pap 56(2):495–517. doi:10.1007/s00362-014-0594-6
Lazaridis A (2007) A note regarding the condition number: the case of spurious and latent multicollinearity. Qual Quant 41:123–135
Liu K (2003) Using liu-type estimator to combat collinearity. Commun Stati Theory Methods 32(5):1009–1020
Liu XQ, Gao F, Yu ZF (2013) Improved ridge estimators in a linear regression model. J Appl Stat 40(1):209–220
Marquardt DW (1970) Generalized inverses, ridge regression, biased linear estimation and nonlinear estimation. Technometrics 12(3):591–612
Marquardt DW, Snee SR (1975) Ridge regression in practice. Am Stat 29(1):3–20
McDonald GC (2009) Ridge regression. Wiley Interdiscip Rev 1:93–100
McDonald GC (2010) Tracing ridge regression coefficients. Wiley Interdiscip Rev 2:695–703
O’Brien RM (2007) A caution regarding rules of thumb for variance inflationfactors. Qual Quant 41:673–690
Sakallıoğlu S, Kaçıranlar S (2008) A new biased estimator based on ridge estimation. Stat Pap 49(4):669–689
Silvey S (1969) Multicollinearity and imprecise estimation. J R Stat Soc Ser B (Methodol) 31:539–552
Spanos A, McGuirk A (2002) The problem of near-multicollinearity revisited: erractic versus systematic volatility. J Econ 108:365–393
Stein C et al (1956) Inadmissibility of the usual estimator for the mean of a multivariate normal distribution. Proc Third Berkeley Symp Math Stat Probab 1:197–206
Theil H (1971) Principles of econometrics. Wiley, New York
Willan AR, Watts DG (1978) Meaningful multicollinearity measures. Technometrics 20(4):407–412
Wu J, Yang H (2014) More on the unbiased ridge regression estimation. Stat Pap: 1–12. doi:10.1007/s00362-014-0637-z
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendices
1.1 Appendix 1: \(R^{2}_{0}(k)\) is a continuous function at \(k=0\)
\(R_{0}^{2}(k)=\sum \nolimits _{j=1}^{p}corr\left( \mathbf {y}^{R},\mathbf {X} _{j}^{R}\right) ^{2}\) will be a continuous function at \(k=0 \) if \(R_{0}^{2}(0)=R_{0}^{2}=\sum \nolimits _{j=1}^{p}corr\left( \mathbf {y}, \mathbf {X}_{j}\right) ^{2}\) where , \(\mathbf {X}_{j}^{R}\) is the j-column of
and \(corr\left( \mathbf {y}^{R},\mathbf {X}_{j}^{R}\right) \) is the coefficient of correlation between \(\mathbf {y}^{R}\) and \(\mathbf {X}_{j}^{R}\) , that is:
Taking into account that:
we obtain that:
By considering \(k=0\), the previous expressions become:
and it is then evident that \(R_{0}^{2}(0)\not =R_{0}^{2}\) except in the case when \(p=0\) (which is not possible since it means that there are no independent variables in the model).
However, if the data are standardized, that is, \( \overline{\mathbf {y}}=0=\overline{\mathbf {X}}_{j}\) , we have that:
and, in that case, it is clear that
and, as consequence \(R_{0}^{2}(0)=R_{0}^{2}\).
1.2 Appendix 2: \(R^{2}(k)\) is a continuous function at \(k=0\)
\(R^{2}(k)\) will be a continuous function at \(k=0\) if the coefficient of determination of the model (7) for \(k=0\) is equal to the one obtained from model (1), that is, if expressions:
coincide for \(k=0\).
From the original data we obtain that:
and since \(\widehat{\varvec{\beta }}(0)=\widehat{\varvec{\beta }}\) it is verified that:
It is then clear that \(R_{0}^{2}\not =R^{2}\) except in the case when \(p=0\) (which it is not possible since it means that there are no independent variables in the model).
However, if the data are standardized (\(\overline{\mathbf {y}}=0= \overline{\mathbf {X}}_{j}\)), we obtain that:
1.3 Appendix 3: \(R_{0}^{2}(k)\) is a decreasing function in k
Since the continuity is verified with standardized data, we will work directly from standardized data. By considering Eqs. (11) and (13) for \(\overline{\mathbf {y}}=0=\overline{\mathbf {X}} _{j}\) we obtain that:
Since it is verified that
it is evident that
which clearly has a negative sign. Hence we conclude that \(R_{0}^{2}(k)\) is a decreasing function in k.
Rights and permissions
About this article
Cite this article
Salmerón, R., García, J., García, C.B. et al. A note about the corrected VIF. Stat Papers 58, 929–945 (2017). https://doi.org/10.1007/s00362-015-0732-9
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00362-015-0732-9