Skip to main content
Log in

A Guide to Using the R Package “multiColl” for Detecting Multicollinearity

  • Published:
Computational Economics Aims and scope Submit manuscript

Abstract

The detection of problematic collinearity in a linear regression model is treated in all the existing statistical software packages. However, such detection is not always done adequately. The main shortcomings relate to treatment of independent qualitative variables and completely ignoring the role of the intercept in the model (consequently, ignoring the nonessential collinearity). This paper presents the R package multiColl, which implements the usually applied measures for detecting near collinearity while overcoming the weaknesses observed in other existing packages.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Notes

  1. This value differs from the threshold equal to 0.7 provided by Halkos and Tsilika (2018) to indicate a problem of near collinearity.

  2. García et al. (2018) show that values of the determinant of the correlation matrix lower than \(0.1013 + 0.00008626 \cdot n - 0.01384 \cdot k\) indicate the presence of problematic near essential multicollinearity. Once again, this value differs from the threshold provided by Field (2019), who claims that when the value of the determinant of the correlation matrix is less than 0.00001 there is severe multicollinearity. In the example presented by Halkos and Tsilika (2018), the conclusion is that collinearity is not detected since the value of the determinant of the matrix of correlations (0.00663839) is higher than the threshold (0.00001). However, taking into account the paper presented by García et al. (2018), the threshold will be 0.01964016 and, consequently, severe near collinearity is detected.

References

  • Farrar, D. E., & Glauber, R. R. (1967). Multicollinearity in regression analysis: The problem revisited. The Review of Economic and Statistics, pp. 92–107.

  • Field, A. (2019). Discovering statistics using SPSS for Windows (3rd ed.). Los Angeles: Sage Publications.

    Google Scholar 

  • García, C., Salmerón, R., García, C., & García, J. (2019). Residualization: justification, properties and application. Journal of Applied Statistics (in review)

  • García, C., Salmerín, R., & Garcóa, C. (2018). A choice of the ridge factor from the correlation matrix determinant. Journal of Statistical Computation and Simulation, 2(89), 211–231. https://doi.org/10.1080/00949655.2018.1543423.

    Article  Google Scholar 

  • Gunst, R. F., & Mason, R. L. (1977). Advantages of examining multicollinearities in regression analysis. Biometrics, pp. 249–260.

  • Halkos, G., & Tsilika, K. (2018). Programming correlation criteria with free cas software. Computational Economics, 52(1), 299–311.

    Article  Google Scholar 

  • Hoerl, A. E., & Kennard, R. W. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12(1), 55–67.

    Article  Google Scholar 

  • Longley, J. (1967). An appraisal of least-squares programs from the point of view of the user. Journal of the American Statistical Association, 62, 819–841.

    Article  Google Scholar 

  • Marquardt, D., & Snee, R. (1975). Ridge regression in practice. The American Statistician, 1(29), 3–20. https://doi.org/10.1080/00031305.1975.10479105.

    Article  Google Scholar 

  • Marquardt, D. W. (1970). Generalized inverses, ridge regression, biased linear estimation and nonlinear estimation. Technometrics, 12(3), 591–612.

    Article  Google Scholar 

  • Marquardt, D. W., & Snee, S. R. (1975). Ridge regression in practice. The American Statistician, 29(1), 3–20.

    Google Scholar 

  • Salmerón, R., García, C., & García, J. (2019). “multicoll”: An r package to detect multicollinearity. arXiv preprint arXiv:1910.14590

  • Salmerón, R., García, C., García, J., & López, M. (2017). The raise estimators estimation, inference and properties. Communications in Statistics-Theory and Methods, 46(13), 6446–6462.

    Article  Google Scholar 

  • Salmerón, R., Rodríguez, A., & García, C. (2019). Diagnosis and quantification of the non-essential collinearity. Computational Statistics. https://doi.org/10.1007/s00180-019-00922-x.

  • Simon, D., & Lesage, J. (1988). The impact of collinearity involving the intercept term on the numerical acauracy of regression. Computer Science in Economics and Management, 1, 137–152.

    Article  Google Scholar 

  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1), 267–288.

    Google Scholar 

  • Willan, A. R., & Watts, D. G. (1978). Meaningful multicollinearity measures. Technometrics, 20(4), 407–412.

    Article  Google Scholar 

  • York, R. (2012). Residualization is not the answer: Rethinking how to address multicollinearity. Social Science Research, 41(6), 1379–1386.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Catalina García-García.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Salmerón-Gómez, R., García-García, C. & García-Pérez, J. A Guide to Using the R Package “multiColl” for Detecting Multicollinearity. Comput Econ 57, 529–536 (2021). https://doi.org/10.1007/s10614-019-09967-y

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10614-019-09967-y

Keywords

Navigation