Mixed integer quadratic optimization formulations for eliminating multicollinearity based on variance inflation factor

Tamura, Ryuta; Kobayashi, Ken; Takano, Yuichi; Miyashiro, Ryuhei; Nakata, Kazuhide; Matsui, Tomomi

doi:10.1007/s10898-018-0713-3

Mixed integer quadratic optimization formulations for eliminating multicollinearity based on variance inflation factor

Published: 22 October 2018

Volume 73, pages 431–446, (2019)
Cite this article

Journal of Global Optimization Aims and scope Submit manuscript

Ryuta Tamura¹^nAff2,
Ken Kobayashi³,
Yuichi Takano⁴^nAff5,
Ryuhei Miyashiro⁶,
Kazuhide Nakata⁷ &
…
Tomomi Matsui⁷

2356 Accesses
50 Citations
3 Altmetric
Explore all metrics

Abstract

Multicollinearity exists when some explanatory variables of a multiple linear regression model are highly correlated. High correlation among explanatory variables reduces the reliability of the analysis. To eliminate multicollinearity from a linear regression model, we consider how to select a subset of significant variables by means of the variance inflation factor (VIF), which is the most common indicator used in detecting multicollinearity. In particular, we adopt the mixed integer optimization (MIO) approach to subset selection. The MIO approach was proposed in the 1970s, and recently it has received renewed attention due to advances in algorithms and hardware. However, none of the existing studies have developed a computationally tractable MIO formulation for eliminating multicollinearity on the basis of VIF. In this paper, we propose mixed integer quadratic optimization (MIQO) formulations for selecting the best subset of explanatory variables subject to the upper bounds on the VIFs of selected variables. Our two MIQO formulations are based on the two equivalent definitions of VIF. Computational results illustrate the effectiveness of our MIQO formulations by comparison with conventional local search algorithms and MIO-based cutting plane algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

Learning to optimize: A tutorial for continuous and mixed-integer optimization

Article 08 May 2024

Multi-objective generalized normal distribution optimization: a novel algorithm for multi-objective problems

Article Open access 08 May 2024

References

Arthanari, T.S., Dodge, Y.: Mathematical Programming in Statistics. Wiley, New York (1981)
MATH Google Scholar
Beale, E.M.L.: Two transportation problems. In: Kreweras, G., Morlat, G. (eds.) Proceedings of the Third International Conference on Operational Research, pp. 780–788 (1963)
Beale, E.M.L., Tomlin, J.A.: Special facilities in a general mathematical programming system for non-convex problems using ordered sets of variables. In: Lawrence, J. (ed.) Proceedings of the Fifth International Conference on Operational Research, pp. 447–454 (1970)
Belsley, D.A., Kuh, E., Welsch, R.E.: Regression Diagnostics: Identifying Influential Data and Sources of Collinearity. Wiley, Hoboken (2005)
MATH Google Scholar
Benati, S., García, S.: A mixed integer linear model for clustering with variable selection. Comput. Oper. Res. 43, 280–285 (2014)
Article MathSciNet MATH Google Scholar
Bertsimas, D., Dunn, J.: Optimal classification trees. Mach. Learn. 136, 1039–1082 (2017)
Article MathSciNet MATH Google Scholar
Bertsimas, D., King, A.: OR forum: an algorithmic approach to linear regression. Oper. Res. 64, 2–16 (2016)
Article MathSciNet MATH Google Scholar
Bertsimas, D., King, A.: Logistic regression: from art to science. Stat. Sci. 32, 367–384 (2017)
Article MathSciNet MATH Google Scholar
Bertsimas, D., King, A., Mazumder, R.: Best subset selection via a modern optimization lens. Ann. Stat. 44, 813–852 (2016)
Article MathSciNet MATH Google Scholar
Blum, A.L., Langley, P.: Selection of relevant features and examples in machine learning. Artif. Intell. 97, 245–271 (1997)
Article MathSciNet MATH Google Scholar
Chatterjee, S., Hadi, A.S.: Regression Analysis by Example. Wiley, Hoboken (2012)
MATH Google Scholar
Dormann, C.F., Elith, J., Bacher, S., Buchmann, C., Carl, G., Carré, G., García Marquéz, J.R., Gruber, B., Lafoourcade, B., Leitão, P.J., Münkemüller, T., McClean, C., Osborne, P.E., Reineking, B., Schröder, B., Skidmore, A.K., Zurell, D., Lautenbach, S.: Collinearity: a review of methods to deal with it and a simulation study evaluating their performance. Ecography 36, 27–46 (2013)
Article Google Scholar
Gurobi Optimization, Inc.: Gurobi Optimizer Reference Manual. http://www.gurobi.com (2016). Accessed 6 Oct 2017
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
MATH Google Scholar
Hastie, T., Tibshirani, R., Tibshirani, R.J.: Extended comparisons of best subset selection, forward stepwise selection, and the lasso. arXiv preprint arXiv:1707.08692 (2017)
Hocking, R.R.: The analysis and selection of variables in linear regression. Biometrics 32, 1–49 (1976)
Article MathSciNet MATH Google Scholar
Hoerl, A.E., Kennard, R.W.: Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12, 55–67 (1970)
Article MATH Google Scholar
Huberty, C.J.: Issues in the use and interpretation of discriminant analysis. Psychol. Bull. 95, 156–171 (1984)
Article Google Scholar
IBM: IBM ILOG CPLEX Optimization Studio. https://www-01.ibm.com/software/commerce/optimization/cplex-optimizer/ (2015). Accessed 6 Oct 2017
James, G., Witten, D., Hastie, T., Tibshirani, R.: An Introduction to Statistical Learning. Springer, New York (2013)
Book MATH Google Scholar
Jolliffe, I.T.: A note on the use of principal components in regression. Appl. Stat. 31, 300–303 (1982)
Article Google Scholar
Kimura, K., Waki, H.: Minimization of Akaike’s information criterion in linear regression analysis via mixed integer nonlinear program. Optim. Methods Softw. 33, 633–649 (2018)
Article MathSciNet MATH Google Scholar
Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97, 273–324 (1997)
Article MATH Google Scholar
Konno, H., Yamamoto, R.: Choosing the best set of variables in regression analysis using integer programming. J. Glob. Optim. 44, 273–282 (2009)
Article MathSciNet MATH Google Scholar
Lichman, M.: UCI Machine Learning Repository. School of Information and Computer Science, University of California, Irvine. http://archive.ics.uci.edu/ml (2013)
Liu, H., Motoda, H.: Computational Methods of Feature Selection. CRC Press, Boca Raton (2007)
MATH Google Scholar
Maldonado, S., Pérez, J., Weber, R., Labbé, M.: Feature selection for support vector machines via mixed integer linear programming. Inf. Sci. 279, 163–175 (2014)
Article MathSciNet MATH Google Scholar
Massy, W.F.: Principal components regression in exploratory statistical research. J. Am. Stat. Assoc. 60, 234–256 (1965)
Article Google Scholar
Mazumder, R., Radchenko, P.: The discrete Dantzig selector: estimating sparse linear models via mixed integer linear optimization. IEEE Trans. Inf. Theory 63, 3053–3075 (2017)
MathSciNet MATH Google Scholar
Miller, A.: Subset Selection in Regression. CRC Press, Boca Raton (2002)
Book MATH Google Scholar
Miyashiro, R., Takano, Y.: Subset selection by Mallows’ \(C_p\): a mixed integer programming approach. Expert. Syst. Appl. 42, 325–331 (2015)
Article Google Scholar
Miyashiro, R., Takano, Y.: Mixed integer second-order cone programming formulations for variable selection in linear regression. Eur. J. Oper. Res. 247, 721–731 (2015)
Article MathSciNet MATH Google Scholar
R Core Team: R: a language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org (2014). Accessed 6 Oct 2017
Sato, T., Takano, Y., Miyashiro, R., Yoshise, A.: Feature subset selection for logistic regression via mixed integer optimization. Comput. Optim. Appl. 64, 865–880 (2016)
Article MathSciNet MATH Google Scholar
Sato, T., Takano, Y., Miyashiro, R.: Piecewise-linear approximation for feature subset selection in a sequential logit model. J. Oper. Res. Soc. Jpn. 60, 1–14 (2017)
Article MathSciNet MATH Google Scholar
Tamura, R., Kobayashi, K., Takano, Y., Miyashiro, R., Nakata, K., Matsui, T.: Best subset selection for eliminating multicollinearity. J. Oper. Res. Soc. Jpn. 60, 321–336 (2017)
Article MathSciNet MATH Google Scholar
Ustun, B., Rudin, C.: Supersparse linear integer models for optimized medical scoring systems. Mach. Learn. 102, 349–391 (2016)
Article MathSciNet MATH Google Scholar
Wilson, Z.T., Sahinidis, N.V.: The ALAMO approach to machine learning. Comput. Chem. Eng. 106, 785–795 (2017)
Article Google Scholar
Wold, S., Ruhe, A., Wold, H., Dunn III, W.J.: The collinearity problem in linear regression. The partial least squares (PLS) approach to generalized inverses. SIAM J. Sci. Stat. Comput. 5, 735–743 (1984)
Wold, S., Sjöström, M., Eriksson, L.: PLS-regression: a basic tool of chemometrics. Chemom. Intell. Lab. Syst. 58, 109–130 (2001)
Article Google Scholar

Download references

Acknowledgements

This work was partially supported by JSPS KAKENHI Grant Nos. JP17K01246 and JP17K12983.

Author information

Ryuta Tamura
Present address: October Sky Co., Ltd., Zelkova Bldg., 1-25-12 Fuchucho, Fuchu-shi, Tokyo, 183-0055, Japan
Yuichi Takano
Present address: Faculty of Engineering, Information and Systems, University of Tsukuba, 1-1-1 Tennodai, Tsukuba-shi, Ibaraki, 305-8577, Japan

Authors and Affiliations

Graduate School of Engineering, Tokyo University of Agriculture and Technology, 2-24-16 Naka-cho, Koganei-shi, Tokyo, 184-8588, Japan
Ryuta Tamura
Artificial Intelligence Laboratory, Fujitsu Laboratories Ltd., 4-1-1 Kamikodanaka, Nakahara-ku, Kawasaki-shi, Kanagawa, 211-8588, Japan
Ken Kobayashi
School of Network and Information, Senshu University, 2-1-1 Higashimita, Tama-ku, Kawasaki-shi, Kanagawa, 214-8580, Japan
Yuichi Takano
Institute of Engineering, Tokyo University of Agriculture and Technology, 2-24-16 Naka-cho, Koganei-shi, Tokyo, 184-8588, Japan
Ryuhei Miyashiro
School of Engineering, Tokyo Institute of Technology, 2-12-1 Ookayama, Meguro-ku, Tokyo, 152-8552, Japan
Kazuhide Nakata & Tomomi Matsui

Authors

Ryuta Tamura
View author publications
You can also search for this author in PubMed Google Scholar
Ken Kobayashi
View author publications
You can also search for this author in PubMed Google Scholar
Yuichi Takano
View author publications
You can also search for this author in PubMed Google Scholar
Ryuhei Miyashiro
View author publications
You can also search for this author in PubMed Google Scholar
Kazuhide Nakata
View author publications
You can also search for this author in PubMed Google Scholar
Tomomi Matsui
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ryuhei Miyashiro.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tamura, R., Kobayashi, K., Takano, Y. et al. Mixed integer quadratic optimization formulations for eliminating multicollinearity based on variance inflation factor. J Glob Optim 73, 431–446 (2019). https://doi.org/10.1007/s10898-018-0713-3

Download citation

Received: 07 October 2017
Accepted: 11 October 2018
Published: 22 October 2018
Issue Date: 15 February 2019
DOI: https://doi.org/10.1007/s10898-018-0713-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mixed integer quadratic optimization formulations for eliminating multicollinearity based on variance inflation factor

Abstract

Access this article

Similar content being viewed by others

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

Learning to optimize: A tutorial for continuous and mixed-integer optimization

Multi-objective generalized normal distribution optimization: a novel algorithm for multi-objective problems

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Mixed integer quadratic optimization formulations for eliminating multicollinearity based on variance inflation factor

Abstract

Access this article

Similar content being viewed by others

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

Learning to optimize: A tutorial for continuous and mixed-integer optimization

Multi-objective generalized normal distribution optimization: a novel algorithm for multi-objective problems

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation