Feature subset selection for logistic regression via mixed integer optimization
- 566 Downloads
- 4 Citations
Abstract
This paper concerns a method of selecting a subset of features for a logistic regression model. Information criteria, such as the Akaike information criterion and Bayesian information criterion, are employed as a goodness-of-fit measure. The purpose of our work is to establish a computational framework for selecting a subset of features with an optimality guarantee. For this purpose, we devise mixed integer optimization formulations for feature subset selection in logistic regression. Specifically, we pose the problem as a mixed integer linear optimization problem, which can be solved with standard mixed integer optimization software, by making a piecewise linear approximation of the logistic loss function. The computational results demonstrate that when the number of candidate features was less than 40, our method successfully provided a feature subset that was sufficiently close to an optimal one in a reasonable amount of time. Furthermore, even if there were more candidate features, our method often found a better subset of features than the stepwise methods did in terms of information criteria.
Keywords
Logistic regression Feature subset selection Mixed integer optimization Information criterion Piecewise linear approximationReferences
- 1.Akaike, H.: A new look at the statistical model identification. IEEE Trans. Autom. Control. 19, 716–723 (1974). doi: 10.1109/TAC.1974.1100705 MathSciNetCrossRefMATHGoogle Scholar
- 2.Arthanari, T.S., Dodge, Y.: Mathematical Programming in Statistics. Wiley, New York (1981)MATHGoogle Scholar
- 3.Altman, E.I.: Financial ratios, discriminant analysis and the prediction of corporate bankruptcy. J. Financ. 23, 589–609 (1968). doi: 10.1111/j.1540-6261.1968.tb00843.x CrossRefGoogle Scholar
- 4.Bache, K., Lichman, M.: UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science (2013)
- 5.Beale, E.M.L.: Two transportation problems. In: Kreweras, G., Morlat, G. (eds.) Proceedings of the Third International Conference on Operational Research, pp. 780–788 (1963)Google Scholar
- 6.Beale, E.M.L., Tomlin, J.A.: Special facilities in a general mathematical programming system for non-convex problems using ordered sets of variables. In: Lawrence, J. (ed.) Proceedings of the Fifth International Conference on Operational Research, pp. 447–454. Tavistock Publications, London, UK (1970)Google Scholar
- 7.Bertsimas, D., King, A., Mazumder, R.: Best subset selection via a modern optimization lens. arXiv preprint, arXiv:1507.03133
- 8.Bertsimas, D., Shioda, R.: Algorithm for cardinality-constrained quadratic optimization. Comput. Optim. Appl. 43, 1–22 (2009). doi: 10.1007/s10589-007-9126-9 MathSciNetCrossRefMATHGoogle Scholar
- 9.Blum, A.L., Langley, P.: Selection of relevant features and examples in machine learning. Artif. Intell. 97, 245–271 (1997). doi: 10.1016/S0004-3702(97)00063-5 MathSciNetCrossRefMATHGoogle Scholar
- 10.Burnham, K.P., Anderson, D.R.: Model Selection and Multimodel Inference: A Practical Information-theoretic Approach, 2nd edn. Springer, New York (2002). doi: 10.1007/b97636 MATHGoogle Scholar
- 11.Chen, X.: An improved branch and bound algorithm for feature selection. Pattern Recognit. Lett. 24, 1925–1933 (2003). doi: 10.1016/S0167-8655(03)00020-5 CrossRefGoogle Scholar
- 12.Cormack, G.V.: Email spam filtering: a systematic review. Found. Trends Inf. Retr. 1, 335–455 (2007). doi: 10.1561/1500000006 CrossRefGoogle Scholar
- 13.Efroymson, M.A.: Multiple regression analysis. In: Ralston, A., Wilf, H.S. (eds.) Mathematical Methods for Digital Computers, pp. 191–203. Wiley, New York (1960)Google Scholar
- 14.George, E.I.: The variable selection problem. J. Am. Stat. Assoc. 95, 1304–1308 (2000). doi: 10.1080/01621459.2000.10474336 MathSciNetCrossRefMATHGoogle Scholar
- 15.Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46, 389–422 (2002). doi: 10.1023/A:1012487302797 CrossRefMATHGoogle Scholar
- 16.Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)MATHGoogle Scholar
- 17.Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning, 2nd edn. Springer, New York (2009). doi: 10.1007/978-0-387-84858-7 CrossRefMATHGoogle Scholar
- 18.Hosmer Jr., D.W., Lemeshow, S., Sturdivant, R.X.: Applied Logistic Regression, 3rd edn. Wiley, Hoboken (2013)CrossRefMATHGoogle Scholar
- 19.Huberty, C.J.: Issues in the use and interpretation of discriminant analysis. Psychol. Bull. 95, 156–171 (1984)CrossRefGoogle Scholar
- 20.Huberty, C.J.: Problems with stepwise methods–better alternatives. Adv. Soc. Sci. Methodol. 1, 43–70 (1989)Google Scholar
- 21.James, G., Witten, D., Hastie, T., Tibshirani, R.: An Introduction to Statistical Learning. Springer, New York (2013)CrossRefMATHGoogle Scholar
- 22.Koh, K., Kim, S., Boyd, S.: An interior-point method for large-scale \(\ell _1\)-regularized logistic regression. J. Mach. Learn. Res. 8, 1519–1555 (2007)MathSciNetMATHGoogle Scholar
- 23.Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97, 273–324 (1997). doi: 10.1016/S0004-3702(97)00043-X CrossRefMATHGoogle Scholar
- 24.Konno, H., Takaya, Y.: Multi-step methods for choosing the best set of variables in regression analysis. Comput. Optim. Appl. 46, 417–426 (2010). doi: 10.1007/s10589-008-9193-6 MathSciNetCrossRefMATHGoogle Scholar
- 25.Konno, H., Yamamoto, R.: A mean-variance-skewness model: algorithm and applications. Int. J. Theor. Appl. Financ. 8, 409–423 (2005). doi: 10.1142/S0219024905003116 MathSciNetCrossRefMATHGoogle Scholar
- 26.Konno, H., Yamamoto, R.: Choosing the best set of variables in regression analysis using integer programming. J. Glob. Optim. 44, 273–282 (2009). doi: 10.1007/s10898-008-9323-9 MathSciNetCrossRefMATHGoogle Scholar
- 27.Lee, S., Lee, H., Abbeel, P., Ng, A.Y.: Efficient \(L_1\) regularized logistic regression. In: Proceedings of the Twenty-First National Conference on Artificial Intelligence. AAAI Press, Menlo Park, pp. 401–408 (2006)Google Scholar
- 28.Liu, H., Motoda, H. (eds.): Computational Methods of Feature Selection. Chapman & Hall/CRC, Boca Raton (2007)MATHGoogle Scholar
- 29.McFadden, D.: Conditional logit analysis of qualitative choice behavior. In: Zarembka, P. (ed.) Frontiers in Econometrics, pp. 105–142. Academic Press, New York (1974)Google Scholar
- 30.Mallows, C.L.: Some comments on \(C_p\). Technometrics 15, 661–675 (1973). doi: 10.1080/00401706.1973.10489103 MATHGoogle Scholar
- 31.Miyashiro, R., Takano, Y.: Mixed integer second-order cone programming formulations for variable selection in linear regression. Eur. J. Oper. Res. 247, 721–731 (2015). doi: 10.1016/j.ejor.2015.06.081 MathSciNetCrossRefGoogle Scholar
- 32.Miyashiro, R., Takano, Y.: Subset selection by Mallows’ \(C_p\): a mixed integer programming approach. Expert Syst. Appl. 42, 325–331 (2015). doi: 10.1016/j.eswa.2014.07.056 CrossRefGoogle Scholar
- 33.Nakariyakul, S., Casasent, D.P.: Adaptive branch and bound algorithm for selecting optimal features. Pattern Recognit. Lett. 28, 1415–1427 (2007). doi: 10.1016/j.patrec.2007.02.015 CrossRefGoogle Scholar
- 34.Narendra, P.M., Fukunaga, K.: A branch and bound algorithm for feature subset selection. IEEE Trans. Comput. C–26, 917–922 (1977). doi: 10.1109/TC.1977.1674939 CrossRefMATHGoogle Scholar
- 35.Pacheco, J., Casado, S., Núñez, L.: A variable selection method based on tabu search for logistic regression models. Eur. J. Oper. Res. 199, 506–511 (2009). doi: 10.1016/j.ejor.2008.10.007 MathSciNetCrossRefMATHGoogle Scholar
- 36.Sato, T., Takano, Y., Miyashiro, R.: Piecewise-linear approximation for feature subset selection in a sequential logit model. arXiv preprint, arXiv:1510.05417
- 37.Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1978)MathSciNetCrossRefMATHGoogle Scholar
- 38.Somol, P., Pudil, P., Kittler, J.: Fast branch & bound algorithms for optimal feature selection. IEEE Trans. Pattern Anal. Mach. Intell. 26, 900–912 (2004). doi: 10.1109/TPAMI.2004.28 CrossRefGoogle Scholar
- 39.Unler, A., Murat, A.: A discrete particle swarm optimization method for feature selection in binary classification problems. Eur. J. Oper. Res. 206, 528–539 (2010). doi: 10.1016/j.ejor.2010.02.032 CrossRefMATHGoogle Scholar
- 40.Yamamoto, R., Konno, H.: An efficient algorithm for solving mean-variance model under transaction costs. Pac. J. Optim. 2, 367–384 (2006)MathSciNetMATHGoogle Scholar
- 41.Yu, B., Yuan, B.: A more efficient branch and bound algorithm for feature selection. Pattern Recognit. 26, 883–889 (1993). doi: 10.1016/0031-3203(93)90054-Z CrossRefGoogle Scholar
- 42.Yusta, S.C.: Different metaheuristic strategies to solve the feature selection problem. Pattern Recognit. Lett. 30, 525–534 (2009). doi: 10.1016/j.patrec.2008.11.012 CrossRefGoogle Scholar