Computational Optimization and Applications

, Volume 64, Issue 3, pp 865–880 | Cite as

Feature subset selection for logistic regression via mixed integer optimization

  • Toshiki SatoEmail author
  • Yuichi Takano
  • Ryuhei Miyashiro
  • Akiko Yoshise


This paper concerns a method of selecting a subset of features for a logistic regression model. Information criteria, such as the Akaike information criterion and Bayesian information criterion, are employed as a goodness-of-fit measure. The purpose of our work is to establish a computational framework for selecting a subset of features with an optimality guarantee. For this purpose, we devise mixed integer optimization formulations for feature subset selection in logistic regression. Specifically, we pose the problem as a mixed integer linear optimization problem, which can be solved with standard mixed integer optimization software, by making a piecewise linear approximation of the logistic loss function. The computational results demonstrate that when the number of candidate features was less than 40, our method successfully provided a feature subset that was sufficiently close to an optimal one in a reasonable amount of time. Furthermore, even if there were more candidate features, our method often found a better subset of features than the stepwise methods did in terms of information criteria.


Logistic regression Feature subset selection Mixed integer optimization Information criterion Piecewise linear approximation 


  1. 1.
    Akaike, H.: A new look at the statistical model identification. IEEE Trans. Autom. Control. 19, 716–723 (1974). doi: 10.1109/TAC.1974.1100705 MathSciNetCrossRefzbMATHGoogle Scholar
  2. 2.
    Arthanari, T.S., Dodge, Y.: Mathematical Programming in Statistics. Wiley, New York (1981)zbMATHGoogle Scholar
  3. 3.
    Altman, E.I.: Financial ratios, discriminant analysis and the prediction of corporate bankruptcy. J. Financ. 23, 589–609 (1968). doi: 10.1111/j.1540-6261.1968.tb00843.x CrossRefGoogle Scholar
  4. 4.
    Bache, K., Lichman, M.: UCI Machine Learning Repository []. Irvine, CA: University of California, School of Information and Computer Science (2013)
  5. 5.
    Beale, E.M.L.: Two transportation problems. In: Kreweras, G., Morlat, G. (eds.) Proceedings of the Third International Conference on Operational Research, pp. 780–788 (1963)Google Scholar
  6. 6.
    Beale, E.M.L., Tomlin, J.A.: Special facilities in a general mathematical programming system for non-convex problems using ordered sets of variables. In: Lawrence, J. (ed.) Proceedings of the Fifth International Conference on Operational Research, pp. 447–454. Tavistock Publications, London, UK (1970)Google Scholar
  7. 7.
    Bertsimas, D., King, A., Mazumder, R.: Best subset selection via a modern optimization lens. arXiv preprint, arXiv:1507.03133
  8. 8.
    Bertsimas, D., Shioda, R.: Algorithm for cardinality-constrained quadratic optimization. Comput. Optim. Appl. 43, 1–22 (2009). doi: 10.1007/s10589-007-9126-9 MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Blum, A.L., Langley, P.: Selection of relevant features and examples in machine learning. Artif. Intell. 97, 245–271 (1997). doi: 10.1016/S0004-3702(97)00063-5 MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    Burnham, K.P., Anderson, D.R.: Model Selection and Multimodel Inference: A Practical Information-theoretic Approach, 2nd edn. Springer, New York (2002). doi: 10.1007/b97636 zbMATHGoogle Scholar
  11. 11.
    Chen, X.: An improved branch and bound algorithm for feature selection. Pattern Recognit. Lett. 24, 1925–1933 (2003). doi: 10.1016/S0167-8655(03)00020-5 CrossRefGoogle Scholar
  12. 12.
    Cormack, G.V.: Email spam filtering: a systematic review. Found. Trends Inf. Retr. 1, 335–455 (2007). doi: 10.1561/1500000006 CrossRefGoogle Scholar
  13. 13.
    Efroymson, M.A.: Multiple regression analysis. In: Ralston, A., Wilf, H.S. (eds.) Mathematical Methods for Digital Computers, pp. 191–203. Wiley, New York (1960)Google Scholar
  14. 14.
    George, E.I.: The variable selection problem. J. Am. Stat. Assoc. 95, 1304–1308 (2000). doi: 10.1080/01621459.2000.10474336 MathSciNetCrossRefzbMATHGoogle Scholar
  15. 15.
    Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46, 389–422 (2002). doi: 10.1023/A:1012487302797 CrossRefzbMATHGoogle Scholar
  16. 16.
    Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)zbMATHGoogle Scholar
  17. 17.
    Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning, 2nd edn. Springer, New York (2009). doi: 10.1007/978-0-387-84858-7 CrossRefzbMATHGoogle Scholar
  18. 18.
    Hosmer Jr., D.W., Lemeshow, S., Sturdivant, R.X.: Applied Logistic Regression, 3rd edn. Wiley, Hoboken (2013)CrossRefzbMATHGoogle Scholar
  19. 19.
    Huberty, C.J.: Issues in the use and interpretation of discriminant analysis. Psychol. Bull. 95, 156–171 (1984)CrossRefGoogle Scholar
  20. 20.
    Huberty, C.J.: Problems with stepwise methods–better alternatives. Adv. Soc. Sci. Methodol. 1, 43–70 (1989)Google Scholar
  21. 21.
    James, G., Witten, D., Hastie, T., Tibshirani, R.: An Introduction to Statistical Learning. Springer, New York (2013)CrossRefzbMATHGoogle Scholar
  22. 22.
    Koh, K., Kim, S., Boyd, S.: An interior-point method for large-scale \(\ell _1\)-regularized logistic regression. J. Mach. Learn. Res. 8, 1519–1555 (2007)MathSciNetzbMATHGoogle Scholar
  23. 23.
    Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97, 273–324 (1997). doi: 10.1016/S0004-3702(97)00043-X CrossRefzbMATHGoogle Scholar
  24. 24.
    Konno, H., Takaya, Y.: Multi-step methods for choosing the best set of variables in regression analysis. Comput. Optim. Appl. 46, 417–426 (2010). doi: 10.1007/s10589-008-9193-6 MathSciNetCrossRefzbMATHGoogle Scholar
  25. 25.
    Konno, H., Yamamoto, R.: A mean-variance-skewness model: algorithm and applications. Int. J. Theor. Appl. Financ. 8, 409–423 (2005). doi: 10.1142/S0219024905003116 MathSciNetCrossRefzbMATHGoogle Scholar
  26. 26.
    Konno, H., Yamamoto, R.: Choosing the best set of variables in regression analysis using integer programming. J. Glob. Optim. 44, 273–282 (2009). doi: 10.1007/s10898-008-9323-9 MathSciNetCrossRefzbMATHGoogle Scholar
  27. 27.
    Lee, S., Lee, H., Abbeel, P., Ng, A.Y.: Efficient \(L_1\) regularized logistic regression. In: Proceedings of the Twenty-First National Conference on Artificial Intelligence. AAAI Press, Menlo Park, pp. 401–408 (2006)Google Scholar
  28. 28.
    Liu, H., Motoda, H. (eds.): Computational Methods of Feature Selection. Chapman & Hall/CRC, Boca Raton (2007)zbMATHGoogle Scholar
  29. 29.
    McFadden, D.: Conditional logit analysis of qualitative choice behavior. In: Zarembka, P. (ed.) Frontiers in Econometrics, pp. 105–142. Academic Press, New York (1974)Google Scholar
  30. 30.
    Mallows, C.L.: Some comments on \(C_p\). Technometrics 15, 661–675 (1973). doi: 10.1080/00401706.1973.10489103 zbMATHGoogle Scholar
  31. 31.
    Miyashiro, R., Takano, Y.: Mixed integer second-order cone programming formulations for variable selection in linear regression. Eur. J. Oper. Res. 247, 721–731 (2015). doi: 10.1016/j.ejor.2015.06.081 MathSciNetCrossRefGoogle Scholar
  32. 32.
    Miyashiro, R., Takano, Y.: Subset selection by Mallows’ \(C_p\): a mixed integer programming approach. Expert Syst. Appl. 42, 325–331 (2015). doi: 10.1016/j.eswa.2014.07.056 CrossRefGoogle Scholar
  33. 33.
    Nakariyakul, S., Casasent, D.P.: Adaptive branch and bound algorithm for selecting optimal features. Pattern Recognit. Lett. 28, 1415–1427 (2007). doi: 10.1016/j.patrec.2007.02.015 CrossRefGoogle Scholar
  34. 34.
    Narendra, P.M., Fukunaga, K.: A branch and bound algorithm for feature subset selection. IEEE Trans. Comput. C–26, 917–922 (1977). doi: 10.1109/TC.1977.1674939 CrossRefzbMATHGoogle Scholar
  35. 35.
    Pacheco, J., Casado, S., Núñez, L.: A variable selection method based on tabu search for logistic regression models. Eur. J. Oper. Res. 199, 506–511 (2009). doi: 10.1016/j.ejor.2008.10.007 MathSciNetCrossRefzbMATHGoogle Scholar
  36. 36.
    Sato, T., Takano, Y., Miyashiro, R.: Piecewise-linear approximation for feature subset selection in a sequential logit model. arXiv preprint, arXiv:1510.05417
  37. 37.
    Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1978)MathSciNetCrossRefzbMATHGoogle Scholar
  38. 38.
    Somol, P., Pudil, P., Kittler, J.: Fast branch & bound algorithms for optimal feature selection. IEEE Trans. Pattern Anal. Mach. Intell. 26, 900–912 (2004). doi: 10.1109/TPAMI.2004.28 CrossRefGoogle Scholar
  39. 39.
    Unler, A., Murat, A.: A discrete particle swarm optimization method for feature selection in binary classification problems. Eur. J. Oper. Res. 206, 528–539 (2010). doi: 10.1016/j.ejor.2010.02.032 CrossRefzbMATHGoogle Scholar
  40. 40.
    Yamamoto, R., Konno, H.: An efficient algorithm for solving mean-variance model under transaction costs. Pac. J. Optim. 2, 367–384 (2006)MathSciNetzbMATHGoogle Scholar
  41. 41.
    Yu, B., Yuan, B.: A more efficient branch and bound algorithm for feature selection. Pattern Recognit. 26, 883–889 (1993). doi: 10.1016/0031-3203(93)90054-Z CrossRefGoogle Scholar
  42. 42.
    Yusta, S.C.: Different metaheuristic strategies to solve the feature selection problem. Pattern Recognit. Lett. 30, 525–534 (2009). doi: 10.1016/j.patrec.2008.11.012 CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  • Toshiki Sato
    • 1
    Email author
  • Yuichi Takano
    • 2
  • Ryuhei Miyashiro
    • 3
  • Akiko Yoshise
    • 4
  1. 1.Graduate School of Systems and Information EngineeringUniversity of TsukubaTsukuba-shiJapan
  2. 2.School of Network and InformationSenshu UniversityKawasaki-shiJapan
  3. 3.Institute of EngineeringTokyo University of Agriculture and TechnologyTokyoJapan
  4. 4.Faculty of Engineering, Information and SystemsUniversity of TsukubaTsukuba-shiJapan

Personalised recommendations