Annals of Operations Research

, Volume 254, Issue 1–2, pp 89–109 | Cite as

Embedded variable selection method using signomial classification

  • Kyoungmi Hwang
  • Dohyun Kim
  • Kyungsik LeeEmail author
  • Chungmok Lee
  • Sungsoo ParkEmail author
Original Paper


We propose two variable selection methods using signomial classification. We attempt to select, among a set of the input variables, the variables that lead to the best performance of the classifier. One method repeatedly removes variables based on backward selection, whereas the second method directly selects a set of variables by solving an optimization problem. The proposed methods conduct variable selection considering nonlinear interactions of variables and obtain a signomial classifier with the selected variables. Computational results show that the proposed methods more effectively selects desirable variables for predicting output and provide the classifiers with better or comparable test error rates, as compared with existing methods.


Classification problems Variable selection Embedded method Signomial classification 



This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (2013-025297).

Compliance with ethical standards

Conflict of interest

The authors declare that they have no conflict of interest. Funding information is stated in the above acknowledgements.


  1. Bache, K., & Lichman, M. (2013). University of california, irvine (UCI) machine learning repository.
  2. Bay, S. D. (1998). Combining nearest neighbor classifiers through multiple feature subsets. In Proceedings of the 15th international conference on machine learning (ICML ’98, pp. 37–45). Madison, WI: Morgan Kaufmann Publishers.Google Scholar
  3. Bertsimas, D., & Tsitsiklis, J. N. (1997). Introduction to linear optimization. No. 6 in Athena scientific series in optimization and neural computation. Belmont: Athena Scientific, MAMSC.Google Scholar
  4. Bi, J., Bennett, K., Embrechts, M., Breneman, C., & Song, M. (2003). Dimensionality reduction via support vector machines. Journal of Machine Learning Research, 3, 1229–1243.Google Scholar
  5. Biesiada, J., & Duch, W. (2007). Feature selection for high-dimensional data—a Pearson redundancy based filter. In Computer recognition systems 2, advances in soft computing (Vol. 45, pp. 242–249). NewYork: Springer.Google Scholar
  6. Bradley, P. S., Mangasarian, O. L., & Street, W. N. (1998). Feature selection via mathematical programming. INFORMS Journal on Computing, 10, 209–217.CrossRefGoogle Scholar
  7. Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and regression trees. Belmont, CA: Wadsworth International Group.Google Scholar
  8. Canu, S., Grandvalet, Y., Guigue, V., & Rakotomamonjy, A. (2005). SVM and kernel methods matlab toolbox. INSA de Rouen, Rouen: Perception Systemes et Information.Google Scholar
  9. Chapelle, O., Vapnik, V., Bousquet, O., & Mukherjee, S. (2002). Choosing multiple parameters for support vector machines. Machine Learning, 46, 131–159.CrossRefGoogle Scholar
  10. Cover, T., & Hart, P. (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13, 21–27.CrossRefGoogle Scholar
  11. Cun, Y. L., Denker, J. S., & Solla, S. A. (1989). Optimal brain damage. In Proceedings of the 2nd annual conference on neural information processing systems (NIPS ’89, pp. 598–605). Morgan Kaufmann Publishers: Denver, CO.Google Scholar
  12. Dash, M., Choi, K., Scheuermann, P., & Liu, H. (2002). Feature selection for clustering—a filter solution. In Proceedings of the 2nd international conference on data mining (ICDM ’02, pp. 115–122). Maebashi: IEEE Computer Society.Google Scholar
  13. Fung, G. M., & Mangasarian, O. L. (2004). A feature selection newton method for support vector machine classification. Computational Optimization and Applications, 28, 185–202.CrossRefGoogle Scholar
  14. Garey, M. R., & Johnson, D. S. (1979). Computers and intractability: A guide to the theory of NP-completeness., A series of books in the mathematical sciences New York, NY: W. H. Freeman and Company.Google Scholar
  15. Grandvalet, Y., & Canu, S. (2002). Adaptive scaling for feature selection in SVMs. In Proceedings of the 15th annual conference on neural information processing systems (NIPS ’02, pp. 553–560). Vancouver, BC: MIT Press.Google Scholar
  16. Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research, 3, 1157–1182.Google Scholar
  17. Guyon, I., Weston, J., Barnhill, S., & Vapnik, V. (2002). Gene selection for cancer classification using support vector machines. Machine Learning, 46, 389–422.CrossRefGoogle Scholar
  18. Hermes, L., & Buhmann, J. M. (2000). Feature selection for support vector machines. In Proceedings of the 15th international conference on pattern recognition (ICPR ’00, Vol. 2, pp. 716–719). Barcelona: IEEE Computer Society .Google Scholar
  19. Hosmer, D., & Lemeshow, S. (2005). Applied logistic regression (2nd ed.)., Wiley series in probability and statistics New York, NY: Wiley.Google Scholar
  20. Hsu, C. W., Chang, C. C, & Lin, C. J. (2003). A practical guide to support vector classification. Technical report, Department of Computer Science, National Taiwan University, Taipei 106, Taiwan.Google Scholar
  21. Jebara, T., & Jaakkola, T. (2000). Feature selection and dualities in maximum entropy discrimination. In Proceedings of the 16th conference on uncertainty in artificial intelligence (UAI ’00, pp. 291–300). Stanford, CA: Morgan Kaufmann Publishers.Google Scholar
  22. Jeong, Y. S., Shin, K., & Jeong, M. K. (2014). An evolutionary algorithm with the partial SFFS mutation for large scale feature selection problems. Journal of the Operational Research Society, 65, 1–19.CrossRefGoogle Scholar
  23. Kohavi, R., & John, G. H. (1997). Wrappers for feature subset selection. Artificial Intelligence, 97, 273–324.CrossRefGoogle Scholar
  24. Kohavi, R., & Sommerfield, D. (1995). Feature subset selection using the wrapper method: Overfitting and dynamic search space topology. In Proceedings of the 1st international conference on knowledge discovery and data mining (KDD ’95, pp. 192–197). Montreal, QC: AAAI Press.Google Scholar
  25. Lal, T. N., Chapelle, O., Weston, J., & Elisseeff, A. (2006). Feature extraction: Foundations and applications (Studies in Fuzziness and Soft Computing), chap 5. Embedded methods (Vol. 207, pp. 137–165). Berlin: Springer.Google Scholar
  26. Lawler, E. L., & Wood, D. E. (1966). Branch-and-bound methods: A survey. Operations Research, 14(4), 699–719.CrossRefGoogle Scholar
  27. Lee, K., Kim, N., & Jeong, M. (2014). The sparse signomial classification and regression model. Annals of Operations Research, 216, 257–286.CrossRefGoogle Scholar
  28. MATLAB (2010). version 7.10.0 (R2010a). The MathWorks Inc., Natick, MA, USA.Google Scholar
  29. Murty, K. G., & Kabadi, S. N. (1987). Some NP-complete problems in quadratic and nonlinear programming. Mathematical Programming, 39, 117–129.CrossRefGoogle Scholar
  30. Perkins, S., Lacker, K., & Theiler, J. (2003). Grafting: Fast, incremental feature selection by gradient descent in function space. Journal of Machine Learning Research, 3, 1333–1356.Google Scholar
  31. Rakotomamonjy, A. (2003). Variable selection using SVM based criteria. Journal of Machine Learning Research, 3, 1357–1370.Google Scholar
  32. Rivals, I., & Personnaz, L. (2003). MLPs (mono layer polynomials and multi layer perceptrons) for nonlinear modeling. Journal of Machine Learning Research, 3, 1383–1398.Google Scholar
  33. Stoppiglia, H., Dreyfus, G., Dubois, R., & Oussar, Y. (2003). Ranking a random feature for variable and feature selection. Journal of Machine Learning Research, 3, 1399–1414.Google Scholar
  34. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B (Methodological), 58(1), 267–288.Google Scholar
  35. Tipping, M. E. (2001). Sparse Bayesian learning and the relevance vector machine. Journal of Machine Learning Research, 1, 211–244.Google Scholar
  36. Torkkola, K. (2003). Feature extraction by non-parametric mutual information maximization. Journal of Machine Learning Research, 3, 1415–1438.Google Scholar
  37. Tsanas, A., Little, M. A., Fox, C., & Ramig, L. O. (2014). Objective automatic assessment of rehabilitative speech treatment in Parkinsons disease. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 22(1), 1801–1901.CrossRefGoogle Scholar
  38. Weston, J., Mukherjee, S., Chapelle, O., Pontil, M., Poggio, T., & Vapnik, V. (2000). Feature selection for SVMs. In Proceedings of the 13th annual conference on neural information processing systems (NIPS ’00, pp. 563–532). Denver, CO: MIT PressGoogle Scholar
  39. Weston, J., Elisseeff, A., Schölkopf, B., & Tipping, M. (2003). Use of the zero-norm with linear models and kernel methods. Journal of Machine Learning Research, 3, 1439–1461.Google Scholar
  40. Weston, J., Elisseeff, A., BakIr, G., & Sinz, F. (2006). Spider toolbox.
  41. Xpress (2016). Xpress 7.9.
  42. Youn, E., & Jeong, M. K. (2009). Class dependent feature scaling method using naive bayes classifier for text mining. Pattern Recognition Letters, 30(5), 477–485.CrossRefGoogle Scholar
  43. Youn, E., Jeong, M. K., & Baek, S. (2010). Support vector based feature selection using Fisher’s linear discriminant and support vector machine. Expert Systems with Applications, 37, 6148–6156.CrossRefGoogle Scholar
  44. Yu, L., & Liu, H. (2003). Feature selection for high-dimensional data: A fast correlation-based filter solution. In Proceedings of the 20th international conference on machine learning (ICML ’03, pp. 56–63). Washington, DC: AAAI Press.Google Scholar

Copyright information

© Springer Science+Business Media New York 2017

Authors and Affiliations

  1. 1.Test and Package Automation Group, Giheung Hwaseong ComplexSamsung ElectronicsAsan-siRepublic of Korea
  2. 2.Department of Industrial and Management EngineeringMyongji UniversityYonginRepublic of Korea
  3. 3.Department of Industrial Engineering, Institute for Industrial Systems InnovationSeoul National UniversitySeoulRepublic of Korea
  4. 4.School of Industrial and Management EngineeringHankuk University of Foreign StudiesYongin-siRepublic of Korea
  5. 5.Department of Industrial and Systems EngineeringKAISTDaejeonRepublic of Korea

Personalised recommendations