FeatureBand: A Feature Selection Method by Combining Early Stopping and Genetic Local Search

  • Huanran XueEmail author
  • Jiawei Jiang
  • Yingxia Shao
  • Bin Cui
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11642)


Feature selection is an important problem in machine learning and data mining. In reality, the wrapper methods are broadly used in feature selection. It treats feature selection as a search problem using a predictor as a black-box. However, most wrapper methods are time-consuming due to the large search space. In this paper, we propose a novel wrapper method, called FeatureBand, for feature selection. We use the early stopping strategy to terminate bad candidate feature subsets and avoid wasting of training time. Further, we use a genetic local search to generate new subsets based on previous ones. These two techniques are combined under an iterative framework in which we gradually allocate more resources for more promising candidate feature subsets. The experimental result shows that FeatureBand achieves a better trade-off between search time and search accuracy. It is 1.45\(\times \) to 17.6\(\times \) faster than the state-of-the-art wrapper-based methods without accuracy loss.


Feature selection Early stopping Genetic local search 



This work is supported by NSFC (No. 61832001, 61702015, 61702016, 61572039), the National Key Research and Development Program of China (No. 2018YFB1004403), and PKU-Tencent joint research Lab.


  1. 1.
    Amaldi, E., Kann, V.: On the approximability of minimizing nonzero variables or unsatisfied relations in linear systems. Theor. Comput. Sci. 209(1–2), 237–260 (1998)MathSciNetCrossRefGoogle Scholar
  2. 2.
    Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)CrossRefGoogle Scholar
  3. 3.
    Brown, G., Pocock, A., Zhao, M.J., Luján, M.: Conditional likelihood maximisation: a unifying framework for information theoretic feature selection. J. Mach. Learn. Res. 13(Jan), 27–66 (2012)MathSciNetzbMATHGoogle Scholar
  4. 4.
    Chandrashekar, G., Sahin, F.: A survey on feature selection methods. Comput. Electr. Eng. 40(1), 16–28 (2014)CrossRefGoogle Scholar
  5. 5.
    Chen, J., Stern, M., Wainwright, M.J., Jordan, M.I.: Kernel feature selection via conditional covariance minimization. In: Advances in Neural Information Processing Systems, pp. 6946–6955 (2017)Google Scholar
  6. 6.
    Fleuret, F.: Fast binary feature selection with conditional mutual information. J. Mach. Learn. Res. 5(Nov), 1531–1555 (2004)MathSciNetzbMATHGoogle Scholar
  7. 7.
    Gao, S., Ver Steeg, G., Galstyan, A.: Variational information maximization for feature selection. In: Advances in Neural Information Processing Systems, pp. 487–495 (2016)Google Scholar
  8. 8.
    Gu, S., Cheng, R., Jin, Y.: Feature selection for high-dimensional classification using a competitive swarm optimizer. Soft Comput. 22(3), 811–822 (2018)CrossRefGoogle Scholar
  9. 9.
    Jamieson, K., Talwalkar, A.: Non-stochastic best arm identification and hyperparameter optimization. In: Artificial Intelligence and Statistics, pp. 240–248 (2016)Google Scholar
  10. 10.
    Kennedy, R.: J. and eberhart, particle swarm optimization. In: Proceedings of IEEE International Conference on Neural Networks IV, vol. 1000 (1995)Google Scholar
  11. 11.
    Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97(1–2), 273–324 (1997)CrossRefGoogle Scholar
  12. 12.
    Koller, D., Sahami, M.: Toward optimal feature selection. Technical report, Stanford InfoLab (1996)Google Scholar
  13. 13.
    Lewis, D.D.: Feature selection and feature extraction for text categorization. In: Proceedings of the workshop on Speech and Natural Language, pp. 212–217. Association for Computational Linguistics (1992)Google Scholar
  14. 14.
    Li, J., et al.: Feature selection: a data perspective. ACM Comput. Surv. (CSUR) 50(6), 94 (2017)CrossRefGoogle Scholar
  15. 15.
    Li, L., Jamieson, K., DeSalvo, G., Rostamizadeh, A., Talwalkar, A.: Hyperband: a novel bandit-based approach to hyperparameter optimization. J. Mach. Learn. Res. 18(1), 6765–6816 (2017)MathSciNetzbMATHGoogle Scholar
  16. 16.
    Liao, J., Chin, K.V.: Logistic regression for disease classification using microarray data: model selection in a large p and small n case. Bioinformatics 23(15), 1945–1951 (2007)CrossRefGoogle Scholar
  17. 17.
    Liu, L., Du, X., Zhu, L., Shen, F., Huang, Z.: Learning discrete hashing towards efficient fashion recommendation. Data Sci. Eng. 3(4), 307–322 (2018). Scholar
  18. 18.
    Oh, I.S., Lee, J.S., Moon, B.R.: Hybrid genetic algorithms for feature selection. IEEE Trans. Pattern Anal. Mach. Intell. 11, 1424–1437 (2004)Google Scholar
  19. 19.
    Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12(Oct), 2825–2830 (2011)MathSciNetzbMATHGoogle Scholar
  20. 20.
    Peng, H., Long, F., Ding, C.: Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005)CrossRefGoogle Scholar
  21. 21.
    Pudil, P., Novovičová, J., Kittler, J.: Floating search methods in feature selection. Pattern Recogn. Lett. 15(11), 1119–1125 (1994)CrossRefGoogle Scholar
  22. 22.
    Tan, M., Wang, L., Tsang, I.W.: Learning sparse svm for feature selection on very high dimensional datasets. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 1047–1054 (2010)Google Scholar
  23. 23.
    Unler, A., Murat, A.: A discrete particle swarm optimization method for feature selection in binary classification problems. Eur. J. Oper. Res. 206(3), 528–539 (2010)CrossRefGoogle Scholar
  24. 24.
    Xue, B., Zhang, M., Browne, W.N.: Particle swarm optimization for feature selection in classification: a multi-objective approach. IEEE Trans. Cybern. 43(6), 1656–1671 (2013)CrossRefGoogle Scholar
  25. 25.
    Zhang, J., Wang, Q., Yang, Q., Zhou, R., Zhang, Y.: Exploiting multi-category characteristics and unified framework to extract web content. Data Sci. Eng. 3(2), 101–114 (2018). Scholar
  26. 26.
    Zhou, Y., et al.: Parallel feature selection inspired by group testing. In: Advances in Neural Information Processing Systems, pp. 3554–3562 (2014)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Huanran Xue
    • 1
    Email author
  • Jiawei Jiang
    • 2
    • 3
  • Yingxia Shao
    • 4
  • Bin Cui
    • 1
    • 2
  1. 1.Center for Data Science, National Engineering Laboratory for Big Data Analysis and ApplicationsPeking UniversityBeijingChina
  2. 2.School of EECS and Key Laboratory of High Confidence Software Technologies (MOE)Peking UniversityBeijingChina
  3. 3.Tencent Inc.ShenzhenChina
  4. 4.Beijing Key Lab of Intelligent Telecommunications Software and MultimediaBUPTBeijingChina

Personalised recommendations