FeatureBand: A Feature Selection Method by Combining Early Stopping and Genetic Local Search
Abstract
Feature selection is an important problem in machine learning and data mining. In reality, the wrapper methods are broadly used in feature selection. It treats feature selection as a search problem using a predictor as a black-box. However, most wrapper methods are time-consuming due to the large search space. In this paper, we propose a novel wrapper method, called FeatureBand, for feature selection. We use the early stopping strategy to terminate bad candidate feature subsets and avoid wasting of training time. Further, we use a genetic local search to generate new subsets based on previous ones. These two techniques are combined under an iterative framework in which we gradually allocate more resources for more promising candidate feature subsets. The experimental result shows that FeatureBand achieves a better trade-off between search time and search accuracy. It is 1.45\(\times \) to 17.6\(\times \) faster than the state-of-the-art wrapper-based methods without accuracy loss.
Keywords
Feature selection Early stopping Genetic local searchNotes
Acknowledgement
This work is supported by NSFC (No. 61832001, 61702015, 61702016, 61572039), the National Key Research and Development Program of China (No. 2018YFB1004403), and PKU-Tencent joint research Lab.
References
- 1.Amaldi, E., Kann, V.: On the approximability of minimizing nonzero variables or unsatisfied relations in linear systems. Theor. Comput. Sci. 209(1–2), 237–260 (1998)MathSciNetCrossRefGoogle Scholar
- 2.Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)CrossRefGoogle Scholar
- 3.Brown, G., Pocock, A., Zhao, M.J., Luján, M.: Conditional likelihood maximisation: a unifying framework for information theoretic feature selection. J. Mach. Learn. Res. 13(Jan), 27–66 (2012)MathSciNetzbMATHGoogle Scholar
- 4.Chandrashekar, G., Sahin, F.: A survey on feature selection methods. Comput. Electr. Eng. 40(1), 16–28 (2014)CrossRefGoogle Scholar
- 5.Chen, J., Stern, M., Wainwright, M.J., Jordan, M.I.: Kernel feature selection via conditional covariance minimization. In: Advances in Neural Information Processing Systems, pp. 6946–6955 (2017)Google Scholar
- 6.Fleuret, F.: Fast binary feature selection with conditional mutual information. J. Mach. Learn. Res. 5(Nov), 1531–1555 (2004)MathSciNetzbMATHGoogle Scholar
- 7.Gao, S., Ver Steeg, G., Galstyan, A.: Variational information maximization for feature selection. In: Advances in Neural Information Processing Systems, pp. 487–495 (2016)Google Scholar
- 8.Gu, S., Cheng, R., Jin, Y.: Feature selection for high-dimensional classification using a competitive swarm optimizer. Soft Comput. 22(3), 811–822 (2018)CrossRefGoogle Scholar
- 9.Jamieson, K., Talwalkar, A.: Non-stochastic best arm identification and hyperparameter optimization. In: Artificial Intelligence and Statistics, pp. 240–248 (2016)Google Scholar
- 10.Kennedy, R.: J. and eberhart, particle swarm optimization. In: Proceedings of IEEE International Conference on Neural Networks IV, vol. 1000 (1995)Google Scholar
- 11.Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97(1–2), 273–324 (1997)CrossRefGoogle Scholar
- 12.Koller, D., Sahami, M.: Toward optimal feature selection. Technical report, Stanford InfoLab (1996)Google Scholar
- 13.Lewis, D.D.: Feature selection and feature extraction for text categorization. In: Proceedings of the workshop on Speech and Natural Language, pp. 212–217. Association for Computational Linguistics (1992)Google Scholar
- 14.Li, J., et al.: Feature selection: a data perspective. ACM Comput. Surv. (CSUR) 50(6), 94 (2017)CrossRefGoogle Scholar
- 15.Li, L., Jamieson, K., DeSalvo, G., Rostamizadeh, A., Talwalkar, A.: Hyperband: a novel bandit-based approach to hyperparameter optimization. J. Mach. Learn. Res. 18(1), 6765–6816 (2017)MathSciNetzbMATHGoogle Scholar
- 16.Liao, J., Chin, K.V.: Logistic regression for disease classification using microarray data: model selection in a large p and small n case. Bioinformatics 23(15), 1945–1951 (2007)CrossRefGoogle Scholar
- 17.Liu, L., Du, X., Zhu, L., Shen, F., Huang, Z.: Learning discrete hashing towards efficient fashion recommendation. Data Sci. Eng. 3(4), 307–322 (2018). https://doi.org/10.1007/s41019-018-0079-zCrossRefGoogle Scholar
- 18.Oh, I.S., Lee, J.S., Moon, B.R.: Hybrid genetic algorithms for feature selection. IEEE Trans. Pattern Anal. Mach. Intell. 11, 1424–1437 (2004)Google Scholar
- 19.Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12(Oct), 2825–2830 (2011)MathSciNetzbMATHGoogle Scholar
- 20.Peng, H., Long, F., Ding, C.: Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005)CrossRefGoogle Scholar
- 21.Pudil, P., Novovičová, J., Kittler, J.: Floating search methods in feature selection. Pattern Recogn. Lett. 15(11), 1119–1125 (1994)CrossRefGoogle Scholar
- 22.Tan, M., Wang, L., Tsang, I.W.: Learning sparse svm for feature selection on very high dimensional datasets. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 1047–1054 (2010)Google Scholar
- 23.Unler, A., Murat, A.: A discrete particle swarm optimization method for feature selection in binary classification problems. Eur. J. Oper. Res. 206(3), 528–539 (2010)CrossRefGoogle Scholar
- 24.Xue, B., Zhang, M., Browne, W.N.: Particle swarm optimization for feature selection in classification: a multi-objective approach. IEEE Trans. Cybern. 43(6), 1656–1671 (2013)CrossRefGoogle Scholar
- 25.Zhang, J., Wang, Q., Yang, Q., Zhou, R., Zhang, Y.: Exploiting multi-category characteristics and unified framework to extract web content. Data Sci. Eng. 3(2), 101–114 (2018). https://doi.org/10.1007/s41019-018-0067-3CrossRefGoogle Scholar
- 26.Zhou, Y., et al.: Parallel feature selection inspired by group testing. In: Advances in Neural Information Processing Systems, pp. 3554–3562 (2014)Google Scholar