FeatureBand: A Feature Selection Method by Combining Early Stopping and Genetic Local Search
Feature selection is an important problem in machine learning and data mining. In reality, the wrapper methods are broadly used in feature selection. It treats feature selection as a search problem using a predictor as a black-box. However, most wrapper methods are time-consuming due to the large search space. In this paper, we propose a novel wrapper method, called FeatureBand, for feature selection. We use the early stopping strategy to terminate bad candidate feature subsets and avoid wasting of training time. Further, we use a genetic local search to generate new subsets based on previous ones. These two techniques are combined under an iterative framework in which we gradually allocate more resources for more promising candidate feature subsets. The experimental result shows that FeatureBand achieves a better trade-off between search time and search accuracy. It is 1.45\(\times \) to 17.6\(\times \) faster than the state-of-the-art wrapper-based methods without accuracy loss.
KeywordsFeature selection Early stopping Genetic local search
This work is supported by NSFC (No. 61832001, 61702015, 61702016, 61572039), the National Key Research and Development Program of China (No. 2018YFB1004403), and PKU-Tencent joint research Lab.
- 5.Chen, J., Stern, M., Wainwright, M.J., Jordan, M.I.: Kernel feature selection via conditional covariance minimization. In: Advances in Neural Information Processing Systems, pp. 6946–6955 (2017)Google Scholar
- 7.Gao, S., Ver Steeg, G., Galstyan, A.: Variational information maximization for feature selection. In: Advances in Neural Information Processing Systems, pp. 487–495 (2016)Google Scholar
- 9.Jamieson, K., Talwalkar, A.: Non-stochastic best arm identification and hyperparameter optimization. In: Artificial Intelligence and Statistics, pp. 240–248 (2016)Google Scholar
- 10.Kennedy, R.: J. and eberhart, particle swarm optimization. In: Proceedings of IEEE International Conference on Neural Networks IV, vol. 1000 (1995)Google Scholar
- 12.Koller, D., Sahami, M.: Toward optimal feature selection. Technical report, Stanford InfoLab (1996)Google Scholar
- 13.Lewis, D.D.: Feature selection and feature extraction for text categorization. In: Proceedings of the workshop on Speech and Natural Language, pp. 212–217. Association for Computational Linguistics (1992)Google Scholar
- 18.Oh, I.S., Lee, J.S., Moon, B.R.: Hybrid genetic algorithms for feature selection. IEEE Trans. Pattern Anal. Mach. Intell. 11, 1424–1437 (2004)Google Scholar
- 22.Tan, M., Wang, L., Tsang, I.W.: Learning sparse svm for feature selection on very high dimensional datasets. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 1047–1054 (2010)Google Scholar
- 26.Zhou, Y., et al.: Parallel feature selection inspired by group testing. In: Advances in Neural Information Processing Systems, pp. 3554–3562 (2014)Google Scholar