Abstract
To remove the irrelevant and redundant features from the high-dimensional data while ensuring classification accuracy, a supervised feature subset evaluation method based on multi-objective optimization has been proposed in this paper. Four aspects, sparsity of feature space, classification accuracy, information loss degree and feature subset stability, were took into account in the proposed method and the Multi-objective functions were constructed. Then the popular NSGA-II algorithm was used for optimization of the four objectives in the feature selection process. Finally the feature subset was selected based on the obtained feature weight vector according the four evaluation criteria. The proposed method was tested on 4 standard data sets using two kinds of classifier. The experiment results show that the proposed method can guarantee the higher classification accuracy even though only few numbers of features selected than the other methods. On the other hand, the information loss degrees of the proposed method are the lowest which demonstrates that the selected feature subsets of the proposed method can represent the original data sets best.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Chen, L.: Curse of dimensionality. J. Ind. Eng. Chem. 29, 48–53 (2009)
Theodoridis, S., Koutroumbas, K.: Pattern Recognition, 4th edn. Academic Press, Orlando (2008)
Kumar, V., Minz, S.: Feature selection: a literature review. Smart Comput. Rev. 4, 211–222 (2014)
Hamdani, T.M., Won, J.-M., Alimi, A.M., Karray, F.: Multi-objective feature selection with NSGA II. In: Beliczynski, B., Dzielinski, A., Iwanowski, M., Ribeiro, B. (eds.) ICANNGA 2007. LNCS, vol. 4431, pp. 240–247. Springer, Heidelberg (2007). doi:10.1007/978-3-540-71618-1_27
Venkatadri, M., Rao, K.S.: A multiobjective genetic algorithm for feature selection in data mining. Int. J. Comput. Sci. Inf. Technol. 1, 443–448 (2010)
Saroj, J.: Multi-objective genetic algorithm approach to feature subset optimization. In: 2014 IEEE International Advance Computing Conference, pp. 544–548. IEEE Press, New York (2014)
Deb, K., Pratap, A., Agarwal, S., et al.: A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6, 182–197 (2002)
Indyk, P., Ruzic, M.: Near-optimal sparse recovery in the L\(_{1}\) norm. In: 2008 Conference on Communication, pp. 199–207. IEEE Press, New York (2008)
Mitra, P., Murthy, C.A., Pal, S.K.: Unsupervised feature selection using feature similarity. IEEE Trans. Pattern Anal. Mach. Intell. 24, 301–312 (2002)
Hawkins, D.M.: The problem of overfitting. J. Chem. Inf. Comput. Sci. 44, 1 (2004)
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27 (2011)
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley, New York (2000)
Singh, S.R., Murthy, H.A., Gonsalves, T.A., et al.: Feature selection for text classification based on Gini coefficient of inequality. J. Mach. Learn. Res. Proc. Track 10(10), 76–85 (2010)
Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Fourteenth International Conference on Machine Learning, pp. 182–197. Morgan Kaufmann, San Francisco (1998)
Wei, L.J.: Asymptotic conservativeness and efficiency of Kruskal-Wallis test for K dependent samples. J. Am. Stat. Assoc. 76, 1006–1009 (1981)
Kononenko, I.: Estimating attributes: analysis and extensions of RELIEF. In: Bergadano, F., De Raedt, L. (eds.) ECML 1994. LNCS, vol. 784, pp. 171–182. Springer, Heidelberg (1994). doi:10.1007/3-540-57868-4_57
Tusher, V.G., Tibshirani, R., Chu, G.: Significance analysis of microarrays applied to the ionizing radiation response. Proc. Nat. Acad. Sci. 98, 5116–5121 (2001)
Devijver, P.A., Kittler, J.: Pattern Recognition: A Statistical Approach. Prentice-Hall International, London (1982)
Acknowledgments
The work is supported by National Nature Science Foundation of China (U1304602, 61473266 and 61305080).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Li, M., Shang, Z., Yue, C. (2017). A Feature Subset Evaluation Method Based on Multi-objective Optimization. In: Shi, Y., et al. Simulated Evolution and Learning. SEAL 2017. Lecture Notes in Computer Science(), vol 10593. Springer, Cham. https://doi.org/10.1007/978-3-319-68759-9_47
Download citation
DOI: https://doi.org/10.1007/978-3-319-68759-9_47
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-68758-2
Online ISBN: 978-3-319-68759-9
eBook Packages: Computer ScienceComputer Science (R0)