Improved Feature Selection Algorithm Based on SVM and Correlation
As a feature selection method, support vector machines-recursive feature elimination (SVM-RFE) can remove irrelevance features but don’t take redundant features into consideration. In this paper, it is shown why this method can’t remove redundant features and an improved technique is presented. Correlation coefficient is introduced to measure the redundancy in the selected subset with SVM-RFE. The features which have a great correlation coefficient with some important feature are removed. Experimental results show that there actually are several strongly redundant features in the selected subsets by SVM-RFE. The coefficients are high to 0.99. The proposed method can not only reduce the number of features, but also keep the classification accuracy.
KeywordsSupport Vector Machine Feature Selection Feature Subset Feature Selection Method Irrelevant Feature
Unable to display preview. Download preview PDF.
- 4.Hu, Q., Yu, D., Xie, Z.: Information Preserving Hybrid Data Reduction Based on Fuzzy Rough Techniques. Pattern Recognition Letters (in press)Google Scholar
- 5.Liu, H., Yu, L., Dash, M., Motoda, H.: Active Feature Selection Using Classes. In: Whang, K.-Y., Jeon, J., Shim, K., Srivastava, J. (eds.) PAKDD 2003. LNCS (LNAI), vol. 2637. Springer, Heidelberg (2003)Google Scholar
- 6.Guyon, I., Matic, N., Vapnik, V.: Discovering Informative Patterns and Data Cleaning. Advances in Knowledge Discovery and Data Mining, 181–203 (1996)Google Scholar
- 10.Duan, K., Rajapakse, J.C., Wang, H., Francisco, A.: Multiple SVM-RFE for Gene Selection in Cancer Classification with Expression Data. IEEE Transactions on Nanobioscience, 228–234 (2005)Google Scholar
- 12.Hsing, T., Liu, L., Brun, M., et al.: The Coefficient of Intrinsic Dependence (Feature Selection Using el CID). Pattern Recognition, 623–636 (2005)Google Scholar
- 13.Yao, K., Lu, W., Zhang, S., et al.: Feature Expansion and Feature Selection for General Pattern Recognition Problems. IEEE Int. Conf. Neural Networks and Signal Processing, 29–32 (2003)Google Scholar
- 14.Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, San Francisco (2000)Google Scholar
- 15.Blake, C., Keogh, E., Merz, C.: UCI Repository of Machine Learning Databases. Technical Report, Department of Information and Computer Science, University of California, Irvine, CA (1998)Google Scholar