A Modified Kolmogorov-Smirnov Correlation Based Filter Algorithm for Feature Selection
A feature selection is a technique of selecting a subset of relevant features from which the classification model can be constructed for a particular task. Feature selection is a preprocessing step of machine learning which is effective in reducing dimensionality, removing irrelevant data, increasing learning accuracy, and improving results. In this paper, a modified Kolmogorov-Smirnov Correlation Based Filter algorithm for Feature Selection is proposed based on Kolmogorov-Smirnov statistic which uses class label information while comparing feature pairs. Results obtained from this algorithm are compared with two other algorithms, Correlation Feature Selection algorithm (CFS) and simple Kolmogorov Smirnov-Correlation Based Filter (KS-CBF), capable of removing irrelevancy and redundancy. The classification accuracy is achieved with the reduced feature set using the proposed approach with two of the standard classifiers such as the Decision-Tree classifier and the K-NN classifier.
KeywordsFeature Selection Information Gain Feature Subset Filter Model Redundant Feature
Unable to display preview. Download preview PDF.
- 1.Chou, T., Yen, K., Luo, J., Pissinou, N., Makki, K.: Correlation Based Feature Selection for Intrusion Detection Design. In: IEEE Military Communications Conference, MILCOM 2007, pp. 1–7 (2007)Google Scholar
- 2.Hall, M.A., Smith, L.A.: Feature subset selection: A correlation based filter approach. In: Proc. Intl. Conf. Neural Inform. Processing Intell. Inform. Syst., pp. 855–858 (1997)Google Scholar
- 4.Chou, T.-S.: Ensemble Fuzzy Belief Intrusion Detection Design Thesis. Florida International University, Miami (2007)Google Scholar
- 5.Bancarz, I.: Conditional Entropy Metrics for Feature Selection, University of Edinburgh, College of Science and Engineering, School of Informatics (June 2005)Google Scholar
- 6.Blachnik, M., Duch, W., Kachel, A., Biesiada, J.: Feature Selection for Supervised Classification: A Kolmogorov-Smirnov Class Correlation-Based Filter. In: AIMeth, Symposium On Methods Of Artificial Intelligence, Gliwice, Poland, (November 10-19, 2009)Google Scholar
- 7.Duch, W., Biesiada, J.: Feature Selection for High-Dimensional Data: A KolmogorovSmirnov Correlation-Based Filter Solution. In: Advances in Soft Computing, pp. 95–104. Springer, Heidelberg (2005)Google Scholar
- 8.Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. (2006)Google Scholar
- 9.The Kolmogorov-Smirnov Test When Parameters are estimated from data: Hovhannes Keutelian, FermilabGoogle Scholar
- 11.Yu, L., Liu, H.: Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution. In: Proceedings of the Twentieth International Conference on Machine Leaning, Washington, D.C, pp. 856–863Google Scholar
- 12.Biesiada, J., Duch, W.: Feature Selection for High-Dimensional Data: A Kolmogorov-Smirnov Correlation-Based Filter. In: CORES, pp. 95–103 (2005)Google Scholar
- 13.Hall, M.A.: Correlation-based Feature Subset Selection for Machine Learning. PhD dissertation, Department of Computer Science. University of Waikatoa (1999)Google Scholar