A novel feature selection method based on normalized mutual information
In this paper, a novel feature selection method based on the normalization of the well-known mutual information measurement is presented. Our method is derived from an existing approach, the max-relevance and min-redundancy (mRMR) approach. We, however, propose to normalize the mutual information used in the method so that the domination of the relevance or of the redundancy can be eliminated. We borrow some commonly used recognition models including Support Vector Machine (SVM), k-Nearest-Neighbor (kNN), and Linear Discriminant Analysis (LDA) to compare our algorithm with the original (mRMR) and a recently improved version of the mRMR, the Normalized Mutual Information Feature Selection (NMIFS) algorithm. To avoid data-specific statements, we conduct our classification experiments using various datasets from the UCI machine learning repository. The results confirm that our feature selection method is more robust than the others with regard to classification accuracy.
KeywordsFeature selection Mutual information Minimal redundancy Maximal relevance
Unable to display preview. Download preview PDF.
- 1.Asuncion A, Newman DJ (2007) Uci machine learning repository. University of California, Irvine, School of Information and Computer Sciences. http://www.ics.uci.edu/~mlearn/MLRepository.html
- 4.Cawley GC, Talbot NLC, Girolami M (2007) Sparse multinomial logistic regression via Bayesian l1 regularisation. Adv Neural Inf Process Syst 19:209–216 Google Scholar
- 11.Fodor IK (2002) A survey of dimension reduction techniques. Technical report, Center for Applied Scientific Computing, Lawrence Livermore National Laboratory Google Scholar
- 13.Goulden CH (1956) Methods of statistical analysis, 2nd edn. Wiley, New York Google Scholar
- 15.Hall MA (1999) Correlation-based feature selection for machine learning. PhD thesis, The University of Waikato Google Scholar
- 25.Shen K-Q, Ong C-J, Li X-P (2008) Novel multi-class feature selection methods using sensitivity analysis of posterior probabilities. In: Proceedings of the IEEE international conference on systems, man and cybernetics, pp 1116–1121 Google Scholar
- 30.Yan R (2006) MatlabArsenal toolbox for classification algorithms. Informedia School of Computer Science, Carnegie Mellon University Google Scholar
- 31.Yang HH, Moody J (1999) Data visualization and feature selection: New algorithms for nongaussian data. In: Advances in neural information processing systems. MIT Press, Cambridge, pp 687–693 Google Scholar
- 32.Yu L, Liu H (2004) Redundancy based feature selection for microarray data. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining, pp 737–742 Google Scholar
- 34.Zhao Z, Morstatter F, Sharma S, Alelyani S, Anand A, Liu H (2010) Advancing feature selection research—asu feature selection repository. Technical report, School of Computing, Informatics, and Decision Systems Engineering, Arizona State University Google Scholar