Abstract
Feature selection is one of the major aspects of pattern classification systems. In previous studies, Ding and Peng recognized the importance of feature selection and proposed a minimum redundancy feature selection method to minimize redundant features for sequential selection in microarray gene expression data. However, since the minimum redundancy feature selection method is used mainly to measure the dependency between random variables of mutual information, the results cannot be optimal without consideration of global feature selection. Therefore, based on the framework of minimum redundancy-maximum correlation, this paper introduces entropy to measure global feature selection and proposes a new feature subset evaluation method, differential correlation information entropy. In our function, different bivariate correlation metrics are selected. Then, the feature selection is completed through sequence forward search. Two different classification models are used on eleven standard data sets of the UCI machine learning knowledge base to compare various comparison algorithms, such as mRMR, reliefF and feature selection method with joint maximal information entropy, with our method. The experimental results show that feature selection based on our proposed method is obviously superior to that of other models.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Ding C, Peng H (2003) Minimum redundancy feature selection from microarray gene expression data. In: Proceedings of the 2003 IEEE bioinformatics conference, CSB 2003 pp 523–528. https://doi.org/10.1109/CSB.2003.1227396
Soltani M, Shammakhi MH, Khorram S, Sheikhzadeh H (2016) Combined mRMR filter and sparse Bayesian classifier for analysis of gene expression data. In: Proceedings—2016 2nd international conference of signal processing and intelligent systems. ICSPIS 2016 https://doi.org/10.1109/ICSPIS.2016.7869891
Hanchuan P, Fuhui L, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226. https://doi.org/10.1109/TPAMI.2005.159
Gu X, Guo J, Xiao L, Ming T, Li C (2019) A feature selection algorithm based on equal interval division and minimal-redundancy-maximal-relevance. Neural Process Lett. https://doi.org/10.1007/s11063-019-10144-3
Zheng K, Wang X (2018) Feature selection method with joint maximal information entropy between features and class. Pattern Recogn 77:20–29. https://doi.org/10.1016/j.patcog.2017.12.008
Zheng K, Wang X, Wu B, Wu T (2020) Feature subset selection combining maximal information entropy and maximal information coefficient. Appl Intell 50(2):487–501. https://doi.org/10.1007/s10489-019-01537-x
Breiman L (2001) Statistical modeling: The two cultures. Stat Sci 16(3):199–215. https://doi.org/10.1214/ss/1009213726
Chen G, Chen J (2015) A novel wrapper method for feature selection and its applications. Neurocomputing 159:219–226. https://doi.org/10.1016/j.neucom.2015.01.070
Maldonado S, Weber R (2009) A wrapper method for feature selection using support vector machines. Inf Sci 179(13):2208–2217. https://doi.org/10.1016/j.ins.2009.02.014
Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1–2):273–324. https://doi.org/10.1016/s0004-3702(97)00043-x
Guyon I, Elisseefl A (2006) An introduction to feature extraction. In: Studies in fuzziness and soft computing, vol 207, pp 1–25
Blum AL, Langley P (1997) Selection of relevant features and examples in machine learning. Artif Intell 97(1–2):245
Caropreso MF, Matwin S, Sebastiani F (2001) A learner-independent evaluation of the usefulness of statistical phrases for automated text categorization (IGI Global), pp 78–102
Mladenic D, Grobelnik M (1999) Feature selection for unbalanced class distribution and Naive Bayes. In: international conference on machine learning, pp 258–267
Battiti R (1994) Using mutual information for selecting features in supervised neural-net learning. IEEE Trans Neural Netw 5(4):537–550. https://doi.org/10.1109/72.298224
Yang HH, Moody J (2000) Data visualization and feature selection: new algorithms for non-Gaussian data. In: Advances in neural information processing systems. pp 687–693
Fleuret F (2004) Fast binary feature selection with conditional mutual information. J Mach Learn Res 5:1531–1555
Jakulin A (2005) Machine learning based on attribute interactions, Ph.D. thesis
Meyer PE, Bontempi G (2006) On the use of variable complementarity for feature selection in cancer classification, Lecture Notes in Computer Science, Springer, Berlin, Berlin, vol 3907, pp 91–102
Cadenas JM, Garrido MC, Martinez R (2013) Feature subset selection Filter–Wrapper based on low quality data. Expert Syst Appl 40(16):6241–6252. https://doi.org/10.1016/j.eswa.2013.05.051
Liu Y, Zheng YF (2006) \(\text{ FS}_{{\rm SFS}}\): a novel feature selection method for support vector machines. Pattern Recogn 39(7):1333–1345. https://doi.org/10.1016/j.patcog.2005.10.006
Chyzhyk D, Savio A, Grana M (2014) Evolutionary ELM wrapper feature selection for Alzheimer’s disease CAD on anatomical brain MRI. Neurocomputing 128:73–80. https://doi.org/10.1016/j.neucom.2013.01.065
Erguzel T, Tas C, Cebi M (2015) A wrapper-based approach for feature selection and classification of major depressive disorder-bipolar disorders. Comput Biol Med. https://doi.org/10.1016/j.compbiomed.2015.06.021
Reshef DN, Reshef YA, Finucane HK, Grossman SR, McVean G, Turnbaugh PJ, Lander ES, Mitzenmacher M, Sabeti PC (2011) Detecting novel associations in large data sets. Science 334(6062):1518–1524. https://doi.org/10.1126/science.1205438
Bache K, Lichman M (2013) http://archive.ics.uci.edu/ml
Roffo G, Melzi S, Castellani U, Vinciarelli A (2017) Infinite latent feature selection: a probabilistic latent graph-based ranking approach. Comput Vis Pattern Recogn https://doi.org/10.1109/ICCV.2017.156
Roffo G, Melzi S, Cristani M (2015) In: IEEE international conference on computer vision (ICCV). pp 4202–4210. https://doi.org/10.1109/ICCV.2015.478
Roffo G, Melzi S (2017) Ranking to learn: feature ranking and selection via eigenvector centrality. In: Lecture notes in computer science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol 10312 LNCS, pp 19–35. https://doi.org/10.1007/978-3-319-61461-8_2
Kang S, Ko Y, Seo J (2013) Hierarchical speech-act classification for discourse analysis. Pattern Recognit Lett 34(10):1119–1124. https://doi.org/10.1016/j.patrec.2013.03.008
Roffo G (2017) Computer Vision and Pattern Recognition. arXiv
Kira K, Rendell LA (1992) Feature selection problem: traditional methods and a new algorithm. In: Proceedings tenth national conference on artificial intelligence pp 129–134
Liu C, Wang W, Zhao Q, Shen X, Konan M (2017) A new feature selection method based on a validity index of feature subset. Pattern Recogn Lett 92:1–8. https://doi.org/10.1016/j.patrec.2017.03.018
Acknowledgements
This work was financially supported by the National Key R&D Program of China (Grant No. 2017YFB0802803), the Beijing Natural Science Foundation (4202002) and National College Students Innovation and Entrepreneurship Training Program in BJUT (GJDC-2020-01-09).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wang, X., Yan, Y. & Ma, X. Feature Selection Method Based on Differential Correlation Information Entropy. Neural Process Lett 52, 1339–1358 (2020). https://doi.org/10.1007/s11063-020-10307-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-020-10307-7