Skip to main content

Advertisement

Log in

Feature Selection Method Based on Differential Correlation Information Entropy

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

Feature selection is one of the major aspects of pattern classification systems. In previous studies, Ding and Peng recognized the importance of feature selection and proposed a minimum redundancy feature selection method to minimize redundant features for sequential selection in microarray gene expression data. However, since the minimum redundancy feature selection method is used mainly to measure the dependency between random variables of mutual information, the results cannot be optimal without consideration of global feature selection. Therefore, based on the framework of minimum redundancy-maximum correlation, this paper introduces entropy to measure global feature selection and proposes a new feature subset evaluation method, differential correlation information entropy. In our function, different bivariate correlation metrics are selected. Then, the feature selection is completed through sequence forward search. Two different classification models are used on eleven standard data sets of the UCI machine learning knowledge base to compare various comparison algorithms, such as mRMR, reliefF and feature selection method with joint maximal information entropy, with our method. The experimental results show that feature selection based on our proposed method is obviously superior to that of other models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Ding C, Peng H (2003) Minimum redundancy feature selection from microarray gene expression data. In: Proceedings of the 2003 IEEE bioinformatics conference, CSB 2003 pp 523–528. https://doi.org/10.1109/CSB.2003.1227396

  2. Soltani M, Shammakhi MH, Khorram S, Sheikhzadeh H (2016) Combined mRMR filter and sparse Bayesian classifier for analysis of gene expression data. In: Proceedings—2016 2nd international conference of signal processing and intelligent systems. ICSPIS 2016 https://doi.org/10.1109/ICSPIS.2016.7869891

  3. Hanchuan P, Fuhui L, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226. https://doi.org/10.1109/TPAMI.2005.159

    Article  Google Scholar 

  4. Gu X, Guo J, Xiao L, Ming T, Li C (2019) A feature selection algorithm based on equal interval division and minimal-redundancy-maximal-relevance. Neural Process Lett. https://doi.org/10.1007/s11063-019-10144-3

    Article  Google Scholar 

  5. Zheng K, Wang X (2018) Feature selection method with joint maximal information entropy between features and class. Pattern Recogn 77:20–29. https://doi.org/10.1016/j.patcog.2017.12.008

    Article  Google Scholar 

  6. Zheng K, Wang X, Wu B, Wu T (2020) Feature subset selection combining maximal information entropy and maximal information coefficient. Appl Intell 50(2):487–501. https://doi.org/10.1007/s10489-019-01537-x

    Article  Google Scholar 

  7. Breiman L (2001) Statistical modeling: The two cultures. Stat Sci 16(3):199–215. https://doi.org/10.1214/ss/1009213726

    Article  MathSciNet  MATH  Google Scholar 

  8. Chen G, Chen J (2015) A novel wrapper method for feature selection and its applications. Neurocomputing 159:219–226. https://doi.org/10.1016/j.neucom.2015.01.070

    Article  Google Scholar 

  9. Maldonado S, Weber R (2009) A wrapper method for feature selection using support vector machines. Inf Sci 179(13):2208–2217. https://doi.org/10.1016/j.ins.2009.02.014

    Article  Google Scholar 

  10. Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1–2):273–324. https://doi.org/10.1016/s0004-3702(97)00043-x

    Article  MATH  Google Scholar 

  11. Guyon I, Elisseefl A (2006) An introduction to feature extraction. In: Studies in fuzziness and soft computing, vol 207, pp 1–25

  12. Blum AL, Langley P (1997) Selection of relevant features and examples in machine learning. Artif Intell 97(1–2):245

    Article  MathSciNet  Google Scholar 

  13. Caropreso MF, Matwin S, Sebastiani F (2001) A learner-independent evaluation of the usefulness of statistical phrases for automated text categorization (IGI Global), pp 78–102

  14. Mladenic D, Grobelnik M (1999) Feature selection for unbalanced class distribution and Naive Bayes. In: international conference on machine learning, pp 258–267

  15. Battiti R (1994) Using mutual information for selecting features in supervised neural-net learning. IEEE Trans Neural Netw 5(4):537–550. https://doi.org/10.1109/72.298224

    Article  Google Scholar 

  16. Yang HH, Moody J (2000) Data visualization and feature selection: new algorithms for non-Gaussian data. In: Advances in neural information processing systems. pp 687–693

  17. Fleuret F (2004) Fast binary feature selection with conditional mutual information. J Mach Learn Res 5:1531–1555

    MathSciNet  MATH  Google Scholar 

  18. Jakulin A (2005) Machine learning based on attribute interactions, Ph.D. thesis

  19. Meyer PE, Bontempi G (2006) On the use of variable complementarity for feature selection in cancer classification, Lecture Notes in Computer Science, Springer, Berlin, Berlin, vol 3907, pp 91–102

  20. Cadenas JM, Garrido MC, Martinez R (2013) Feature subset selection Filter–Wrapper based on low quality data. Expert Syst Appl 40(16):6241–6252. https://doi.org/10.1016/j.eswa.2013.05.051

    Article  Google Scholar 

  21. Liu Y, Zheng YF (2006) \(\text{ FS}_{{\rm SFS}}\): a novel feature selection method for support vector machines. Pattern Recogn 39(7):1333–1345. https://doi.org/10.1016/j.patcog.2005.10.006

    Article  MATH  Google Scholar 

  22. Chyzhyk D, Savio A, Grana M (2014) Evolutionary ELM wrapper feature selection for Alzheimer’s disease CAD on anatomical brain MRI. Neurocomputing 128:73–80. https://doi.org/10.1016/j.neucom.2013.01.065

    Article  Google Scholar 

  23. Erguzel T, Tas C, Cebi M (2015) A wrapper-based approach for feature selection and classification of major depressive disorder-bipolar disorders. Comput Biol Med. https://doi.org/10.1016/j.compbiomed.2015.06.021

    Article  Google Scholar 

  24. Reshef DN, Reshef YA, Finucane HK, Grossman SR, McVean G, Turnbaugh PJ, Lander ES, Mitzenmacher M, Sabeti PC (2011) Detecting novel associations in large data sets. Science 334(6062):1518–1524. https://doi.org/10.1126/science.1205438

    Article  MATH  Google Scholar 

  25. Bache K, Lichman M (2013) http://archive.ics.uci.edu/ml

  26. Roffo G, Melzi S, Castellani U, Vinciarelli A (2017) Infinite latent feature selection: a probabilistic latent graph-based ranking approach. Comput Vis Pattern Recogn https://doi.org/10.1109/ICCV.2017.156

  27. Roffo G, Melzi S, Cristani M (2015) In: IEEE international conference on computer vision (ICCV). pp 4202–4210. https://doi.org/10.1109/ICCV.2015.478

  28. Roffo G, Melzi S (2017) Ranking to learn: feature ranking and selection via eigenvector centrality. In: Lecture notes in computer science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol 10312 LNCS, pp 19–35. https://doi.org/10.1007/978-3-319-61461-8_2

  29. Kang S, Ko Y, Seo J (2013) Hierarchical speech-act classification for discourse analysis. Pattern Recognit Lett 34(10):1119–1124. https://doi.org/10.1016/j.patrec.2013.03.008

    Article  Google Scholar 

  30. Roffo G (2017) Computer Vision and Pattern Recognition. arXiv

  31. Kira K, Rendell LA (1992) Feature selection problem: traditional methods and a new algorithm. In: Proceedings tenth national conference on artificial intelligence pp 129–134

  32. Liu C, Wang W, Zhao Q, Shen X, Konan M (2017) A new feature selection method based on a validity index of feature subset. Pattern Recogn Lett 92:1–8. https://doi.org/10.1016/j.patrec.2017.03.018

    Article  Google Scholar 

Download references

Acknowledgements

This work was financially supported by the National Key R&D Program of China (Grant No. 2017YFB0802803), the Beijing Natural Science Foundation (4202002) and National College Students Innovation and Entrepreneurship Training Program in BJUT (GJDC-2020-01-09).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yixuan Yan.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, X., Yan, Y. & Ma, X. Feature Selection Method Based on Differential Correlation Information Entropy. Neural Process Lett 52, 1339–1358 (2020). https://doi.org/10.1007/s11063-020-10307-7

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-020-10307-7

Keywords

Navigation