Feature subset selection combining maximal information entropy and maximal information coefficient

  • Kangfeng Zheng
  • Xiujuan WangEmail author
  • Bin Wu
  • Tong Wu


Feature subset selection is an efficient step to reduce the dimension of data, which remains an active research field in decades. In order to develop highly accurate and fast searching feature subset selection algorithms, a filter feature subset selection method combining maximal information entropy (MIE) and the maximal information coefficient (MIC) is proposed in this paper. First, a new metric mMIE-mMIC is defined to minimize the MIE among features while maximizing the MIC between the features and the class label. The mMIE-mMIC algorithm is designed to evaluate whether a candidate subset is valid for classification. Second, two searching strategies are adopted to identify a suitable solution in the candidate subset space, including the binary particle swarm optimization algorithm (BPSO) and sequential forward selection (SFS). Finally, classification is performed on UCI datasets to validate the performance of our work compared to 9 existing methods. Experimental results show that in most cases, the proposed method behaves equally or better than the other 9 methods in terms of classification accuracy and F1-score.


Feature subset selection BPSO MIC SFS 



Xiujuan Wang is the corresponding author. This work was supported by the National Key R&D Program of China [NO. 2017YFB0802703] and the National Natural Science Foundation of China [NO.61602052].


  1. 1.
    Lei L, Hao Q, Zhong Z (2016) Mode Selection and Resource Allocation in Device-to-Device Communications With User Arrivals and Departures. IEEE Access 4:5209–5222CrossRefGoogle Scholar
  2. 2.
    Dougherty ER (2001) Small sample issues for microarray-based classification. Comp Funct Genomics 2(1):28–34CrossRefGoogle Scholar
  3. 3.
    Xue Y, Zhang L, Wang B et al (2018) Nonlinear feature selection using Gaussian kernel SVM-RFE for fault diagnosis. Appl Intell 48(10):3306–3331CrossRefGoogle Scholar
  4. 4.
    Fodor IK (2002) A survey of dimension reduction techniques. Neoplasia 7(5):475–485Google Scholar
  5. 5.
    Aldehim G, Wang W (2017) Determining appropriate approaches for using data in feature selection[J]. Int J Mach Learn Cybern 8(3):915–928CrossRefGoogle Scholar
  6. 6.
    Liu H, Yu L (2005) Toward integrating feature selection algorithms for classification and clustering. IEEE Trans Knowl Data Eng 17(4):491–502CrossRefGoogle Scholar
  7. 7.
    Liu Y, Tang F, Zeng Z (2015) Feature selection based on dependency margin. IEEE Transactions on Cybernetics 45(6):1209–1221CrossRefGoogle Scholar
  8. 8.
    Hadi Z, Niazi M (2016) Relevant based structure learning for feature selection. Eng Appl Artif Intell 55:93–102CrossRefGoogle Scholar
  9. 9.
    Zhou Z (2016) Machine Learning. Tsinghua PressGoogle Scholar
  10. 10.
    Kira K, Rendell LA (1992) A practical approach to feature selection, International Workshop on Machine Learning Morgan Kaufmann Publishers Inc. 249-256Google Scholar
  11. 11.
    Cai Z, Gu J, Chen H (2017) A new hybrid intelligent framework for predicting Parkinson’s disease. IEEE Access, PP(99):1-1Google Scholar
  12. 12.
    Song E et al. (2011) A feature selection approach to estimate discrimination capability of feature sub-set category, Journal of Huazhong University of Science & TechnologyGoogle Scholar
  13. 13.
    Cai H, Ng M (2012) Feature weighting by RELIEF based on local hyperplane approximation, Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining Springer-Verlag, 335-346Google Scholar
  14. 14.
    Baskar SS, Arockiam L (2014) C-LAS Relief-An improved feature selection technique in data mining. Int J Comput Appl 83(13):33–36Google Scholar
  15. 15.
    Khosravi MH , Bagherzadeh P (2018) A new method for feature selection based on intelligent water drops[J]. Applied IntelligenceGoogle Scholar
  16. 16.
    Qiao LY, Peng XY, Peng Y (2006) BPSO-SVM wrapper for feature subset selection. Acta Electron Sin 34(3):496–498Google Scholar
  17. 17.
    Wang Y, Feng L, Zhu J (2017) Novel artificial bee colony based feature selection method for filtering redundant information[J]. Appl Intell 48(3):868–885Google Scholar
  18. 18.
    Ran GB, Navot A, Tishby N (2004) Margin based feature selection - theory and algorithms, International Conference on Machine Learning ACM,43Google Scholar
  19. 19.
    Hedjazi L et al (2015) Membership-margin based feature selection for mixed type and high-dimensional data: Theory and applications. Inf Sci 322:174–196MathSciNetzbMATHCrossRefGoogle Scholar
  20. 20.
    Wei P et al (2014) Comparative analysis on margin based feature selection algorithms. Int J Mach Learn Cybern 5(3):339–367CrossRefGoogle Scholar
  21. 21.
    Ding C, Peng H, (2003) Minimum redundancy feature selection from microarray gene expression data, Bio-informatics Conference, 2003. Csb 2003. Proceedings of the. IEEE, 523-528Google Scholar
  22. 22.
    Peng H, Long F, Ding C Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238Google Scholar
  23. 23.
    Liu C, Wang W, Zhao Q et al (2017) A new feature selection method based on a validity index of feature subset[J]. Pattern Recogn Lett 92:1–8CrossRefGoogle Scholar
  24. 24.
    Che J et al. (2017) Maximum relevance minimum common redundancy feature selection for nonlinear data, Information Sciences 409Google Scholar
  25. 25.
    Roffo G , Melzi S , Castellani U , et al. (2017) [IEEE 2017 IEEE International Conference on Computer Vision (ICCV) - Venice (2017.10.22-2017.10.29)] 2017 IEEE International Conference on Computer Vision (ICCV) - Infinite Latent Feature Selection: A Probabilistic Latent Graph-Based Ranking Approach[J]. 1407-1415Google Scholar
  26. 26.
    Zheng K, Wang X (2018) Feature selection method with joint maximal information entropy between features and class. Pattern Recogn 77:20–29CrossRefGoogle Scholar
  27. 27.
    Xu Y, Ding YX, Ding J et al (2016) Mal-Lys: prediction of lysine malonylation sites in proteins integrated sequence-based features with mRMR feature selection. Sci Rep 6:38318CrossRefGoogle Scholar
  28. 28.
    Vinh LT, Lee S, Park YT et al (2012) A novel feature selection method based on normalized mutual information[J]. Appl Intell 37(1):100–120CrossRefGoogle Scholar
  29. 29.
    Zhao G, Liu S (2016) Estimation of discriminative feature subset using community modularity. Sci Rep 6:25040CrossRefGoogle Scholar
  30. 30.
    Geiβ S, Einax J (1996) Multivariate correlation analysis - a method for the analysis of multidimensional time series in environmental studies. Chemom Intell Lab Syst 32(1):57–65CrossRefGoogle Scholar
  31. 31.
    Hall MA (2000) Correlation-based feature selection for discrete and numeric class machine learning, Proceedings of the Seventeenth International Conference on Machine Learning Morgan Kaufmann Publishers Inc. 359-366Google Scholar
  32. 32.
    Reshef DN, Reshef YA, Finucane HK et al (2011) Detecting novel associations in large datasets. Science 334:1518–1524zbMATHCrossRefGoogle Scholar
  33. 33.
    Ya-hong Z, Li Y-j, Ting Z (2015) Detecting multivariable correlation with maximal information entropy. J Electron Inf Technol 37(1):123–129Google Scholar
  34. 34.
    Reshef D, Reshef Y, Sabeti P, Mitzenmacher M, MINE: Maximal Infor-mation-based Nonparametric Exploration. [Online], available at:
  35. 35.
    Unler A, Murat A, Chinnam RB (2011) mr 2 PSO: A maximum relevance minimum redundancy feature selection method based on swarm intelligence for support vector machine classification. Inf Sci 181(20):4625–4641CrossRefGoogle Scholar
  36. 36.
    Kennedy J, Eberhart R (1997) A discrete binary version of the particle swarm algorithm, IEEE International Conference on Systems, Man, and Cybernetics. IEEE Comput Cybern Simul 2002(5):4104–4108Google Scholar
  37. 37.
    UCI Machine Learning Repository.
  38. 38.
    Chang CC, Lin CJ (2011) LIBSVM: A library for support vector machines. ACM Trans Intell Syst Technol 2(3):389–396 Software available at CrossRefGoogle Scholar
  39. 39.
    Tibshirani RJ (1996) Regression Shrinkage and Selection via the LASSO[J]. J R Stat Soc Ser B Methodol 73(1):273–282MathSciNetzbMATHGoogle Scholar
  40. 40.
    He X, Cai D, Niyogi P (2005) Laplacian score for feature selection. In NIPSGoogle Scholar
  41. 41.
    Fisher: P. E. H. R. O. Duda and D. G. Stork. Pattern Classification. Wiley-Interscience Publication, 2001Google Scholar
  42. 42.
    Roffo G, Melzi S, Cristani M (2015) Infinite Feature Selection[C]// IEEE International Conference on Computer VisionGoogle Scholar
  43. 43.
    IRA Hamid JA, Kim TH (2013) Using feature selection and classification scheme for automating phishing email detection. Stud Inform Control 22(1):61–70Google Scholar
  44. 44.
    Toolan F, Carthy J (2009) Phishing detection using classifier ensembles, Ecrime Researchers Summit, IEEE Xplore, 1-9Google Scholar
  45. 45.
    Zhang Y et al (2014) A novel algorithm for the precise calculation of the maximal information coefficient. Sci Rep 4(4):6662Google Scholar
  46. 46.
    Kinney JB, Atwal GS (2014) Equitability, mutual information, and the maximal information coefficient[J]. Proc Natl Acad Sci U S A 111(9):3354MathSciNetzbMATHCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.School of Cyberspace SecurityBeijing University of Posts and TelecommunicationsBeijingChina
  2. 2.Faculty of Information TechnologyBeijing University of TechnologyBeijingChina

Personalised recommendations