Mutual Information Estimation for Filter Based Feature Selection Using Particle Swarm Optimization

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9597)


Feature selection is a pre-processing step in classification, which selects a small set of important features to improve the classification performance and efficiency. Mutual information is very popular in feature selection because it is able to detect non-linear relationship between features. However the existing mutual information approaches only consider two-way interaction between features. In addition, in most methods, mutual information is calculated by a counting approach, which may lead to an inaccurate results. This paper proposes a filter feature selection algorithm based on particle swarm optimization (PSO) named PSOMIE, which employs a novel fitness function using nearest neighbor mutual information estimation (NNE) to measure the quality of a feature set. PSOMIE is compared with using all features and two traditional feature selection approaches. The experiment results show that the mutual information estimation successfully guides PSO to search for a small number of features while maintaining or improving the classification performance over using all features and the traditional feature selection methods. In addition, PSOMIE provides a strong consistency between training and test results, which may be used to avoid overfitting problem.


Feature selection Mutual information estimation Particle swarm optimization 


  1. 1.
    Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)zbMATHGoogle Scholar
  2. 2.
    Liu, H., Motoda, H., Setiono, R., Zhao, Z.: Feature selection: an ever evolving frontier in data mining. FSDM 10, 4–13 (2010)Google Scholar
  3. 3.
    Whitney, A.W.: A direct method of nonparametric measurement selection. IEEE Trans. Comput. 100(9), 1100–1103 (1971)MathSciNetCrossRefzbMATHGoogle Scholar
  4. 4.
    Marill, T., Green, D.M.: On the effectiveness of receptors in recognition systems. IEEE Trans. Inf. Theory 9(1), 11–17 (1963)CrossRefGoogle Scholar
  5. 5.
    Nag, K., Pal, N.R.: A multiobjective genetic programming-based ensemble for simultaneous feature selection and classification. IEEE Trans. Cybern. 46(2), 499–510 (2015)CrossRefGoogle Scholar
  6. 6.
    Lin, F., Liang, D., Yeh, C.C., Huang, J.C.: Novel feature selection methods to financial distress prediction. Expert Syst. Appl. 41(5), 2472–2483 (2014)CrossRefGoogle Scholar
  7. 7.
    Chuang, L.Y., Chang, H.W., Tu, C.J., Yang, C.H.: Improved binary PSO for feature selection using gene expression data. Comput. Biol. Chem. 32(1), 29–38 (2008)CrossRefzbMATHGoogle Scholar
  8. 8.
    Hassan, R., Cohanim, B., De Weck, O., Venter, G.: A comparison of particle swarm optimization and the genetic algorithm. In: Proceedings of the 1st AIAA Multidisciplinary Design Optimization Specialist Conference, pp. 1–13 (2005)Google Scholar
  9. 9.
    Dash, M., Liu, H.: Feature selection for classification. Intell. Data Anal. 1(3), 131–156 (1997)CrossRefGoogle Scholar
  10. 10.
    Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97(1), 273–324 (1997)CrossRefzbMATHGoogle Scholar
  11. 11.
    Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. Wiley, New York (2012)zbMATHGoogle Scholar
  12. 12.
    Dash, M., Liu, H., Motoda, H.: Consistency based feature selection. In: Terano, T., Liu, H., Chen, A.L.P. (eds.) PAKDD 2000. LNCS (LNAI), vol. 1805, pp. 98–109. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  13. 13.
    Hall, M.: Correlation-based feature selection for discrete and numeric class machinelearning. In: Proceedings of 7th International Conference on Machine Learning, Stanford University (2000)Google Scholar
  14. 14.
    Kononenko, I.: On biases in estimating multi-valued attributes. In: IJCAI. vol. 95, pp. 1034–1040. Citeseer (1995)Google Scholar
  15. 15.
    Walters-Williams, J., Li, Y.: Estimation of mutual information: a survey. In: Wen, P., Li, Y., Polkowski, L., Yao, Y., Tsumoto, S., Wang, G. (eds.) RSKT 2009. LNCS, vol. 5589, pp. 389–396. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  16. 16.
    Kraskov, A., Stögbauer, H., Grassberger, P.: Estimating mutual information. Phys. Rev. E 69(6), 066138 (2004)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Kennedy, J., Eberhart, R., et al.: Particle swarm optimization. In: Proceedings of IEEE International Conference on Neural Networks, vol. 4, pp. 1942–1948. Perth, Australia (1995)Google Scholar
  18. 18.
    Jaynes, E.T.: Information theory and statistical mechanics. Phys. Rev. 106(4), 620 (1957)MathSciNetCrossRefzbMATHGoogle Scholar
  19. 19.
    Sturges, H.A.: The choice of a class interval. J. Am. Stat. Assoc. 21(153), 65–66 (1926)CrossRefGoogle Scholar
  20. 20.
    Parzen, E.: On estimation of a probability density function and mode. Ann. Math. Stat. 33(3), 1065–1076 (1962)MathSciNetCrossRefzbMATHGoogle Scholar
  21. 21.
    Doquire, G., Verleysen, M.: A performance evaluation of mutual information estimators for multivariate feature selection. In: Carmona, P.L., Salvado Sánchez, J., Fred, A.L.N. (eds.) ICPRAM 2012. AISC, vol. 204, pp. 51–63. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  22. 22.
    Stearns, S.D.: On selecting features for pattern classifiers. In: Proceedings of the 3rd International Conference on Pattern Recognition (ICPR 1976), pp. 71–75. Coronado, CA (1976)Google Scholar
  23. 23.
    Zhu, Z., Ong, Y.S., Dash, M.: Wrapper-filter feature selection algorithm using a memetic framework. IEEE Trans. Syst. Man Cybern. B Cybern. 37(1), 70–76 (2007)CrossRefGoogle Scholar
  24. 24.
    Neshatian, K., Zhang, M.: Genetic programming for feature subset ranking in binary classification problems. In: Vanneschi, L., Gustafson, S., Moraglio, A., Falco, I., Ebner, M. (eds.) EuroGP 2009. LNCS, vol. 5481, pp. 121–132. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  25. 25.
    Hunt, R., Neshatian, K., Zhang, M.: A genetic programming approach to hyper-heuristic feature selection. In: Bui, L.T., Ong, Y.S., Hoai, N.X., Ishibuchi, H., Suganthan, P.N. (eds.) SEAL 2012. LNCS, vol. 7673, pp. 320–330. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  26. 26.
    Sousa, P., Cortez, P., Vaz, R., Rocha, M., Rio, M.: Email spam detection: a symbiotic feature selection approach fostered by evolutionary computation. Int. J. Inf. Technol. Decis. Making 12(04), 863–884 (2013)CrossRefGoogle Scholar
  27. 27.
    Bhowan, U., McCloskey, D.: Genetic programming for feature selection and question-answer ranking in IBM watson. In: Machado, P., Heywood, M.I., McDermott, J., Castelli, M., García-Sánchez, P., Burelli, P., Risi, S., Sim, K. (eds.) EuroGP 2015. LNCS, vol. 9025, pp. 153–166. Springer, Heidelberg (2015)Google Scholar
  28. 28.
    Xue, B., Zhang, M., Browne, W.N.: Particle swarm optimisation for feature selection in classification: novel initialisation and updating mechanisms. Appl. Soft. Comput. 18, 261–276 (2014)CrossRefGoogle Scholar
  29. 29.
    Cervante, L., Xue, B., Zhang, M., Shang, L.: Binary particle swarm optimisation for feature selection: a filter based approach. In: 2012 IEEE Congress on Evolutionary Computation (CEC), pp. 1–8. IEEE (2012)Google Scholar
  30. 30.
    Butler-Yeoman, T., Xue, B., Zhang, M.: Particle swarm optimisation for feature selection: a hybrid filter-wrapper approach. In: 2015 IEEE Congress on Evolutionary Computation (CEC), pp. 2428–2435. IEEE (2015)Google Scholar
  31. 31.
    Xue, B., Zhang, M., Browne, W., Yao, X.: A survey on evolutionary computation approaches to feature selection. IEEE Trans. Evol. Comput. published online on 30 November 2015. doi: 10.1109/TEVC.2015.2504420 Google Scholar
  32. 32.
    Asuncion, A., Newman, D.: UCI machine learning repository (2007)Google Scholar
  33. 33.
    Van Den Bergh, F.: An analysis of particle swarm optimizers. PhD thesis, University of Pretoria (2006)Google Scholar
  34. 34.
    Zhai, Y., Ong, Y.S., Tsang, I.W.: The emerging big dimensionality. IEEE Comput. Intell. Mag. 9(3), 14–26 (2014)CrossRefGoogle Scholar
  35. 35.
    Eberhart, R.C., Shi, Y.: Comparing inertia weights and constriction factors in particle swarm optimization. In: Proceedings of the 2000 Congress on Evolutionary Computation, vol. 1, pp. 84–88. IEEE (2000)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.School of Engineering and Computer ScienceVictoria University of WellingtonWellingtonNew Zealand

Personalised recommendations