Data Preprocessing in Data Mining pp 163-193

Part of the Intelligent Systems Reference Library book series (ISRL, volume 72) | Cite as

Feature Selection

  • Salvador García
  • Julián Luengo
  • Francisco Herrera
Chapter

Abstract

In this chapter, one of the most commonly used techniques for dimensionality and data reduction will be described. The feature selection problem will be discussed and the main aspects and methods will be analyzed. The chapter starts with the topics theoretical background (Sect. 7.1), dividing it into the major perspectives (Sect. 7.2) and the main aspects, including applications and the evaluation of feature selections methods (Sect. 7.3). From this point on, the successive sections make a tour from the classical approaches, to the most advanced proposals, in Sect. 7.4. Focusing on hybridizations, better optimization models and derivatives methods related with feature selection, Sect. 7.5 provides a summary on related and advanced topics, such as feature construction and feature extraction. An enumeration of some comparative experimental studies conducted in the specialized literature is included in Sect. 7.6.

References

  1. 1.
    Aha, D.W. (ed.): Lazy Learning. Springer, Berlin (2010)Google Scholar
  2. 2.
    Almuallim, H., Dietterich, T.G.: Learning with many irrelevant features. In: Proceedings of the Ninth National Conference on Artificial Intelligence, pp. 547–552 (1991)Google Scholar
  3. 3.
    Arauzo-Azofra, A., Aznarte, J., Benítez, J.: Empirical study of feature selection methods based on individual feature evaluation for classification problems. Expert Syst. Appl. 38(7), 8170–8177 (2011)CrossRefGoogle Scholar
  4. 4.
    Arauzo-Azofra, A., Benítez, J., Castro, J.: Consistency measures for feature selection. J. Intell. Inf. Syst. 30(3), 273–292 (2008)CrossRefGoogle Scholar
  5. 5.
    Battiti, R.: Using mutual information for selection features in supervised neural net learning. IEEE Trans. Neural Netw. 5(4), 537–550 (1994)CrossRefGoogle Scholar
  6. 6.
    Blum, A.L., Langley, P.: Selection of relevant features and examples in machine learning. Artif. Intell. 97(1–2), 245–271 (1997)MATHMathSciNetCrossRefGoogle Scholar
  7. 7.
    Bolón-Canedo, V., Sánchez-Maroño, N., Alonso-Betanzos, A.: A review of feature selection methods on synthetic data. Knowl. Inf. Syst. 34(3), 483–519 (2013)CrossRefGoogle Scholar
  8. 8.
    Brown, G., Pocock, A., Zhao, M.J., Luján, M.: Conditional likelihood maximisation: A unifying framework for information theoretic feature selection. J. Mach. Learn. Res. 13, 27–66 (2012)MATHMathSciNetGoogle Scholar
  9. 9.
    Cornelis, C., Jensen, R., Hurtado, G., Slezak, D.: Attribute selection with fuzzy decision reducts. Inf. Sci. 180(2), 209–224 (2010)MATHMathSciNetCrossRefGoogle Scholar
  10. 10.
    Dash, M., Liu, H.: Feature selection for classification. Intell. Data Anal. 1(3), 131–156 (1997)CrossRefGoogle Scholar
  11. 11.
    Dash, M., Liu, H.: Consistency-based search in feature selection. Artif. Intell. 151(1–2), 155–176 (2003)MATHMathSciNetCrossRefGoogle Scholar
  12. 12.
    Doak, J.: An Evaluation of Feature Selection Methods and Their Application to Computer Security. Tech. rep, UC Davis Department of Computer Science (1992)Google Scholar
  13. 13.
    Elghazel, H., Aussem, A.: Unsupervised feature selection with ensemble learning. Machine Learning, pp. 1–24. Springer, Berlin (2013)Google Scholar
  14. 14.
    Estévez, P., Tesmer, M., Perez, C., Zurada, J.: Normalized mutual information feature selection. IEEE Trans. Neural Netw. 20(2), 189–201 (2009)CrossRefGoogle Scholar
  15. 15.
    Gunal, S., Edizkan, R.: Subspace based feature selection for pattern recognition. Inf. Sci. 178(19), 3716–3726 (2008)CrossRefGoogle Scholar
  16. 16.
    Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)MATHGoogle Scholar
  17. 17.
    Hu, Q., Yu, D., Liu, J., Wu, C.: Neighborhood rough set based heterogeneous feature subset selection. Inf. Sci. 178(18), 3577–3594 (2008)MATHMathSciNetCrossRefGoogle Scholar
  18. 18.
    Jain, A.: Feature selection: evaluation, application, and small sample performance. IEEE Trans. Pattern Anal. Mach. Intell. 19(2), 153–158 (1997)CrossRefGoogle Scholar
  19. 19.
    Javed, K., Babri, H., Saeed, M.: Feature selection based on class-dependent densities for high-dimensional binary data. IEEE Trans. Knowl. Data Eng. 24(3), 465–477 (2012)CrossRefGoogle Scholar
  20. 20.
    Jensen, R., Shen, Q.: Fuzzy-rough sets assisted attribute selection. IEEE Trans. Fuzzy Syst. 15(1), 73–89 (2007)CrossRefGoogle Scholar
  21. 21.
    Kalousis, A., Prados, J., Hilario, M.: Stability of feature selection algorithms: A study on high-dimensional spaces. Knowl. Inf. Syst. 12(1), 95–116 (2007)CrossRefGoogle Scholar
  22. 22.
    Kira, K., Rendell, L.A.: A practical approach to feature selection. In: Proceedings of the Ninth International Workshop on Machine Learning, ML92, pp. 249–256 (1992)Google Scholar
  23. 23.
    Kohavi, R., John, G.: Wrappers for feature subset selection. Artif. Intell. 97(1–2), 273–324 (1997)MATHCrossRefGoogle Scholar
  24. 24.
    Koller, D., Sahami, M.: Toward optimal feature selection. In: Proceedings of the Thirteenth International Conference on Machine Learning, pp. 284–292 (1996)Google Scholar
  25. 25.
    Kononenko, I.: Estimating attributes: Analysis and extensions of relief. In: Proceedings of the European Conference on Machine Learning on Machine Learning, ECML-94, pp. 171–182 (1994)Google Scholar
  26. 26.
    Kudo, M., Sklansky, J.: Comparison of algorithms that select features for pattern classifiers. Pattern Recognit. 33(1), 25–41 (2000)CrossRefGoogle Scholar
  27. 27.
    Kwak, N., Choi, C.H.: Input feature selection by mutual information based on parzen window. IEEE Trans. Pattern Anal. Mach. Intell. 24(12), 1667–1671 (2002)CrossRefGoogle Scholar
  28. 28.
    Kwak, N., Choi, C.H.: Input feature selection for classification problems. IEEE Trans. Neural Netw. 13(1), 143–159 (2002)CrossRefGoogle Scholar
  29. 29.
    Liu, H., Motoda, H.: Feature Selection for Knowledge Discovery and Data Mining. Kluwer Academic, USA (1998)MATHCrossRefGoogle Scholar
  30. 30.
    Liu, H., Motoda, H., Dash, M.: A monotonic measure for optimal feature selection. In: Proceedings of European Conference of Machine Learning, Lecture Notes in Computer Science vol. 1398, pp. 101–106 (1998)Google Scholar
  31. 31.
    Liu, H., Setiono, R.: A probabilistic approach to feature selection - a filter solution. In: Proceedings of the International Conference on Machine Learning (ICML), pp. 319–327 (1996)Google Scholar
  32. 32.
    Liu, H., Sun, J., Liu, L., Zhang, H.: Feature selection with dynamic mutual information. Pattern Recognit. 42(7), 1330–1339 (2009)MATHMathSciNetCrossRefGoogle Scholar
  33. 33.
    Liu, H., Yu, L.: Toward integrating feature selection algorithms for classification and clustering. IEEE Trans. Knowl. Data Eng. 17(4), 491–502 (2005)CrossRefGoogle Scholar
  34. 34.
    Maldonado, S., Weber, R.: A wrapper method for feature selection using support vector machines. Inf. Sci. 179(13), 2208–2217 (2009)CrossRefGoogle Scholar
  35. 35.
    Mitra, P., Murthy, C., Pal, S.: Unsupervised feature selection using feature similarity. IEEE Trans. Pattern Anal. Mach. Intell. 24(3), 301–312 (2002)CrossRefGoogle Scholar
  36. 36.
    Modha, D., Spangler, W.: Feature weighting on k-means clustering. Mach. Learn. 52(3), 217–237 (2003)MATHCrossRefGoogle Scholar
  37. 37.
    Mucciardi, A.N., Gose, E.E.: A Comparison of Seven Techniques for Choosing Subsets of Pattern Recognition Properties, pp. 1023–1031. IEEE, India (1971)Google Scholar
  38. 38.
    Narendra, P.M., Fukunaga, K.: A branch and bound algorithm for feature subset selection. IEEE Trans. Comput. 26(9), 917–922 (1977)MATHCrossRefGoogle Scholar
  39. 39.
    Nguyen, M., de la Torre, F.: Optimal feature selection for support vector machines. Pattern Recognit. 43(3), 584–591 (2010)MATHCrossRefGoogle Scholar
  40. 40.
    Oh, I.S., Lee, J.S., Moon, B.R.: Hybrid genetic algorithms for feature selection. IEEE Trans. Pattern Anal. Mach. Intell. 26(11), 1424–1437 (2004)CrossRefGoogle Scholar
  41. 41.
    Opitz, D.W.: Feature selection for ensembles. In: Proceedings of the National Conference on Artificial Intelligence, pp. 379–384 (1999)Google Scholar
  42. 42.
    Peng, H., Long, F., Ding, C.: Feature selection based on mutual information: Criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005)CrossRefGoogle Scholar
  43. 43.
    Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, USA (1993)Google Scholar
  44. 44.
    Raymer, M., Punch, W., Goodman, E., Kuhn, L., Jain, A.: Dimensionality reduction using genetic algorithms. IEEE Trans. Evolut. Comput. 4(2), 164–171 (2000)CrossRefGoogle Scholar
  45. 45.
    Robnik-Łikonja, M., Kononenko, I.: Theoretical and empirical analysis of relieff and rrelieff. Mach. Learn. 53(1–2), 23–69 (2003)CrossRefGoogle Scholar
  46. 46.
    Rodriguez-Lujan, I., Huerta, R., Elkan, C., Cruz, C.: Quadratic programming feature selection. J. Mach. Learn. Res. 11, 1491–1516 (2010)MATHMathSciNetGoogle Scholar
  47. 47.
    Rokach, L.: Data Mining with Decision Trees: Theory and Applications. Series in Machine Perception and Artificial Intelligence. World Scientific Publishing, USA (2007)Google Scholar
  48. 48.
    Saeys, Y., Inza, I., Larrañaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23(19), 2507–2517 (2007)CrossRefGoogle Scholar
  49. 49.
    Setiono, R., Liu, H.: Neural-network feature selector. IEEE Trans. Neural Netw. 8(3), 654–662 (1997)CrossRefGoogle Scholar
  50. 50.
    Sima, C., Attoor, S., Brag-Neto, U., Lowey, J., Suh, E., Dougherty, E.: Impact of error estimation on feature selection. Pattern Recognit. 38(12), 2472–2482 (2005)CrossRefGoogle Scholar
  51. 51.
    Song, L., Smola, A., Gretton, A., Bedo, J., Borgwardt, K.: Feature selection via dependence maximization. J. Mach. Learn. Res. 13, 1393–1434 (2012)MATHMathSciNetGoogle Scholar
  52. 52.
    Sun, Y.: Iterative relief for feature weighting: Algorithms, theories, and applications. IEEE Trans. Pattern Anal. Mach. Intell. 29(6), 1035–1051 (2007)CrossRefGoogle Scholar
  53. 53.
    Sun, Y., Todorovic, S., Goodison, S.: Local-learning-based feature selection for high-dimensional data analysis. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1610–1626 (2010)CrossRefGoogle Scholar
  54. 54.
    Swiniarski, R., Skowron, A.: Rough set methods in feature selection and recognition. Pattern Recognit. Lett. 24(6), 833–849 (2003)MATHCrossRefGoogle Scholar
  55. 55.
    Tuv, E., Borisov, A., Runger, G., Torkkola, K.: Feature selection with ensembles, artificial variables, and redundancy elimination. J. Mach. Learn. Res. 10, 1341–1366 (2009)MATHMathSciNetGoogle Scholar
  56. 56.
    Uncu, O., Trksen, I.: A novel feature selection approach: Combining feature wrappers and filters. Inf. Sci. 177(2), 449–466 (2007)MATHCrossRefGoogle Scholar
  57. 57.
    Wang, L.: Feature selection with kernel class separability. IEEE Trans. Pattern Anal. Mach. Intell. 30(9), 1534–1546 (2008)CrossRefGoogle Scholar
  58. 58.
    Wang, X., Yang, J., Teng, X., Xia, W., Jensen, R.: Feature selection based on rough sets and particle swarm optimization. Pattern Recognit. Lett. 28(4), 459–471 (2007)CrossRefGoogle Scholar
  59. 59.
    Wei, H.L., Billings, S.: Feature subset selection and ranking for data dimensionality reduction. IEEE Trans. Pattern Anal. Mach. Intell. 29(1), 162–166 (2007)CrossRefGoogle Scholar
  60. 60.
    Xu, L., Yan, P., Chang, T.: Best first strategy for feature selection. In: Proceedings of the Ninth International Conference on Pattern Recognition, pp. 706–708 (1988)Google Scholar
  61. 61.
    Zhang, H., Sun, G.: Feature selection using tabu search method. Pattern Recognit. 35(3), 701–711 (2002)MATHCrossRefGoogle Scholar
  62. 62.
    Zhao, Z., Wang, L., Liu, H., Ye, J.: On similarity preserving feature selection. IEEE Trans. Knowl. Data Eng. 25(3), 619–632 (2013)CrossRefGoogle Scholar
  63. 63.
    Zhao, Z., Zhang, R., Cox, J., Duling, D., Sarle, W.: Massively parallel feature selection: An approach based on variance preservation. Mach. Learn. 92(1), 195–220 (2013)MATHMathSciNetCrossRefGoogle Scholar
  64. 64.
    Zhou, L., Wang, L., Shen, C.: Feature selection with redundancy-constrained class separability. IEEE Trans. Neural Netw. 21(5), 853–858 (2010)CrossRefGoogle Scholar
  65. 65.
    Zhu, Z., Ong, Y.S., Dash, M.: Wrapper-filter feature selection algorithm using a memetic framework. IEEE Trans. Syst. Man Cybern. Part B: Cybern. 37(1), 70–76 (2007)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Salvador García
    • 1
  • Julián Luengo
    • 2
  • Francisco Herrera
    • 3
  1. 1.Department of Computer ScienceUniversity of JaénJaénSpain
  2. 2.Department of Civil EngineeringUniversity of BurgosBurgosSpain
  3. 3.Department of Computer Science and Artificial IntelligenceUniversity of GranadaGranadaSpain

Personalised recommendations