Advertisement

Knowledge and Information Systems

, Volume 34, Issue 3, pp 483–519 | Cite as

A review of feature selection methods on synthetic data

  • Verónica Bolón-CanedoEmail author
  • Noelia Sánchez-Maroño
  • Amparo Alonso-Betanzos
Regular Paper

Abstract

With the advent of high dimensionality, adequate identification of relevant features of the data has become indispensable in real-world scenarios. In this context, the importance of feature selection is beyond doubt and different methods have been developed. However, with such a vast body of algorithms available, choosing the adequate feature selection method is not an easy-to-solve question and it is necessary to check their effectiveness on different situations. Nevertheless, the assessment of relevant features is difficult in real datasets and so an interesting option is to use artificial data. In this paper, several synthetic datasets are employed for this purpose, aiming at reviewing the performance of feature selection methods in the presence of a crescent number or irrelevant features, noise in the data, redundancy and interaction between attributes, as well as a small ratio between number of samples and number of features. Seven filters, two embedded methods, and two wrappers are applied over eleven synthetic datasets, tested by four classifiers, so as to be able to choose a robust method, paving the way for its application to real datasets.

Keywords

Feature selection Filters Embedded methods Wrappers Synthetic datasets 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Kalousis A, Prados J, Hilario M (2007) Stability of feature selection algorithms: a study on high-dimensional spaces. Knowl Inf Syst 12(1): 95–116CrossRefGoogle Scholar
  2. 2.
    Yang Y, Pederson JO (2003) A comparative study on feature selection in text categorization. In: Proceedings of the 20th international conference on machine learning, pp 856–863Google Scholar
  3. 3.
    Yu L, Liu H (2004) Efficient feature selection via analysis of relevance and redundancy. J Mach Learn Res 5: 1205–1224zbMATHGoogle Scholar
  4. 4.
    Provost F (2000) Distributed data mining: scaling up and beyond. In: Kargupta H, Chan P (eds) Advances in distributed data mining. Morgan Kaufmann, San FranciscoGoogle Scholar
  5. 5.
    Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3: 1157–1182zbMATHGoogle Scholar
  6. 6.
    Guyon I, Gunn S, Nikravesh M, Zadeh L (2006) Feature extraction, foundations and applications. Springer, HeidelbergzbMATHCrossRefGoogle Scholar
  7. 7.
    Yu L, Liu H (2004) Redundancy based feature selection for microarray data. In: Proceedings of the 10th ACM SIGKDD conference on knowledge discovery and data mining, pp 737–742Google Scholar
  8. 8.
    Bolón-Canedo V, Sánchez-Maroño N, Alonso-Betanzos A (2011) Feature selection and classification in multiple class datasets: an application to KDD Cup 99 dataset. J Expert Syst Appl 38(5): 5947–5957CrossRefGoogle Scholar
  9. 9.
    Lee W, Stolfo SJ, Mok KW (2000) Adaptive intrusion detection: a data mining approach. Artif Intell Rev 14(6): 533–567zbMATHCrossRefGoogle Scholar
  10. 10.
    Forman G (2003) An extensive empirical study of feature selection metrics for text classification. J Mach Learn Res 3: 1289–1305zbMATHGoogle Scholar
  11. 11.
    Gomez JC, Boiy E, Moens MF (2011) Highly discriminative statistical features for email classification. Knowl Inf Syst. doi: 10.1007/s10115-011-0403-7
  12. 12.
    Egozi O, Gabrilovich E, Markovitch S (2008) Concept-based feature generation and selection for information retrieval. In: Proceedings of the twenty-third AAAI conference on artificial intelligence, pp 1132–1137Google Scholar
  13. 13.
    Dy JG, Brodley CE, Kak AC, Broderick LS, Aisen AM (2003) Unsupervised feature selection applied to content-based retrieval of lung images. IEEE Trans Pattern Anal Mach Intell 25(3): 373–378CrossRefGoogle Scholar
  14. 14.
    Saari P, Eerola T, Lartillot O (2011) Generalizability and simplicity as criteria in feature selection: application to mood classification in music. IEEE Trans Audio Speech Lang 19(6):1802–1812Google Scholar
  15. 15.
    Dash M, Liu H (1997) Feature selection for classification. Intell Data Anal 1: 131–156CrossRefGoogle Scholar
  16. 16.
    Zhang Y, Ding C, Li T (2008) Gene selection algorithm by combining relief and mrmr. BMC Genomics 9(Suppl 2): S27. doi: 10.1186/1471-2164-9-S2-S27 CrossRefGoogle Scholar
  17. 17.
    Abraham R Dimensionality reduction through bagged feature selector for medical data miningGoogle Scholar
  18. 18.
    Peng Y, Wu Z, Jiang J (2010) A novel feature selection approach for biomedical data classification. J Biomed Inf 43(1): 15–23CrossRefGoogle Scholar
  19. 19.
    El Akadi A, Amine A, El Ouardighi A, Aboutajdine D (2011) A two-stage gene selection scheme utilizing MRMR filter and GA wrapper. Knowl Inf Syst 26(3): 487–500CrossRefGoogle Scholar
  20. 20.
    Vainer I, Kraus S, Kaminka GA, Slovin H (2010) Obtaining scalable and accurate classification in large-scale spatio-temporal domains. Knowl Inf Syst. doi: 10.1007/s10115-010-0348-2
  21. 21.
    Tuv E, Borisov A, Runger G (2009) Feature selection with ensembles, artificial variables, and redundancy elimination. J Mach Learn Res 10: 1341–1366MathSciNetzbMATHGoogle Scholar
  22. 22.
    Sun Y, Li J (2006) Iterative RELIEF for feature weighting. In: Proceedings of the 21st international conference on machine learning, pp 913–920Google Scholar
  23. 23.
    Sun Y, Todorovic S, Goodison S (2008) A feature selection algorithm capable of handling extremely large data dimensionality. In: Proceedings of the 8th SIAM international conference on data mining, pp 530–540Google Scholar
  24. 24.
    Chidlovskii B, Lecerf L (2008) Scalable feature selection for multi-class problems. Mach Learn Knowl Discov Databases 5211: 227–240CrossRefGoogle Scholar
  25. 25.
    Loscalzo S, Yu L, Ding C (2009) Consensus group based stable feature selection. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, pp 567–576Google Scholar
  26. 26.
    Saeys Y, Abeel T, Peer Y (2008) Robust feature selection using ensemble feature selection techniques. In: Proceedings of the European conference on machine learning and knowledge discovery in databases—part II, pp 313–325Google Scholar
  27. 27.
    Bolon-Canedo V, Sanchez-Maroño N, Alonso-Betanzos A (2012) An ensemble of filters and classifiers for microarray data classification. J Pattern Recognit 45: 531–539CrossRefGoogle Scholar
  28. 28.
    Sun Y, Babbs CF, Delp EJ (2005) A comparison of feature selection methods for the detection of breast cancers in mammograms: adaptive sequential floating search vs. genetic algorithm. In: Proceedings of the IEEE conference on engineering in medicine and biology society, pp 6532–6535Google Scholar
  29. 29.
    Ramaswami M, Bhaskaran R (2009) A study on feature selection techniques in educational data mining. Int J Adv Comput Sci Appl 2(1): 7–11Google Scholar
  30. 30.
    Liu H, Liu L, Zhang H (2008) Feature selection using mutual information: an experimental study. In: Proceedings of the 10th Pacific rim international conference on artificial intelligence: trends in artificial intelligence, pp 235–246Google Scholar
  31. 31.
    Beretta L, Santaniello A (2011) Implementing ReliefF filters to extract meaningful features from genetic lifetime datasets. J Biomed Inf 44(2): 361–369CrossRefGoogle Scholar
  32. 32.
    Zhang ML, Peña JM, Robles V (2009) Feature selection for multi-label naive Bayes classification. J Inf Sci 179(19): 3218–3229zbMATHCrossRefGoogle Scholar
  33. 33.
    Perner P, Apte C (2000) Empirical evaluation of feature subset selection on a real-world data set. In: Proceedings of conference on principles of data mining and knowledge discovery, pp 575–580Google Scholar
  34. 34.
    Victo Sudha G, Cyril Raj V (2011) Review on feature selection techniques and the impact of SVM for cancer classification using gene expression profile. Int J Comput Sci Eng Survey. doi: 10.5121/ijcses.2011.2302
  35. 35.
    Li T, Zhang C, Ogihara M (2004) A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. J Bioinf 20(15): 2429–2437CrossRefGoogle Scholar
  36. 36.
    Hua J, Tembe W, Dougherty E (2009) Performance of feature-selection methods in the classification of high-dimension data. J Pattern Recognit 42(3): 409–424zbMATHCrossRefGoogle Scholar
  37. 37.
    Bontempi G, Meyer PE (2010) Causal filter selection in microarray data. In: Proceedings of the 27th international conference on machine learning, pp 95–102Google Scholar
  38. 38.
    Aliferis CF, Statnikov A, Tsamardinos I, Mani S, Koutsoukos XD (2010) Local causal and Markov blanket induction for causal discovery and feature selection for classification part I: algorithms and empirical evaluation. J Mach Learn Res 11: 171–234MathSciNetzbMATHGoogle Scholar
  39. 39.
    Byeon B, Rasheed K (2008) Simultaneously removing noise and selecting relevant features for high dimensional noisy data. In: Proceedings of the 2008 seventh international conference on machine learning and applications, pp 147–152Google Scholar
  40. 40.
    Yang SH, Hu BG (2008) Efficient feature selection in the presence of outliers and noises. In: Proceedings of the 4th Asia information retrieval conference on information retrieval technology, pp 184–191Google Scholar
  41. 41.
    Guyon I, Bitter HM, Ahmed Z, Brown M, Heller J (2005) Multivariate non-linear feature selection with kernel methods. Stud Fuzz Soft Comput 164: 313–326CrossRefGoogle Scholar
  42. 42.
    Liu H, Yu L (2005) Toward integrating feature selection algorithms for classification and clustering. IEEE Trans Knowl Data Eng 17(4): 491–502CrossRefGoogle Scholar
  43. 43.
    Molina LC, Belanche L, Nebot A (2002) Feature selection algorithms: a survey and experimental evaluation. In: Proceedings of the 2002 IEEE international conference on data mining, pp 306–313Google Scholar
  44. 44.
    Doak J (1992) An evaluation of feature selection methods and their application to computer security. Technical report CSE-92-18, University of California, Department of Computer ScienceGoogle Scholar
  45. 45.
    Jain AK, Zongker D (2002) Feature selection evaluation, application, and small sample performance. IEEE Trans Pattern Anal Mach Intell 19(2): 153–158CrossRefGoogle Scholar
  46. 46.
    Kudo M, Sklansky J (1997) A comparative evaluation of medium and large-scale feature selectors for pattern classifiers. In: Proceedings of the 1st international workshop on statistical techniques in pattern recognition, pp 91–96Google Scholar
  47. 47.
    Liu H, Setiono R (1998) Scalable feature selection for large sized databases. In: Proceedings of the 4th world conference on machine learning, pp 101–106Google Scholar
  48. 48.
    Thrun S, et al (1991) The MONK’s problems: a performance comparison of different learning algorithms. Technical report CS-91-197, CMUGoogle Scholar
  49. 49.
    Belanche LA, González FF, Review and evaluation of feature selection algorithms in synthetic problems. http://arxiv.org/abs/1101.2320 (Last access: Nov 2011)
  50. 50.
    Liu H, Setiono R (2002) Chi2: feature selection and discretization of numeric attributes. In: Proceedings of the 7th international conference on tools with artificial intelligence, pp 388–391Google Scholar
  51. 51.
    Sánchez-Maroño N, Alonso-Betanzos A, Tombilla-Sanromán M (2007) Filter methods for feature selection: a comparative study. In: Proceedings of the 8th international conference on intelligent data engineering and automated learning, pp 178–187Google Scholar
  52. 52.
    Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques. Morgan Kaufmann, San Francisco. http://www.cs.waikato.ac.nz/ml/weka/ (Last access: Nov 2011)
  53. 53.
    The Mathworks, Matlab Tutorial (1998). http://www.mathworks.com/academia/student_center/tutorials/ (Last access: Nov 2011)
  54. 54.
    Hall MA (1999) Correlation-based feature selection for machine learning. PhD thesis, University of Waikato, HamiltonGoogle Scholar
  55. 55.
    Dash M, Liu H (2003) Consistency-based search in feature selection. J Artif Intell 151(1–2): 155–176MathSciNetzbMATHCrossRefGoogle Scholar
  56. 56.
    Zhao Z, Liu H (1991) Searching for interacting features. In: Proceedings of the international joint conference on artificial intelligence, pp 1156–1167Google Scholar
  57. 57.
    Hall MA, Smith LA (1998) Practical feature subset selection for machine learning. J Comput Sci 98: 4–6Google Scholar
  58. 58.
    Kononenko I (1994) Estimating attributes: analysis and extensions of RELIEF. In: Proceedings of the European conference on machine learning, pp 171–182Google Scholar
  59. 59.
    Kira K, Rendell L (1992) A practical approach to feature selection. In: Proceedings of the 9th international workshop on machine learning, pp 249–256Google Scholar
  60. 60.
    Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8): 1226–1238CrossRefGoogle Scholar
  61. 61.
    Seth S, Principe JC (2010) Variable selection: a statistical dependence perspective. In: Proceedings of the international conference of machine learning and applications, pp 931–936Google Scholar
  62. 62.
    Guyon I, Weston J, Barnhill SMD, Vapnik V (2002) Gene selection for cancer classification using support vector machines. J Mach Learn 46(1–3): 389–422zbMATHCrossRefGoogle Scholar
  63. 63.
    Rakotomamonjy A (2003) Variable selection using SVM-based criteria. J Mach Learn Res 3: 1357–1370MathSciNetzbMATHGoogle Scholar
  64. 64.
    Mejía-Lavalle M, Sucar E, Arroyo G (2006) Feature selection with a perceptron neural net. In: Proceedings of the international workshop on feature selection for data mining, pp 131–135Google Scholar
  65. 65.
    Mamitsuka H (2006) Query-learning-based iterative feature-subset selection for learning from high-dimensional data sets. Knowl Inf Syst 9(1): 91–108CrossRefGoogle Scholar
  66. 66.
    Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann, San FranciscoGoogle Scholar
  67. 67.
    Rish I (2001) An empirical study of the naive Bayes classifier. In: Proceedings of IJCAI-01 workshop on empirical methods in artificial intelligence, pp 41–46Google Scholar
  68. 68.
    Aha DW, Kibler D, Albert MK (1991) Instance-based learning algorithms. J Mach Learn 6(1): 37–66Google Scholar
  69. 69.
    Shawe-Taylor J, Cristianini N (2000) An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, CambridgeGoogle Scholar
  70. 70.
    Langley P, Iba W (1993) Average-case analysis of a nearest neighbor algorithm. In: Proceedings of international joint conference on artificial intelligence, vol 13, pp 889–894Google Scholar
  71. 71.
    Weston J, Mukherjee S, Chapelle O, Pontil M, Poggio T, Vapnik V (2001) Feature selection for SVMs. J Adv Neural Inf Process Syst 13:668–674Google Scholar
  72. 72.
    John GH, Kohavi R, Pfleger K (1994) Irrelevant features and the subset selection problem. In: Proceedings of the 11th international conference on machine learning, pp 121–129Google Scholar
  73. 73.
    Kim G, Kim Y, Lim H, Kim H (2010) An MLP-based feature subset selection for HIV-1 protease cleavage site analysis. J Artif Intell Med 48: 83–89CrossRefGoogle Scholar
  74. 74.
    Breiman L, Friedman J, Olshen R, Stone C (1984) Classification and regression trees. Wadsworth International Group, BelmontzbMATHGoogle Scholar
  75. 75.
    Zhu Z, Ong YS, Zurada JM (2010) Identification of full and partial class relevant genes. IEEE Trans Comput Biol Bioinf 7(2): 263–277CrossRefGoogle Scholar
  76. 76.
    Díaz-Uriarte R, de Andrés A (2006) Gene selection and classification of microarray data using random forest. J Bioinf 7(1): 1–13CrossRefGoogle Scholar
  77. 77.
    Kohavi R, John GH (1997) Wrappers for feature subset selection. J Artif Intell 97(1–2): 273–324zbMATHCrossRefGoogle Scholar
  78. 78.
    Brown MPS, Grundy WN et al (2000) Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc Natl Acad Sci USA 97(1): 262–267CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London Limited 2012

Authors and Affiliations

  • Verónica Bolón-Canedo
    • 1
    Email author
  • Noelia Sánchez-Maroño
    • 1
  • Amparo Alonso-Betanzos
    • 1
  1. 1.Department of Computer ScienceUniversity of A CoruñaA CoruñaSpain

Personalised recommendations