Skip to main content
Log in

Comparative analysis on margin based feature selection algorithms

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Feature evaluation and selection is an important preprocessing step in classification and regression learning. As large quantity of irrelevant information is gathered, selecting the most informative features may help users to understand the task, and enhance the performance of the models. Margin has been widely accepted and used in evaluating feature quality these years. A collection of feature selection algorithms were developed using margin based loss functions and various search strategies. However, there is no comparative research conducted to study the effectiveness of these algorithms. In this work, we compare 14 margin based feature selections from the viewpoints of reduction capability, classification performance of reduced data and robustness, where four margin based loss functions and three search strategies are considered. Moreover, we also compare these techniques with two well-known margin based feature selection algorithms ReliefF and Simba. The derived conclusions give some guidelines for selecting features in practical applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Dash M, Liu H (1997) Feature selection for classification. Intell Data Anal 1:131–156

    Article  Google Scholar 

  2. Dash M, Choi K, Scheuermann P, Liu H (2002) Feature selection for clustering—a filter solution. In: Proceeding of second international conference on data mining, pp 115–122

  3. Tong DL, Mintram R (2010) Genetic Algorithm-Neural Network (GANN): a study of neural network activation functions and depth of genetic algorithm search applied to feature selection. Int J Mach Learn Cybern 1(1–4):75-87

    Google Scholar 

  4. Boehm O, Hardoon DR, Manevitz LM (2011) Classifying cognitive states of brain activity via one-class neural networks with feature selection by genetic algorithms. Int J Mach Learn Cybern 2(3):125–134

    Google Scholar 

  5. Sharma A, Imoto S, Miyano S, Sharma V (2012) Null space based feature selection method for gene expression data. Int J Mach Learn Cybern 3(4):269–276

    Google Scholar 

  6. Pal M (2009) Margin-based feature selection for hyperspectral data. Int J Appl Earth Observ Geoinf 11:212–220

    Article  Google Scholar 

  7. Liu CL, Jaeger S, Masaki N (2004) Offline recognition of Chinese characters: the state of art. IEEE Trans Pattern Anal Mach Intell 26:198–213

    Article  Google Scholar 

  8. Verron S, Tiplica T, Kobi A (2008) Fault detection and identification with a new feature selection based on mutual information. J Process Control 5:479–490

    Article  Google Scholar 

  9. Oh IS, Lee JS, Moon BR (2004) Hybrid genetic algorithms for feature selection. IEEE Trans Pattern Anal Mach Intell 11:1424–1437

    Google Scholar 

  10. Hall MA (1999) Correlation-based feature subset selection for machine learning, Hamilton, pp 7–45

  11. Dash M, Liu H (2003) Consistency-based search in feature selection. Artif Intell 11:155–176

    Article  MathSciNet  Google Scholar 

  12. Wang X, Dong C (2009) Improving generalization of fuzzy if-then rules by maximizing fuzzy entropy. IEEE Trans Fuzzy Syst 17(3):556–567

    Google Scholar 

  13. Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27:1226–1238

    Article  Google Scholar 

  14. Huang D, Chow TWS (2005) Effective feature selection scheme using mutual information. Neurocomputing 63:325–343

    Article  Google Scholar 

  15. Yu H, Yang J (2001) A direct LDA algorithm for high-dimensional data-with application to face recognition. Pattern Recognit 34:2067–2070

    Article  MATH  Google Scholar 

  16. Witold M (1981) On an extended Fisher criterion for feature selection. IEEE Trans Pattern Anal Mach Intell 5:611–614

    Google Scholar 

  17. Hu Q, Yu D, Pedrycz W, Chen D (2011) Kernelized fuzzy rough sets and their applications. IEEE Trans Knowl Data Eng 11:1649–1667

    Article  Google Scholar 

  18. He Q, Xie Z, Hu Q, Wu C (2011) Boundary instance selection based on neighborhood model. Neurocomputing 10:1585–1594

    Article  Google Scholar 

  19. Wang X, Dong L, Yan J (2012) Maximum ambiguity based sample selection in fuzzy decision tree induction. IEEE Trans Knowl Data Eng 24(8):1491–1505

    Google Scholar 

  20. Wang X, Zhai J, Lu S (2008) Induction of multiple fuzzy decision trees based on rough set technique. Inf Sci 178(16):3188–3202

    Google Scholar 

  21. Rakotomamonjy A (2003) Variable selection using SVM-based criteria. J Mach Learn Res 3:1357–1370

    MATH  MathSciNet  Google Scholar 

  22. Gilad-Bachrach R, Navot A, Tishby N (2004) Margin based feature selection–theory and algorithms. In: Proceedings of the 21st international conference on machine learning, pp 40–46

  23. Sun Y. (2007) Iterative RELIEF for feature weighting: algorithms, theories, and applications. IEEE Trans Pattern Anal Mach Intell 6:1–17

    Google Scholar 

  24. Vapnik VN (1998) Statistical learning theory, New York

  25. Bartlett PL, Shawe-Taylor J (1999) Generalization performance of support vector machines and other pattern classifiers. In: Advances in Kernel methods: support vector learning. MIT Press, Cambridge, pp 43–54

  26. Li Y, Lu B (2009) Feature selection based on loss-margin of nearest neighbor classification. Pattern Recognit 42:1914–1921

    Article  MATH  Google Scholar 

  27. Chen B, Liu H, Chai J, Bao Z (2009) Large margin feature weighting method via linear programming. IEEE Trans Knowl Data Eng 10:1475–1486

    Article  Google Scholar 

  28. Hu Q, Zhu P, Yang Y, Yu D (2011) Large-margin nearest neighbor classifiers via sample weight learning. Neurocomputing 7: 656–660

    Article  Google Scholar 

  29. Garg A, Roth D (2003) Margin distribution and learning algorithms. In: Proceedings of the twentieth international conference on machine learning, pp 210–217

  30. Nguyen X, Wainwright MJ, Jordan MI (2009) On surrogate loss functions and f-divergences. Ann Stat 2:876–904

    Article  MathSciNet  Google Scholar 

  31. Rudin C, Schapire RE, Daubechies I (2007) Analysis of boosting algorithms using the smooth margin function. Ann Stat 6:2723–2768

    Article  MathSciNet  Google Scholar 

  32. Freund Y (2001) An adaptive version of the boost by majority algorithm. Int Conf Mach Learn 43:293–318

    Article  MATH  Google Scholar 

  33. Friedman JH, Hastie T, Tibshirani R (2000) Additive logistic regression: a statistical view of boosting. Ann Stat 28:337–407

    Article  MATH  MathSciNet  Google Scholar 

  34. Burges CJC (1998) A tutorial on support vector machines for pattern recognition. Knowl Discov Data Min 2:121–167

    Article  Google Scholar 

  35. Freund Y, Schapire RE (1996) Experiments with a new boosting algorithm. Thirteenth international conference in machine learning, pp 1–15

  36. Warmuth MK, Glocer K, Ratsch G (2007) Boosting algorithms for maximizing the soft margin. Advances in Neural Information Processing Systems. MIT Press, pp 340–346

  37. Vezhnevets A, Barinova O (2007) Avoiding boosting overfitting by removing confusing samples. In: Proceedings of the 18th European conference on machine learning, pp 430–441

  38. Friedman JH, Hastie T, Tibshirani R (2000) Additive logistic regression: a statistical view of boosting (with discussion). Ann Stat 28:337–407

    Article  MATH  MathSciNet  Google Scholar 

  39. Romero E, Marquez L, Carreras X (2004) Margin maximization with feed-forward neural networks: a comparative study with SVM and AdaBoost. Neurocomputing 57: 313–344

    Article  Google Scholar 

  40. Kanamori T, Takenouchi P, Eguchi P, Murata N (2007) Robust loss functions for boosting. NeuralComputation 8:2183–2244

    MathSciNet  Google Scholar 

  41. Allen GI (2012) Automatic feature extraction via weighted Kernels and regularization. J Comput Graph Stat (to appear)

  42. Parka C, Koob JY, Kimc PT, Leeb JW (2008) Stepwise feature selection using generalized logistic loss. Comput Stat Data Anal 52(7):3709–3718

    Article  Google Scholar 

  43. Weinberger KQ, Blitzer J, Saul LK (2009) Distance metric learning for large margin nearest neighbor classification. J Mach Learn Res 10:207–244

    Google Scholar 

  44. Ng Andrew Y (2004) Feature selection L1 vs. L2 regularization, and rotational invariance. In: Proceedings of the 21 st international conference on machine learning

  45. Zhang T (2009) On the consistency of feature selection using Greedy least squares regression. J Mach Learn Res 10:555–568

    MATH  MathSciNet  Google Scholar 

  46. Farahat AK, Ghodsi A, Kamel MS (2011) An efficient Greedy method for unsupervised feature selection. IEEE 11th international conference on data mining

  47. Weston J, Mukherjee S, Chapelle O, Pontil M, Poggio T, Vapnik V (2000) Feature selection for SVMs, ICML

  48. Liu X, Yu T, Gen. Electr, Global Res. Niskayuna (2007) Gradient feature selection for online boosting. IEEE 11th international conference on computer vision, pp 1–8

  49. Domingos P, Pazzani M (1997) On the optimality of the simple Bayesian classifier under zero-one loss. Mach Learn 29:103–130

    Article  MATH  Google Scholar 

  50. Zhang T, Oles FJ (2001) Text categorization based on regularized linear classification methods. Inf Retriev 4:5–31

    Article  MATH  Google Scholar 

  51. Burges CJC (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Discov 2:121–167

    Article  Google Scholar 

  52. Lee Y-J, Mangasarian OL (2001) SSVM: a smooth support vector machine for classification. Comput Optim Appl 1:320–344

    MathSciNet  Google Scholar 

  53. Crammer K, Gilad-Bachrach R, Navot A, Tishby N (2002) Margin analysis of the LVQ algorithm. In: Proceedings of the 17’th conference on neural information processing systems, pp 462–469

  54. Wright J, Yang AY, Ganesh A, Shankar Sastry S, Ma Y (2009) Robust face recognition via sparse representation. IEEE Trans Pattern Anal Mach Intell 2:210–227

    Article  Google Scholar 

  55. Zhang L, Zhou WD (2011) Sparse ensembles using weighted combination methods based on linear programming. Pattern Recognit 1:97–106

    Article  MATH  Google Scholar 

  56. Tan M, Wang L, Tsang IW (2010) Learning sparse SVM for feature selection on very high dimensional datasets. In: Proceedings of the 27th international conference on machine learning, pp 301–308

  57. Perkins S, Lacker K, Theiler J (2003) Grafting: fast incremental feature se- lection by gradient descent in function space. J Mach Learn Res 3:1333–1356

    MATH  MathSciNet  Google Scholar 

  58. Carpenter B (2008) Lazy sparse stochastic gradient descent for regularized multinomial logistic regression, Technical report 1–12

  59. Tsuruoka Y, Tsujii J, Ananiadou S (2009) Stochastic gradient descent training for L1-regularized log-linear models with cumulative penalty. In: Proceedings of the 47th annual meeting of the ACL and the 4th IJCNLP of the AFNLP, pp 477–485

  60. Merz CJ, Merphy P (1996) UCI repository of machine learning databases [OB/OL]. http://www.ics.uci.edu/mlearn/MLRRepository.html

  61. Perou CM, Solie T, Eisen MB et al (2000) Molecular portraits of human breast tumours. Nature 406:747–752

    Google Scholar 

  62. Alizadeh A et al (2000) Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 405:503–511

    Article  Google Scholar 

  63. Beer DG, Kardia SLR, Huang CC et al (2008) Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nat Med 8:816–824

    Google Scholar 

  64. Khan J, Weil JS, Ringner M, Saall LH, Ladanyi M et al (2001) Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat Med 7:673–679

    Article  Google Scholar 

  65. Kononenko I (1994) Estimating attributes: analysis and extensions of RELIEF. Eur Conf Mach Learn, 171–182

Download references

Acknowledgments

This work is supported by the National Natural Science Foundation of China under Grants 61222210 and 61175027.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qinghua Hu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wei, P., Ma, P., Hu, Q. et al. Comparative analysis on margin based feature selection algorithms. Int. J. Mach. Learn. & Cyber. 5, 339–367 (2014). https://doi.org/10.1007/s13042-013-0164-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-013-0164-6

Keywords

Navigation