Advertisement

Feature selection based on correlation deflation

  • Si-Bao Chen
  • Chris H. Q. Ding
  • Zhi-Li Zhou
  • Bin Luo
Original Article
  • 67 Downloads

Abstract

Feature selection is very important in many machine learning and data mining applications. In this paper, a simple and effective correlation-deflation-based feature selection method is proposed. The objective function of residual minimization constrained by \(L_{2,0}\)-norm is proved to be equivalent to maximizing sum of square of correlations between class labels and features. Then the whole procedure of correlation-deflation-based feature selection turns into selecting features out one-by-one by deflating correlations. Experiments on several public benchmark data sets show that the proposed method has better residual reduction and classification performance than many state-of-the-art feature selection methods.

Keywords

Feature selection Correlation deflation Residual reduction Pattern classification 

Notes

Acknowledgements

The authors would like to thank the Editor and anonymous reviewers for their valuable comments and suggestions, which were helpful in improving the paper. This work was supported in part by Key Project of Chinese National Programs for Fundamental Research and Development (973 Program) under Grant 2015CB351705, in part by the National Natural Science Foundation of China under Grants 61202228, 61472002, 61572030 and 61671018, and Collegiate Natural Science Fund of Anhui Province under Grant KJ2017A014.

Compliance with ethical standards

Conflict of interest

The authors declare that they have no conflict of interest.

References

  1. 1.
    Alon U, Barkai N, Notterman D et al (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci 96(12):6745–6750CrossRefGoogle Scholar
  2. 2.
    Backer E, Schipper JAD (1977) On the max–min approach for feature ordering and selection. In: The seminar on pattern recognition, Liege Univ, Liege, BelgiumGoogle Scholar
  3. 3.
    Battiti R (1994) Using mutual information for selecting features in supervised neural net learning. IEEE Trans Neural Netw 5:537–550CrossRefGoogle Scholar
  4. 4.
    Bhattacharjee A, Richards W, Staunton J et al (2001) Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Proc Natl Acad Sci 98(24):13790–13795CrossRefGoogle Scholar
  5. 5.
    Devijver PA, Kittler J (1982) Pattern recognition: a statistical approach. Prentice-Hall, LondonzbMATHGoogle Scholar
  6. 6.
    Ding CHQ, Peng H (2005) Minimum redundancy feature selection from microarray gene expression data. J Bioinform Computat Biol 3(2):185–206CrossRefGoogle Scholar
  7. 7.
    Ding CHQ, Zhou D, He X, Zha H (2006) R1-PCA: rotational invariant L1-norm principal component analysis for robust subspace factorization. In: ICML, Pittsburgh, PA, USA, pp 281–288Google Scholar
  8. 8.
    Dudoit S, Fridlyand J, Speed T (2002) Comparison of discrimination methods for the classification of tumors using gene expression data. J Am Stat Assoc 97(457):77–87MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Fang X, Xu Y, Li X, Fan Z, Liu H, Chen Y (2014) Locality and similarity preserving embedding for feature selection. Neurocomputing 128:304–315CrossRefGoogle Scholar
  10. 10.
    Golub T, Slonim D, Tamayo P et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439):531–537CrossRefGoogle Scholar
  11. 11.
    Gu B, Sheng VS (2016) A robust regularization path algorithm for v-support vector classification. IEEE Trans Neural Netw Learn Syst.  https://doi.org/10.1109/TNNLS.2016.2527796 Google Scholar
  12. 12.
    Gu B, Sheng VS, Tay KY, Romano W, Li S (2015a) Incremental support vector learning for ordinal regression. IEEE Trans Neural Netw Learn Syst 26(7):1403–1416MathSciNetCrossRefGoogle Scholar
  13. 13.
    Gu B, Sheng VS, Wang Z, Ho D, Osman S, Li S (2015b) Incremental learning for v-support vector regression. Neural Netw 67:140–150CrossRefGoogle Scholar
  14. 14.
    Gu B, Sun X, Sheng VS (2016) Structural minimax probability machine. IEEE Trans Neural Netw Learn Syst.  https://doi.org/10.1109/TNNLS.2016.2544779 Google Scholar
  15. 15.
    Guyon I (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182zbMATHGoogle Scholar
  16. 16.
    Huang D, Chow TW (2005) Effective feature selection scheme using mutual information. Neurocomputing 63:325–343CrossRefGoogle Scholar
  17. 17.
    Jain A, Zongker D (1997) Feature selection: evaluation, application and small sample performance. IEEE Trans Pattern Anal Machine Intell 19(2):153–158CrossRefGoogle Scholar
  18. 18.
    Khan J, Wei JS, Ringner M et al (2001) Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat Med 7(6):673–679CrossRefGoogle Scholar
  19. 19.
    Kira K, Rendell LA (1992) A practical approach to feature selection. In: Proceedings of the 9th international workshop on machine learning, ML92, pp 249–256Google Scholar
  20. 20.
    Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1–2):273–324CrossRefzbMATHGoogle Scholar
  21. 21.
    Kononenko I (1994) Estimating attributes: analysis and extensions of RELIEF. In: European conference on machine learning, pp 171–182Google Scholar
  22. 22.
    Langley P (1994) Selection of relevant features in machine learning. In: Proceedings of the AAAI Fall symposium on relevance, pp 140–144Google Scholar
  23. 23.
    Li Q, Xie B, You J, Bian W, Tao D (2016) Correlated logistic model with elastic net regularization for multilabel image classification. IEEE Trans Image Process 25(8):3801–3813MathSciNetCrossRefGoogle Scholar
  24. 24.
    Liu H, Motoda H (1998) Feature selection for knowledge discovery and data mining. Kluwer, NorwellCrossRefzbMATHGoogle Scholar
  25. 25.
    Liu H, Liu L, Zhang H (2009) Boosting feature selection using information metric for classification. Neurocomputing 73(1–3):295–303CrossRefGoogle Scholar
  26. 26.
    Ma S, Song X, Huang J (2007) Supervised group lasso with applications to microarray data analysis. BMC Bioinform 8:60CrossRefGoogle Scholar
  27. 27.
    Mao KZ (2002) Fast orthogonal forward selection algorithm for feature subset selection. IEEE Trans Neural Netw 13(5):1218–1224CrossRefGoogle Scholar
  28. 28.
    Mao KZ (2004) Orthogonal forward selection and backward elimination algorithms for feature subset selection. IEEE Trans Syst Man Cybern Part B 34(1):629–634CrossRefGoogle Scholar
  29. 29.
    Ng AY (2004) Feature selection, \(l_1\) vs. \(l_2\) regularization, and rotational invariance. In: ICMLGoogle Scholar
  30. 30.
    Nie F, Huang H, Cai X, Ding CHQ (2010) Efficient and robust feature selection via joint \(l_{2,1}\)-norms minimization. In: Advances in neural information processing systems, pp 1813–1821Google Scholar
  31. 31.
    Nutt C, Mani D, Betensky R et al (2003) Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. Cancer Res 63(7):1602–1607Google Scholar
  32. 32.
    Pan Z, Jin P, Lei J et al (2016) Fast reference frame selection based on content similarity for low complexity HEVC encoder. J Vis Commun Image Represent 40(Part B):516–524CrossRefGoogle Scholar
  33. 33.
    Pan Z, Zhang Y, Kwong S (2015) Efficient motion and disparity estimation optimization for low complexity multiview video coding. IEEE Trans Broadcast 61(2):166–176CrossRefGoogle Scholar
  34. 34.
    Pan Z, Lei J, Zhang Y, Sun X, Kwong S (2016) Fast motion estimation based on content property for low-complexity H265 HEVC encoder. IEEE Trans Broadcast 62(3):675–684CrossRefGoogle Scholar
  35. 35.
    Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238CrossRefGoogle Scholar
  36. 36.
    Pudil P, Novovicova J, Kittler J (1994) Floating search methods in feature selection. Pattern Recognit Lett 15(11):1119–1125CrossRefGoogle Scholar
  37. 37.
    Raileanu LE, Stoffel K (2004) Theoretical comparison between the Gini index and information gain criteria. Ann Math Artif Intell 41(1):77–93MathSciNetCrossRefzbMATHGoogle Scholar
  38. 38.
    Skalak DB (1994) Prototype and feature selection by sampling and random mutation hill climbing algorithms. In: ICML, NJ, USA, pp 293–301Google Scholar
  39. 39.
    Su A, Welsh J, Sapinoso L et al (2001) Molecular classification of human carcinomas by use of gene expression signatures. Cancer Res 61(20):7388–7393Google Scholar
  40. 40.
    Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodol) 58(1):267–288MathSciNetzbMATHGoogle Scholar
  41. 41.
    Wei D, Li S, Tan M (2012) Graph embedding based feature selection. Neurocomputing 93:115–125CrossRefGoogle Scholar
  42. 42.
    Wei H, Billings S (2007) Feature subset selection and ranking for data dimensionality reduction. IEEE Trans Pattern Anal Mach Intell 29(1):162–166CrossRefGoogle Scholar
  43. 43.
    Xia Z, Wang X, Sun X, Liu Q, Xiong N (2016) Steganalysis of LSB matching using differences between nonadjacent pixels. Multimed Tools Appl 75(4):1947–1962CrossRefGoogle Scholar
  44. 44.
    Xuan P, Guo MZ, Wang J, Liu XY, Liu Y (2011) Genetic algorithm-based efficient feature selection for classification of pre-mirnas. Genet Mol Res 10(2):588–603CrossRefGoogle Scholar
  45. 45.
    Xue Y, Jiang J, Zhao B, Ma T (2017) A self-adaptive artificial bee colony algorithm based on global best for global optimization. Soft Comput.  https://doi.org/10.1007/s00500-017-2547-1 Google Scholar
  46. 46.
    Yang K, Cai Z, Li J, Lin G (2006) A stable gene selection in microarray data analysis. BMC Bioinform 7:228CrossRefGoogle Scholar
  47. 47.
    Yuan C, Sun X, R LV (2016) Fingerprint liveness detection based on multi-scale LPQ and PCA. China Commun 13(7):60–65CrossRefGoogle Scholar
  48. 48.
    Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J R Stat Soc Ser B 68(1):49–67MathSciNetCrossRefzbMATHGoogle Scholar
  49. 49.
    Zhang J, Yu J, Wan J, Zeng Z (2015) L2,1-norm regularized fisher criterion for optimal feature selection. Neurocomputing 166:455–463CrossRefGoogle Scholar
  50. 50.
    Zhang M, Ding CHQ, Zhang Y, Nie F (2014) Feature selection at the discrete limit. In: Proceedings of the 28th AAAI, Québec, Canada, pp 1355–1361Google Scholar
  51. 51.
    Zhao G, Wu Y, Chen F, Zhang J, Bai J (2015) Effective feature selection using feature vector graph for classification. Neurocomputing 151:376–389CrossRefGoogle Scholar
  52. 52.
    Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc B 67(2):301–320MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© The Natural Computing Applications Forum 2018

Authors and Affiliations

  1. 1.MOE Key Lab of Signal Processing and Intelligent Computing, School of Computer Science and TechnologyAnhui UniversityHefeiChina
  2. 2.Department of Computer Science and EngineeringUniversity of Texas at ArlingtonArlingtonUSA
  3. 3.School of Computer and SoftwareNanjing University of Information Science and TechnologyNanjingChina

Personalised recommendations