Dynamic graph learning for spectral feature selection

  • Wei Zheng
  • Xiaofeng Zhu
  • Yonghua Zhu
  • Rongyao Hu
  • Cong Lei


Previous spectral feature selection methods generate the similarity graph via ignoring the negative effect of noise and redundancy of the original feature space, and ignoring the association between graph matrix learning and feature selection, so that easily producing suboptimal results. To address these issues, this paper joints graph learning and feature selection in a framework to obtain optimal selected performance. More specifically, we use the least square loss function and an 2,1-norm regularization to remove the effect of noisy and redundancy features, and use the resulting local correlations among the features to dynamically learn a graph matrix from a low-dimensional space of original data. Experimental results on real data sets show that our method outperforms the state-of-the-art feature selection methods for classification tasks.


Graph learning Optimization Spectral feature selection 



This work was supported in part by the China Key Research Program (Grant No: 2016YFB1000905), the China 1000-Plan National Distinguished Professorship, the Nation Natural Science Foundation of China (Grants No: 61573270, and 61672177), the Guangxi Natural Science Foundation (Grant No: 2015GXNSFCB139011), the Guang-xi High Institutions Program of Introducing 100 High-Level Overseas Talents, the Guangxi Collaborative Innovation Center of Multi-Source Information Integration and Intelligent Processing, the Guangxi Bagui Teams for Innovation and Research, the Research Fund of Guangxi Key Lab of MIMS (16-A-01-01 and 16-A-01-02), the Guangxi Bagui Teams for Innovation and Research, and Innovation Project of Guangxi Graduate Education under grant XYCSZ2017064, XYCSZ2017067 and YCSW2017065.


  1. 1.
    Boyd S, Vandenberghe L, Faybusovich L (2006) Convex optimization. IEEE Trans Autom Control 51(11):1859–1859CrossRefGoogle Scholar
  2. 2.
    Cai D, Zhang C, He X (2010) Unsupervised feature selection for multi-cluster data. In: ACM SIGKDD international conference on knowledge discovery and data mining, pp 333–342Google Scholar
  3. 3.
    Daubechies I, Devore R, Fornasier M, Güntürk C S (2008) Iteratively reweighted least squares minimization for sparse recovery. Commun Pure Appl Math 63(1):1–38MathSciNetCrossRefMATHGoogle Scholar
  4. 4.
    Gentile C (2001) A new approximate maximal margin classification algorithm. J Mach Learn Res 2(2):213–242MathSciNetMATHGoogle Scholar
  5. 5.
    Gu Q, Li Z, Han J (2011) Joint feature selection and subspace learning. In: International joint conference on artificial intelligence, pp 1294–1299Google Scholar
  6. 6.
    Guyon I, Elisseeff A (2003) An introduction to variable feature selection. J Mach Learn Res 3:1157–1182MATHGoogle Scholar
  7. 7.
    Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46(1–3):389–422CrossRefMATHGoogle Scholar
  8. 8.
    He X, Niyogi P (2003) Locality preserving projections. Adv Neural Inf Process Syst 16(1):186–197Google Scholar
  9. 9.
    Hu R, Zhu X, Cheng D, He W, Yan Y, Song J, Zhang S (2017) Graph self-representation method for unsupervised feature selection. Neurocomputing 220:130–137CrossRefGoogle Scholar
  10. 10.
    Jia Y, Wang Y, Lin H, Jin X, Cheng X (2016) Locally adaptive translation for knowledge graph embedding. In: Thirtieth AAAI conference on artificial intelligence, pp 992–998Google Scholar
  11. 11.
    Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1–2):273–324CrossRefMATHGoogle Scholar
  12. 12.
    Lewis DD (2013) Feature selection and feature extraction for text categorization. In: The workshop on speech & natural language, pp 212–217Google Scholar
  13. 13.
    Ling CX, Yang Q, Wang J, Zhang S (2004) Decision trees with minimal costs. In: International conference on machine learning, p 69Google Scholar
  14. 14.
    Liu H, Ma Z, Zhang S, Wu X (2015) Penalized partial least square discriminant analysis with l1 - norm for multi-label data. Pattern Recogn 48(5):1724–1733CrossRefGoogle Scholar
  15. 15.
    Mangasarian OL (2006) Exact 1-norm support vector machines via unconstrained convex differentiable minimization. J Mach Learn Res 7(3):1517–1530MathSciNetMATHGoogle Scholar
  16. 16.
    Nie F, Huang H, Cai X et al (2010) Efficient and robust feature selection via joint l2,1 -norms minimization. In: International conference on neural information processing systems, pp 1813–1821Google Scholar
  17. 17.
    Nie F, Xu D, Tsang WH, Zhang C (2010) Flexible manifold embedding: a framework for semi-supervised and unsupervised dimension reduction. IEEE Trans Image Process 19(7):1921–1932MathSciNetCrossRefMATHGoogle Scholar
  18. 18.
    Nie F, Zhu W, Li X (2016) Unsupervised feature selection with structured graph optimization. In: Thirtieth AAAI conference on artificial intelligence, pp 1302–1308Google Scholar
  19. 19.
    Nie F, Zhu W, Li X (2017) Unsupervised large graph embedding. In: Thirt-First AAAI conference on artificial intelligence. AAAI Press, pp 2422–2428Google Scholar
  20. 20.
    Peng H, Fan Y (2017) A general framework for sparsity regularized feature selection via iteratively reweighted least square minimization. In: Thirt-First AAAI conference on artificial intelligence. AAAI Press, pp 2471–2477Google Scholar
  21. 21.
    Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226CrossRefGoogle Scholar
  22. 22.
    Qin B, Xia Y, Prabhakar S, Tu Y (2009) A rule-based classification algorithm for uncertain data. In: IEEE international conference on data engineering, pp 1633–1640Google Scholar
  23. 23.
    Shang R, Wang W, Stolkin R, Jiao L (2017) Non-negative spectral learning and sparse regression-based dual-graph regularized feature selection. IEEE Trans Cybern PP(99):1–14CrossRefGoogle Scholar
  24. 24.
    Shi L, Du L, Shen Y D (2015) Robust spectral learning for unsupervised feature selection. In: IEEE international conference on data mining, pp 977–982Google Scholar
  25. 25.
    Shi X, Guo Z, Lai Z, Yang Y, Bao Z, Zhang D (2015) A framework of joint graph embedding and sparse regression for dimensionality reduction. IEEE Trans Image Process: A Publication of the IEEE Signal Processing Society 24(4):1341–1355MathSciNetCrossRefGoogle Scholar
  26. 26.
    Singh D, Febbo PG, Ross K, Jackson DG, Manola J, Ladd C, Tamayo P, Renshaw AA, D’Amico AV, Richie JP (2002) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2):203CrossRefGoogle Scholar
  27. 27.
    Tibshirani R (2011) Regression shrinkage and selection via the lasso. J R Stat Soc 73(3):273–282MathSciNetCrossRefGoogle Scholar
  28. 28.
    Wang L, Zhu J, Zou H (2008) Hybrid huberized support vector machines for microarray classification and gene selection. Bioinformatics 24(3):412CrossRefGoogle Scholar
  29. 29.
    Wang D, Nie F, Huang H (2014) Unsupervised feature selection via unified trace ratio formulation and k-means clustering (track). In: Ecml/pkdd, pp 306–321Google Scholar
  30. 30.
    Wang X, Zhang X, Zeng Z et al (2016) Unsupervised spectral feature selection with l 1 -norm graph. volume C, pp 47–54Google Scholar
  31. 31.
    Wen Z, Yin W (2013) A feasible method for optimization with orthogonality constraints. Math Program 142(1–2):397–434MathSciNetCrossRefMATHGoogle Scholar
  32. 32.
    Wold S, Esbensen K, Geladi P (1987) Principal component analysis. Chemom Intell Lab Syst 2(1):37–52CrossRefGoogle Scholar
  33. 33.
    Wu X, Zhang S (2003) Synthesizing high-frequency rules from different data sources. IEEE Trans Knowl Data Eng 15(2):353–367CrossRefGoogle Scholar
  34. 34.
    Wu X, Zhang C, Zhang S (2004) Efficient mining of both positive and negative association rules. ACM Trans Inf Syst 22(3):381–405CrossRefGoogle Scholar
  35. 35.
    Wu X, Zhang C, Zhang S (2005) Database classification for multi-database mining. Inf Syst 30(1):71–88CrossRefMATHGoogle Scholar
  36. 36.
    Yan X, Zhang C, Zhang S (2009) Genetic algorithm-based strategy for identifying association rules without specifying actual minimum support. Expert Syst Appl 36(2):3066–3076CrossRefGoogle Scholar
  37. 37.
    Yang J, Frangi AF, Yang JY, Zhang D, Jin Z (2005) Kpca plus lda: a complete kernel fisher discriminant framework for feature extraction and recognition. IEEE Trans Pattern Anal Mach Intell 27(2):230CrossRefGoogle Scholar
  38. 38.
    Zhu X, Suk H-I, Lee S-W, Shen D (2016) Subspace regularized sparse multitask learning for multiclass neurodegenerative disease identification. IEEE Trans Biomed Eng 63(3):607–618CrossRefGoogle Scholar
  39. 39.
    Zhang S, Zhang C (2002) Anytime mining for multiuser applications. IEEE Trans Syst Man Cybern Part A Syst Hum 32(4):515–521CrossRefGoogle Scholar
  40. 40.
    Zhang S, Zhang C, Yang Q (1999) Data preparation for data mining. Academic, New YorkGoogle Scholar
  41. 41.
    Zhang S, Wu X, Zhang C (2003) Multi-database mining. IEEE Comput Intell Bull 2(1):5–13Google Scholar
  42. 42.
    Zhang S, Zhang C, Yan X (2003) Post-mining: maintenance of association rules by weighting? Inf Syst 28(7):691–707CrossRefGoogle Scholar
  43. 43.
    Zhang S, Qin Z, Ling CX, Sheng S (2005) Missing is useful?: missing values in cost-sensitive decision trees. IEEE Trans Knowl Data Eng 17(12):1689–1693CrossRefGoogle Scholar
  44. 44.
    Zhao Z, Liu H (2007) Spectral feature selection for supervised and unsupervised learning. In: Machine learning, proceedings of the twenty-fourth international conference, pp 1151–1157Google Scholar
  45. 45.
    Zhao Y, Zhang S (2005) Generalized dimension-reduction framework for recent-biased time series analysis. IEEE Trans Knowl Data Eng 18(2):231–244CrossRefGoogle Scholar
  46. 46.
    Zhu X, Zhang S, Jin Z, Zhang Z (2011) Missing value estimation for mixed-attribute data sets. IEEE Trans Knowl Data Eng 23(1):110–121CrossRefGoogle Scholar
  47. 47.
    Zhu X, Huang Z, Shen H T, Cheng J, Xu C (2012) Dimensionality reduction by mixed kernel canonical correlation analysis. Pattern Recogn 45(8):3003–3016CrossRefMATHGoogle Scholar
  48. 48.
    Zhu X, Huang Z, Yang Y, HT Shen C Xu, Luo J (2013) Self-taught dimensionality reduction on the high-dimensional small-sized data. Pattern Recogn 46 (1):215–229CrossRefMATHGoogle Scholar
  49. 49.
    Zhu X, Zhang L, Huang Z (2014) A sparse embedding and least variance encoding approach to hashing. IEEE Trans Image Process 23(9):3737–3750MathSciNetCrossRefGoogle Scholar
  50. 50.
    Zhu P, Zuo W, Zhang L, Hu Q, Shiu SCK (2015) Unsupervised feature selection by regularized self-representation. Pattern Recogn 48(2):438–446CrossRefGoogle Scholar
  51. 51.
    Zhu X, Li X, Zhang S (2016) Block-row sparse multiview multilabel learning for image classification. IEEE Trans Cybern 46(2):450CrossRefGoogle Scholar
  52. 52.
    Zhu X, Li X, Zhang S, Ju C, Wu X (2017) Robust joint graph sparse coding for unsupervised spectral feature selection. IEEE Trans Neural Netw Learn Syst 28(6):1263–1275MathSciNetCrossRefGoogle Scholar
  53. 53.
    Zhu X, Li X, Zhang S, Xu Z, Yu L, Wang C (2017) Graph pca hashing for similarity search. IEEE Trans Multimed.
  54. 54.
    Zhu X, Suk H-I, Huang H, Shen D (2017) Low-rank graph-regularized structured sparse regression for identifying genetic biomarkers. IEEE Trans Big Data.
  55. 55.
    Zhu X, Suk H-I, Wang L, Lee S-W, Shen D (2017) A novel relational regularization feature selection method for joint regression and classification in AD diagnosis. Meds Image Anal 38:205–214CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2017

Authors and Affiliations

  • Wei Zheng
    • 1
  • Xiaofeng Zhu
    • 1
  • Yonghua Zhu
    • 2
  • Rongyao Hu
    • 1
  • Cong Lei
    • 1
  1. 1.Guangxi Key Lab of Multi-source Information Mining and SecurityGuangxi Normal UniversityGuilinPeople’s Republic of China
  2. 2.School of ComputerElectronics and Information of Guangxi UniversityNanningPeople’s Republic of China

Personalised recommendations