Unsupervised feature selection based on joint spectral learning and general sparse regression

  • Tao Chen
  • Yanrong GuoEmail author
  • Shijie Hao
Multi-Source Data Understanding (MSDU)


Unsupervised feature selection is an important machine learning task since the manual annotated data are dramatically expensive to obtain and therefore very limited. However, due to the existence of noise and outliers in different data samples, feature selection without the help of discriminant information embedded in the annotated data is quite challenging. To relieve these limitations, we investigate the embedding of spectral learning into a general sparse regression framework for unsupervised feature selection. Generally, the proposed general spectral sparse regression (GSSR) method handles the outlier features by learning the joint sparsity and the noisy features by preserving the local structures of data, jointly. Specifically, GSSR is conducted in two stages. First, the classic sparse dictionary learning method is used to build the bases of original data. After that, the original data are project to the basis space by learning a new representation via GSSR. In GSSR, robust loss function \(\ell _{2,r}{-}{norm}(0<r\le 2)\) and \(\ell _{2,p}-{norm}(0<p\le 1)\) instead of the traditional F norm and least square loss function are simultaneously considered as the reconstruction term and sparse regularization term for sparse regression. Furthermore, the local topological structures of the new representations are preserved by spectral learning based on the Laplacian term. The overall objective function in GSSR is optimized and proved to be converging. Experimental results on several publicly datasets have demonstrated the validity of our algorithm, which outperformed the state-of-the-art feature selections in terms of classification performance.


Spectral selection General sparse regression Unsupervised feature selection 



The research was supported by the National Key R&D Program of China under Grant No. 2017YFC0820604, Anhui Provincial Natural Science Foundation under Grant No. 1808085QF188, and National Nature Science Foundation of China under Grant Nos. 61702156, 61772171 and 61876056.


  1. 1.
    Zhang S, Li X, Zong M et al (2018) Efficient knn classification with different numbers of nearest neighbors. IEEE Trans Neural Netw Learn Syst 29(5):1774–1785MathSciNetCrossRefGoogle Scholar
  2. 2.
    Cong L, Xiaofeng Z (2017) Unsupervised feature selection via local structure learning and sparse learning. Multimed Tools Appl 77:1–18Google Scholar
  3. 3.
    Yu J, Gao X, Tao D et al (2014) A unified learning framework for single image super-resolution. IEEE Trans Neural Netw Learn Syst 25(4):780–792CrossRefGoogle Scholar
  4. 4.
    Dash M, Liu H (2000) Feature selection for clustering. In: Pacific-Asia conference on knowledge discovery and data mining, Springer, Berlin, pp 110–121Google Scholar
  5. 5.
    Nie F, Xiang S, Jia Y et al (2008) Trace ratio criterion for feature selection. AAAI 2:671–676Google Scholar
  6. 6.
    Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182 (Mar)zbMATHGoogle Scholar
  7. 7.
    Cai D, Zhang C, He X (2010) Unsupervised feature selection for multi-cluster data. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 333–342Google Scholar
  8. 8.
    Dy JG (2008) Unsupervised feature selection. In: Computational methods of feature selection, pp 19–39Google Scholar
  9. 9.
    Mitra P, Murthy CA, Pal SK (2002) Unsupervised feature selection using feature similarity. IEEE Trans Pattern Anal Mach Intell 24(3):301–312CrossRefGoogle Scholar
  10. 10.
    Li Z, Yang Y, Liu J et al (2012) Unsupervised feature selection using nonnegative spectral analysis. In: AAAI, pp 1026–1032Google Scholar
  11. 11.
    Hou C, Nie F, Li X et al (2014) Joint embedding learning and sparse regression: a framework for unsupervised feature selection. IEEE Trans Cybern 44(6):793–804CrossRefGoogle Scholar
  12. 12.
    Hou C, Nie F, Yi D et al (2011) Feature selection via joint embedding learning and sparse regression. In: IJCAIGoogle Scholar
  13. 13.
    Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J R Stat Soc Ser B (Stat Methodol) 68(1):49–67MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.
    Peng H, Fan Y (2017) A general framework for sparsity regularized feature selection via iteratively reweighted least square minimization. In: AAAIGoogle Scholar
  15. 15.
    Peng H, Fan Y (2016) Direct sparsity optimization based feature selection for multi-class classification. In: IJCAIGoogle Scholar
  16. 16.
    Zhu X, Li X, Zhang S et al (2017) Robust joint graph sparse coding for unsupervised spectral feature selection. IEEE Trans Neural Netw Learn Syst 28(6):1263–1275MathSciNetCrossRefGoogle Scholar
  17. 17.
    Wang Y, Wu L, Lin X et al (2018) Multiview spectral clustering via structured low-rank matrix factorization. IEEE Trans Neural Netw Learn Syst 29:4833–4843CrossRefGoogle Scholar
  18. 18.
    Wang Y, Lin X, Wu L et al (2015) Robust subspace clustering for multi-view data by exploiting correlation consensus. IEEE Trans Image Process 24(11):3939–3949MathSciNetCrossRefGoogle Scholar
  19. 19.
    Zhao Z, Wang L, Liu H (2010) Efficient spectral feature selection with minimum redundancy. In: AAAI, pp 673–678Google Scholar
  20. 20.
    Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1–2):273–324CrossRefzbMATHGoogle Scholar
  21. 21.
    Wang Y, Wu L (2018) Beyond low-rank representations: orthogonal clustering basis reconstruction with optimized graph structure for multi-view spectral clustering. Neural Netw 103:1–8CrossRefGoogle Scholar
  22. 22.
    Constantinopoulos C, Titsias MK, Likas A (2006) Bayesian feature and model selection for Gaussian mixture models. IEEE Trans Pattern Anal Mach Intell 6:1013–1018CrossRefGoogle Scholar
  23. 23.
    Wang Y, Lin X, Wu L et al (2017) Effective multi-query expansions: collaborative deep networks for robust landmark retrieval. arXiv preprint arXiv:1701.05003
  24. 24.
    Parthalin NM, Jensen R (2010) Measures for unsupervised fuzzy-rough feature selection. Int J Hybrid Intell Syst 7(4):249–259CrossRefzbMATHGoogle Scholar
  25. 25.
    Zhu X, Wu X, Ding W, et al (2013) Feature selection by joint graph sparse coding. In: Proceedings of the 2013 SIAM international conference on data mining. Society for industrial and applied mathematics, pp 803–811Google Scholar
  26. 26.
    Wu L, Wang Y, Gao J et al (2018) Deep adaptive feature embedding with local sample distributions for person re-identification. Pattern Recognit 73:275–288CrossRefGoogle Scholar
  27. 27.
    Liu J, Ji S, Ye J (2009) Multi-task feature learning via efficient \(\ell _{2,1}-norm\) minimization. In: Proceedings of the twenty-fifth conference on uncertainty in artificial intelligence, AUAI Press, pp 339–348Google Scholar
  28. 28.
    Chartrand R (2007) Exact reconstruction of sparse signals via nonconvex minimization. IEEE Signal Process Lett 14(10):707–710CrossRefGoogle Scholar
  29. 29.
    Zeng J, Lin S, Wang Y et al (2014) \(L_ 1/2\) Regularization: convergence of iterative half thresholding algorithm. IEEE Trans Signal Process 62(9):2317–2329MathSciNetCrossRefzbMATHGoogle Scholar
  30. 30.
    Shao L, Yan R, Li X et al (2014) From heuristic optimization to dictionary learning: a review and comprehensive comparison of image denoising algorithms. IEEE Trans Cybern 44(7):1001–1013CrossRefGoogle Scholar
  31. 31.
    Mairal J, Bach F, Ponce J et al (2010) Online learning for matrix factorization and sparse coding. J Mach Learn Res 11:19–60 (Jan)MathSciNetzbMATHGoogle Scholar
  32. 32.
    Mairal J, Bach F, Ponce J et al (2009) Online dictionary learning for sparse coding. In: Proceedings of the 26th annual international conference on machine learning, ACM, pp 689–696Google Scholar
  33. 33.
    Zheng W, Zhu X, Wen G et al (2018) Unsupervised feature selection by self-paced learning regularization. Pattern Recognit Lett. Google Scholar
  34. 34.
    Yang Y, Shen H T, Ma Z, et al (2011) \(\ell _{2,1}-Norm\) regularized discriminative feature selection for unsupervised learning. In: IJCAI proceedings-international joint conference on artificial intelligence, vol 22, no 1, p 1589Google Scholar
  35. 35.
    Zhu X, Zhang S, Li Y et al (2018) Low-rank sparse subspace for spectral clustering. IEEE Trans Knowl Data Eng. Google Scholar
  36. 36.
    Zhu X, Zhang S, Hu R et al (2018) Local and global structure preservation for robust unsupervised spectral feature selection. IEEE Trans Knowl Data Eng 30(3):517–529CrossRefGoogle Scholar
  37. 37.
    Zheng W, Zhu X, Zhu Y et al (2018) Dynamic graph learning for spectral feature selection. Multimed Tools Appl 77(22):29739–29755CrossRefGoogle Scholar
  38. 38.
    Geng B, Tao D, Xu C et al (2012) Ensemble manifold regularization. IEEE Trans Pattern Anal Mach Intell 34(6):1227–1233CrossRefGoogle Scholar
  39. 39.
    Zhang Q, Zhang L, Zhang L et al (2015) Ensemble manifold regularized sparse low-rank approximation for multiview feature embedding. Pattern Recognit 48(10):3102–3112CrossRefGoogle Scholar
  40. 40.
    Nie F, Huang H, Cai X, et al (2010) Efficient and robust feature selection via joint \(\ell _{2,1}-norm\) minimization. In: Advances in neural information processing systems, pp 1813-1821Google Scholar
  41. 41.
    Zhu X, Zhang S, He W, Hu R, Lei C, Zhu P (2018) One-step multi-view spectral clustering. IEEE Trans Knowl Data Eng.
  42. 42.
    Zhu X, Li X, Zhang S, Zongben X, Litao Y, Wang C (2017) Graph PCA hashing for similarity search. IEEE Trans Multimed 19(9):2033–2044CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2019

Authors and Affiliations

  1. 1.School of Computer and InformationHefei University of TechnologyHefeiChina

Personalised recommendations