Nonlinear sparse feature selection algorithm via low matrix rank constraint

Abstract

The characteristics of non-linear, low-rank, and feature redundancy often appear in high-dimensional data, which have great trouble for further research. Therefore, a low-rank unsupervised feature selection algorithm based on kernel function is proposed. Firstly, each feature is projected into the high-dimensional kernel space by the kernel function to solve the problem of linear inseparability in the low-dimensional space. At the same time, the self-expression form is introduced into the deviation term and the coefficient matrix is processed with low rank and sparsity. Finally, the sparse regularization factor of the coefficient vector of the kernel matrix is introduced to implement feature selection. In this algorithm, kernel matrix is used to solve linear inseparability, low rank constraints to consider the global information of the data, and self-representation form determines the importance of features. Experiments show that comparing with other algorithms, the classification after feature selection using this algorithm can achieve good results.

This is a preview of subscription content, log in to check access.

Fig. 1

Notes

  1. 1.

    http://archive.ics.uci.edu/ml/.

References

  1. 1.

    Bach F (2008) Exploring large feature spaces with hierarchical multiple kernel learning. In: Advances in neural information processing systems, p 2008

  2. 2.

    Cai D, Zhang C, He X (2010) Unsupervised feature selection for multi-cluster data. In: ACM SIGKDD International conference on knowledge discovery and data mining, pp 333–342

  3. 3.

    Chang X, Nie F, Yang Y, Huang H (2014) A convex formulation for semi-supervised multi-label feature selection. In: Twenty-eighth AAAI conference on artificial intelligence, pp 1171–1177

  4. 4.

    Chen X, Yuan G, Nie F, Huang J (2017) Semi-supervised feature selection via rescaled linear regression. In: Twenty-sixth international joint conference on artificial intelligence, pp 1525–1531

  5. 5.

    Daubechies I, Devore R, Fornasier M (2010) Iteratively reweighted least squares minimization for sparse recovery. Commun Pur Appl Math 63(1):1–38

    MathSciNet  Article  Google Scholar 

  6. 6.

    Fan Z, Xu Y, Zhang D (2011) Local linear discriminant analysis framework using sample neighbors. IEEE Trans Neural Netw 22(7):1119

    Article  Google Scholar 

  7. 7.

    Feng S, Lu H, Long X (2015) Discriminative dictionary learning based on supervised feature selection for image classification. In: Seventh international symposium on computational intelligence and design, pp 225–228

  8. 8.

    Gao L, Guo Z, Zhang H, Xu X, Shen H (2017) Video captioning with attention-based LSTM and semantic consistency. IEEE Trans Multimedia 19(9):2045–2055

    Article  Google Scholar 

  9. 9.

    Gu Q, Li Z, Han J (2011) Joint feature selection and subspace learning. In: International joint conference on artificial intelligence, pp 1294–1299

  10. 10.

    Gu Q, Li Z, Han J (2011) Linear discriminant dimensionality reduction. In: European conference on machine learning and knowledge discovery in databases, pp 549–564

    Google Scholar 

  11. 11.

    Han Y, Yang Y, Yan Y, Ma Z, Sebe N, Zhou X (2015) Semisupervised feature selection via spline regression for video semantic recognition. IEEE Trans Neural Netw Learn Syst 26(2):252–264

    MathSciNet  Article  Google Scholar 

  12. 12.

    He X, Cai D, Niyogi P (2005) Laplacian score for feature selection. In: International conference on neural information processing systems, pp 507–514

  13. 13.

    Jawanpuria P, Nath J, Ramakrishnan G (2015) Generalized hierarchical kernel learning. JMLR.org

  14. 14.

    Kimeldorf G, Wahba G (1970) A correspondence between bayesian estimation on stochastic processes and smoothing by splines. Ann Math Stat 41(2):495–502

    MathSciNet  Article  Google Scholar 

  15. 15.

    Lei C, Zhu X (2018) Unsupervised feature selection via local structure learning and sparse learning. Multimedia Tools and Applications 77(22):29605–29622

    Article  Google Scholar 

  16. 16.

    Li J, Hu X, Wu L, Liu H (2016) Robust unsupervised feature selection on networked data. In: Siam international conference on data mining, pp 387–395

  17. 17.

    Ling C, Yang Q, Wang J, Zhang S (2004) Decision trees with minimal costs. In: International conference on machine learning, p 69

  18. 18.

    Liu H, Lafferty J, Wasserman L (2008) Nonparametric regression and classification with joint sparsity constraints. In: Advances in neural information processing systems, pp 969–976

  19. 19.

    Lu C, Lin Z, Yan S (2014) Smoothed low rank and sparse matrix recovery by iteratively reweighted least squares minimization. IEEE Trans Image Process 24(2):646–54

    MathSciNet  MATH  Google Scholar 

  20. 20.

    Ma Z, Nie F, Yang Y, Uijlings J, Sebe N (2012) Web image annotation via subspace-sparsity collaborated feature selection. IEEE Trans Multimedia 14(4):1021–1030

    Article  Google Scholar 

  21. 21.

    Ma Z, Yang Y, Nie F, Uijlings J, Sebe N (2011) Exploiting the entire feature space with sparsity for automatic image annotation. In: ACM International conference on multimedia, pp 283–292

  22. 22.

    Muller K, Mika S, Ratsch G, Tsuda K, Scholkopf B (2008) An introduction to kernel-based learning algorithms. IEEE Trans Neural Netw 12(2):181

    Article  Google Scholar 

  23. 23.

    Nie F, Zhu W, Li X (2016) Unsupervised feature selection with structured graph optimization. In: Thirtieth AAAI conference on artificial intelligence, pp 1302–1308

  24. 24.

    Paruolo P (1998) Multivariate reduced-rank regression: theory and applications. J Am Stat Assoc 95(450):369–370

    Google Scholar 

  25. 25.

    Raskutti G, Wainwright M, Yu B (2010) Minimax-optimal rates for sparse additive models over kernel classes via convex programming. Technical Report 13(2):389–427

    MathSciNet  MATH  Google Scholar 

  26. 26.

    Ravikumar P, Lafferty J, Liu H, Wasserman L (2009) Sparse additive models. J R Stat Soc 71(5):1009–1030

    MathSciNet  Article  Google Scholar 

  27. 27.

    Sun Y, Yao J, Goodison S (2015) Feature selection for nonlinear regression and its application to cancer research. In: International conference on data mining, pp 73–81

  28. 28.

    Suzuki T, Sugiyama M (2013) Fast learning rate of multiple kernel learning trade-off between sparsity and smoothness. Ann Stat 41(3):1381–1405

    MathSciNet  Article  Google Scholar 

  29. 29.

    Tan M, Tsang I, Wang L (2014) Towards ultrahigh dimensional feature selection for big data. JMLR.org

  30. 30.

    Varma M, Babu B (2009) More generality in efficient multiple kernel learning. In: International conference on machine learning, pp 1065–1072

  31. 31.

    Wang H, Yu J (2006) Study on the kernel-based methods and its model selection. Journal of Southern Yangtze University (Natural Science Edition) 5(4):500–504

    Google Scholar 

  32. 32.

    Wang S, Tang J, Liu H (2015) Embedded unsupervised feature selection. In: International conference on swarm intelligence, pp 1044–1051

  33. 33.

    Wu F, Yuan Y, Zhuang Y (2010) Heterogeneous feature selection by group lasso with logistic regression. In: ACM International conference on multimedia, pp 983–986

  34. 34.

    Yamada M, Jitkrittum W, Sigal L, Xing E (2014) High-dimensional feature selection by feature-wise kernelized lasso. Neural Comput 26(1):185–207

    MathSciNet  Article  Google Scholar 

  35. 35.

    Yang Y, Zha Z, Gao Y, Zhu X, Chua T (2014) Exploiting web images for semantic video indexing via robust sample-specific loss. IEEE Trans Multimedia 16(6):1677–1689

    Article  Google Scholar 

  36. 36.

    Zhang C, Zhang S (2002) Association rule mining: models and algorithms. Springer, Berlin Heidelberg

    Google Scholar 

  37. 37.

    Zhang S, Li X, Zong M, Zhu X, Cheng D (2017) Learning k for knn classification. ACM Trans Intell Syst Technol 8(3):43

    Google Scholar 

  38. 38.

    Zhang S, Li X, Zong M, Zhu X, Wang R (2018) Efficient knn classification with different numbers of nearest neighbors. IEEE Transactions on Neural Networks and Learning Systems 29(5):1774–1785

    MathSciNet  Article  Google Scholar 

  39. 39.

    Zhang S, Qin Z, Ling C, Sheng S (2005) “missing is useful”: missing values in cost-sensitive decision trees. IEEE Trans Knowl Data Eng 17(12):1689–1693

    Article  Google Scholar 

  40. 40.

    Zhao J, Lu K, He X (2008) Locality sensitive semi-supervised feature selection. Neurocomputing 71(10):1842–1849

    Article  Google Scholar 

  41. 41.

    Zhao Z, Liu H (2007) Semi-supervised feature selection via spectral analysis. In: Siam international conference on data mining

  42. 42.

    Zheng W, Zhu X, Wen G, Zhu Y, Yu H, Gan J (2018) Unsupervised feature selection by self-paced learning regularization. Pattern Recogn Lett. https://doi.org/10.1016/j.patrec.2018.06.029

  43. 43.

    Zheng W, Zhu X, Zhu Y, Hu R, Lei C (2018) Dynamic graph learning for spectral feature selection. Multimedia Tools and Applications 77(22):29739–29755

    Article  Google Scholar 

  44. 44.

    Zhou Z (2016) Machine learning. Tsinghua University Press, Beijing

    Google Scholar 

  45. 45.

    Zhu P, Zuo W, Zhang L, Hu Q, Shiu S (2015) Unsupervised feature selection by regularized self-representation. Pattern Recogn 48(2):438–446

    Article  Google Scholar 

  46. 46.

    Zhu X, Huang Z, Shen H, Zhao X (2013) Linear cross-modal hashing for efficient multimedia search. In: 21St ACM international conference on multimedia, pp 143–152

  47. 47.

    Zhu X, Li X, Zhang S (2016) Block-row sparse multiview multilabel learning for image classification. IEEE Transactions on Cybernetics 46(2):450–461

    Article  Google Scholar 

  48. 48.

    Zhu X, Li X, Zhang S, Ju C, Wu X (2017) Robust joint graph sparse coding for unsupervised spectral feature selection. IEee Transactions on Neural Networks and Learning Systems 28(6):1263–1275

    MathSciNet  Article  Google Scholar 

  49. 49.

    Zhu X, Li X, Zhang S, Xu Z, Yu L, Wang C (2017) Graph pca hashing for similarity search. IEEE Trans Multimedia 19(9):2033–2044

    Article  Google Scholar 

  50. 50.

    Zhu X, Zhang L, Huang Z (2014) A sparse embedding and least variance encoding approach to hashing. IEEE Trans Image Process 23(9):3737–3750

    MathSciNet  Article  Google Scholar 

  51. 51.

    Zhu X, Zhang S, He W, Hu R, Lei C, Zhu P (2018) One-step multi-view spectral clustering. IEEE Trans Knowl Data Eng. https://doi.org/10.1109/TKDE.2018.2873378

    Article  Google Scholar 

  52. 52.

    Zhu X, Zhang S, Jin Z, Zhang Z, Xu Z (2011) Missing value estimation for mixed-attribute data sets. IEEE Trans Knowl Data Eng 23(1):110–121

    Article  Google Scholar 

  53. 53.

    Zhu X, Zhang S, Li Y, Zhang J, Yang L, Fang Y (2018) Low-rank sparse subspace for spectral clustering. IEEE Trans Knowl Data Eng. https://doi.org/10.1109/TKDE.2018.2858782

    Article  Google Scholar 

  54. 54.

    Zhu Y, Kim M, Zhu X, Yan J, Kaufer D, Wu G (2017) Personalized diagnosis for alzheimer’s disease. In: International conference on medical image computing and computer-assisted intervention, pp 205–213

    Google Scholar 

  55. 55.

    Zhu Y, Lucey S (2015) Convolutional sparse coding for trajectory reconstruction. IEEE Trans Pattern Anal Mach Intell 37(3):529–540

    Article  Google Scholar 

  56. 56.

    Zhu Y, Zhu X, Kim M, Kaufer D, Wu G (2017) A novel dynamic hyper-graph inference framework for computer assisted diagnosis of neuro-diseases. In: International conference on information processing in medical imaging, pp 158–169

    Google Scholar 

Download references

Acknowledgements

This work is partially supported by the China Key Research Program (Grant No: 2016YFB1000905); the Key Program of the National Natural Science Foundation of China (Grant No: 61836016); the Natural Science Foundation of China (Grants No: 61876046, 61573270, 81701780 and 61672177); the Project of Guangxi Science and Technology (GuiKeAD17195062); the Guangxi Natural Science Foundation (Grant No: 2015GXNSFCB139011, 2017GXNSFBA198221); the Guangxi Collaborative Innovation Center of Multi-Source Information Integration and Intelligent Processing; the Guangxi High Institutions Program of Introducing 100 High-Level Overseas Talents; and the Research Fund of Guangxi Key Lab of Multisource Information Mining & Security.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Yangding Li.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Zhang, L., Li, Y., Zhang, J. et al. Nonlinear sparse feature selection algorithm via low matrix rank constraint. Multimed Tools Appl 78, 33319–33337 (2019). https://doi.org/10.1007/s11042-018-6909-1

Download citation

Keywords

  • Feature selection
  • Kernel function
  • Subspace learning
  • Low rank representation
  • Sparse processing