Skip to main content
Log in

Multi-class feature selection via Sparse Softmax with a discriminative regularization

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Feature selection plays a critical role in many machine learning applications as it effectively addresses the challenges posed by “the curse of dimensionality” and enhances the generalization capability of trained models. However, existing approaches for multi-class feature selection (MFS) often combine sparse regularization with a simple classification model, such as least squares regression, which can result in suboptimal performance. To address this limitation, this paper introduces a novel MFS method called Sparse Softmax Feature Selection (\(S^2FS\)). \(S^2FS\) combines a \(l_{2,0}\)-norm regularization with the Softmax model to perform feature selection. By utilizing the \(l_{2,0}\)-norm, \(S^2FS\) produces a more precise sparsity solution for the feature selection matrix. Additionally, the Softmax model improves the interpretability of the model’s outputs, thereby enhancing the classification performance. To further enhance discriminative feature selection, a discriminative regularization, derived based on linear discriminate analysis (LDA), is incorporated into the learning model. Furthermore, an efficient optimization algorithm, based on the alternating direction method of multipliers (ADMM), is designed to solve the objective function of \(S^2FS\). Extensive experiments conducted on various datasets demonstrate that \(S^2FS\) achieves higher accuracy in classification tasks compared to several contemporary MFS methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Algorithm 2
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data availability

All data included in this study is available from the first author and can also be found in the manuscript.

Code availability

All codes included in this study is available from the first author upon reasonable request.

References

  1. Schapire RE, Singer Y (2000) Boostexter: a boosting-based system for text categorization. Mach Learn 39:135–168

    Article  Google Scholar 

  2. Gui J, Sun Z, Jia W, Hu R, Lei Y, Ji S (2012) Discriminant sparse neighborhood preserving embedding for face recognition. Pattern Recogn 45(8):2884–2893

    Article  Google Scholar 

  3. Saeys Y, Inza I, Larranaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23(19):2507–2517

    Article  Google Scholar 

  4. Hammer P (1962) Adaptive control processes: a guided tour (R. Bellman). Society for Industrial and Applied Mathematics

    Google Scholar 

  5. Nie F, Wang Z, Wang R, Wang Z, Li X (2019) Towards robust discriminative projections learning via non-greedy \(l_{2,1}\)-norm minmax. IEEE Trans Pattern Anal Mach Intell 43(6):2086–2100

    Article  Google Scholar 

  6. Wang Z, Nie F, Wang R, Yang H, Li X (2021) Local structured feature learning with dynamic maximum entropy graph. Pattern Recogn 111:107673

    Article  Google Scholar 

  7. Tsai C-F, Sung Y-T (2020) Ensemble feature selection in high dimension, low sample size datasets: Parallel and serial combination approaches. Knowl-Based Syst 203:106097

    Article  Google Scholar 

  8. Zhao H, Li Q, Wang Z, Nie F (2022) Joint adaptive graph learning and discriminative analysis for unsupervised feature selection. Cogn Comput 14(3):1211–1221

    Article  Google Scholar 

  9. Wu X, Xu X, Liu J, Wang H, Hu B, Nie F (2020) Supervised feature selection with orthogonal regression and feature weighting. IEEE Trans Neural Netw Learn Syst 32(5):1831–1838

    Article  MathSciNet  Google Scholar 

  10. Nie F, Wang Z, Tian L, Wang R, Li X (2022) Subspace sparse discriminative feature selection. IEEE Trans Cybern 52(6):4221–4233

    Article  Google Scholar 

  11. Chen X, Yuan G, Nie F, Huang JZ (2017) Semi-supervised feature selection via rescaled linear regression. In: Twenty-Sixth International Joint conference on artificial intelligence, pp 1525–1531

  12. Luo T, Hou C, Nie F, Tao H, Yi D (2018) Semi-supervised feature selection via insensitive sparse regression with application to video semantic recognition. IEEE Trans Knowl Data Eng 30(10):1943–1956

    Article  Google Scholar 

  13. Lai J, Chen H, Li W, Li T, Wan J (2022) Semi-supervised feature selection via adaptive structure learning and constrained graph learning. Knowl-Based Syst 251:109243

    Article  Google Scholar 

  14. Cao Z, Xie X, Feixiang S (2023) Adaptive unsupervised feature selection with robust graph regularization. Int J Mach Learn Cybern 15(2):341–354

    Article  Google Scholar 

  15. Tang C, Zheng X, Zhang W, Liu X, Zhu X, Zhu E (2023) Unsupervised feature selection via multiple graph fusion and feature weight learning. SCIENCE CHINA Inf Sci 66(5):56–72

    Article  MathSciNet  Google Scholar 

  16. Wang R, Zhang C, Bian J, Wang Z, Nie F, Li X (2023) Sparse and flexible projections for unsupervised feature selection. IEEE Trans Knowl Data Eng 35:6362–6375

    Google Scholar 

  17. Nie F, Huang H, Cai X, Ding C (2010) Efficient and robust feature selection via joint \(l_{2,1}\)-norms minimization. In: Proceedings of the 23rd International Conference on Neural Information Processing Systems, Vol. 2, pp. 1813–1821

  18. Cai X, Nie F, Huang H, Ding C (2011) multi-class \(l_{2,1}\)-norm support vector machine. In: 11th IEEE International Conference on data mining, pp 91–100

  19. Zhang J, Yu J, Wan J, Zeng Z (2015) \(l_{2,1}\)-norm regularized fisher criterion for optimal feature selection. Neurocomputing 166:455–463

    Article  Google Scholar 

  20. Zhang M, Ding C, Zhang Y, Nie F (2014) Feature selection at the discrete limit. In: Proceedings of the AAAI Conference on artificial intelligence, vol. 28

  21. Du X, Yan Y, Pan P, Long G, Zhao L (2016) Multiple graph unsupervised feature selection. Signal Process 120:754–760

    Article  Google Scholar 

  22. Shi Y, Miao J, Wang Z, Zhang P, Niu L (2018) Feature selection with \(l_{2,1--2}\) regularization. IEEE Trans Neural Netw Learn Syst 29:4967–4982

    Article  MathSciNet  Google Scholar 

  23. Shi Y, Miao J, Niu L (2019) Feature selection with mcp\(^2\) regularization. Neural Comput Appl 31:6699–6709

    Article  Google Scholar 

  24. Cai X, Nie F, Huang H (2013) Exact top-k feature selection via \(l_{2,0}\)-norm constraint. In: Twenty-third International Joint Conference on artificial intelligence

  25. Pang T, Nie F, Han J, Li X (2019) Efficient feature selection via \(l_{2,0}\)-norm constrained sparse regression. IEEE Trans Knowl Data Eng 31:880–893

    Article  Google Scholar 

  26. Wang Z, Nie F, Tian L, Wang R, Li X (2020) Discriminative feature selection via a structured sparse subspace learning module. In: Twenty-ninth International Joint Conference on artificial intelligence, pp 3009–3015

  27. Sun Z, Yu Y (2022) Robust multi-class feature selection via \(l_{2,0}\)-norm regularization minimization. Intell Data Anal 26(1):57–73

    Article  Google Scholar 

  28. Zhang R, Nie F, Li X (2018) Feature selection under regularized orthogonal least square regression with optimal scaling. Neurocomputing 273(17):547–553

    Article  Google Scholar 

  29. Li Y, Li T, Liu H (2017) Recent advances in feature selection and its applications. Knowl Inf Syst 53:551–577

    Article  Google Scholar 

  30. Bishop CM (1995) Neural networks for pattern recognition. Agric Eng Int Cigr J Sci Res Dev Manuscr Pm 12(5):1235–1242

    Google Scholar 

  31. Ding C, Peng H (2005) Minimum redundancy feature selection from microarray gene expression data. J Bioinform Comput Biol 3(02):185–205

    Article  Google Scholar 

  32. Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238

    Article  Google Scholar 

  33. Hussain S, Shahzadi F, Munir B (2022) Constrained class-wise feature selection (ccfs). Int J Mach Learn Cybern 13:3211–3224

    Article  Google Scholar 

  34. Tabakhi S, Moradi P (2015) Relevance-redundancy feature selection based on ant colony optimization. Pattern Recogn 48(9):2798–2811

    Article  Google Scholar 

  35. Zhao W, Wang L, Zhang Z (2019) Atom search optimization and its application to solve a hydrogeologic parameter estimation problem. Knowl-Based Syst 163:283–304

    Article  Google Scholar 

  36. Sun L, Si S, Xu J, Zhang Y (2023) Bssfs: binary sparrow search algorithm for feature selection. Int J Mach Learn Cybern 14:2633–2657

    Article  Google Scholar 

  37. Tan J, Zhang Z, Zhen L, Zhang C (2013) Adaptive feature selection via a new version of support vector machine. Neural Comput Appl 23(3–4):937–945

    Article  Google Scholar 

  38. Ye Y, Shao Y, Deng N, Li C, Hua X (2017) Robust \(l_p\)-norm least squares support vector regression with feature selection. Appl Math Comput 305:32–52

    MathSciNet  Google Scholar 

  39. Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodol) 58(1):267–288

    Article  MathSciNet  Google Scholar 

  40. Liu J, Chen J, Ye J (2009) Large-scale sparse logistic regression. In: 15th ACM SIGKDD International Conference on knowledge discovery and data mining, pp 547–556

  41. Xie Z, Xu Y (2014) Sparse group lasso based uncertain feature selection. Int J Mach Learn Cybern 5:201–210

    Article  Google Scholar 

  42. Peng Y, Sehdev P, Liu S, Li J, Wang X (2018) \(l_{2,1}\)-norm minimization based negative label relaxation linear regression for feature selection. Pattern Recogn Lett 116:170–178

    Article  Google Scholar 

  43. Lu Z, Chu Q (2023) Feature selection using class-level regularized self-representation. Appl Intell 53:13130–13144

    Article  Google Scholar 

  44. Tao H, Hou C, Nie F, Jiao Y, Yi D (2015) Effective discriminative feature selection with nontrivial solution. IEEE Trans Neural Netw Learn Syst 27(4):796–808

    Article  MathSciNet  Google Scholar 

  45. Yang Z, Ye Q, Chen Q, Ma X, Fu L, Yang G, Yan H, Liu F (2020) Robust discriminant feature selection via joint \(l_{2,1}\)-norm distance minimization and maximization. Knowl-Based Syst 207:106090

    Article  Google Scholar 

  46. Nie F, Zhang R, Li X (2017) A generalized power iteration method for solving quadratic problem on the Stiefel manifold. SCIENCE CHINA Inf Sci 60:1–10

    Article  MathSciNet  Google Scholar 

  47. Powell MJ (1969) A method for nonlinear constraints in minimization problems. Optimization, pp 283–298

  48. Boyd S, Parikh N, Chu E, Peleato B, Eckstein J et al (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends Mach Learn 3(1):1–122

    Article  Google Scholar 

  49. Zuo Z, Li J, Moubayed NA (2021) Curvature-based feature selection with application in classifying electronic health records. Technol Forecast Soc Change 173:121127

    Article  Google Scholar 

  50. Talukdar U, Hazarika SM, Gan JQ (2018) A kernel partial least square based feature selection method. Pattern Recogn 83:91–106

    Article  Google Scholar 

  51. Demiar J, Schuurmans D (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7(1):1–30

    MathSciNet  Google Scholar 

Download references

Acknowledgements

This work is supported by Grants from the Natural Science Foundation of Xiamen (No. 3502Z20227045), the National Natural Science Foundation of China (No. 61873067), and the Scientific Research Funds of Huaqiao University(No. 20211XD066). And this article was partly supported by the Natural Science Foundation of Fujian Province (No. 2022J01317) and the Fundamental Research Funds for the Central Universities of Huaqiao University under Grant ZQN-1115.

Funding

The first author is supported by the Natural Science Foundation of Xiamen (No. 3502Z20227045) and the Scientific Research Funds of Huaqiao University (No. 21BS121).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Zhenzhen Sun or Yuanlong Yu.

Ethics declarations

Conflict of interest

The authors declare that they have no Conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Informed consent

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sun, Z., Chen, Z., Liu, J. et al. Multi-class feature selection via Sparse Softmax with a discriminative regularization. Int. J. Mach. Learn. & Cyber. (2024). https://doi.org/10.1007/s13042-024-02185-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13042-024-02185-5

Keywords

Navigation