Abstract
Feature selection plays a critical role in many machine learning applications as it effectively addresses the challenges posed by “the curse of dimensionality” and enhances the generalization capability of trained models. However, existing approaches for multi-class feature selection (MFS) often combine sparse regularization with a simple classification model, such as least squares regression, which can result in suboptimal performance. To address this limitation, this paper introduces a novel MFS method called Sparse Softmax Feature Selection (\(S^2FS\)). \(S^2FS\) combines a \(l_{2,0}\)-norm regularization with the Softmax model to perform feature selection. By utilizing the \(l_{2,0}\)-norm, \(S^2FS\) produces a more precise sparsity solution for the feature selection matrix. Additionally, the Softmax model improves the interpretability of the model’s outputs, thereby enhancing the classification performance. To further enhance discriminative feature selection, a discriminative regularization, derived based on linear discriminate analysis (LDA), is incorporated into the learning model. Furthermore, an efficient optimization algorithm, based on the alternating direction method of multipliers (ADMM), is designed to solve the objective function of \(S^2FS\). Extensive experiments conducted on various datasets demonstrate that \(S^2FS\) achieves higher accuracy in classification tasks compared to several contemporary MFS methods.
Similar content being viewed by others
Data availability
All data included in this study is available from the first author and can also be found in the manuscript.
Code availability
All codes included in this study is available from the first author upon reasonable request.
References
Schapire RE, Singer Y (2000) Boostexter: a boosting-based system for text categorization. Mach Learn 39:135–168
Gui J, Sun Z, Jia W, Hu R, Lei Y, Ji S (2012) Discriminant sparse neighborhood preserving embedding for face recognition. Pattern Recogn 45(8):2884–2893
Saeys Y, Inza I, Larranaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23(19):2507–2517
Hammer P (1962) Adaptive control processes: a guided tour (R. Bellman). Society for Industrial and Applied Mathematics
Nie F, Wang Z, Wang R, Wang Z, Li X (2019) Towards robust discriminative projections learning via non-greedy \(l_{2,1}\)-norm minmax. IEEE Trans Pattern Anal Mach Intell 43(6):2086–2100
Wang Z, Nie F, Wang R, Yang H, Li X (2021) Local structured feature learning with dynamic maximum entropy graph. Pattern Recogn 111:107673
Tsai C-F, Sung Y-T (2020) Ensemble feature selection in high dimension, low sample size datasets: Parallel and serial combination approaches. Knowl-Based Syst 203:106097
Zhao H, Li Q, Wang Z, Nie F (2022) Joint adaptive graph learning and discriminative analysis for unsupervised feature selection. Cogn Comput 14(3):1211–1221
Wu X, Xu X, Liu J, Wang H, Hu B, Nie F (2020) Supervised feature selection with orthogonal regression and feature weighting. IEEE Trans Neural Netw Learn Syst 32(5):1831–1838
Nie F, Wang Z, Tian L, Wang R, Li X (2022) Subspace sparse discriminative feature selection. IEEE Trans Cybern 52(6):4221–4233
Chen X, Yuan G, Nie F, Huang JZ (2017) Semi-supervised feature selection via rescaled linear regression. In: Twenty-Sixth International Joint conference on artificial intelligence, pp 1525–1531
Luo T, Hou C, Nie F, Tao H, Yi D (2018) Semi-supervised feature selection via insensitive sparse regression with application to video semantic recognition. IEEE Trans Knowl Data Eng 30(10):1943–1956
Lai J, Chen H, Li W, Li T, Wan J (2022) Semi-supervised feature selection via adaptive structure learning and constrained graph learning. Knowl-Based Syst 251:109243
Cao Z, Xie X, Feixiang S (2023) Adaptive unsupervised feature selection with robust graph regularization. Int J Mach Learn Cybern 15(2):341–354
Tang C, Zheng X, Zhang W, Liu X, Zhu X, Zhu E (2023) Unsupervised feature selection via multiple graph fusion and feature weight learning. SCIENCE CHINA Inf Sci 66(5):56–72
Wang R, Zhang C, Bian J, Wang Z, Nie F, Li X (2023) Sparse and flexible projections for unsupervised feature selection. IEEE Trans Knowl Data Eng 35:6362–6375
Nie F, Huang H, Cai X, Ding C (2010) Efficient and robust feature selection via joint \(l_{2,1}\)-norms minimization. In: Proceedings of the 23rd International Conference on Neural Information Processing Systems, Vol. 2, pp. 1813–1821
Cai X, Nie F, Huang H, Ding C (2011) multi-class \(l_{2,1}\)-norm support vector machine. In: 11th IEEE International Conference on data mining, pp 91–100
Zhang J, Yu J, Wan J, Zeng Z (2015) \(l_{2,1}\)-norm regularized fisher criterion for optimal feature selection. Neurocomputing 166:455–463
Zhang M, Ding C, Zhang Y, Nie F (2014) Feature selection at the discrete limit. In: Proceedings of the AAAI Conference on artificial intelligence, vol. 28
Du X, Yan Y, Pan P, Long G, Zhao L (2016) Multiple graph unsupervised feature selection. Signal Process 120:754–760
Shi Y, Miao J, Wang Z, Zhang P, Niu L (2018) Feature selection with \(l_{2,1--2}\) regularization. IEEE Trans Neural Netw Learn Syst 29:4967–4982
Shi Y, Miao J, Niu L (2019) Feature selection with mcp\(^2\) regularization. Neural Comput Appl 31:6699–6709
Cai X, Nie F, Huang H (2013) Exact top-k feature selection via \(l_{2,0}\)-norm constraint. In: Twenty-third International Joint Conference on artificial intelligence
Pang T, Nie F, Han J, Li X (2019) Efficient feature selection via \(l_{2,0}\)-norm constrained sparse regression. IEEE Trans Knowl Data Eng 31:880–893
Wang Z, Nie F, Tian L, Wang R, Li X (2020) Discriminative feature selection via a structured sparse subspace learning module. In: Twenty-ninth International Joint Conference on artificial intelligence, pp 3009–3015
Sun Z, Yu Y (2022) Robust multi-class feature selection via \(l_{2,0}\)-norm regularization minimization. Intell Data Anal 26(1):57–73
Zhang R, Nie F, Li X (2018) Feature selection under regularized orthogonal least square regression with optimal scaling. Neurocomputing 273(17):547–553
Li Y, Li T, Liu H (2017) Recent advances in feature selection and its applications. Knowl Inf Syst 53:551–577
Bishop CM (1995) Neural networks for pattern recognition. Agric Eng Int Cigr J Sci Res Dev Manuscr Pm 12(5):1235–1242
Ding C, Peng H (2005) Minimum redundancy feature selection from microarray gene expression data. J Bioinform Comput Biol 3(02):185–205
Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238
Hussain S, Shahzadi F, Munir B (2022) Constrained class-wise feature selection (ccfs). Int J Mach Learn Cybern 13:3211–3224
Tabakhi S, Moradi P (2015) Relevance-redundancy feature selection based on ant colony optimization. Pattern Recogn 48(9):2798–2811
Zhao W, Wang L, Zhang Z (2019) Atom search optimization and its application to solve a hydrogeologic parameter estimation problem. Knowl-Based Syst 163:283–304
Sun L, Si S, Xu J, Zhang Y (2023) Bssfs: binary sparrow search algorithm for feature selection. Int J Mach Learn Cybern 14:2633–2657
Tan J, Zhang Z, Zhen L, Zhang C (2013) Adaptive feature selection via a new version of support vector machine. Neural Comput Appl 23(3–4):937–945
Ye Y, Shao Y, Deng N, Li C, Hua X (2017) Robust \(l_p\)-norm least squares support vector regression with feature selection. Appl Math Comput 305:32–52
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodol) 58(1):267–288
Liu J, Chen J, Ye J (2009) Large-scale sparse logistic regression. In: 15th ACM SIGKDD International Conference on knowledge discovery and data mining, pp 547–556
Xie Z, Xu Y (2014) Sparse group lasso based uncertain feature selection. Int J Mach Learn Cybern 5:201–210
Peng Y, Sehdev P, Liu S, Li J, Wang X (2018) \(l_{2,1}\)-norm minimization based negative label relaxation linear regression for feature selection. Pattern Recogn Lett 116:170–178
Lu Z, Chu Q (2023) Feature selection using class-level regularized self-representation. Appl Intell 53:13130–13144
Tao H, Hou C, Nie F, Jiao Y, Yi D (2015) Effective discriminative feature selection with nontrivial solution. IEEE Trans Neural Netw Learn Syst 27(4):796–808
Yang Z, Ye Q, Chen Q, Ma X, Fu L, Yang G, Yan H, Liu F (2020) Robust discriminant feature selection via joint \(l_{2,1}\)-norm distance minimization and maximization. Knowl-Based Syst 207:106090
Nie F, Zhang R, Li X (2017) A generalized power iteration method for solving quadratic problem on the Stiefel manifold. SCIENCE CHINA Inf Sci 60:1–10
Powell MJ (1969) A method for nonlinear constraints in minimization problems. Optimization, pp 283–298
Boyd S, Parikh N, Chu E, Peleato B, Eckstein J et al (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends Mach Learn 3(1):1–122
Zuo Z, Li J, Moubayed NA (2021) Curvature-based feature selection with application in classifying electronic health records. Technol Forecast Soc Change 173:121127
Talukdar U, Hazarika SM, Gan JQ (2018) A kernel partial least square based feature selection method. Pattern Recogn 83:91–106
Demiar J, Schuurmans D (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7(1):1–30
Acknowledgements
This work is supported by Grants from the Natural Science Foundation of Xiamen (No. 3502Z20227045), the National Natural Science Foundation of China (No. 61873067), and the Scientific Research Funds of Huaqiao University(No. 20211XD066). And this article was partly supported by the Natural Science Foundation of Fujian Province (No. 2022J01317) and the Fundamental Research Funds for the Central Universities of Huaqiao University under Grant ZQN-1115.
Funding
The first author is supported by the Natural Science Foundation of Xiamen (No. 3502Z20227045) and the Scientific Research Funds of Huaqiao University (No. 21BS121).
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflict of interest
The authors declare that they have no Conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Informed consent
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Sun, Z., Chen, Z., Liu, J. et al. Multi-class feature selection via Sparse Softmax with a discriminative regularization. Int. J. Mach. Learn. & Cyber. (2024). https://doi.org/10.1007/s13042-024-02185-5
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13042-024-02185-5