Multi-class feature selection via Sparse Softmax with a discriminative regularization

Sun, Zhenzhen; Chen, Zexiang; Liu, Jinghua; Yu, Yuanlong

doi:10.1007/s13042-024-02185-5

Multi-class feature selection via Sparse Softmax with a discriminative regularization

Original Article
Published: 05 May 2024

(2024)
Cite this article

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Zhenzhen Sun^1,2,
Zexiang Chen¹,
Jinghua Liu¹ &
…
Yuanlong Yu³

57 Accesses
Explore all metrics

Abstract

Feature selection plays a critical role in many machine learning applications as it effectively addresses the challenges posed by “the curse of dimensionality” and enhances the generalization capability of trained models. However, existing approaches for multi-class feature selection (MFS) often combine sparse regularization with a simple classification model, such as least squares regression, which can result in suboptimal performance. To address this limitation, this paper introduces a novel MFS method called Sparse Softmax Feature Selection ($S^2FS$). $S^2FS$ combines a $l_{2,0}$-norm regularization with the Softmax model to perform feature selection. By utilizing the $l_{2,0}$-norm, $S^2FS$ produces a more precise sparsity solution for the feature selection matrix. Additionally, the Softmax model improves the interpretability of the model’s outputs, thereby enhancing the classification performance. To further enhance discriminative feature selection, a discriminative regularization, derived based on linear discriminate analysis (LDA), is incorporated into the learning model. Furthermore, an efficient optimization algorithm, based on the alternating direction method of multipliers (ADMM), is designed to solve the objective function of $S^2FS$. Extensive experiments conducted on various datasets demonstrate that $S^2FS$ achieves higher accuracy in classification tasks compared to several contemporary MFS methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-class Feature Selection Based on Softmax with $$L_{2,0}$$ -Norm Regularization

Feature selection with MCP $$^2$$ regularization

Article 26 April 2018

Effective Learning with Joint Discriminative and Representative Feature Selection

Data availability

All data included in this study is available from the first author and can also be found in the manuscript.

Code availability

All codes included in this study is available from the first author upon reasonable request.

References

Schapire RE, Singer Y (2000) Boostexter: a boosting-based system for text categorization. Mach Learn 39:135–168
Article Google Scholar
Gui J, Sun Z, Jia W, Hu R, Lei Y, Ji S (2012) Discriminant sparse neighborhood preserving embedding for face recognition. Pattern Recogn 45(8):2884–2893
Article Google Scholar
Saeys Y, Inza I, Larranaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23(19):2507–2517
Article Google Scholar
Hammer P (1962) Adaptive control processes: a guided tour (R. Bellman). Society for Industrial and Applied Mathematics
Google Scholar
Nie F, Wang Z, Wang R, Wang Z, Li X (2019) Towards robust discriminative projections learning via non-greedy $l_{2,1}$-norm minmax. IEEE Trans Pattern Anal Mach Intell 43(6):2086–2100
Article Google Scholar
Wang Z, Nie F, Wang R, Yang H, Li X (2021) Local structured feature learning with dynamic maximum entropy graph. Pattern Recogn 111:107673
Article Google Scholar
Tsai C-F, Sung Y-T (2020) Ensemble feature selection in high dimension, low sample size datasets: Parallel and serial combination approaches. Knowl-Based Syst 203:106097
Article Google Scholar
Zhao H, Li Q, Wang Z, Nie F (2022) Joint adaptive graph learning and discriminative analysis for unsupervised feature selection. Cogn Comput 14(3):1211–1221
Article Google Scholar
Wu X, Xu X, Liu J, Wang H, Hu B, Nie F (2020) Supervised feature selection with orthogonal regression and feature weighting. IEEE Trans Neural Netw Learn Syst 32(5):1831–1838
Article MathSciNet Google Scholar
Nie F, Wang Z, Tian L, Wang R, Li X (2022) Subspace sparse discriminative feature selection. IEEE Trans Cybern 52(6):4221–4233
Article Google Scholar
Chen X, Yuan G, Nie F, Huang JZ (2017) Semi-supervised feature selection via rescaled linear regression. In: Twenty-Sixth International Joint conference on artificial intelligence, pp 1525–1531
Luo T, Hou C, Nie F, Tao H, Yi D (2018) Semi-supervised feature selection via insensitive sparse regression with application to video semantic recognition. IEEE Trans Knowl Data Eng 30(10):1943–1956
Article Google Scholar
Lai J, Chen H, Li W, Li T, Wan J (2022) Semi-supervised feature selection via adaptive structure learning and constrained graph learning. Knowl-Based Syst 251:109243
Article Google Scholar
Cao Z, Xie X, Feixiang S (2023) Adaptive unsupervised feature selection with robust graph regularization. Int J Mach Learn Cybern 15(2):341–354
Article Google Scholar
Tang C, Zheng X, Zhang W, Liu X, Zhu X, Zhu E (2023) Unsupervised feature selection via multiple graph fusion and feature weight learning. SCIENCE CHINA Inf Sci 66(5):56–72
Article MathSciNet Google Scholar
Wang R, Zhang C, Bian J, Wang Z, Nie F, Li X (2023) Sparse and flexible projections for unsupervised feature selection. IEEE Trans Knowl Data Eng 35:6362–6375
Google Scholar
Nie F, Huang H, Cai X, Ding C (2010) Efficient and robust feature selection via joint $l_{2,1}$-norms minimization. In: Proceedings of the 23rd International Conference on Neural Information Processing Systems, Vol. 2, pp. 1813–1821
Cai X, Nie F, Huang H, Ding C (2011) multi-class $l_{2,1}$-norm support vector machine. In: 11th IEEE International Conference on data mining, pp 91–100
Zhang J, Yu J, Wan J, Zeng Z (2015) $l_{2,1}$-norm regularized fisher criterion for optimal feature selection. Neurocomputing 166:455–463
Article Google Scholar
Zhang M, Ding C, Zhang Y, Nie F (2014) Feature selection at the discrete limit. In: Proceedings of the AAAI Conference on artificial intelligence, vol. 28
Du X, Yan Y, Pan P, Long G, Zhao L (2016) Multiple graph unsupervised feature selection. Signal Process 120:754–760
Article Google Scholar
Shi Y, Miao J, Wang Z, Zhang P, Niu L (2018) Feature selection with $l_{2,1--2}$ regularization. IEEE Trans Neural Netw Learn Syst 29:4967–4982
Article MathSciNet Google Scholar
Shi Y, Miao J, Niu L (2019) Feature selection with mcp$^2$ regularization. Neural Comput Appl 31:6699–6709
Article Google Scholar
Cai X, Nie F, Huang H (2013) Exact top-k feature selection via $l_{2,0}$-norm constraint. In: Twenty-third International Joint Conference on artificial intelligence
Pang T, Nie F, Han J, Li X (2019) Efficient feature selection via $l_{2,0}$-norm constrained sparse regression. IEEE Trans Knowl Data Eng 31:880–893
Article Google Scholar
Wang Z, Nie F, Tian L, Wang R, Li X (2020) Discriminative feature selection via a structured sparse subspace learning module. In: Twenty-ninth International Joint Conference on artificial intelligence, pp 3009–3015
Sun Z, Yu Y (2022) Robust multi-class feature selection via $l_{2,0}$-norm regularization minimization. Intell Data Anal 26(1):57–73
Article Google Scholar
Zhang R, Nie F, Li X (2018) Feature selection under regularized orthogonal least square regression with optimal scaling. Neurocomputing 273(17):547–553
Article Google Scholar
Li Y, Li T, Liu H (2017) Recent advances in feature selection and its applications. Knowl Inf Syst 53:551–577
Article Google Scholar
Bishop CM (1995) Neural networks for pattern recognition. Agric Eng Int Cigr J Sci Res Dev Manuscr Pm 12(5):1235–1242
Google Scholar
Ding C, Peng H (2005) Minimum redundancy feature selection from microarray gene expression data. J Bioinform Comput Biol 3(02):185–205
Article Google Scholar
Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238
Article Google Scholar
Hussain S, Shahzadi F, Munir B (2022) Constrained class-wise feature selection (ccfs). Int J Mach Learn Cybern 13:3211–3224
Article Google Scholar
Tabakhi S, Moradi P (2015) Relevance-redundancy feature selection based on ant colony optimization. Pattern Recogn 48(9):2798–2811
Article Google Scholar
Zhao W, Wang L, Zhang Z (2019) Atom search optimization and its application to solve a hydrogeologic parameter estimation problem. Knowl-Based Syst 163:283–304
Article Google Scholar
Sun L, Si S, Xu J, Zhang Y (2023) Bssfs: binary sparrow search algorithm for feature selection. Int J Mach Learn Cybern 14:2633–2657
Article Google Scholar
Tan J, Zhang Z, Zhen L, Zhang C (2013) Adaptive feature selection via a new version of support vector machine. Neural Comput Appl 23(3–4):937–945
Article Google Scholar
Ye Y, Shao Y, Deng N, Li C, Hua X (2017) Robust $l_p$-norm least squares support vector regression with feature selection. Appl Math Comput 305:32–52
MathSciNet Google Scholar
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodol) 58(1):267–288
Article MathSciNet Google Scholar
Liu J, Chen J, Ye J (2009) Large-scale sparse logistic regression. In: 15th ACM SIGKDD International Conference on knowledge discovery and data mining, pp 547–556
Xie Z, Xu Y (2014) Sparse group lasso based uncertain feature selection. Int J Mach Learn Cybern 5:201–210
Article Google Scholar
Peng Y, Sehdev P, Liu S, Li J, Wang X (2018) $l_{2,1}$-norm minimization based negative label relaxation linear regression for feature selection. Pattern Recogn Lett 116:170–178
Article Google Scholar
Lu Z, Chu Q (2023) Feature selection using class-level regularized self-representation. Appl Intell 53:13130–13144
Article Google Scholar
Tao H, Hou C, Nie F, Jiao Y, Yi D (2015) Effective discriminative feature selection with nontrivial solution. IEEE Trans Neural Netw Learn Syst 27(4):796–808
Article MathSciNet Google Scholar
Yang Z, Ye Q, Chen Q, Ma X, Fu L, Yang G, Yan H, Liu F (2020) Robust discriminant feature selection via joint $l_{2,1}$-norm distance minimization and maximization. Knowl-Based Syst 207:106090
Article Google Scholar
Nie F, Zhang R, Li X (2017) A generalized power iteration method for solving quadratic problem on the Stiefel manifold. SCIENCE CHINA Inf Sci 60:1–10
Article MathSciNet Google Scholar
Powell MJ (1969) A method for nonlinear constraints in minimization problems. Optimization, pp 283–298
Boyd S, Parikh N, Chu E, Peleato B, Eckstein J et al (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends Mach Learn 3(1):1–122
Article Google Scholar
Zuo Z, Li J, Moubayed NA (2021) Curvature-based feature selection with application in classifying electronic health records. Technol Forecast Soc Change 173:121127
Article Google Scholar
Talukdar U, Hazarika SM, Gan JQ (2018) A kernel partial least square based feature selection method. Pattern Recogn 83:91–106
Article Google Scholar
Demiar J, Schuurmans D (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7(1):1–30
MathSciNet Google Scholar

Download references

Acknowledgements

This work is supported by Grants from the Natural Science Foundation of Xiamen (No. 3502Z20227045), the National Natural Science Foundation of China (No. 61873067), and the Scientific Research Funds of Huaqiao University(No. 20211XD066). And this article was partly supported by the Natural Science Foundation of Fujian Province (No. 2022J01317) and the Fundamental Research Funds for the Central Universities of Huaqiao University under Grant ZQN-1115.

Funding

The first author is supported by the Natural Science Foundation of Xiamen (No. 3502Z20227045) and the Scientific Research Funds of Huaqiao University (No. 21BS121).

Author information

Authors and Affiliations

College of Computer Science and Technology, HuaQiao University, Jimei Avenue, Xiamen, 361021, Fujian Province, China
Zhenzhen Sun, Zexiang Chen & Jinghua Liu
Xiamen Key Laboratory of Computer Vision and Pattern Recognition, HuaQiao University, Jimei Avenue, Xiamen, 361021, Fujian Province, China
Zhenzhen Sun
College of Computer and Data Science, Fuzhou University, Wulong Jiangbei Avenue, Fuzhou, 350108, Fujian Province, China
Yuanlong Yu

Authors

Zhenzhen Sun
View author publications
You can also search for this author in PubMed Google Scholar
Zexiang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jinghua Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yuanlong Yu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Zhenzhen Sun or Yuanlong Yu.

Ethics declarations

Conflict of interest

The authors declare that they have no Conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Informed consent

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Sun, Z., Chen, Z., Liu, J. et al. Multi-class feature selection via Sparse Softmax with a discriminative regularization. Int. J. Mach. Learn. & Cyber. (2024). https://doi.org/10.1007/s13042-024-02185-5

Download citation

Received: 12 July 2023
Accepted: 15 April 2024
Published: 05 May 2024
DOI: https://doi.org/10.1007/s13042-024-02185-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-class feature selection via Sparse Softmax with a discriminative regularization

Abstract

Access this article

Similar content being viewed by others

Multi-class Feature Selection Based on Softmax with $$L_{2,0}$$ -Norm Regularization

Feature selection with MCP $$^2$$ regularization

Effective Learning with Joint Discriminative and Representative Feature Selection

Data availability

Code availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Ethical approval

Informed consent

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multi-class feature selection via Sparse Softmax with a discriminative regularization

Abstract

Access this article

Similar content being viewed by others

Multi-class Feature Selection Based on Softmax with $$L_{2,0}$$ -Norm Regularization

Feature selection with MCP $$^2$$ regularization

Effective Learning with Joint Discriminative and Representative Feature Selection

Data availability

Code availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Ethical approval

Informed consent

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation