Abstract
Multi-label feature selection can effectively improve the performance and efficiency of subsequent learning tasks by selecting important features within multi-label data. However, for handling multiple labels, many approaches group them to gather insights for label fusion, but ignore the different importance of these label groups and treat them equally, which seems unfair to individual label groups and fails to consider their distinct significances. Moreover, for handling the relationship between features and labels, many multi-label feature selection methods efficiently achieve linear fitting of features and labels through manifold learning, but ignore fitting spatial distribution between feature space and label space. Motivated by these, this paper integrates label distribution learning and spectral clustering to evaluate the unique significance of each label group and construct an improved label space, which is then aligned with the feature space through manifold distribution consistency for multi-label feature selection. First, we propose a hypothetical model indicating the existence of a relationship among labels, wherein this relationship involves clustering subordinate labels around a central core label. On this basis, we employ spectral clustering to generate distinct label clusters by integrating density peaks, thereafter combining this with label distribution learning to assess the significance of each cluster. Then, we design a manifold distribution consistency evaluation, i.e., quantifying the structural disparity between feature space and the enhanced label space achieved through spectral clustering-based label enhancement strategy, so as to obtain a low-dimensional feature space and the optimal feature subset. Finally, experimental results showcase the superiority of our proposed multi-label feature selection algorithm when compared with five other algorithms, across several datasets from diverse domains.
Similar content being viewed by others
Data availability
Data is provided within the manuscript.
References
Al-Salemi B, Noah SAM, Ab Aziz MJ (2016) RFBoost: an improved multi-label boosting algorithm and its application to text categorisation. Knowl-Based Syst 103:104–117
Al-Salemi B, Ayob M, Noah SAM (2018) Feature ranking for enhancing boosting-based multi-label text categorization. Expert Syst Appl 113:531–543
Burkhardt S, Kramer S (2018) Online multi-label dependency topic models for text classification. Mach Learn 107:859–886
Gargiulo F, Silvestri S, Ciampi M et al (2019) Deep neural network for hierarchical extreme multi-label text classification. Appl Soft Comput 79:125–138
Liu Y, Wen KW, Gao QX et al (2018) SVM based multi-label learning with missing labels for image annotation. Pattern Recognit 78:307–317
Su JH, Chou CL, Lin CY et al (2011) Effective semantic annotation by image-to-concept distribution model. IEEE Trans Multimed 13(3):530–538
Song LY, Liu J, Qian BY et al (2018) A deep multi-modal CNN for multi-instance multi-label image classification. IEEE Trans Image Process 27(12):6025–6038
Fakhari A, Moghadam AME (2013) Combination of classification and regression in decision tree for multi-labeling image annotation and retrieval. Appl Soft Comput 13(2):1292–1302
Elisseeff A, Weston JA (2001) Kernel method for multi-labelled classification. In: Advances in international conference on neural information processing systems: natural and synthetic, pp 681–687
Liu L, Tang L, Jin X et al (2019) A multi-label supervised topic model conditioned on arbitrary features for gene function prediction. Genes 10(1):57
Zhang JP, Zhang ZP, Wang ZX et al (2018) Ontological function annotation of long non-coding RNAs through hierarchical multi-label classification. Bioinformatics 34(10):1750–1757
Xu YH, Min HQ, Song HJ et al (2016) Multi-instance multi-label distance metric learning for genome-wide protein function prediction. Comput Biol Chem 63:30–40
Del Giudice M (2021) Effective dimensionality: a tutorial. Multivar Behav Res 56(3):527–542
Zhang M, Zhou Z (2014) A review on multi-label learning algorithms. IEEE Trans Knowl Data Eng 26(8):1819–1837
Newton S, Cherman EA, Monard MC et al (2013) A comparison of multi-label feature selection methods using the problem transformation approach. Electron Notes Theor Comput Sci 292:135–151
Lee J, Kim DW (2013) Feature selection for multi-label classification using multivariate mutual information. Pattern Recognit Lett 34(3):349–357
Rahmaninia M, Moradi P (2018) OSFSMI: online stream feature selection method based on mutual information. Appl Soft Comput 68:733–746
Lee J, Kim DW (2015) Fast multi-label feature selection based on information-theoretic feature ranking. Pattern Recognit 48(9):2761–2771
Cai Y, Yang M, Gao Y et al (2015) ReliefF-based multi-label feature selection. Int J Database Theory Appl 8:307–318
Kononenko I (1994) Estimating attributes: analysis and extensions of RELIEF. In: European conference on machine learning, pp 171–182
Xin G (2016) Label distribution learning. IEEE Trans Knowl Data Eng 28(7):1734–1748
Qian W, Long X, Wang Y et al (2020) Multi-label feature selection based on label distribution and feature complementarity. Appl Soft Comput 90:106167
Geng X, Xia Y et al (2022) Head pose estimation based on multivariate label distribution. IEEE Trans Pattern Anal Mach Intell 44(4):1974–1991
He JH, Hu CL, Wang LJ (2023) Facial age estimation based on asymmetrical label distribution. Multimed Syst 29(2):753–762
Chen JY, Guo C, Xu RY et al (2022) Toward children’s empathy ability analysis: joint facial expression recognition and intensity estimation using label distribution learning. IEEE Trans Ind Inform 18(1):16–25
Xu N, Liu YP, Geng X (2021) Label enhancement for label distribution learning. IEEE Trans Knowl Data Eng 33(4):1632–1643
Xu N, Shu J, Liu YP et al (2020) Variational label enhancement. In: Proceedings of the 37th international conference on machine learning, vol 119, pp 10597–10606
Xu N, Qiao C, Geng X et al (2021) Instance-dependent partial label learning. Adv Neural Inf Process Syst 34:27119–27130
Xu N, Qiao C, Lv J et al (2022) One positive label is sufficient: single-positive multi-label learning with label enhancement. Adv Neural Inf Process Syst 35:21765–21776
Zhang P, Gao W, Hu J et al (2020) Multi-label feature selection based on the division of label topics. Inf Sci 553(10):129–153
Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496
Liu R, Huang W, Fei Z et al (2019) Constraint-based clustering by fast search and find of density peaks. Neurocomputing 330:223–237
Hu Q, Zhang L, Zhang D et al (2011) Measuring relevance between discrete and continuous features based on neighborhood mutual information. Expert Syst Appl 38(9):10737–10750
Wang T, Ji ZX, Yang J et al (2021) Global manifold learning for interactive image segmentation. IEEE Trans Multimed 23:3239–3249
Tan C, Chen S, Ji GL et al (2022) Multilabel distribution learning based on multioutput regression and manifold learning. IEEE Trans Cybern 52(6):5064–5078
Eybpoosh K, Rezghi M, Heydari A (2022) Applying inverse stereographic projection to manifold learning and clustering. Appl Intell 52(4):4443–4457
Cai Z, Zhu W (2018) Multi-label feature selection via feature manifold learning and sparsity regularization. Int J Mach Learn Cybern 9(8):1321–1334
Hu J, Li Y, Gao W et al (2020) Robust multi-label feature selection with dual-graph regularization. Knowl-Based Syst 203:106126
Jian L, Li J, Shu K et al (2016) Multi-label informed feature selection. In: International Joint Conference on Artificial Intelligence, pp 1627–1633
Hashemi A, Dowlatshahi MB, Nezamabadi-pour H (2020) MFS-MCDM: multi-label feature selection using multi-criteria decision making. Knowl-Based Syst 206:106365
Tsoumakas G, Spyromitros-Xioufis E, Vilcek J et al (2011) MULAN: a java library for multi-label learning. J Mach Learn Res 12:2411–2414
Multi-label classification dataset repository. http://www.uco.es/kdis/mllresources
Zhang ML, Zhou ZH (2007) ML-KNN: a lazy learning approach to multi-label learning. Pattern Recognit 40(7):2038–2048
Chen L, Chen D, Wang H (2018) Alignment based feature selection for multi-label learning. Neural Process Lett 50:2323–2344
Yu K, Yu S, Tresp V (2005) Multi-label informed latent semantic indexing. In: Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval, pp 258–265
Friedman M (1940) A comparison of alternative tests of significance for the problem of m rankings. Ann Math Stat 11(1):86–92
Acknowledgements
This work is supported by National Natural Science Foundation of China (Nos. 62266018 and 61966016), Natural Science Foundation of Jiangxi Province (No. 20232BAB202052), and Jiangxi Postgraduate Innovation Fund Project (YC2022-s547).
Author information
Authors and Affiliations
Contributions
Wenhao Shu: Conceptualization, Formal analysis; Dongtao Cao: Data curation, Software, Writing; Wenbin Qian: Visualization, Review.
Corresponding author
Ethics declarations
Conflict of interest
All the authors do not have any possible Conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Shu, W., Cao, D. & Qian, W. Multi-label feature selection via spectral clustering-based label enhancement and manifold distribution consistency. Int. J. Mach. Learn. & Cyber. (2024). https://doi.org/10.1007/s13042-024-02181-9
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13042-024-02181-9