Abstract
Feature selection, semi-supervised learning and multi-label classification are different challenges for machine learning and data mining communities. While other works have addressed each of these problems separately, in this paper we show how they can be addressed together. We propose a unified framework for semi-supervised multi-label feature selection, based on Laplacian score. In particular, we show how to constrain the function of this score, when data are partially labeled and each instance is associated with a set of labels. We transform the labeled part of data into soft constraints and show how to integrate them in a measure of feature relevance, according to the available labels. Experiments on benchmark data sets are provided for validating the proposed approach and comparing it with some other state-of-the-art feature selection methods in a multi-label context.
Similar content being viewed by others
References
Barutcuoglu Z, Schapire RE, Troyanskaya OG (2006) Hierarchical multi-label prediction of gene function. Bioinformatics 22(7):830–836
Benabdeslem K, Hindawi M (2011) Constrained Laplacian score for semi-supervised feature selection. In: Proceedings of ECML-PKDD conference, pp 204–218
Benabdeslem K, Hindawi M (2014) Efficient semi-supervised feature selection: constraint, relevance and redundancy. IEEE Trans Knowl Data Eng 26(5):1131–1143
Boutell MR, Luo J, Shen X, Brown CM (2004) Learning multi-label scene classification. Pattern Recognit 37(9):1757–1771
Briggs F, Lakshminarayanan B, Neal L, Fern XZ, Raich R, Hadley SJ, Hadley AS, Betts MG (2012) Acoustic classification of multiple simultaneous bird species: a multi-instance multi-label approach. J Acoust Soc Am 131(6):4640–4650
Cheng W, Hüllermeier E (2009) Combining instance-based learning and logistic regression for multi-label classification. Mach Learn 76(2–3):211–225
Dash M, Liu H (1997) Feature selection for classification. Intell Data Anal 1(3):131–156
Dembczyński K, Waegeman W, Cheng W, Hüllermeier E (2012) On label dependence and loss minimization in multi-label classification. Mach Learn 88(1–2):5–45
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
Doquire G, Verleysen M (2013) Mutual information-based feature selection for multilabel classification. Neurocomputing 122:148–155
Dy JG, Brodley CE (2004) Feature selection for unsupervised learning. J Mach Learn Res 5:845–889
Elisseeff A, Weston J (2001) A kernel method for multi-labelled classification. Proc NIPS 14:681–687
García S, Fernández A, Luengo J, Herrera F (2010) Advanced non-parametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power. Inf Sci 180(10):2044–2064
García S, Luengo J, Herrera F (2015) Data preprocessing in data mining. Springer, New York
Gu Q, Li Z, Han J (2011) Correlated multi-label feature selection. In: Proceedings of the 20th ACM international conference on Information and knowledge management, ACM, pp 1087–1096
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182
He X, Cai D, Niyogi P (2005) Laplacian score for feature selection. In: Proceeding of NIPS
Hindawi M, Allab K, Benabdeslem K (2011) Constraint selection based semi-supervised feature selection. In: Proceedings of international conference on data mining, pp 1080–1085
Kalakech M, Biela P, Macaire L, Hamad D (2011) Constraint scores for semi-supervised feature selection: a comparative study. Pattern Recognit Lett 32(5):656–665
Kohonen T (2001) Self organizing map. Springer, Berlin
Lee J, Kim DW (2013) Feature selection for multi-label classification using multivariate mutual information. Pattern Recognit Lett 34(3):349–357
Lee J, Kim DW (2015) Memetic feature selection algorithm for multi-label classification. Inf Sci 293:80–96
Qi GJ, Hua XS, Rui Y, Tang J, Mei T, Zhang HJ (2007) Correlative multi-label video annotation. In: Proceedings of the 15th international conference on multimedia, ACM, pp 17–26
Qian B, Davidson I (2010) Semi-supervised dimension reduction for multi-label classification. In: Proceedings of AAAI
Read J (2008) A pruned problem transformation method for multi-label classification. In: Proceedings of New Zealand computer science research student conference, pp 143–150
Read J, Pfahringer B, Holmes G, Frank E (2011) Classifier chains for multi-label classification. Mach Learn 85(3):333–359
Read J, Bifet A, Holmes G, Pfahringer B (2012) Scalable and efficient multi-label classification for evolving data streams. Mach Learn 88(1–2):243–272
Salton G (1991) Developments in automatic text retrieval. Science 253(5023):974–980
Schapire RE, Singer Y (2000) Boostexter: a boosting-based system for text categorization. Mach Learn 39(2–3):135–168
Trohidis K, Tsoumakas G, Kalliris G, Vlahavas IP (2008) Multi-label classification of music into emotions. ISMIR 8:325–330
Tsoumakas G, Vlahavas I (2007) Random k-labelsets: an ensemble method for multi-label classification. In: Machine learnin, ECML. Springer, Berlin, pp 406–417
Tsoumakas G, Katakis I (2007) Multi-label classification: an overview. IJDWM 3(3):1–13
Tsoumakas G, Katakis I, Vlahavas I (2008) Effective and efficient multi- label classification in domains with large number of labels. In: Proceedings of ECML/PKDD 2008 workshop on mining multidimensional data (MMD’08), Antwerp, Belgium
Tsoumakas G, Katakis I, Vlahavas I (2009) Mining multi-label data. In: Data mining and knowledge discovery handbook. Springer, US, pp 667–685
Xu J (2013) Fast multi-label core vector machine. Pattern Recognit 46(3):885–898
Zhang D, Chen S, Zhou Z (2008) Constraint score: a new filter method for feature selection with pairwise constraints. Pattern Recognit 41(5):1440–1451
Zhang D, Zhou Z, Chen, S (2007) Semi-supervised dimensionality reduction. In: Proceedings of SIAM international conference on data mining
Zhang ML, Zhou ZH (2007) Ml-knn: a lazy learning approach to multi-label learning. Pattern Recognit 40(7):2038–2048
Zhang Y, Zhou ZH (2010) Multilabel dimensionality reduction via dependence maximization. ACM Trans Knowl Discov Data (TKDD) 4(3):14
Zhao Z, Liu H (2007) Semi-supervised feature selection via spectral analysis. In: Proceedings of SDM, SIAM
Acknowledgments
We thank anonymous reviewers for their very useful comments and suggestions. This work was done in LIRIS, Lab. CNRS 5205 in Lyon1 University. The work was supported by an Algerian Research Scholarship (PNE: Programme National Exceptionnel).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Alalga, A., Benabdeslem, K. & Taleb, N. Soft-constrained Laplacian score for semi-supervised multi-label feature selection. Knowl Inf Syst 47, 75–98 (2016). https://doi.org/10.1007/s10115-015-0841-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-015-0841-8