Advertisement

Knowledge and Information Systems

, Volume 47, Issue 1, pp 75–98 | Cite as

Soft-constrained Laplacian score for semi-supervised multi-label feature selection

  • Abdelouahid Alalga
  • Khalid Benabdeslem
  • Nora Taleb
Regular Paper

Abstract

Feature selection, semi-supervised learning and multi-label classification are different challenges for machine learning and data mining communities. While other works have addressed each of these problems separately, in this paper we show how they can be addressed together. We propose a unified framework for semi-supervised multi-label feature selection, based on Laplacian score. In particular, we show how to constrain the function of this score, when data are partially labeled and each instance is associated with a set of labels. We transform the labeled part of data into soft constraints and show how to integrate them in a measure of feature relevance, according to the available labels. Experiments on benchmark data sets are provided for validating the proposed approach and comparing it with some other state-of-the-art feature selection methods in a multi-label context.

Keywords

Feature selection Semi-supervised context Multi-label learning Constraints 

Notes

Acknowledgments

We thank anonymous reviewers for their very useful comments and suggestions. This work was done in LIRIS, Lab. CNRS 5205 in Lyon1 University. The work was supported by an Algerian Research Scholarship (PNE: Programme National Exceptionnel).

References

  1. 1.
    Barutcuoglu Z, Schapire RE, Troyanskaya OG (2006) Hierarchical multi-label prediction of gene function. Bioinformatics 22(7):830–836CrossRefGoogle Scholar
  2. 2.
    Benabdeslem K, Hindawi M (2011) Constrained Laplacian score for semi-supervised feature selection. In: Proceedings of ECML-PKDD conference, pp 204–218Google Scholar
  3. 3.
    Benabdeslem K, Hindawi M (2014) Efficient semi-supervised feature selection: constraint, relevance and redundancy. IEEE Trans Knowl Data Eng 26(5):1131–1143CrossRefGoogle Scholar
  4. 4.
    Boutell MR, Luo J, Shen X, Brown CM (2004) Learning multi-label scene classification. Pattern Recognit 37(9):1757–1771CrossRefGoogle Scholar
  5. 5.
    Briggs F, Lakshminarayanan B, Neal L, Fern XZ, Raich R, Hadley SJ, Hadley AS, Betts MG (2012) Acoustic classification of multiple simultaneous bird species: a multi-instance multi-label approach. J Acoust Soc Am 131(6):4640–4650CrossRefGoogle Scholar
  6. 6.
    Cheng W, Hüllermeier E (2009) Combining instance-based learning and logistic regression for multi-label classification. Mach Learn 76(2–3):211–225CrossRefGoogle Scholar
  7. 7.
    Dash M, Liu H (1997) Feature selection for classification. Intell Data Anal 1(3):131–156CrossRefGoogle Scholar
  8. 8.
    Dembczyński K, Waegeman W, Cheng W, Hüllermeier E (2012) On label dependence and loss minimization in multi-label classification. Mach Learn 88(1–2):5–45MathSciNetCrossRefMATHGoogle Scholar
  9. 9.
    Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30MathSciNetMATHGoogle Scholar
  10. 10.
    Doquire G, Verleysen M (2013) Mutual information-based feature selection for multilabel classification. Neurocomputing 122:148–155CrossRefMATHGoogle Scholar
  11. 11.
    Dy JG, Brodley CE (2004) Feature selection for unsupervised learning. J Mach Learn Res 5:845–889MathSciNetMATHGoogle Scholar
  12. 12.
    Elisseeff A, Weston J (2001) A kernel method for multi-labelled classification. Proc NIPS 14:681–687Google Scholar
  13. 13.
    García S, Fernández A, Luengo J, Herrera F (2010) Advanced non-parametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power. Inf Sci 180(10):2044–2064CrossRefGoogle Scholar
  14. 14.
    García S, Luengo J, Herrera F (2015) Data preprocessing in data mining. Springer, New YorkCrossRefGoogle Scholar
  15. 15.
    Gu Q, Li Z, Han J (2011) Correlated multi-label feature selection. In: Proceedings of the 20th ACM international conference on Information and knowledge management, ACM, pp 1087–1096Google Scholar
  16. 16.
    Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182MATHGoogle Scholar
  17. 17.
    He X, Cai D, Niyogi P (2005) Laplacian score for feature selection. In: Proceeding of NIPSGoogle Scholar
  18. 18.
    Hindawi M, Allab K, Benabdeslem K (2011) Constraint selection based semi-supervised feature selection. In: Proceedings of international conference on data mining, pp 1080–1085Google Scholar
  19. 19.
    Kalakech M, Biela P, Macaire L, Hamad D (2011) Constraint scores for semi-supervised feature selection: a comparative study. Pattern Recognit Lett 32(5):656–665CrossRefGoogle Scholar
  20. 20.
    Kohonen T (2001) Self organizing map. Springer, BerlinCrossRefMATHGoogle Scholar
  21. 21.
    Lee J, Kim DW (2013) Feature selection for multi-label classification using multivariate mutual information. Pattern Recognit Lett 34(3):349–357CrossRefGoogle Scholar
  22. 22.
    Lee J, Kim DW (2015) Memetic feature selection algorithm for multi-label classification. Inf Sci 293:80–96CrossRefGoogle Scholar
  23. 23.
    Qi GJ, Hua XS, Rui Y, Tang J, Mei T, Zhang HJ (2007) Correlative multi-label video annotation. In: Proceedings of the 15th international conference on multimedia, ACM, pp 17–26Google Scholar
  24. 24.
    Qian B, Davidson I (2010) Semi-supervised dimension reduction for multi-label classification. In: Proceedings of AAAIGoogle Scholar
  25. 25.
    Read J (2008) A pruned problem transformation method for multi-label classification. In: Proceedings of New Zealand computer science research student conference, pp 143–150Google Scholar
  26. 26.
    Read J, Pfahringer B, Holmes G, Frank E (2011) Classifier chains for multi-label classification. Mach Learn 85(3):333–359MathSciNetCrossRefGoogle Scholar
  27. 27.
    Read J, Bifet A, Holmes G, Pfahringer B (2012) Scalable and efficient multi-label classification for evolving data streams. Mach Learn 88(1–2):243–272MathSciNetCrossRefGoogle Scholar
  28. 28.
    Salton G (1991) Developments in automatic text retrieval. Science 253(5023):974–980MathSciNetCrossRefGoogle Scholar
  29. 29.
    Schapire RE, Singer Y (2000) Boostexter: a boosting-based system for text categorization. Mach Learn 39(2–3):135–168CrossRefMATHGoogle Scholar
  30. 30.
    Trohidis K, Tsoumakas G, Kalliris G, Vlahavas IP (2008) Multi-label classification of music into emotions. ISMIR 8:325–330Google Scholar
  31. 31.
    Tsoumakas G, Vlahavas I (2007) Random k-labelsets: an ensemble method for multi-label classification. In: Machine learnin, ECML. Springer, Berlin, pp 406–417Google Scholar
  32. 32.
    Tsoumakas G, Katakis I (2007) Multi-label classification: an overview. IJDWM 3(3):1–13Google Scholar
  33. 33.
    Tsoumakas G, Katakis I, Vlahavas I (2008) Effective and efficient multi- label classification in domains with large number of labels. In: Proceedings of ECML/PKDD 2008 workshop on mining multidimensional data (MMD’08), Antwerp, BelgiumGoogle Scholar
  34. 34.
    Tsoumakas G, Katakis I, Vlahavas I (2009) Mining multi-label data. In: Data mining and knowledge discovery handbook. Springer, US, pp 667–685Google Scholar
  35. 35.
    Xu J (2013) Fast multi-label core vector machine. Pattern Recognit 46(3):885–898CrossRefMATHGoogle Scholar
  36. 36.
    Zhang D, Chen S, Zhou Z (2008) Constraint score: a new filter method for feature selection with pairwise constraints. Pattern Recognit 41(5):1440–1451CrossRefMATHGoogle Scholar
  37. 37.
    Zhang D, Zhou Z, Chen, S (2007) Semi-supervised dimensionality reduction. In: Proceedings of SIAM international conference on data miningGoogle Scholar
  38. 38.
    Zhang ML, Zhou ZH (2007) Ml-knn: a lazy learning approach to multi-label learning. Pattern Recognit 40(7):2038–2048CrossRefMATHGoogle Scholar
  39. 39.
    Zhang Y, Zhou ZH (2010) Multilabel dimensionality reduction via dependence maximization. ACM Trans Knowl Discov Data (TKDD) 4(3):14Google Scholar
  40. 40.
    Zhao Z, Liu H (2007) Semi-supervised feature selection via spectral analysis. In: Proceedings of SDM, SIAMGoogle Scholar

Copyright information

© Springer-Verlag London 2015

Authors and Affiliations

  • Abdelouahid Alalga
    • 1
  • Khalid Benabdeslem
    • 2
  • Nora Taleb
    • 1
  1. 1.Department of Computer ScienceUniversity of Badji-MokhtarAnnabaAlgeria
  2. 2.University of Lyon1-LIRISVilleurbanneFrance

Personalised recommendations