Advertisement

Knowledge and Information Systems

, Volume 49, Issue 3, pp 1161–1185 | Cite as

Ensemble constrained Laplacian score for efficient and robust semi-supervised feature selection

  • Khalid Benabdeslem
  • Haytham Elghazel
  • Mohammed Hindawi
Regular Paper

Abstract

In this paper, we propose an efficient and robust approach for semi-supervised feature selection, based on the constrained Laplacian score. The main drawback of this method is the choice of the scant supervision information, represented by pairwise constraints. In fact, constraints are proven to have some noise which may deteriorate learning performance. In this work, we try to override any negative effects of constraint set by the variation of their sources. This is achieved by an ensemble technique using both a resampling of data (bagging) and a random subspace strategy. Experiments on high-dimensional datasets are provided for validating the proposed approach and comparing it with other representative feature selection methods.

Keywords

Feature selection Semi-supervised context Ensemble methods Constraints 

Notes

Acknowledgments

We thank anonymous reviewers for their very useful comments and suggestions.

Compliance with ethical standards

Conflict of interest

The authors declare that they have no conflict of interest.

References

  1. 1.
    Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald A, Boldrick JC, Sabet H, Tran T, Yu X, Powell JI, Yang L, Marti GE, Moore T, Hudson J, Lu L, Lewis DB, Tibshirani R, Sherlock G, Chan WC, Greiner TC, Weisenburger DD, Armitage JO, Warnke R, Levy R, Wilson W, Grever MR, Byrd JC, Botstein D, Brown PO, Staudt LM (2000) Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403(6769):503–511Google Scholar
  2. 2.
    Barkia H, Elghazel H, Aussem A (2011) Semi-supervised feature importance evaluation with ensemble learning. In: IEEE ICDM, pp 31–40Google Scholar
  3. 3.
    Benabdeslem K, Hindawi M (2011) Constrained laplacian score for semi-supervised feature selection. In: Proceedings of ECML-PKDD conference, pp 204–218Google Scholar
  4. 4.
    Benabdeslem K, Hindawi M (2014) Efficient semi-supervised feature selection: constraint, relevance and redundancy. IEEE Trans Knowl Data Eng 26(5):1131–1143Google Scholar
  5. 5.
    Frank A, Asuncion A (2010) UCI machine learning repository. Available at http://archive.ics.uci.edu/ml
  6. 6.
    Breiman L (1996) Bagging predictors. Mach Learn 26(2):123–140MATHGoogle Scholar
  7. 7.
    Breiman L (2001) Random forests. Mach Learn 45(1):5–32MathSciNetMATHGoogle Scholar
  8. 8.
    Chang CC, Lin CJ (2011) LIBSVM: A library for support vector machines. ACM Trans Intell Syst Technol 2:27:1–27:27. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
  9. 9.
    Cormen TH, Stein C, Rivest RL, Leiserson CE (2001) Introduction to algorithms. McGraw-Hill Higher Education, New YorkMATHGoogle Scholar
  10. 10.
    Davidson I, Wagstaff K, Basu S (2006) Measuring constraint-set utility for partitional clustering algorithms. In: Proceedings of ECML/PKDDGoogle Scholar
  11. 11.
    Demsar J (2006) Statistical comparisons of classifiers over multiple datasets. J Mach Learn Res 7:1–30MathSciNetMATHGoogle Scholar
  12. 12.
    Dietterich T (2000) Ensemble methods in machine learning. In: First international workshop on multiple classifier systems, pp 1–15Google Scholar
  13. 13.
    Duda RO, Hart PE, Stork DG (2000) Pattern classification. Wiley Interscience, New YorkMATHGoogle Scholar
  14. 14.
    Dy J, Brodley CE (2004) Feature selection for unsupervised learning. J Mach Learn Res 5:845–889MathSciNetMATHGoogle Scholar
  15. 15.
    Elghazel H, Aussem A (2015) Unsupervised feature selection with ensemble learning. Mach Learn 98(1–2):157–180MathSciNetMATHGoogle Scholar
  16. 16.
    Freund Y, Shapire R (1996) Experiments with a new boosting algorithm. In: 13th international conference on machine learning, pp 276–280Google Scholar
  17. 17.
    Golub T, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov J, Coller H (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–537Google Scholar
  18. 18.
    Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182MATHGoogle Scholar
  19. 19.
    Guyon I, Gunn S, Nikravesh M, Zadeh LA (2006) Feature extraction: foundations and applications. Series studies in fuzziness and soft computing. Physica-Verlag, Springer, BerlinGoogle Scholar
  20. 20.
    Hady MFA, Schwenker F (2010) Combining committee-based semi-supervised learning and active learning. J Comput Sci Technol 25(4):681–698MathSciNetGoogle Scholar
  21. 21.
    He X, Cai D, Niyogi P (2005) Laplacian score for feature selection. Adv Neural Inf Process Syst 17:507–514Google Scholar
  22. 22.
    Hindawi M, Allab K, Benabdeslem K (2011) Constraint selection based semi-supervised feature selection. In: Proceedings of IEEE ICDM, pp 1080–1085Google Scholar
  23. 23.
    Hindawi M, Elghazel H, Benabdeslem K (2013) Efficient semi-supervised feature selection by an ensemble approach. In: COPEM@ECML/PKDD. International workshop on complex machine learning problems with ensemble methods, pp 41–55Google Scholar
  24. 24.
    Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20(8):832–844Google Scholar
  25. 25.
    Hong Y, Kwong S, Chang Y, Ren Q (2008) Consensus unsupervised feature ranking from multiple views. Pattern Recognit Lett 29(5):595–602Google Scholar
  26. 26.
    Hong Y, Kwong S, Chang Y, Ren Q (2008) Unsupervised feature selection using clustering ensembles and population based incremental learning algorithm. Pattern Recognit 41(9):2742–2756MATHGoogle Scholar
  27. 27.
    Kalakech M, Biela P, Macaire L, Hamad D (2011) Constraint scores for semi-supervised feature selection: a comparative study. Pattern Recognit Lett 32(5):656–665Google Scholar
  28. 28.
    Kohonen T (2001) Self organizing map. Springer, BerlinMATHGoogle Scholar
  29. 29.
    Kuncheva LI (2007) A stability index for feature selection. In: Artificial intelligence and applications, pp 421–427Google Scholar
  30. 30.
    Li M, Zhou ZH (2007) Improve computer-aided diagnosis with machine learning techniques using undiagnosed samples. IEEE Trans Syst Man Cybern 37(6):1088–1098Google Scholar
  31. 31.
    Saeys Y, Abeel T, de Peer YV (2008) Robust feature selection using ensemble feature selection techniques. In: ECML/PKDD (2), pp 313–325Google Scholar
  32. 32.
    Singh D, Febbo PG, Ross K, Jackson DG, Manola J, Ladd C, Tamayo P, Renshaw AA, D’Amico AV, Richie JP (2002) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2):203–209Google Scholar
  33. 33.
    Strehl A, Ghosh J (2002) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617MathSciNetMATHGoogle Scholar
  34. 34.
    Sun D, Zhang D (2010) Bagging constraint score for feature selection with pairwise constraints. Pattern Recognit 43:2106–2118MATHGoogle Scholar
  35. 35.
    Sun Y, Todorovic S, Goodison S (2010) Local learning based feature selection for high dimensional data analysis. IEEE Trans Pattern Anal Mach Intell 32(9):1610–1626Google Scholar
  36. 36.
    Topchy A, Jain A, Punch W (2005) Clustering ensembles: models of consensus and weak partitions. IEEE Trans Pattern Anal Mach Intell 27(12):1866–1881Google Scholar
  37. 37.
    Yaslan Y, Cataltepe Z (2010) Co-training with relevant random subspaces. Neurocomputing 73(10–12):1652–1661Google Scholar
  38. 38.
    Yu L, Liu H (2003) Feature selection for high-dimensional data: a fast correlation-based filter solution. In: Proceedings of international conference on machine leaning, pp 856–863Google Scholar
  39. 39.
    Zhang D, Chen S, Zhou Z (2008) Constraint score: a new filter method for feature selection with pairwise constraints. Pattern Recognit 41(5):1440–1451MATHGoogle Scholar
  40. 40.
    Zhao Z, Liu H (2007) Semi-supervised feature selection via spectral analysis. In: Proceedings of SIAM data mining (SDM), pp 641–646Google Scholar
  41. 41.
    Zhao Z, Morstatter F, Sharma S, Alelyani S, Anand A (2010) Advancing feature selection research—ASU feature selection repository. TR-10-007Google Scholar

Copyright information

© Springer-Verlag London 2015

Authors and Affiliations

  • Khalid Benabdeslem
    • 1
  • Haytham Elghazel
    • 1
  • Mohammed Hindawi
    • 2
  1. 1.University of Lyon1 - LIRISVilleurbanneFrance
  2. 2.Computer Science DepartmentZirve UniversityGaziantepTurkey

Personalised recommendations