Skip to main content
Log in

Ensemble constrained Laplacian score for efficient and robust semi-supervised feature selection

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

In this paper, we propose an efficient and robust approach for semi-supervised feature selection, based on the constrained Laplacian score. The main drawback of this method is the choice of the scant supervision information, represented by pairwise constraints. In fact, constraints are proven to have some noise which may deteriorate learning performance. In this work, we try to override any negative effects of constraint set by the variation of their sources. This is achieved by an ensemble technique using both a resampling of data (bagging) and a random subspace strategy. Experiments on high-dimensional datasets are provided for validating the proposed approach and comparing it with other representative feature selection methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. http://perso.univ-lyon1.fr/haytham.elghazel/EnsCLS/EnsCLS.zip.

References

  1. Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald A, Boldrick JC, Sabet H, Tran T, Yu X, Powell JI, Yang L, Marti GE, Moore T, Hudson J, Lu L, Lewis DB, Tibshirani R, Sherlock G, Chan WC, Greiner TC, Weisenburger DD, Armitage JO, Warnke R, Levy R, Wilson W, Grever MR, Byrd JC, Botstein D, Brown PO, Staudt LM (2000) Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403(6769):503–511

    Google Scholar 

  2. Barkia H, Elghazel H, Aussem A (2011) Semi-supervised feature importance evaluation with ensemble learning. In: IEEE ICDM, pp 31–40

  3. Benabdeslem K, Hindawi M (2011) Constrained laplacian score for semi-supervised feature selection. In: Proceedings of ECML-PKDD conference, pp 204–218

  4. Benabdeslem K, Hindawi M (2014) Efficient semi-supervised feature selection: constraint, relevance and redundancy. IEEE Trans Knowl Data Eng 26(5):1131–1143

    Google Scholar 

  5. Frank A, Asuncion A (2010) UCI machine learning repository. Available at http://archive.ics.uci.edu/ml

  6. Breiman L (1996) Bagging predictors. Mach Learn 26(2):123–140

    MATH  Google Scholar 

  7. Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    MathSciNet  MATH  Google Scholar 

  8. Chang CC, Lin CJ (2011) LIBSVM: A library for support vector machines. ACM Trans Intell Syst Technol 2:27:1–27:27. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm

  9. Cormen TH, Stein C, Rivest RL, Leiserson CE (2001) Introduction to algorithms. McGraw-Hill Higher Education, New York

    MATH  Google Scholar 

  10. Davidson I, Wagstaff K, Basu S (2006) Measuring constraint-set utility for partitional clustering algorithms. In: Proceedings of ECML/PKDD

  11. Demsar J (2006) Statistical comparisons of classifiers over multiple datasets. J Mach Learn Res 7:1–30

    MathSciNet  MATH  Google Scholar 

  12. Dietterich T (2000) Ensemble methods in machine learning. In: First international workshop on multiple classifier systems, pp 1–15

  13. Duda RO, Hart PE, Stork DG (2000) Pattern classification. Wiley Interscience, New York

    MATH  Google Scholar 

  14. Dy J, Brodley CE (2004) Feature selection for unsupervised learning. J Mach Learn Res 5:845–889

    MathSciNet  MATH  Google Scholar 

  15. Elghazel H, Aussem A (2015) Unsupervised feature selection with ensemble learning. Mach Learn 98(1–2):157–180

    MathSciNet  MATH  Google Scholar 

  16. Freund Y, Shapire R (1996) Experiments with a new boosting algorithm. In: 13th international conference on machine learning, pp 276–280

  17. Golub T, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov J, Coller H (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–537

    Google Scholar 

  18. Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182

    MATH  Google Scholar 

  19. Guyon I, Gunn S, Nikravesh M, Zadeh LA (2006) Feature extraction: foundations and applications. Series studies in fuzziness and soft computing. Physica-Verlag, Springer, Berlin

    Google Scholar 

  20. Hady MFA, Schwenker F (2010) Combining committee-based semi-supervised learning and active learning. J Comput Sci Technol 25(4):681–698

    MathSciNet  Google Scholar 

  21. He X, Cai D, Niyogi P (2005) Laplacian score for feature selection. Adv Neural Inf Process Syst 17:507–514

  22. Hindawi M, Allab K, Benabdeslem K (2011) Constraint selection based semi-supervised feature selection. In: Proceedings of IEEE ICDM, pp 1080–1085

  23. Hindawi M, Elghazel H, Benabdeslem K (2013) Efficient semi-supervised feature selection by an ensemble approach. In: COPEM@ECML/PKDD. International workshop on complex machine learning problems with ensemble methods, pp 41–55

  24. Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20(8):832–844

    Google Scholar 

  25. Hong Y, Kwong S, Chang Y, Ren Q (2008) Consensus unsupervised feature ranking from multiple views. Pattern Recognit Lett 29(5):595–602

    Google Scholar 

  26. Hong Y, Kwong S, Chang Y, Ren Q (2008) Unsupervised feature selection using clustering ensembles and population based incremental learning algorithm. Pattern Recognit 41(9):2742–2756

    MATH  Google Scholar 

  27. Kalakech M, Biela P, Macaire L, Hamad D (2011) Constraint scores for semi-supervised feature selection: a comparative study. Pattern Recognit Lett 32(5):656–665

    Google Scholar 

  28. Kohonen T (2001) Self organizing map. Springer, Berlin

    MATH  Google Scholar 

  29. Kuncheva LI (2007) A stability index for feature selection. In: Artificial intelligence and applications, pp 421–427

  30. Li M, Zhou ZH (2007) Improve computer-aided diagnosis with machine learning techniques using undiagnosed samples. IEEE Trans Syst Man Cybern 37(6):1088–1098

    Google Scholar 

  31. Saeys Y, Abeel T, de Peer YV (2008) Robust feature selection using ensemble feature selection techniques. In: ECML/PKDD (2), pp 313–325

  32. Singh D, Febbo PG, Ross K, Jackson DG, Manola J, Ladd C, Tamayo P, Renshaw AA, D’Amico AV, Richie JP (2002) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2):203–209

    Google Scholar 

  33. Strehl A, Ghosh J (2002) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617

    MathSciNet  MATH  Google Scholar 

  34. Sun D, Zhang D (2010) Bagging constraint score for feature selection with pairwise constraints. Pattern Recognit 43:2106–2118

    MATH  Google Scholar 

  35. Sun Y, Todorovic S, Goodison S (2010) Local learning based feature selection for high dimensional data analysis. IEEE Trans Pattern Anal Mach Intell 32(9):1610–1626

    Google Scholar 

  36. Topchy A, Jain A, Punch W (2005) Clustering ensembles: models of consensus and weak partitions. IEEE Trans Pattern Anal Mach Intell 27(12):1866–1881

    Google Scholar 

  37. Yaslan Y, Cataltepe Z (2010) Co-training with relevant random subspaces. Neurocomputing 73(10–12):1652–1661

    Google Scholar 

  38. Yu L, Liu H (2003) Feature selection for high-dimensional data: a fast correlation-based filter solution. In: Proceedings of international conference on machine leaning, pp 856–863

  39. Zhang D, Chen S, Zhou Z (2008) Constraint score: a new filter method for feature selection with pairwise constraints. Pattern Recognit 41(5):1440–1451

    MATH  Google Scholar 

  40. Zhao Z, Liu H (2007) Semi-supervised feature selection via spectral analysis. In: Proceedings of SIAM data mining (SDM), pp 641–646

  41. Zhao Z, Morstatter F, Sharma S, Alelyani S, Anand A (2010) Advancing feature selection research—ASU feature selection repository. TR-10-007

Download references

Acknowledgments

We thank anonymous reviewers for their very useful comments and suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Khalid Benabdeslem.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

This work represents an extension of our recently presented idea on the workshop Copem@ECML/PKDD’13 [23].

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Benabdeslem, K., Elghazel, H. & Hindawi, M. Ensemble constrained Laplacian score for efficient and robust semi-supervised feature selection. Knowl Inf Syst 49, 1161–1185 (2016). https://doi.org/10.1007/s10115-015-0901-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-015-0901-0

Keywords

Navigation