Skip to main content
Log in

A new feature selection approach based on ensemble methods in semi-supervised classification

  • Theoretical Advances
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

In computer aided medical system, many practical classification applications are confronted to the massive multiplication of collection and storage of data, this is especially the case in areas such as the prediction of medical test efficiency, the classification of tumors and the detection of cancers. Data with known class labels (labeled data) can be limited but unlabeled data (with unknown class labels) are more readily available. Semi-supervised learning deals with methods for exploiting the unlabeled data in addition to the labeled data to improve performance on the classification task. In this paper, we consider the problem of using a large amount of unlabeled data to improve the efficiency of feature selection in large dimensional datasets, when only a small set of labeled examples is available. We propose a new semi-supervised feature evaluation method called Optimized co-Forest for Feature Selection (OFFS) that combines ideas from co-forest and the embedded principle of selecting in Random Forest based by the permutation of out-of-bag set. We provide empirical results on several medical and biological benchmark datasets, indicating an overall significant improvement of OFFS compared to four other feature selection approaches using filter, wrapper and embedded manner in semi-supervised learning. Our method proves its ability and effectiveness to select and measure importance to improve the performance of the hypothesis learned with a small amount of labeled samples by exploiting unlabeled samples.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. A bootstrap sample L is, for example, obtained by randomly drawing n observations with replacement from the training sample \(L_n\) each observation with a probability 1/n to be drawn.

References

  1. Aha DW, Bankert RL (1996) A comparative evaluation of sequential feature selection algorithms. In: Fisher DH, Lenz HJ (eds) Learning from data: artificial intelligence and Statistics V, Lecture Notes in Statistics, chap 4, pp 199–206. Springer-Verlag, 175 Fifth Avenue, New York, 10010, USA

  2. Amit Y, Geman D (1997) Shape quantization and recognition with randomized trees. Neural Comput 9(7):1545–1588

    Article  Google Scholar 

  3. Battiti R (1994) Using mutual information for selecting features in supervised neural net learning. Trans Neural Netw 5(4):537–550. doi:10.1109/72.298224

    Article  Google Scholar 

  4. Bellal F, Elghazel H, Aussem A (2012) A semi-supervised feature ranking method with ensemble learning. Pattern Recogn Lett 33(10):1426–1432. doi:10.1016/j.patrec.2012.03.001

    Article  Google Scholar 

  5. Benabdeslem K, Hindawi M (2013) Efficient semi-supervised feature selection: constraint, relevance and redundancy. IEEE Trans Knowl Data Eng 26(5):1131–1143

    Article  Google Scholar 

  6. Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. In: Proceedings of the eleventh annual conference on computational learning theory. COLT’ 98NY, USA, New York, pp 92–100

  7. Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140. doi:10.1023/A:1018054314350

    MATH  Google Scholar 

  8. Breiman L (2001) Random forests. Mach Learn 45:5–32

    Article  MATH  Google Scholar 

  9. Cai D, He X, Zhou K, Han J, Bao H (2007) Locality sensitive discriminant analysis. In: Proceedings of the 20th international joint conference on artificial intelligence, IJCAI’07, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp 708–713

  10. Chang CC, Lin CJ (2011) Libsvm: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):27. doi:10.1145/1961189.1961199

    Article  Google Scholar 

  11. Cheng Y, Cai Y, Sun Y, Li J (2008) Semi-supervised feature selection under logistic i-relief framework. In: ICPR IEEE, pp 1–4

  12. Cun YL, Denker JS, Solla SA (1990) Optimal brain damage. In: Advances in neural information processing systems, pp 598–605. Morgan Kaufmann

  13. Deng C, Guo M (2011) A new co-training-style random forest for computer aided diagnosis. J Intell Inf Syst 36(3):253–281

    Article  Google Scholar 

  14. Doquire G, Verleysen M (2011) Graph laplacian for semi-supervised feature selection in regression problems. In: Cabestany J, Rojas I, Caparrs GJ (eds) IWANN (1), Lecture Notes in Computer Science, vol 6691. Springer, pp 248–255

  15. Dy JG, Brodley CE (2004) Feature selection for unsupervised learning. J Mach Learn Res 5:845–889

    MathSciNet  MATH  Google Scholar 

  16. Eiger AM, Nadler B, Spiegelman C (2013) The calibrated kolmogorov–smirnov test. http://arxiv.org/abs/1311.3190. Cite arxiv:1311.3190

  17. Elghazel H, Aussem A (2010) Feature selection for unsupervised learning using random cluster ensembles. In: 2013 IEEE 13th international conference on data mining, pp 168–175

  18. Furey TS, Cristianini N, Duffy N, Bednarski DW, Schummer M, Haussler D (2000) Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16(10):906–914 (Evaluation Studies)

  19. Goldberg DE, Deb K (1991) A comparative analysis of selection schemes used in genetic algorithms. In: Foundations of genetic algorithms. Morgan Kaufmann, pp 69–93

  20. Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182

    MATH  Google Scholar 

  21. Han J, Kamber M, Pei J (2011) Data Mining: concepts and techniques, 3rd edn. Morgan Kaufmann Publishers Inc., San Francisco

    MATH  Google Scholar 

  22. Hindawi M, Benabdeslem K (2013) Local-to-global semi-supervised feature selection. In: He Q, Iyengar A, Nejdl W, Pei J, Rastogi R (eds) CIKM. ACM, pp 2159–2168

  23. Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20(8):832–844. doi:10.1109/34.709601

    Article  Google Scholar 

  24. Hong Y, Kwong S, Chang Y, Ren Q (2008) Unsupervised feature selection using clustering ensembles and population based incremental learning algorithm. Pattern Recogn 41(9):2742–2756. doi:10.1016/j.patcog.2008.03.007

    Article  MATH  Google Scholar 

  25. Hua J, Tembe WD, Dougherty ER (2009) Performance of feature-selection methods in the classification of high-dimension data. Pattern Recogn 42(3):409–424. doi:10.1016/j.patcog.2008.08.001

    Article  MATH  Google Scholar 

  26. John GH, Kohavi R, Pfleger K (1994) Irrelevant features and the subset selection problem. In: Machine learning: proceedings of the eleventh international. Morgan Kaufmann, pp 121–129

  27. Kallakech M, Biela P, Macaire L, Hamad D (2011) Constraint scores for semi-supervised feature selection: a comparative study. Pattern Recogn Lett 32(5):656–665

    Article  Google Scholar 

  28. Kira K, Rendell LA (1992) A practical approach to feature selection. In: Proceedings of the ninth international workshop on machine learning, ML92. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp 249–256

  29. Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1–2):273–324. doi:10.1016/S0004-3702(97)00043-X

    Article  MATH  Google Scholar 

  30. Kong X, Yu PS (2010) Semi-supervised feature selection for graph classification. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’10. ACM, New York, NY, USA, pp 793–802. doi:10.1145/1835804.1835905

  31. Kuncheva LI (2007) A stability index for feature selection. In: Proceedings of the 25th conference on proceedings of the 25th IASTED international multi-conference: artificial intelligence and applications, AIAP’07. ACTA Press, Anaheim, CA, USA, pp 390–395

  32. Leskes B, Torenvliet L (2008) The value of agreement a new boosting algorithm. J Comput Syst Sci 74(4):557–586. doi:10.1016/j.jcss.2007.06.005

    Article  MathSciNet  MATH  Google Scholar 

  33. Li M, Zhou ZH (2007) Improve computer-aided diagnosis with machine learning techniques using undiagnosed samples. Trans Sys Man Cyber Part A 37(6):1088–1098. doi:10.1109/TSMCA.2007.904745

    Article  Google Scholar 

  34. Liu H, Motoda H (1998) Feature extraction, construction and selection: a data mining perspective. Kluwer Academic Publishers, Norwell

    Book  MATH  Google Scholar 

  35. Liu H, Motoda H (2007) Computational methods of feature selection (Chapman & Hall/Crc Data Mining and Knowledge Discovery Series). Chapman & Hall/CRC

  36. Mitchell TM (1999) The role of unlabeled data in supervised learning. In: Proceedings of the sixth international colloquium on cognitive science. San Sebastian, Spain

  37. Mitra P, Murthy CA, Pal SK (2002) Unsupervised feature selection using feature similarity. IEEE Trans Pattern Anal Mach Intell 24(3):301–312. doi:10.1109/34.990133

    Article  Google Scholar 

  38. Miyahara K, Pazzani MJ (2000) Collaborative filtering with the simple bayesian classifier. In: Proceedings of the 6th Pacific Rim international conference on artificial intelligence, PRICAI’00. Springer-Verlag, Berlin, pp 679–689

  39. Nakatani Y, Zhu K, Uehara K (2007) Semisupervised learning using feature selection based on maximum density subgraphs. Syst Comput Jpn 38(9):32–43. doi:10.1002/scj.20757

    Article  Google Scholar 

  40. Newman D, Hettich S, Blake C, Merz C (1998) Uci repository of machine learning databases. http://www.ics.uci.edu/~mlearn/MLRepository.html

  41. Nigam K, Ghani R (2000) Analyzing the effectiveness and applicability of co-training. In: Proceedings of the ninth international conference on information and knowledge management, CIKM ’00. ACM, New York, NY, USA, pp 86–93. doi:10.1145/354756.354805

  42. Press WH, Teukolsky SA (1992) In: Vetterling WT, Flannery BP (eds) Numerical recipes in C (2nd ed.): the art of scientific computing. Cambridge University Press, New York

  43. Ren J, Qiu Z, Fan W, Cheng H, Yu PS (2008) Forward semi-supervised feature selection. In: Proceedings of the 12th Pacific-Asia conference on advances in knowledge discovery and data mining, PAKDD’08. Springer-Verlag, Berlin, pp 970–976

  44. Saeys Y, Inza In, Larrañaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23(19):2507–2517. doi:10.1093/bioinformatics/btm344

  45. Stoppiglia H, Dreyfus G, Dubois R, Oussar Y (2003) Ranking a random feature for variable and feature selection. J Mach Learn Res 3:1399–1414

    MATH  Google Scholar 

  46. Sun D, Zhang D (2010) Bagging constraint score for feature selection with pairwise constraints. Pattern Recogn 43(6):2106–2118. doi:10.1016/j.patcog.2009.12.011

    Article  MATH  Google Scholar 

  47. Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc (Ser B) 58:267–288

    MathSciNet  MATH  Google Scholar 

  48. Torkkola K (2003) Feature extraction by non parametric mutual information maximization. J Mach Learn Res 3:1415–1438

    MathSciNet  MATH  Google Scholar 

  49. Wang J, Luo S, Zeng X (2008) A random subspace method for co-training. In: IJCNN, IEEE, pp 195–200

  50. Xu Z, King I, Lyu MR, Jin R (2010) Discriminative semi-supervised feature selection via manifold regularization. IEEE Trans Neural Netw 21(7):1033–1047

    Article  Google Scholar 

  51. Yaslan Y, Cataltepe Z (2010) Co-training with relevant random subspaces. Neurocomputing 73(10–12):1652–1661. doi:10.1016/j.neucom.2010.01.018

    Article  Google Scholar 

  52. Zafarani R, Liu H (1998) Asu repository of social computing databases. http://socialcomputing.asu.edu/pages/datasets

  53. Zhao J, Lu K, He X (2008) Locality sensitive semi-supervised feature selection. Neurocomputing 71(10–12):1842–1849. doi:10.1016/j.neucom.2007.06.014

    Article  Google Scholar 

  54. Zhao Z, Liu H (2007) Semi-supervised feature selection via spectral analysis. In: SDM, SIAM

  55. Zhou Y, Goldman S (2004) Democratic co-learning. In: Proceedings of the 16th IEEE international conference on tools with artificial intelligence, ICTAI ’04, IEEE Computer Society, Washington, DC, USA, pp 594–202. doi:10.1109/ICTAI.2004.48

  56. Zhou ZH, Li M (2005) Tri-training: exploiting unlabeled data using three classifiers. IEEE Trans Knowl Data Eng 17(11):1529–1541. doi:10.1109/TKDE.2005.186

  57. Zhu X (2005) Semi-Supervised learning literature survey. Computer Sciences, University of Wisconsin-Madison, Tech. rep

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nesma Settouti.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Settouti, N., Chikh, M. & Barra, V. A new feature selection approach based on ensemble methods in semi-supervised classification. Pattern Anal Applic 20, 673–686 (2017). https://doi.org/10.1007/s10044-015-0524-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-015-0524-9

Keywords

Navigation