Pattern Analysis and Applications

, Volume 20, Issue 3, pp 673–686 | Cite as

A new feature selection approach based on ensemble methods in semi-supervised classification

  • Nesma Settouti
  • Mohamed Amine Chikh
  • Vincent Barra
Theoretical Advances


In computer aided medical system, many practical classification applications are confronted to the massive multiplication of collection and storage of data, this is especially the case in areas such as the prediction of medical test efficiency, the classification of tumors and the detection of cancers. Data with known class labels (labeled data) can be limited but unlabeled data (with unknown class labels) are more readily available. Semi-supervised learning deals with methods for exploiting the unlabeled data in addition to the labeled data to improve performance on the classification task. In this paper, we consider the problem of using a large amount of unlabeled data to improve the efficiency of feature selection in large dimensional datasets, when only a small set of labeled examples is available. We propose a new semi-supervised feature evaluation method called Optimized co-Forest for Feature Selection (OFFS) that combines ideas from co-forest and the embedded principle of selecting in Random Forest based by the permutation of out-of-bag set. We provide empirical results on several medical and biological benchmark datasets, indicating an overall significant improvement of OFFS compared to four other feature selection approaches using filter, wrapper and embedded manner in semi-supervised learning. Our method proves its ability and effectiveness to select and measure importance to improve the performance of the hypothesis learned with a small amount of labeled samples by exploiting unlabeled samples.


Feature selection Semi-supervised learning Ensemble methods Co-forest Random Forest Large datasets Medical diagnosis 


  1. 1.
    Aha DW, Bankert RL (1996) A comparative evaluation of sequential feature selection algorithms. In: Fisher DH, Lenz HJ (eds) Learning from data: artificial intelligence and Statistics V, Lecture Notes in Statistics, chap 4, pp 199–206. Springer-Verlag, 175 Fifth Avenue, New York, 10010, USAGoogle Scholar
  2. 2.
    Amit Y, Geman D (1997) Shape quantization and recognition with randomized trees. Neural Comput 9(7):1545–1588CrossRefGoogle Scholar
  3. 3.
    Battiti R (1994) Using mutual information for selecting features in supervised neural net learning. Trans Neural Netw 5(4):537–550. doi: 10.1109/72.298224 CrossRefGoogle Scholar
  4. 4.
    Bellal F, Elghazel H, Aussem A (2012) A semi-supervised feature ranking method with ensemble learning. Pattern Recogn Lett 33(10):1426–1432. doi: 10.1016/j.patrec.2012.03.001 CrossRefGoogle Scholar
  5. 5.
    Benabdeslem K, Hindawi M (2013) Efficient semi-supervised feature selection: constraint, relevance and redundancy. IEEE Trans Knowl Data Eng 26(5):1131–1143CrossRefGoogle Scholar
  6. 6.
    Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. In: Proceedings of the eleventh annual conference on computational learning theory. COLT’ 98NY, USA, New York, pp 92–100Google Scholar
  7. 7.
    Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140. doi: 10.1023/A:1018054314350 zbMATHGoogle Scholar
  8. 8.
    Breiman L (2001) Random forests. Mach Learn 45:5–32CrossRefzbMATHGoogle Scholar
  9. 9.
    Cai D, He X, Zhou K, Han J, Bao H (2007) Locality sensitive discriminant analysis. In: Proceedings of the 20th international joint conference on artificial intelligence, IJCAI’07, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp 708–713Google Scholar
  10. 10.
    Chang CC, Lin CJ (2011) Libsvm: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):27. doi: 10.1145/1961189.1961199 CrossRefGoogle Scholar
  11. 11.
    Cheng Y, Cai Y, Sun Y, Li J (2008) Semi-supervised feature selection under logistic i-relief framework. In: ICPR IEEE, pp 1–4Google Scholar
  12. 12.
    Cun YL, Denker JS, Solla SA (1990) Optimal brain damage. In: Advances in neural information processing systems, pp 598–605. Morgan KaufmannGoogle Scholar
  13. 13.
    Deng C, Guo M (2011) A new co-training-style random forest for computer aided diagnosis. J Intell Inf Syst 36(3):253–281CrossRefGoogle Scholar
  14. 14.
    Doquire G, Verleysen M (2011) Graph laplacian for semi-supervised feature selection in regression problems. In: Cabestany J, Rojas I, Caparrs GJ (eds) IWANN (1), Lecture Notes in Computer Science, vol 6691. Springer, pp 248–255Google Scholar
  15. 15.
    Dy JG, Brodley CE (2004) Feature selection for unsupervised learning. J Mach Learn Res 5:845–889MathSciNetzbMATHGoogle Scholar
  16. 16.
    Eiger AM, Nadler B, Spiegelman C (2013) The calibrated kolmogorov–smirnov test. Cite arxiv:1311.3190
  17. 17.
    Elghazel H, Aussem A (2010) Feature selection for unsupervised learning using random cluster ensembles. In: 2013 IEEE 13th international conference on data mining, pp 168–175Google Scholar
  18. 18.
    Furey TS, Cristianini N, Duffy N, Bednarski DW, Schummer M, Haussler D (2000) Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16(10):906–914 (Evaluation Studies) Google Scholar
  19. 19.
    Goldberg DE, Deb K (1991) A comparative analysis of selection schemes used in genetic algorithms. In: Foundations of genetic algorithms. Morgan Kaufmann, pp 69–93Google Scholar
  20. 20.
    Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182zbMATHGoogle Scholar
  21. 21.
    Han J, Kamber M, Pei J (2011) Data Mining: concepts and techniques, 3rd edn. Morgan Kaufmann Publishers Inc., San FranciscozbMATHGoogle Scholar
  22. 22.
    Hindawi M, Benabdeslem K (2013) Local-to-global semi-supervised feature selection. In: He Q, Iyengar A, Nejdl W, Pei J, Rastogi R (eds) CIKM. ACM, pp 2159–2168Google Scholar
  23. 23.
    Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20(8):832–844. doi: 10.1109/34.709601 CrossRefGoogle Scholar
  24. 24.
    Hong Y, Kwong S, Chang Y, Ren Q (2008) Unsupervised feature selection using clustering ensembles and population based incremental learning algorithm. Pattern Recogn 41(9):2742–2756. doi: 10.1016/j.patcog.2008.03.007 CrossRefzbMATHGoogle Scholar
  25. 25.
    Hua J, Tembe WD, Dougherty ER (2009) Performance of feature-selection methods in the classification of high-dimension data. Pattern Recogn 42(3):409–424. doi: 10.1016/j.patcog.2008.08.001 CrossRefzbMATHGoogle Scholar
  26. 26.
    John GH, Kohavi R, Pfleger K (1994) Irrelevant features and the subset selection problem. In: Machine learning: proceedings of the eleventh international. Morgan Kaufmann, pp 121–129Google Scholar
  27. 27.
    Kallakech M, Biela P, Macaire L, Hamad D (2011) Constraint scores for semi-supervised feature selection: a comparative study. Pattern Recogn Lett 32(5):656–665CrossRefGoogle Scholar
  28. 28.
    Kira K, Rendell LA (1992) A practical approach to feature selection. In: Proceedings of the ninth international workshop on machine learning, ML92. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp 249–256Google Scholar
  29. 29.
    Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1–2):273–324. doi: 10.1016/S0004-3702(97)00043-X CrossRefzbMATHGoogle Scholar
  30. 30.
    Kong X, Yu PS (2010) Semi-supervised feature selection for graph classification. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’10. ACM, New York, NY, USA, pp 793–802. doi: 10.1145/1835804.1835905
  31. 31.
    Kuncheva LI (2007) A stability index for feature selection. In: Proceedings of the 25th conference on proceedings of the 25th IASTED international multi-conference: artificial intelligence and applications, AIAP’07. ACTA Press, Anaheim, CA, USA, pp 390–395Google Scholar
  32. 32.
    Leskes B, Torenvliet L (2008) The value of agreement a new boosting algorithm. J Comput Syst Sci 74(4):557–586. doi: 10.1016/j.jcss.2007.06.005 MathSciNetCrossRefzbMATHGoogle Scholar
  33. 33.
    Li M, Zhou ZH (2007) Improve computer-aided diagnosis with machine learning techniques using undiagnosed samples. Trans Sys Man Cyber Part A 37(6):1088–1098. doi: 10.1109/TSMCA.2007.904745 CrossRefGoogle Scholar
  34. 34.
    Liu H, Motoda H (1998) Feature extraction, construction and selection: a data mining perspective. Kluwer Academic Publishers, NorwellCrossRefzbMATHGoogle Scholar
  35. 35.
    Liu H, Motoda H (2007) Computational methods of feature selection (Chapman & Hall/Crc Data Mining and Knowledge Discovery Series). Chapman & Hall/CRCGoogle Scholar
  36. 36.
    Mitchell TM (1999) The role of unlabeled data in supervised learning. In: Proceedings of the sixth international colloquium on cognitive science. San Sebastian, SpainGoogle Scholar
  37. 37.
    Mitra P, Murthy CA, Pal SK (2002) Unsupervised feature selection using feature similarity. IEEE Trans Pattern Anal Mach Intell 24(3):301–312. doi: 10.1109/34.990133 CrossRefGoogle Scholar
  38. 38.
    Miyahara K, Pazzani MJ (2000) Collaborative filtering with the simple bayesian classifier. In: Proceedings of the 6th Pacific Rim international conference on artificial intelligence, PRICAI’00. Springer-Verlag, Berlin, pp 679–689Google Scholar
  39. 39.
    Nakatani Y, Zhu K, Uehara K (2007) Semisupervised learning using feature selection based on maximum density subgraphs. Syst Comput Jpn 38(9):32–43. doi: 10.1002/scj.20757 CrossRefGoogle Scholar
  40. 40.
    Newman D, Hettich S, Blake C, Merz C (1998) Uci repository of machine learning databases.
  41. 41.
    Nigam K, Ghani R (2000) Analyzing the effectiveness and applicability of co-training. In: Proceedings of the ninth international conference on information and knowledge management, CIKM ’00. ACM, New York, NY, USA, pp 86–93. doi: 10.1145/354756.354805
  42. 42.
    Press WH, Teukolsky SA (1992) In: Vetterling WT, Flannery BP (eds) Numerical recipes in C (2nd ed.): the art of scientific computing. Cambridge University Press, New YorkGoogle Scholar
  43. 43.
    Ren J, Qiu Z, Fan W, Cheng H, Yu PS (2008) Forward semi-supervised feature selection. In: Proceedings of the 12th Pacific-Asia conference on advances in knowledge discovery and data mining, PAKDD’08. Springer-Verlag, Berlin, pp 970–976Google Scholar
  44. 44.
    Saeys Y, Inza In, Larrañaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23(19):2507–2517. doi: 10.1093/bioinformatics/btm344
  45. 45.
    Stoppiglia H, Dreyfus G, Dubois R, Oussar Y (2003) Ranking a random feature for variable and feature selection. J Mach Learn Res 3:1399–1414zbMATHGoogle Scholar
  46. 46.
    Sun D, Zhang D (2010) Bagging constraint score for feature selection with pairwise constraints. Pattern Recogn 43(6):2106–2118. doi: 10.1016/j.patcog.2009.12.011 CrossRefzbMATHGoogle Scholar
  47. 47.
    Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc (Ser B) 58:267–288MathSciNetzbMATHGoogle Scholar
  48. 48.
    Torkkola K (2003) Feature extraction by non parametric mutual information maximization. J Mach Learn Res 3:1415–1438MathSciNetzbMATHGoogle Scholar
  49. 49.
    Wang J, Luo S, Zeng X (2008) A random subspace method for co-training. In: IJCNN, IEEE, pp 195–200Google Scholar
  50. 50.
    Xu Z, King I, Lyu MR, Jin R (2010) Discriminative semi-supervised feature selection via manifold regularization. IEEE Trans Neural Netw 21(7):1033–1047CrossRefGoogle Scholar
  51. 51.
    Yaslan Y, Cataltepe Z (2010) Co-training with relevant random subspaces. Neurocomputing 73(10–12):1652–1661. doi: 10.1016/j.neucom.2010.01.018 CrossRefGoogle Scholar
  52. 52.
    Zafarani R, Liu H (1998) Asu repository of social computing databases.
  53. 53.
    Zhao J, Lu K, He X (2008) Locality sensitive semi-supervised feature selection. Neurocomputing 71(10–12):1842–1849. doi: 10.1016/j.neucom.2007.06.014 CrossRefGoogle Scholar
  54. 54.
    Zhao Z, Liu H (2007) Semi-supervised feature selection via spectral analysis. In: SDM, SIAMGoogle Scholar
  55. 55.
    Zhou Y, Goldman S (2004) Democratic co-learning. In: Proceedings of the 16th IEEE international conference on tools with artificial intelligence, ICTAI ’04, IEEE Computer Society, Washington, DC, USA, pp 594–202. doi: 10.1109/ICTAI.2004.48
  56. 56.
    Zhou ZH, Li M (2005) Tri-training: exploiting unlabeled data using three classifiers. IEEE Trans Knowl Data Eng 17(11):1529–1541. doi: 10.1109/TKDE.2005.186
  57. 57.
    Zhu X (2005) Semi-Supervised learning literature survey. Computer Sciences, University of Wisconsin-Madison, Tech. repGoogle Scholar

Copyright information

© Springer-Verlag London 2015

Authors and Affiliations

  • Nesma Settouti
    • 1
    • 2
    • 3
  • Mohamed Amine Chikh
    • 3
  • Vincent Barra
    • 1
    • 2
  1. 1.LIMOS, CNRS, UMR 6158AubiereFrance
  2. 2.LIMOS, Clermont-Université Université Blaise PascalClermont-FerrandFrance
  3. 3.Biomedical Engineering Laboratory GBMTlemcen UniversityTlemcenAlgeria

Personalised recommendations