Abstract
In computer aided medical system, many practical classification applications are confronted to the massive multiplication of collection and storage of data, this is especially the case in areas such as the prediction of medical test efficiency, the classification of tumors and the detection of cancers. Data with known class labels (labeled data) can be limited but unlabeled data (with unknown class labels) are more readily available. Semi-supervised learning deals with methods for exploiting the unlabeled data in addition to the labeled data to improve performance on the classification task. In this paper, we consider the problem of using a large amount of unlabeled data to improve the efficiency of feature selection in large dimensional datasets, when only a small set of labeled examples is available. We propose a new semi-supervised feature evaluation method called Optimized co-Forest for Feature Selection (OFFS) that combines ideas from co-forest and the embedded principle of selecting in Random Forest based by the permutation of out-of-bag set. We provide empirical results on several medical and biological benchmark datasets, indicating an overall significant improvement of OFFS compared to four other feature selection approaches using filter, wrapper and embedded manner in semi-supervised learning. Our method proves its ability and effectiveness to select and measure importance to improve the performance of the hypothesis learned with a small amount of labeled samples by exploiting unlabeled samples.
Similar content being viewed by others
Notes
A bootstrap sample L is, for example, obtained by randomly drawing n observations with replacement from the training sample \(L_n\) each observation with a probability 1/n to be drawn.
References
Aha DW, Bankert RL (1996) A comparative evaluation of sequential feature selection algorithms. In: Fisher DH, Lenz HJ (eds) Learning from data: artificial intelligence and Statistics V, Lecture Notes in Statistics, chap 4, pp 199–206. Springer-Verlag, 175 Fifth Avenue, New York, 10010, USA
Amit Y, Geman D (1997) Shape quantization and recognition with randomized trees. Neural Comput 9(7):1545–1588
Battiti R (1994) Using mutual information for selecting features in supervised neural net learning. Trans Neural Netw 5(4):537–550. doi:10.1109/72.298224
Bellal F, Elghazel H, Aussem A (2012) A semi-supervised feature ranking method with ensemble learning. Pattern Recogn Lett 33(10):1426–1432. doi:10.1016/j.patrec.2012.03.001
Benabdeslem K, Hindawi M (2013) Efficient semi-supervised feature selection: constraint, relevance and redundancy. IEEE Trans Knowl Data Eng 26(5):1131–1143
Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. In: Proceedings of the eleventh annual conference on computational learning theory. COLT’ 98NY, USA, New York, pp 92–100
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140. doi:10.1023/A:1018054314350
Breiman L (2001) Random forests. Mach Learn 45:5–32
Cai D, He X, Zhou K, Han J, Bao H (2007) Locality sensitive discriminant analysis. In: Proceedings of the 20th international joint conference on artificial intelligence, IJCAI’07, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp 708–713
Chang CC, Lin CJ (2011) Libsvm: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):27. doi:10.1145/1961189.1961199
Cheng Y, Cai Y, Sun Y, Li J (2008) Semi-supervised feature selection under logistic i-relief framework. In: ICPR IEEE, pp 1–4
Cun YL, Denker JS, Solla SA (1990) Optimal brain damage. In: Advances in neural information processing systems, pp 598–605. Morgan Kaufmann
Deng C, Guo M (2011) A new co-training-style random forest for computer aided diagnosis. J Intell Inf Syst 36(3):253–281
Doquire G, Verleysen M (2011) Graph laplacian for semi-supervised feature selection in regression problems. In: Cabestany J, Rojas I, Caparrs GJ (eds) IWANN (1), Lecture Notes in Computer Science, vol 6691. Springer, pp 248–255
Dy JG, Brodley CE (2004) Feature selection for unsupervised learning. J Mach Learn Res 5:845–889
Eiger AM, Nadler B, Spiegelman C (2013) The calibrated kolmogorov–smirnov test. http://arxiv.org/abs/1311.3190. Cite arxiv:1311.3190
Elghazel H, Aussem A (2010) Feature selection for unsupervised learning using random cluster ensembles. In: 2013 IEEE 13th international conference on data mining, pp 168–175
Furey TS, Cristianini N, Duffy N, Bednarski DW, Schummer M, Haussler D (2000) Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16(10):906–914 (Evaluation Studies)
Goldberg DE, Deb K (1991) A comparative analysis of selection schemes used in genetic algorithms. In: Foundations of genetic algorithms. Morgan Kaufmann, pp 69–93
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182
Han J, Kamber M, Pei J (2011) Data Mining: concepts and techniques, 3rd edn. Morgan Kaufmann Publishers Inc., San Francisco
Hindawi M, Benabdeslem K (2013) Local-to-global semi-supervised feature selection. In: He Q, Iyengar A, Nejdl W, Pei J, Rastogi R (eds) CIKM. ACM, pp 2159–2168
Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20(8):832–844. doi:10.1109/34.709601
Hong Y, Kwong S, Chang Y, Ren Q (2008) Unsupervised feature selection using clustering ensembles and population based incremental learning algorithm. Pattern Recogn 41(9):2742–2756. doi:10.1016/j.patcog.2008.03.007
Hua J, Tembe WD, Dougherty ER (2009) Performance of feature-selection methods in the classification of high-dimension data. Pattern Recogn 42(3):409–424. doi:10.1016/j.patcog.2008.08.001
John GH, Kohavi R, Pfleger K (1994) Irrelevant features and the subset selection problem. In: Machine learning: proceedings of the eleventh international. Morgan Kaufmann, pp 121–129
Kallakech M, Biela P, Macaire L, Hamad D (2011) Constraint scores for semi-supervised feature selection: a comparative study. Pattern Recogn Lett 32(5):656–665
Kira K, Rendell LA (1992) A practical approach to feature selection. In: Proceedings of the ninth international workshop on machine learning, ML92. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp 249–256
Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1–2):273–324. doi:10.1016/S0004-3702(97)00043-X
Kong X, Yu PS (2010) Semi-supervised feature selection for graph classification. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’10. ACM, New York, NY, USA, pp 793–802. doi:10.1145/1835804.1835905
Kuncheva LI (2007) A stability index for feature selection. In: Proceedings of the 25th conference on proceedings of the 25th IASTED international multi-conference: artificial intelligence and applications, AIAP’07. ACTA Press, Anaheim, CA, USA, pp 390–395
Leskes B, Torenvliet L (2008) The value of agreement a new boosting algorithm. J Comput Syst Sci 74(4):557–586. doi:10.1016/j.jcss.2007.06.005
Li M, Zhou ZH (2007) Improve computer-aided diagnosis with machine learning techniques using undiagnosed samples. Trans Sys Man Cyber Part A 37(6):1088–1098. doi:10.1109/TSMCA.2007.904745
Liu H, Motoda H (1998) Feature extraction, construction and selection: a data mining perspective. Kluwer Academic Publishers, Norwell
Liu H, Motoda H (2007) Computational methods of feature selection (Chapman & Hall/Crc Data Mining and Knowledge Discovery Series). Chapman & Hall/CRC
Mitchell TM (1999) The role of unlabeled data in supervised learning. In: Proceedings of the sixth international colloquium on cognitive science. San Sebastian, Spain
Mitra P, Murthy CA, Pal SK (2002) Unsupervised feature selection using feature similarity. IEEE Trans Pattern Anal Mach Intell 24(3):301–312. doi:10.1109/34.990133
Miyahara K, Pazzani MJ (2000) Collaborative filtering with the simple bayesian classifier. In: Proceedings of the 6th Pacific Rim international conference on artificial intelligence, PRICAI’00. Springer-Verlag, Berlin, pp 679–689
Nakatani Y, Zhu K, Uehara K (2007) Semisupervised learning using feature selection based on maximum density subgraphs. Syst Comput Jpn 38(9):32–43. doi:10.1002/scj.20757
Newman D, Hettich S, Blake C, Merz C (1998) Uci repository of machine learning databases. http://www.ics.uci.edu/~mlearn/MLRepository.html
Nigam K, Ghani R (2000) Analyzing the effectiveness and applicability of co-training. In: Proceedings of the ninth international conference on information and knowledge management, CIKM ’00. ACM, New York, NY, USA, pp 86–93. doi:10.1145/354756.354805
Press WH, Teukolsky SA (1992) In: Vetterling WT, Flannery BP (eds) Numerical recipes in C (2nd ed.): the art of scientific computing. Cambridge University Press, New York
Ren J, Qiu Z, Fan W, Cheng H, Yu PS (2008) Forward semi-supervised feature selection. In: Proceedings of the 12th Pacific-Asia conference on advances in knowledge discovery and data mining, PAKDD’08. Springer-Verlag, Berlin, pp 970–976
Saeys Y, Inza In, Larrañaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23(19):2507–2517. doi:10.1093/bioinformatics/btm344
Stoppiglia H, Dreyfus G, Dubois R, Oussar Y (2003) Ranking a random feature for variable and feature selection. J Mach Learn Res 3:1399–1414
Sun D, Zhang D (2010) Bagging constraint score for feature selection with pairwise constraints. Pattern Recogn 43(6):2106–2118. doi:10.1016/j.patcog.2009.12.011
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc (Ser B) 58:267–288
Torkkola K (2003) Feature extraction by non parametric mutual information maximization. J Mach Learn Res 3:1415–1438
Wang J, Luo S, Zeng X (2008) A random subspace method for co-training. In: IJCNN, IEEE, pp 195–200
Xu Z, King I, Lyu MR, Jin R (2010) Discriminative semi-supervised feature selection via manifold regularization. IEEE Trans Neural Netw 21(7):1033–1047
Yaslan Y, Cataltepe Z (2010) Co-training with relevant random subspaces. Neurocomputing 73(10–12):1652–1661. doi:10.1016/j.neucom.2010.01.018
Zafarani R, Liu H (1998) Asu repository of social computing databases. http://socialcomputing.asu.edu/pages/datasets
Zhao J, Lu K, He X (2008) Locality sensitive semi-supervised feature selection. Neurocomputing 71(10–12):1842–1849. doi:10.1016/j.neucom.2007.06.014
Zhao Z, Liu H (2007) Semi-supervised feature selection via spectral analysis. In: SDM, SIAM
Zhou Y, Goldman S (2004) Democratic co-learning. In: Proceedings of the 16th IEEE international conference on tools with artificial intelligence, ICTAI ’04, IEEE Computer Society, Washington, DC, USA, pp 594–202. doi:10.1109/ICTAI.2004.48
Zhou ZH, Li M (2005) Tri-training: exploiting unlabeled data using three classifiers. IEEE Trans Knowl Data Eng 17(11):1529–1541. doi:10.1109/TKDE.2005.186
Zhu X (2005) Semi-Supervised learning literature survey. Computer Sciences, University of Wisconsin-Madison, Tech. rep
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Settouti, N., Chikh, M. & Barra, V. A new feature selection approach based on ensemble methods in semi-supervised classification. Pattern Anal Applic 20, 673–686 (2017). https://doi.org/10.1007/s10044-015-0524-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-015-0524-9