Knowledge and Information Systems

, Volume 42, Issue 2, pp 245–284

Self-labeled techniques for semi-supervised learning: taxonomy, software and empirical study

  • Isaac Triguero
  • Salvador García
  • Francisco Herrera
Survey Paper

Abstract

Semi-supervised classification methods are suitable tools to tackle training sets with large amounts of unlabeled data and a small quantity of labeled data. This problem has been addressed by several approaches with different assumptions about the characteristics of the input data. Among them, self-labeled techniques follow an iterative procedure, aiming to obtain an enlarged labeled data set, in which they accept that their own predictions tend to be correct. In this paper, we provide a survey of self-labeled methods for semi-supervised classification. From a theoretical point of view, we propose a taxonomy based on the main characteristics presented in them. Empirically, we conduct an exhaustive study that involves a large number of data sets, with different ratios of labeled data, aiming to measure their performance in terms of transductive and inductive classification capabilities. The results are contrasted with nonparametric statistical tests. Note is then taken of which self-labeled models are the best-performing ones. Moreover, a semi-supervised learning module has been developed for the Knowledge Extraction based on Evolutionary Learning software, integrating analyzed methods and data sets.

Keywords

Learning from unlabeled data Semi-supervised learning  Self-training Co-training Multi-view learning Classification 

References

  1. 1.
    Zhu X, Goldberg AB (2009) Introduction to semi-supervised learning, 1st edn. Morgan and Claypool, San Rafael, CAMATHGoogle Scholar
  2. 2.
    Witten IH, Frank E, Hall MA (2011) Data mining: practical machine learning tools and techniques, 3rd edn. Morgan Kaufmann, San FranciscoGoogle Scholar
  3. 3.
    Zhu Y, Yu J, Jing L (2013) A novel semi-supervised learning framework with simultaneous text representing. Knowl Inf Syst 34(3):547–562CrossRefGoogle Scholar
  4. 4.
    Chapelle O, Schlkopf B, Zien A (2006) Semi-supervised learning, 1st edn. The MIT Press, Cambridge, MACrossRefGoogle Scholar
  5. 5.
    Pedrycz W (1985) Algorithms of fuzzy clustering with partial supervision. Pattern Recognit Lett 3:13–20CrossRefGoogle Scholar
  6. 6.
    Zhao W, He Q, Ma H, Shi Z (2012) Effective semi-supervised document clustering via active learning with instance-level constraints. Knowl Inf Syst 30(3):569–587CrossRefGoogle Scholar
  7. 7.
    Chen K, Wang S (2011) Semi-supervised learning via regularized boosting working on multiple semi-supervised assumptions. IEEE Trans Pattern Anal Mach Intell 33(1):129–143CrossRefGoogle Scholar
  8. 8.
    Fujino A, Ueda N, Saito K (2008) Semisupervised learning for a hybrid generative/discriminative classifier based on the maximum entropy principle. IEEE Trans Pattern Anal Mach Intell 30(3):424–437CrossRefGoogle Scholar
  9. 9.
    Joachims T (1999) Transductive inference for text classification using support vector machines. In: Proceedings of 16th international conference on machine learning, Morgan Kaufmann, pp 200–209Google Scholar
  10. 10.
    Blum A, Chawla S (2001) Learning from labeled and unlabeled data using graph mincuts. In: Proceedings of the eighteenth international conference on machine learning, pp 19–26Google Scholar
  11. 11.
    Wang J, Jebara T, Chang S-F (2013) Semi-supervised learning using greedy max-cut. J Mac Learn Res 14(1):771–800MATHMathSciNetGoogle Scholar
  12. 12.
    Mallapragada PK, Jin R, Jain A, Liu Y (2009) Semiboost: boosting for semi-supervised learning. IEEE Trans Pattern Anal Mach Intell 31(11):2000–2014CrossRefGoogle Scholar
  13. 13.
    Yarowsky D (1995) Unsupervised word sense disambiguation rivaling supervised methods. In: Proceedings of the 33rd annual meeting of the association for computational linguistics, pp 189–196Google Scholar
  14. 14.
    Li M, Zhou ZH (2005) SETRED: self-training with editing. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), vol 3518 LNAI, pp 611–621Google Scholar
  15. 15.
    Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. In: Proceedings of the annual ACM conference on computational learning theory, pp 92–100Google Scholar
  16. 16.
    Du J, Ling CX, Zhou ZH (2010) When does co-training work in real data? IEEE Trans Knowl Data Eng 23(5):788–799CrossRefGoogle Scholar
  17. 17.
    Sun S, Jin F (2011) Robust co-training. Int J Pattern Recognit Artif Intell 25(07):1113–1126CrossRefMathSciNetGoogle Scholar
  18. 18.
    Jiang Z, Zhang S, Zeng J (2013) A hybrid generative/discriminative method for semi-supervised classification. Knowl-Based Syst 37:137–145CrossRefGoogle Scholar
  19. 19.
    Sun S (2013) A survey of multi-view machine learning. Neural Comput Appl 23(7–8):2031–2038Google Scholar
  20. 20.
    Zhou ZH, Li M (2005) Tri-training: exploiting unlabeled data using three classifiers. IEEE Trans Knowl Data Eng 17:1529–1541CrossRefGoogle Scholar
  21. 21.
    Li M, Zhou ZH (2007) Improve computer-aided diagnosis with machine learning techniques using undiagnosed samples. IEEE Trans Syst Man Cybern A Syst Hum 37(6):1088–1098CrossRefGoogle Scholar
  22. 22.
    Sun S, Shawe-Taylor J (2010) Sparse semi-supervised learning using conjugate functions. J Mach Learn Res 11:2423–2455MATHMathSciNetGoogle Scholar
  23. 23.
    Zhu X (2005) Semi-supervised learning literature survey. Technical report 1530, Computer Sciences, University of Wisconsin-MadisonGoogle Scholar
  24. 24.
    Chawla N, Karakoulas G (2005) Learning from labeled and unlabeled data: an empirical study across techniques and domains. J Artif Intell Res 23:331–366MATHGoogle Scholar
  25. 25.
    Zhou Z-H, Li M (2010) Semi-supervised learning by disagreement. Knowl Inf Syst 24(3):415–439CrossRefGoogle Scholar
  26. 26.
    Alcalá-Fdez J, Sánchez L, García S, del Jesus MJ, Ventura S, Garrell JM, Otero J, Romero C, Bacardit J, Rivas VM, Fernández JC, Herrera F (2009) KEEL: a software tool to assess evolutionary algorithms for data mining problems. Soft Comput 13(3):307–318CrossRefGoogle Scholar
  27. 27.
    Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30MATHMathSciNetGoogle Scholar
  28. 28.
    García S, Fernández A, Luengo J, Herrera F (2010) Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power. Inf Sci 180:2044–2064CrossRefGoogle Scholar
  29. 29.
    Triguero I, Sáez JA, Luengo J, García S, Herrera F (2013) On the characterization of noise filters for self-training semi-supervised in nearest neighbor classification, Neurocomputing (in press)Google Scholar
  30. 30.
    Cover TM, Hart PE (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13(1):21–27CrossRefMATHGoogle Scholar
  31. 31.
    Dasgupta S, Littman ML, McAllester DA (2001) Pac generalization bounds for co-training. In: Dietterich TG, Becker S, Ghahramani Z (eds) Advances in neural information processing systems. Neural information processing systems: natural and synthetic, vol 14. MIT Press, Cambridge, pp 375–382Google Scholar
  32. 32.
    Quinlan JR (1993) C4.5 programs for machine learning. Morgan Kaufmann Publishers, San Francisco, CAGoogle Scholar
  33. 33.
    Efron B, Tibshirani RJ (1993) An Introduction to the bootstrap. Chapman & Hall, New YorkCrossRefMATHGoogle Scholar
  34. 34.
    Goldman S, Zhou Y (2000) Enhancing supervised learning with unlabeled data. In: Proceedings of the 17th international conference on machine learning. Morgan Kaufmann, pp 327–334Google Scholar
  35. 35.
    Alcalá-Fdez J, Fernandez A, Luengo J, Derrac J, García S, Sánchez L, Herrera F (2011) KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J Multiple-Valued Logic Soft Comput 17(2–3):255–277Google Scholar
  36. 36.
    Bennett K, Demiriz A, Maclin R (2002) Exploiting unlabeled data in ensemble methods. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining, pp 289–296Google Scholar
  37. 37.
    Zhou Y, Goldman S (2004) Democratic co-learning. In: IEEE international conference on tools with artificial intelligence, pp 594–602Google Scholar
  38. 38.
    Deng C, Guo M (2006) Tri-training and data editing based semi-supervised clustering algorithm. In: Gelbukh A, Reyes-Garcia C (eds) MICAI 2006: advances in artificial intelligence, vol 4293 of lecture notes in computer science. Springer, Berlin, pp 641–651Google Scholar
  39. 39.
    Wang J, Luo S, Zeng X (2008) A random subspace method for co-training. In: IEEE international joint conference on computational intelligence, pp 195–200Google Scholar
  40. 40.
    Hady M, Schwenker F (2008) Co-training by committee: a new semi-supervised learning framework. In: IEEE international conference on data mining workshops, ICDMW ’08, pp 563–572Google Scholar
  41. 41.
    Hady M, Schwenker F (2010) Combining committee-based semi-supervised learning and active learning. J Comput Sci Technol 25:681–698CrossRefGoogle Scholar
  42. 42.
    Hady M, Schwenker F, Palm G (2010) Semi-supervised learning for tree-structured ensembles of rbf networks with co-training. Neural Netw 23:497–509CrossRefGoogle Scholar
  43. 43.
    Yaslan Y, Cataltepe Z (2010) Co-training with relevant random subspaces. Neurocomputing 73(10–12):1652–1661CrossRefGoogle Scholar
  44. 44.
    Huang T, Yu Y, Guo G, Li K (2010) A classification algorithm based on local cluster centers with a few labeled training examples. Knowl-Based Syst 23(6):563–571CrossRefGoogle Scholar
  45. 45.
    Halder A, Ghosh S, Ghosh A (2010) Ant based semi-supervised classification. In: Proceedings of the 7th international conference on swarm intelligence, ANTS’10, Springer, Berlin, Heidelberg, pp 376–383Google Scholar
  46. 46.
    Wang Y, Xu X, Zhao H, Hua Z (2010) Semi-supervised learning based on nearest neighbor rule and cut edges. Knowl-Based Syst 23(6):547–554CrossRefGoogle Scholar
  47. 47.
    Deng C, Guo M (2011) A new co-training-style random forest for computer aided diagnosis. J Intell Inf Syst 36:253–281. doi:10.1007/s10844-009-0105-8 CrossRefGoogle Scholar
  48. 48.
    Nigam K, Mccallum A, Thrun S, Mitchell T (2000) Text classification from labeled and unlabeled documents using em. Mach Learn 39(2):103–134CrossRefMATHGoogle Scholar
  49. 49.
    Tang X-L, Han M (2010) Semi-supervised Bayesian artmap. Appl Intell 33(3):302–317CrossRefMathSciNetGoogle Scholar
  50. 50.
    Joachims T (2003) Transductive learning via spectral graph partitioning. In: Proceedings of twentieth international conference on machine learning, vol 1, pp 290–297Google Scholar
  51. 51.
    Belkin M, Niyogi P, Sindhwani V (2006) Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J Mach Learn Res 7:2399–2434MATHMathSciNetGoogle Scholar
  52. 52.
    Xie B, Wang M, Tao D (2011) Toward the optimization of normalized graph Laplacian. IEEE Trans Neural Netw 22(4):660–666CrossRefGoogle Scholar
  53. 53.
    Burges C (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Discov 2(2):121–167CrossRefGoogle Scholar
  54. 54.
    Chapelle O, Sindhwani V, Keerthi SS (2008) Optimization techniques for semi-supervised support vector machines. J Mach Learn Re. 9:203–233MATHGoogle Scholar
  55. 55.
    Adankon M, Cheriet M (2010) Genetic algorithm-based training for semi-supervised svm. Neural Comput Appl 19:1197–1206CrossRefGoogle Scholar
  56. 56.
    Tian X, Gasso G, Canu S (2012) A multiple kernel framework for inductive semi-supervised svm learning. Neurocomputing 90:46–58CrossRefGoogle Scholar
  57. 57.
    Sugato B, Raymond JM (2003) Comparing and unifying search-based and similarity-based approaches to semi-supervised clustering. In: Proceedings of the ICML-2003 workshop on the continuum from labeled to unlabeled data in machine learning and data mining, pp 42–49Google Scholar
  58. 58.
    Yin X, Chen S, Hu E, Zhang D (2010) Semi-supervised clustering with metric learning: an adaptive kernel method. Pattern Recognit 43(4):1320–1333CrossRefMATHGoogle Scholar
  59. 59.
    Grira N, Crucianu M, Boujemaa N (2004) Unsupervised and semi-supervised clustering: a brief survey. In: A review of machine learning techniques for processing multimedia content. Report of the MUSCLE European network of excellence FP6Google Scholar
  60. 60.
    Freund Y, Seung HS, Shamir E, Tishby N (1997) Selective sampling using the query by committee algorithm. Mach Learn 28:133–168CrossRefMATHGoogle Scholar
  61. 61.
    Muslea I, Minton S, Knoblock C (2002) Active + semi-supervised learning = robust multi-view learning. In: Proceedings of ICML-02, 19th international conference on machine learning, pp 435–442Google Scholar
  62. 62.
    Zhang Q, Sun S (2010) Multiple-view multiple-learner active learning. Pattern Recognit 43(9):3113–3119Google Scholar
  63. 63.
    Yu H (2011) Selective sampling techniques for feedback-based data retrieval. Data Min Knowl Discov 22(1–2):1–30CrossRefMATHMathSciNetGoogle Scholar
  64. 64.
    Belhumeur P, Hespanha J, Kriegman D (1997) Eigenfaces vs. fisherfaces: recognition using class specific linear projection. IEEE Trans Pattern Anal Mach Intell 19(7):711–720CrossRefGoogle Scholar
  65. 65.
    Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning: data mining, inference and prediction, 2nd edn. Springer, BerlinCrossRefGoogle Scholar
  66. 66.
    Song Y, Nie F, Zhang C, Xiang S (2008) A unified framework for semi-supervised dimensionality reduction. Pattern Recognit 41(9):2789–2799CrossRefMATHGoogle Scholar
  67. 67.
    Li Y, Guan C (2008) Joint feature re-extraction and classification using an iterative semi-supervised support vector machine algorithm. Mach Learn 71:33–53CrossRefGoogle Scholar
  68. 68.
    Liu H, Motoda H (eds) (2007) Computational methods of feature selection. Chapman &Hall/CRC data mining and knowledge discovery series. Chapman & Hall/CRC, Boca Raton, FLGoogle Scholar
  69. 69.
    Zhao J, Lu K, He X (2008) Locality sensitive semi-supervised feature selection. Neurocomputing 71(10–12):1842–1849CrossRefGoogle Scholar
  70. 70.
    Gregory PA, Gail AC (2010) Self-supervised ARTMAP. Neural Netw 23:265–282CrossRefGoogle Scholar
  71. 71.
    Cour T, Sapp B, Taskar B (2011) Learning from partial labels. J Mach Learn Res 12:1501–1536MATHMathSciNetGoogle Scholar
  72. 72.
    Joshi A, Papanikolopoulos N (2008) Learning to detect moving shadows in dynamic environments. IEEE Trans Pattern Anal Mach Intell 30(11):2055–2063CrossRefGoogle Scholar
  73. 73.
    Ben-David A (2007) A lot of randomness is hiding in accuracy. Eng Appl Artif Intell 20:875–885CrossRefGoogle Scholar
  74. 74.
    Alpaydin E (2010) Introduction to machine learning, 2nd edn. MIT Press, Cambridge, MAMATHGoogle Scholar
  75. 75.
    Asuncion A, Newman D (2007) UCI machine learning repository. http://www.ics.uci.edu/mlearn/MLRepository.html
  76. 76.
    Wu X, Kumar V (eds) (2009) The top ten algorithms in data mining. Chapman & Hall/CRC data mining and knowledge discovery. Chapman & Hall/CRC, Boca Raton, FLGoogle Scholar
  77. 77.
    Aha DW, Kibler D, Albert MK (1991) Instance-based learning algorithms. Mach Learn 6(1):37–66Google Scholar
  78. 78.
    John GH, Langley P (2001) Estimating continuous distributions in Bayesian classifiers. In: Proceedings of the eleventh conference on uncertainty in artificial intelligence. Morgan Kaufmann, San Mateo, pp 338–345Google Scholar
  79. 79.
    Vapnik VN (1998) Statistical learning theory. Wiley-Interscience, LondonMATHGoogle Scholar
  80. 80.
    Platt JC (1999) Fast training of support vector machines using sequential minimal optimization. MIT Press, Cambridge, MAGoogle Scholar
  81. 81.
    García S, Herrera F (2008) An extension on statistical comparisons of classifiers over multiple data sets for all pairwise comparisons. J Mach Learn Res 9:2677–2694MATHGoogle Scholar
  82. 82.
    Sheskin DJ (2011) Handbook of parametric and nonparametric statistical procedures, 5th edn. Chapman & Hall/CRC, Boca Raton, FLMATHGoogle Scholar
  83. 83.
    Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32:675–701CrossRefGoogle Scholar
  84. 84.
    Bergmann G, Hommel G (1988) Improvements of general multiple test procedures for redundant systems of hypotheses. In: Bauer P, Hommel G, Sonnemann E (eds) Multiple hypotheses testing. Springer, Berlin pp 100–115Google Scholar
  85. 85.
    Yang Y, Webb G (2009) Discretization for naive-Bayes learning: managing discretization bias and variance. Mac Learn 74(1):39–74CrossRefGoogle Scholar
  86. 86.
    García S, Luengo J, Saez JA, López V, Herrera F (2013) A survey of discretization techniques: taxonomy and empirical analysis in supervised learning. IEEE Trans Knowl Data Eng 25(4):734–750CrossRefGoogle Scholar
  87. 87.
    Jolliffe IT (1986) Principal component analysis. Springer, BerlinCrossRefGoogle Scholar

Copyright information

© Springer-Verlag London 2013

Authors and Affiliations

  • Isaac Triguero
    • 1
  • Salvador García
    • 2
  • Francisco Herrera
    • 1
  1. 1.Department of Computer Science and Artificial Intelligence, Research Center on Information and Communications Technology (CITIC-UGR)University of GranadaGranada Spain
  2. 2.Department of Computer ScienceUniversity of JaénJaénSpain

Personalised recommendations