Self-Train LogitBoost for Semi-supervised Learning

  • Stamatis Karlos
  • Nikos Fazakis
  • Sotiris Kotsiantis
  • Kyriakos Sgarbas
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 517)

Abstract

Semi-supervised classification methods are based on the use of unlabeled data in combination with a smaller set of labeled examples, in order to increase the classification rate compared with the supervised methods, in which the total training is executed only by the usage of labeled data. In this work, a self-train Logitboost algorithm is presented. The self-train process improves the results by using the accurate class probabilities for which the Logitboost regression tree model is more confident at the unlabeled instances. We performed a comparison with other well-known semi-supervised classification methods on standard benchmark datasets and the presented technique had better accuracy in most cases.

Keywords

Semi-supervised learning Logitboost Classification method Labeled and/or unlabeled data 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Rosenberg, C., Hebert, M., Schneiderman, H.: Semi-supervised self-training of object detection models. In: 7th IEEE Workshop on Applications of Computer Vision, pp. 29–36 (2005)Google Scholar
  2. 2.
    Friedhelm, S., Edmondo, T.: Pattern classification and clustering: A review of partially supervised learning approaches. Pattern Recognition Letters 37, 4–14 (2014)CrossRefGoogle Scholar
  3. 3.
    Zhou, Z.-H., Li, M.: Tri-Training: Exploiting Unlabeled Data Using Three Classifiers. IEEE Trans. on Knowledge and Data Engg. 17(11), 1529–1541 (2005)CrossRefGoogle Scholar
  4. 4.
    Chapelle, O., Schölkopf, B., Zien, A.: Semi-supervised learning. MIT Press, Cambridge (2006)CrossRefGoogle Scholar
  5. 5.
    Wang, S., Wu, L., Jiao, L., Liu, H.: Improve the performance of co-training by committee with refinement of class probability estimations. Neurocomputing 136, 30–40 (2014)CrossRefGoogle Scholar
  6. 6.
    Xu, J., He, H., Man, H.: DCPE co-training for classification. Neurocomputing 86, 75–85 (2012)CrossRefGoogle Scholar
  7. 7.
    Li, M., Zhou, Z.: Improve computer-aided diagnosis with machine learning techniques using undiagnosed samples. IEEE Trans. Syst. Man Cybernet, 1088–1098 (2007)Google Scholar
  8. 8.
    Hady, M., Schwenker, F.: Co-training by committee: a new semi-supervised learning framework. In: Proceedings of the IEEE International Conference on Data Mining Workshops, pp. 563–572 (2008)Google Scholar
  9. 9.
    Zhou, Y., Goldman, S.: Democratic co-learning. In: Ictai, 16th IEEE International Conference on Tools with Artificial Intelligence (ICTAI 2004), pp. 594–202 (2004)Google Scholar
  10. 10.
    Sun, S., Jin, F.: Robust co-training. Int. J. Pattern Recognit. Artif. Intell. 25, 1113–1126 (2011)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Sun, S.: A survey of multi-view machine learning. Neural Computing and Applications 23(7–8), 2031–2038 (2013)CrossRefGoogle Scholar
  12. 12.
    Deng, C., Guo, M.Z.: A new co-training-style random forest for computer aided diagnosis. Journal of Intelligent Information Systems 36, 253–281 (2011)CrossRefGoogle Scholar
  13. 13.
    Friedman, J., Hastie, T., Tibshirani, R.: Additive logistic regression: a statistical view of boosting. Ann. Statist. 28(2), 337–407 (2000)MathSciNetCrossRefMATHGoogle Scholar
  14. 14.
    Torgo, L.: Inductive learning of tree-based regression models. AI Communications 13(2), 137–138 (2000)Google Scholar
  15. 15.
    Jiang, Z., Zhang, S., Zeng, J.: A hybrid generative/discriminative method for semi-supervised classification. Knowledge-Based Systems 37, 137–145 (2013)CrossRefGoogle Scholar
  16. 16.
    Didaci, L., Fumera, G., Roli, F.: Analysis of co-training algorithm with very small training sets. In: Gimel’farb, G., Hancock, E., Imiya, A., Kuijper, A., Kudo, M., Omachi, S., Windeatt, T., Yamada, K. (eds.) SSPR&SPR 2012. LNCS, vol. 7626, pp. 719–726. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  17. 17.
    Guo, T., Li, G.: Improved tri-training with unlabeled data. In: Wu, Y. (ed.) Software Engineering and Knowledge Engineering: Vol. 2. AISC, vol. 115, pp. 139–148. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  18. 18.
    Zhang, M.-L., Zhou, Z.-H.: CoTrade: Confident co-training with data editing. IEEE Trans. Syst. Man Cybernet, Part B: Cybernetics 41(6), 1612–1626 (2011)CrossRefGoogle Scholar
  19. 19.
    Sun, S., Zhang, Q.: Multiple-View Multiple-Learner Semi-Supervised Learning. Neural Process. Lett. 34, 229–240 (2011)CrossRefGoogle Scholar
  20. 20.
    Du, J., Ling, C.X., Zhou, Z.-H.: When. does cotraining work in real data? IEEE Trans. on Knowledge and Data Engg. 23(5), 788–799 (2011)CrossRefGoogle Scholar
  21. 21.
    Zhu, X., Goldberg, A.: Introduction to semi-supervised learning. Synthesis Lectures on Artificial Intelligence and Machine Learning. Morgan & Claypool (2009)Google Scholar
  22. 22.
    Liu, C., Yuen, P.C.: A boosted co-training algorithm for human action recognition. IEEE Trans. on Circuits and Systems for Video Technology 21(9), 1203–1213 (2011). 5739520CrossRefGoogle Scholar
  23. 23.
    Alcalá-Fdez, J., Fernandez, A., Luengo, J., Derrac, J., García, S., Sánchez, L., Herrera, F.: KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework. Journal of Multiple-Valued Logic and Soft Computing 17(2–3), 255–287 (2011)Google Scholar
  24. 24.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.: The WEKA Data Mining Software: An Update. SIGKDD Explorations 11(1) (2009)Google Scholar
  25. 25.
    Triguero, I., Garca, S., Herrera, F.: Self-labeled techniques for semi-supervised learning: taxonomy, software and empirical study. Knowledge and Information Systems 42(2), 245–284 (2015)CrossRefGoogle Scholar
  26. 26.
    García, S., Fernández, A., Luengo, J., Herrera, F.: Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power. Inf. Sciences 180(10), 2044–2064 (2010)CrossRefGoogle Scholar
  27. 27.
    Keerthi, S.S., Shevade, S.K., Bhattacharyya, C., Murthy, K.R.K.: Improvements to Platt’s SMO Algorithm for SVM Classifier Design. Neural Computation 13(3), 637–649 (2001)CrossRefMATHGoogle Scholar
  28. 28.
    Mease, D., Wyner, A.J., Buja, A.: Boosted classification trees and class probability/quantile estimation. J. Mach. Learn. Res. 8, 409–439 (2007)MATHGoogle Scholar
  29. 29.
    Provost, F.J., Domingos, P.: Tree induction for probability based ranking. Mach. Learn. 52, 199–215 (2003)CrossRefMATHGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Stamatis Karlos
    • 1
  • Nikos Fazakis
    • 2
  • Sotiris Kotsiantis
    • 1
  • Kyriakos Sgarbas
    • 2
  1. 1.Department of MathematicsUniversity of PatrasPatrasGreece
  2. 2.Department of Electrical and Computer EngineeringUniversity of PatrasPatrasGreece

Personalised recommendations