Skip to main content
Log in

A Novel Approach for Semi-supervised Learning: Incremental Parallel Training with Cross-Validation (IPT-CV)

  • Research Article-Computer Engineering and Computer Science
  • Published:
Arabian Journal for Science and Engineering Aims and scope Submit manuscript

Abstract

There are plenty of unlabeled data in different domains, and effective ways that apply machine learning techniques are in dire need to be found for the ability to use them efficiently. Semi-supervised learning methods are utilized to extract useful information from these unlabeled data. In our study, the Incremental Parallel Training with Cross-Validation (IPT-CV) method is proposed as a novel semi-supervised learning method. This proposed method employs several classifiers and different views of the datasets to label the unlabeled data in an efficient manner. The classifiers used in the algorithm work in parallel each round and enlarge the labeled set according to a validation rule. The method was compared with two well-known SSL methods in the literature. The web was chosen as the domain of the experiments, since it is a land of unlabeled files. Nine binary classification datasets were used from the publicly available WebKB, Banksearch, and the individually collected Conference datasets. The results were statistically analyzed, and according to these analyses, the proposed IPT-CV method showed the highest classification accuracy among all of the methods that were examined.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Witten, I.H.; Frank, E.; Hall, M.A.: Data mining: practical machine learning tools and techniques, p. 629. Morgan Kaufmann Publishers, San Francisco, CA (2011)

    Google Scholar 

  2. Zhu, X.: Semi-supervised learning literature survey. University of Wisconsin, Madison (2005)

    Google Scholar 

  3. Goldberg, A. B.: New directions in semi-supervised learning. Doctor of Philosophy Dissertation, University of Wisconsin (2010)

  4. Liu, B.: Web data mining: exploring hyperlinks, contents, and usage data, 2nd edn., p. 622. Springer, Berlin Heidelberg (2011)

    Book  MATH  Google Scholar 

  5. Sadarangani, A.; Jivani, A.: A Survey of semi-supervised learning. Int. J. Eng. Sci. Res. Technol. 5(10), 138–143 (2016)

    Google Scholar 

  6. Triguero, I.; Garcia, S.; Herrera, F.: Self-labeled techniques for semi-supervised learning: taxonomy, software and empirical study. Knowl. Inf. Syst. 42(2), 245–284 (2015)

    Article  Google Scholar 

  7. Yarowsky, D.: Unsupervised word sense disambiguation rivaling supervised methods. In: Proceedings of the 33rd annual meeting of the association for computational linguistics, pp. 189–196 (1995)

  8. Rosenberg, C.; Hebert, M.; and Schneiderman, H.: Semi-supervised self-training of object detection models. In: seventh IEEE workshops on applications of computer vision (WACV/MOTION'05), 29–36 (2005)

  9. Iggane, M.; Ennaji, A.; Mammass, D.; Yassa, M.E.: Self-training using a k-Nearest neighbor as a base classifier reinforced by support vector machines. Int. J. Comput. Appl. 56(6), 43–46 (2012)

    Google Scholar 

  10. Yu, N.: Domain adaptation for opinion classification: a self-training approach. J. Inf. Sci. Theor. Practice, 10–26 (2013)

  11. Nigam, K.; McCallum, A.K.; Thrun, S.; Mitchell, T.: Text classification from labeled and unlabeled documents using EM. Mach. Learn. 39, 103–134 (2000)

    Article  MATH  Google Scholar 

  12. Miyato, T.; Dai, A. M.; Goodfellow, I.: Adversarial training methods for semi-supervised text classification. arXiv preprint arXiv:1605.07725. (2016)

  13. Blum, A.; and Mitchell, T.: 1998. Combining labeled and unlabeled data with co-training. In: Proceedings of conference on computational learning theory, pp. 92–100 (1998)

  14. Kiritchenko, S.; and Matwin, S.: Email classification with co-training. In: Proceedings of the 2001 conference of the centre for advanced studies on collaborative research, Toronto, Ontario, Canada, IBM Press, pp. 192–201 (2001)

  15. Wang, J.; Luo, S.; and Zeng, X.: A random subspace method for co-training. In: 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence), Hong Kong, pp. 195–200 (2008)

  16. Zhou, Z.H.; Li, M.: Tri-training: exploiting unlabeled data using three classifiers. IEEE Trans. Knowl. Data Eng. 17(11), 1529–1541 (2005)

    Article  Google Scholar 

  17. Sun, S.; Jin, F.: Robust co-training. Int. J. Pattern Recognit Artif Intell. 25(7), 1113–1126 (2011)

    Article  MathSciNet  Google Scholar 

  18. Yu, S.; Krishnapuram, B.; Rosales, R.; Rao, R.B.: Bayesian co-training. The Journal of Machine Learning Research 12, 2649–2680 (2011)

    MathSciNet  MATH  Google Scholar 

  19. Xu, J.; He, H.; Man, H.: DCPE co-training for classification. Neurocomputing 86, 75–85 (2012)

    Article  Google Scholar 

  20. Ma, F.; Meng, D.; Xie, Q.; Li, Z.; and Dong, X.: Self-Paced co-training. In: proceedings of the international conference on machine learning, Sydney, Australia, pp. 2275–2284 (2017)

  21. Wu, J.; Li, L.; and Wang, W. Y.: Reinforced co-training. In: Proceedings of the 2018 conference of the north american chapter of the association for computational linguistics: human language technologies, New Orleans, Louisiana, USA, pp. 1252–1262 (2018)

  22. Yi J.; Lee D.; Chieu H. L.: Co-training for commit classification. In: proceedings of the 2021 EMNLP workshop W-NUT: the seventh workshop on noisy user-generated text, pp. 389–395 (2021)

  23. Kihlman, R.; Fasli, M.: Classifying human rights violations using deep multi-label co-training. IEEE Int. Conf. Big Data 2021, 4887–4895 (2021)

    Google Scholar 

  24. Kijsirikul, B.; Sasipongpairoege, P.; Soonthornphisaj, N.; and Meknavin, S.: Supervised and unsupervised learning algorithms for thai web page identification. In: proceedings of pacific rim international conference on artificial intelligence, Australia, pp. 690–700 (2000)

  25. Soonthornphisaj, N.; Kijsirikul, B.: Iterative cross-training: an algorithm for learning from unlabeled web pages. Int. J. Intell. Syst. 19(2), 131–147 (2004)

    Article  MATH  Google Scholar 

  26. Soonthornphisaj, N.; Kijsirikul, B.: Combining ILP with semi-supervised learning for web page categorization. Int. J. Comput. Inf. Eng. 1, 213–216 (2007)

    Google Scholar 

  27. Muggleton, S.: Inverse entailment and progol. New Gener. Comput. 13, 245–286 (1995)

    Article  Google Scholar 

  28. Nie, F.; Cai, G.; and Li, X.: Multi-view clustering and semi-supervised classification with adaptive neighbours. In: thirty-first AAAI conference on artificial intelligence. (2017)

  29. Van Engelen, J.E.; Hoos, H.H.: A survey on semi-supervised learning. Mach. Learn. 109, 373–440 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  30. Ünal, H.E.; Özel, S.A.; Ünal, İ: Performance of using tag-based feature sets in web page classification. Süleyman Demirel Univ. J. Natural Appl. Sci. 22(2), 583–594 (2018)

    Google Scholar 

  31. Uysal, A.K.; Gunal, S.: The impact of preprocessing on text classification. Inf. Process. Manag. 50, 104–112 (2014)

    Article  Google Scholar 

  32. Özel, S.A.: A web page classification system based on a genetic algorithm using tagged-terms as features. Expert Syst. Appl. 38(4), 3407–3415 (2011)

    Article  Google Scholar 

  33. Ünal, H.E.; Özel, S.A.; Ünal, İ: Effect of tagged-terms on web page classification accuracy. Global J. Technol. 3, 244–250 (2013)

    Google Scholar 

  34. Craven, M.; DiPasquo, D.; Freitag, D.; McCallum, A.; Mitchell, T.; Nigam, K.; and Slattery, S.: Learning to extract symbolic knowledge from the World Wide Web. In: proceedings of the 15th national conference on artificial intelligence in Madison, Wisconsin, USA, american association for artificial intelligence, pp. 509–516 (1998)

  35. Sinka, M.; Corne, D.: A large benchmark dataset for web document clustering. Soft Comput. Syst. Design Manag. Appl. 87, 881–890 (2002)

    Google Scholar 

  36. Van Rijsbergen, C. J.: Information retrieval. Butterworths, p. 208 (1979)

  37. Soonthornphisaj, N.; Chartbanchachai, P.; Pratheeptham, T.; and Kijsirikul, B.: Web page categorization using hierarchical headings structure. In: proceedings of the 24th international conference on information technology interfaces in Cavtat, Croatia, IEEE, 37–42 (2002)

  38. Shaker, M.; Ibrahim, H.; Mustapha, A.; Abdullah, L.N.: Information extraction from hypertext mark-up language web pages. J. Comput. Sci. 5(8), 596–607 (2009)

    Article  Google Scholar 

  39. Baykan, E.; Henzinger, M.; Marian, L.; Weber, I.: A Comprehensive Study of Features and Algorithms for URL-based Topic Classification. ACM Trans. Web 5(3), 1–29 (2011)

    Article  Google Scholar 

  40. Han, J.; Kamber, M.; Pei, J.: Data mining: concepts and techniques, p. 703p. Morgan Kaufmann Publishers, USA (2012)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Havva Esin Ünal.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ünal, H.E., Özel, S.A. A Novel Approach for Semi-supervised Learning: Incremental Parallel Training with Cross-Validation (IPT-CV). Arab J Sci Eng 48, 10457–10477 (2023). https://doi.org/10.1007/s13369-022-07433-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13369-022-07433-w

Keywords

Navigation