A Novel Approach for Semi-supervised Learning: Incremental Parallel Training with Cross-Validation (IPT-CV)

Ünal, Havva Esin; Özel, Selma Ayşe

doi:10.1007/s13369-022-07433-w

A Novel Approach for Semi-supervised Learning: Incremental Parallel Training with Cross-Validation (IPT-CV)

Research Article-Computer Engineering and Computer Science
Published: 23 November 2022

Volume 48, pages 10457–10477, (2023)
Cite this article

Arabian Journal for Science and Engineering Aims and scope Submit manuscript

158 Accesses
Explore all metrics

Abstract

There are plenty of unlabeled data in different domains, and effective ways that apply machine learning techniques are in dire need to be found for the ability to use them efficiently. Semi-supervised learning methods are utilized to extract useful information from these unlabeled data. In our study, the Incremental Parallel Training with Cross-Validation (IPT-CV) method is proposed as a novel semi-supervised learning method. This proposed method employs several classifiers and different views of the datasets to label the unlabeled data in an efficient manner. The classifiers used in the algorithm work in parallel each round and enlarge the labeled set according to a validation rule. The method was compared with two well-known SSL methods in the literature. The web was chosen as the domain of the experiments, since it is a land of unlabeled files. Nine binary classification datasets were used from the publicly available WebKB, Banksearch, and the individually collected Conference datasets. The results were statistically analyzed, and according to these analyses, the proposed IPT-CV method showed the highest classification accuracy among all of the methods that were examined.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

LIA: A Label-Independent Algorithm for Feature Selection for Supervised Learning

Cross-validation Strategies for Balanced and Imbalanced Datasets

SSL-C4.5: Implementation of a Classification Algorithm for Semi-supervised Learning Based on C4.5

References

Witten, I.H.; Frank, E.; Hall, M.A.: Data mining: practical machine learning tools and techniques, p. 629. Morgan Kaufmann Publishers, San Francisco, CA (2011)
Google Scholar
Zhu, X.: Semi-supervised learning literature survey. University of Wisconsin, Madison (2005)
Google Scholar
Goldberg, A. B.: New directions in semi-supervised learning. Doctor of Philosophy Dissertation, University of Wisconsin (2010)
Liu, B.: Web data mining: exploring hyperlinks, contents, and usage data, 2nd edn., p. 622. Springer, Berlin Heidelberg (2011)
Book MATH Google Scholar
Sadarangani, A.; Jivani, A.: A Survey of semi-supervised learning. Int. J. Eng. Sci. Res. Technol. 5(10), 138–143 (2016)
Google Scholar
Triguero, I.; Garcia, S.; Herrera, F.: Self-labeled techniques for semi-supervised learning: taxonomy, software and empirical study. Knowl. Inf. Syst. 42(2), 245–284 (2015)
Article Google Scholar
Yarowsky, D.: Unsupervised word sense disambiguation rivaling supervised methods. In: Proceedings of the 33rd annual meeting of the association for computational linguistics, pp. 189–196 (1995)
Rosenberg, C.; Hebert, M.; and Schneiderman, H.: Semi-supervised self-training of object detection models. In: seventh IEEE workshops on applications of computer vision (WACV/MOTION'05), 29–36 (2005)
Iggane, M.; Ennaji, A.; Mammass, D.; Yassa, M.E.: Self-training using a k-Nearest neighbor as a base classifier reinforced by support vector machines. Int. J. Comput. Appl. 56(6), 43–46 (2012)
Google Scholar
Yu, N.: Domain adaptation for opinion classification: a self-training approach. J. Inf. Sci. Theor. Practice, 10–26 (2013)
Nigam, K.; McCallum, A.K.; Thrun, S.; Mitchell, T.: Text classification from labeled and unlabeled documents using EM. Mach. Learn. 39, 103–134 (2000)
Article MATH Google Scholar
Miyato, T.; Dai, A. M.; Goodfellow, I.: Adversarial training methods for semi-supervised text classification. arXiv preprint arXiv:1605.07725. (2016)
Blum, A.; and Mitchell, T.: 1998. Combining labeled and unlabeled data with co-training. In: Proceedings of conference on computational learning theory, pp. 92–100 (1998)
Kiritchenko, S.; and Matwin, S.: Email classification with co-training. In: Proceedings of the 2001 conference of the centre for advanced studies on collaborative research, Toronto, Ontario, Canada, IBM Press, pp. 192–201 (2001)
Wang, J.; Luo, S.; and Zeng, X.: A random subspace method for co-training. In: 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence), Hong Kong, pp. 195–200 (2008)
Zhou, Z.H.; Li, M.: Tri-training: exploiting unlabeled data using three classifiers. IEEE Trans. Knowl. Data Eng. 17(11), 1529–1541 (2005)
Article Google Scholar
Sun, S.; Jin, F.: Robust co-training. Int. J. Pattern Recognit Artif Intell. 25(7), 1113–1126 (2011)
Article MathSciNet Google Scholar
Yu, S.; Krishnapuram, B.; Rosales, R.; Rao, R.B.: Bayesian co-training. The Journal of Machine Learning Research 12, 2649–2680 (2011)
MathSciNet MATH Google Scholar
Xu, J.; He, H.; Man, H.: DCPE co-training for classification. Neurocomputing 86, 75–85 (2012)
Article Google Scholar
Ma, F.; Meng, D.; Xie, Q.; Li, Z.; and Dong, X.: Self-Paced co-training. In: proceedings of the international conference on machine learning, Sydney, Australia, pp. 2275–2284 (2017)
Wu, J.; Li, L.; and Wang, W. Y.: Reinforced co-training. In: Proceedings of the 2018 conference of the north american chapter of the association for computational linguistics: human language technologies, New Orleans, Louisiana, USA, pp. 1252–1262 (2018)
Yi J.; Lee D.; Chieu H. L.: Co-training for commit classification. In: proceedings of the 2021 EMNLP workshop W-NUT: the seventh workshop on noisy user-generated text, pp. 389–395 (2021)
Kihlman, R.; Fasli, M.: Classifying human rights violations using deep multi-label co-training. IEEE Int. Conf. Big Data 2021, 4887–4895 (2021)
Google Scholar
Kijsirikul, B.; Sasipongpairoege, P.; Soonthornphisaj, N.; and Meknavin, S.: Supervised and unsupervised learning algorithms for thai web page identification. In: proceedings of pacific rim international conference on artificial intelligence, Australia, pp. 690–700 (2000)
Soonthornphisaj, N.; Kijsirikul, B.: Iterative cross-training: an algorithm for learning from unlabeled web pages. Int. J. Intell. Syst. 19(2), 131–147 (2004)
Article MATH Google Scholar
Soonthornphisaj, N.; Kijsirikul, B.: Combining ILP with semi-supervised learning for web page categorization. Int. J. Comput. Inf. Eng. 1, 213–216 (2007)
Google Scholar
Muggleton, S.: Inverse entailment and progol. New Gener. Comput. 13, 245–286 (1995)
Article Google Scholar
Nie, F.; Cai, G.; and Li, X.: Multi-view clustering and semi-supervised classification with adaptive neighbours. In: thirty-first AAAI conference on artificial intelligence. (2017)
Van Engelen, J.E.; Hoos, H.H.: A survey on semi-supervised learning. Mach. Learn. 109, 373–440 (2020)
Article MathSciNet MATH Google Scholar
Ünal, H.E.; Özel, S.A.; Ünal, İ: Performance of using tag-based feature sets in web page classification. Süleyman Demirel Univ. J. Natural Appl. Sci. 22(2), 583–594 (2018)
Google Scholar
Uysal, A.K.; Gunal, S.: The impact of preprocessing on text classification. Inf. Process. Manag. 50, 104–112 (2014)
Article Google Scholar
Özel, S.A.: A web page classification system based on a genetic algorithm using tagged-terms as features. Expert Syst. Appl. 38(4), 3407–3415 (2011)
Article Google Scholar
Ünal, H.E.; Özel, S.A.; Ünal, İ: Effect of tagged-terms on web page classification accuracy. Global J. Technol. 3, 244–250 (2013)
Google Scholar
Craven, M.; DiPasquo, D.; Freitag, D.; McCallum, A.; Mitchell, T.; Nigam, K.; and Slattery, S.: Learning to extract symbolic knowledge from the World Wide Web. In: proceedings of the 15^th national conference on artificial intelligence in Madison, Wisconsin, USA, american association for artificial intelligence, pp. 509–516 (1998)
Sinka, M.; Corne, D.: A large benchmark dataset for web document clustering. Soft Comput. Syst. Design Manag. Appl. 87, 881–890 (2002)
Google Scholar
Van Rijsbergen, C. J.: Information retrieval. Butterworths, p. 208 (1979)
Soonthornphisaj, N.; Chartbanchachai, P.; Pratheeptham, T.; and Kijsirikul, B.: Web page categorization using hierarchical headings structure. In: proceedings of the 24th international conference on information technology interfaces in Cavtat, Croatia, IEEE, 37–42 (2002)
Shaker, M.; Ibrahim, H.; Mustapha, A.; Abdullah, L.N.: Information extraction from hypertext mark-up language web pages. J. Comput. Sci. 5(8), 596–607 (2009)
Article Google Scholar
Baykan, E.; Henzinger, M.; Marian, L.; Weber, I.: A Comprehensive Study of Features and Algorithms for URL-based Topic Classification. ACM Trans. Web 5(3), 1–29 (2011)
Article Google Scholar
Han, J.; Kamber, M.; Pei, J.: Data mining: concepts and techniques, p. 703p. Morgan Kaufmann Publishers, USA (2012)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Cukurova University: Çukurova Üniversitesi, Sarıçam, Turkey
Havva Esin Ünal & Selma Ayşe Özel

Authors

Havva Esin Ünal
View author publications
You can also search for this author in PubMed Google Scholar
Selma Ayşe Özel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Havva Esin Ünal.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ünal, H.E., Özel, S.A. A Novel Approach for Semi-supervised Learning: Incremental Parallel Training with Cross-Validation (IPT-CV). Arab J Sci Eng 48, 10457–10477 (2023). https://doi.org/10.1007/s13369-022-07433-w

Download citation

Received: 22 March 2022
Accepted: 27 October 2022
Published: 23 November 2022
Issue Date: August 2023
DOI: https://doi.org/10.1007/s13369-022-07433-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Novel Approach for Semi-supervised Learning: Incremental Parallel Training with Cross-Validation (IPT-CV)

Abstract

Access this article

Similar content being viewed by others

LIA: A Label-Independent Algorithm for Feature Selection for Supervised Learning

Cross-validation Strategies for Balanced and Imbalanced Datasets

SSL-C4.5: Implementation of a Classification Algorithm for Semi-supervised Learning Based on C4.5

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A Novel Approach for Semi-supervised Learning: Incremental Parallel Training with Cross-Validation (IPT-CV)

Abstract

Access this article

Similar content being viewed by others

LIA: A Label-Independent Algorithm for Feature Selection for Supervised Learning

Cross-validation Strategies for Balanced and Imbalanced Datasets

SSL-C4.5: Implementation of a Classification Algorithm for Semi-supervised Learning Based on C4.5

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation