A co-training method based on parameter-free and single-step unlabeled data selection strategy with natural neighbors

Gong, Yanlu; Wu, Quanwang; Cheng, Dongdong

doi:10.1007/s13042-023-01805-w

A co-training method based on parameter-free and single-step unlabeled data selection strategy with natural neighbors

Original Article
Published: 01 March 2023

Volume 14, pages 2887–2902, (2023)
Cite this article

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

282 Accesses
1 Altmetric
Explore all metrics

Abstract

As an effective semi-supervised learning algorithm, the co-training method trains two classifiers on two views independently. The unlabeled sample selection strategy in the self-labeled process is crucial for co-training. However, most of the existing strategies strongly depend on parameter settings and require re-calculating the confidence of unlabeled samples in each iteration. Inspired by the concept of natural neighbors introduced recently, a co-training method based on parameter-free and single-step unlabeled data selection strategy with natural neighbors (CT-NaN) is proposed in this paper. In CT-NaN, the confidence value of unlabeled samples is calculated in a parameter-free manner by analyzing the training data based on natural neighbors before the iteration of co-training, and it requires to be calculated only once in the whole process of co-training. Besides, CT-NaN is able to mitigate the negative effect of outliers because the training stops automatically when only outliers remain. Four groups of experiments with 22 data sets are conducted, and the results verify the effectiveness of CT-NaN when compared with 8 state-of-the-art co-training methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 6

Learning from positive and unlabeled data: a survey

Article 02 April 2020

Multi-label feature selection via spectral clustering-based label enhancement and manifold distribution consistency

Article 09 May 2024

Self-supervised Learning: A Succinct Review

Article 20 January 2023

Data availability

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

References

Wang X, Lin X, Dang X (2020) Supervised learning in spiking neural networks: a review of algorithms and evaluations. Neural Netw 125:258–280
Article Google Scholar
Wang Y, Ye H, Zhang T, Zhang H (2019) A data mining method based on unsupervised learning and spatiotemporal analysis for sheath current monitoring. Neurocomputing 352:54–63
Article Google Scholar
Patwary MJ, Wang X-Z (2019) Sensitivity analysis on initial classifier accuracy in fuzziness based semi-supervised learning. Inf Sci 490:93–112
Article Google Scholar
Zhang X-Y, Shi H, Zhu X, Li P (2019) Active semi-supervised learning based on self-expressive correlation with generative adversarial networks. Neurocomputing 345:103–113
Article Google Scholar
Gu X (2020) A self-training hierarchical prototype-based approach for semi-supervised classification. Inf Sci 535:204–224
Article MathSciNet Google Scholar
Li J, Zhu Q, Wu Q (2020) A parameter-free hybrid instance selection algorithm based on local sets with natural neighbors. Appl Intell 50:1527–1541
Article Google Scholar
Duan J, Luo B, Zeng J (2020) Semi-supervised learning with generative model for sentiment classification of stock messages. Expert Syst Appl 158:113540
Article Google Scholar
Dong A, Chung F-L, Deng Z, Wang S (2015) Semi-supervised SVM with extended hidden features. IEEE Trans Cybern 46:2924–2937
Article Google Scholar
Dornaika F, El Traboulsi Y (2019) Joint sparse graph and flexible embedding for graph-based semi-supervised learning. Neural Netw 114:91–95
Article Google Scholar
Triguero I, García S, Herrera F (2014) SEG-SSC: A framework based on synthetic examples generation for self-labeled semi-supervised classification. IEEE Trans Cybern 45:622–634
Article Google Scholar
Xu X, Li W, Xu D, Tsang IW (2015) Co-labeling for multi-view weakly labeled learning. IEEE Trans Pattern Anal Mach Intell 38:1113–1125
Article Google Scholar
Peng J, Estrada G, Pedersoli M, Desrosiers C (2020) Deep co-training for semi-supervised image segmentation. Pattern Recogn 107:107269
Article Google Scholar
Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. In: Proceedings of the Eleventh Annual Conference on Computational Learning Theory, pp 92–100
Gan H, Sang N, Huang R, Tong X, Dan Z (2013) Using clustering analysis to improve semi-supervised classification. Neurocomputing 101:290–298
Article Google Scholar
Wu D, Shang M, Luo X, Xu J, Yan H, Deng W, Wang G (2018) Self-training semi-supervised classification based on density peaks of data. Neurocomputing 275:180–191
Article Google Scholar
Gong Y, Lu J (2019) Co-training method combined with semi-supervised clustering and weighted K-nearest neighbor. Comput Eng Appl 55:114–118
Google Scholar
Gong Y, Lu J (2019) Co-training method combined active learning and density peaks clustering. Comput Appl 39:2297–2301
Google Scholar
Lu J, Gong Y (2021) A co-training method based on entropy and multi-criteria. Appl Intell 51:3212–3225
Article Google Scholar
Nigam K, Ghani R (2000) Analyzing the effectiveness and applicability of co-training. In: Proceedings of the Ninth International Conference on Information and Knowledge Management, pp 86–93
Zhang M-L, Zhou Z-H (2011) CoTrade: confident co-training with data editing. IEEE Trans Syst Man Cybern Part B (Cybern) 41:1612–1626
Article Google Scholar
Zhang Y, Wen J, Wang X, Jiang Z (2014) Semi-supervised learning combining co-training with active learning. Expert Syst Appl 41:2372–2378
Article Google Scholar
Azad PV, Yaslan Y (2017) Using co-training to empower active learning. In: 2017 25th Signal Processing and Communications Applications Conference (SIU), IEEE, pp 1–4
Liu Z, Gao Z, Li X (2018) Co-training method based on margin sample addition. Chin J Sci Instrum 39:45–53
Google Scholar
Ma F, Meng D, Xie Q, Li Z, Dong X (2017) Self-paced co-training. In: International Conference on Machine Learning, PMLR, pp 2275–2284
Du J, Ling CX, Zhou Z-H (2010) When does cotraining work in real data? IEEE Trans Knowl Data Eng 23:788–799
Article Google Scholar
Chen M, Weinberger KQ, Chen Y (2011) Automatic feature decomposition for single view co-training. In: ICML
Goldman S, Zhou Y (2000) Enhancing supervised learning with unlabeled data. ICML, Citeseer, pp 327–334
Google Scholar
Zhou Z-H, Li M (2005) Tri-training: Exploiting unlabeled data using three classifiers. IEEE Trans Knowl Data Eng 17:1529–1541
Article Google Scholar
Wang W, Zhou Z-H (2010) A new analysis of co-training. In: ICML
Gao C, Zhou J, Miao D, Wen J, Yue X (2021) Three-way decision with co-training for partially labeled data. Inf Sci 544:500–518
Article MathSciNet MATH Google Scholar
Han T, Xie W, Zisserman A (2020) Self-supervised co-training for video representation learning. Adv Neural Inf Process Syst 33:5679–5690
Google Scholar
Zhan W, Zhang M-L (2017) Inductive semi-supervised multi-label learning with co-training. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 1305–1314
Xing Y, Yu G, Domeniconi C, Wang J, Zhang Z (2018) Multi-label co-training. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence; 2018, pp 2882–2888.
Xu C, Zhao W, Zhao J, Guan Z, Song X, Li J (2022) Uncertainty-aware multi-view deep learning for internet of things applications. In: IEEE Transactions on Industrial Informatics, pp 1–12
Yin X, Shu T, Huang Q (2012) Semi-supervised fuzzy clustering with metric learning and entropy regularization. Knowl-Based Syst 35:304–311
Article Google Scholar
Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344:1492–1496
Article Google Scholar
Hou J, Pelillo M (2016) A new density kernel in density peak based clustering. In: 2016 23rd International Conference on Pattern Recognition (ICPR), IEEE, pp 468–473
Ding J, He X, Yuan J, Jiang B (2018) Automatic clustering based on density peak detection using generalized extreme value distribution. Soft Comput 22:2777–2796
Article Google Scholar
Ma F, Meng D, Dong X, Yang Y (2020) Self-paced multi-view co-training. J Mach Learn Res 21:1–38
MathSciNet MATH Google Scholar
Kumbure MM, Luukka P, Collan M (2020) A new fuzzy k-nearest neighbor classifier based on the Bonferroni mean. Pattern Recogn Lett 140:172–178
Article Google Scholar
Zhu Q, Feng J, Huang J (2016) Natural neighbor: a self-adaptive neighborhood method without parameter K. Pattern Recogn Lett 80:30–36
Article Google Scholar
Cheng D, Zhu Q, Huang J, Wu Q, Yang L (2018) A novel cluster validity index based on local cores. IEEE Trans Neural Netw Learn Syst 30:985–999
Article Google Scholar
Cheng D, Zhu Q, Huang J, Wu Q, Lijun Y (2019) Clustering with local density peaks-based minimum spanning tree. In: IEEE Transactions on Knowledge and Data Engineering
Huang J, Zhu Q, Yang L, Feng J (2016) A non-parameter outlier detection algorithm based on natural neighbor. Knowl-Based Syst 92:71–77
Article Google Scholar
Wahid A, Annavarapu CSR (2021) NaNOD: a natural neighbour-based outlier detection algorithm. Neural Comput Appl 33:2107–2123
Article Google Scholar
Yousef A, Charkari NM (2015) SFM: a novel sequence-based fusion method for disease genes identification and prioritization. J Theor Biol 383:12–19
Article MathSciNet Google Scholar
Nikdelfaz O, Jalili S (2018) Disease genes prediction by HMM based PU-learning using gene expression profiles. J Biomed Inform 81:102–111
Article Google Scholar
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
MathSciNet MATH Google Scholar

Download references

Acknowledgements

This work is supported in part by the National Natural Science Foundation of China under Grant 62006029, and the Natural Science Foundation of Chongqing under Grant cstc2020jcyj-msxmX0137, in part by Postdoctoral Innovative Talent Support Program of Chongqing under Grant CQBX2021024, in part by Natural Science Foundation of Chongqing (China) under Grant CSTB2022NSCQ-MSX0258, in part by Project of Chongqing Municipal Education Commission, China under Grant KJQN202001434.

Author information

Authors and Affiliations

College of Computer Science, Chongqing University, Chongqing, 40044, China
Yanlu Gong & Quanwang Wu
College of Big Data and Intelligent Engineering, Yangtze Normal University, Chongqing, China
Dongdong Cheng

Authors

Yanlu Gong
View author publications
You can also search for this author in PubMed Google Scholar
Quanwang Wu
View author publications
You can also search for this author in PubMed Google Scholar
Dongdong Cheng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Quanwang Wu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Gong, Y., Wu, Q. & Cheng, D. A co-training method based on parameter-free and single-step unlabeled data selection strategy with natural neighbors. Int. J. Mach. Learn. & Cyber. 14, 2887–2902 (2023). https://doi.org/10.1007/s13042-023-01805-w

Download citation

Received: 12 February 2022
Accepted: 16 February 2023
Published: 01 March 2023
Issue Date: August 2023
DOI: https://doi.org/10.1007/s13042-023-01805-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A co-training method based on parameter-free and single-step unlabeled data selection strategy with natural neighbors

Abstract

Access this article

Similar content being viewed by others

Learning from positive and unlabeled data: a survey

Multi-label feature selection via spectral clustering-based label enhancement and manifold distribution consistency

Self-supervised Learning: A Succinct Review

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A co-training method based on parameter-free and single-step unlabeled data selection strategy with natural neighbors

Abstract

Access this article

Similar content being viewed by others

Learning from positive and unlabeled data: a survey

Multi-label feature selection via spectral clustering-based label enhancement and manifold distribution consistency

Self-supervised Learning: A Succinct Review

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation