Abstract
As an effective semi-supervised learning algorithm, the co-training method trains two classifiers on two views independently. The unlabeled sample selection strategy in the self-labeled process is crucial for co-training. However, most of the existing strategies strongly depend on parameter settings and require re-calculating the confidence of unlabeled samples in each iteration. Inspired by the concept of natural neighbors introduced recently, a co-training method based on parameter-free and single-step unlabeled data selection strategy with natural neighbors (CT-NaN) is proposed in this paper. In CT-NaN, the confidence value of unlabeled samples is calculated in a parameter-free manner by analyzing the training data based on natural neighbors before the iteration of co-training, and it requires to be calculated only once in the whole process of co-training. Besides, CT-NaN is able to mitigate the negative effect of outliers because the training stops automatically when only outliers remain. Four groups of experiments with 22 data sets are conducted, and the results verify the effectiveness of CT-NaN when compared with 8 state-of-the-art co-training methods.
Similar content being viewed by others
Data availability
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.
References
Wang X, Lin X, Dang X (2020) Supervised learning in spiking neural networks: a review of algorithms and evaluations. Neural Netw 125:258–280
Wang Y, Ye H, Zhang T, Zhang H (2019) A data mining method based on unsupervised learning and spatiotemporal analysis for sheath current monitoring. Neurocomputing 352:54–63
Patwary MJ, Wang X-Z (2019) Sensitivity analysis on initial classifier accuracy in fuzziness based semi-supervised learning. Inf Sci 490:93–112
Zhang X-Y, Shi H, Zhu X, Li P (2019) Active semi-supervised learning based on self-expressive correlation with generative adversarial networks. Neurocomputing 345:103–113
Gu X (2020) A self-training hierarchical prototype-based approach for semi-supervised classification. Inf Sci 535:204–224
Li J, Zhu Q, Wu Q (2020) A parameter-free hybrid instance selection algorithm based on local sets with natural neighbors. Appl Intell 50:1527–1541
Duan J, Luo B, Zeng J (2020) Semi-supervised learning with generative model for sentiment classification of stock messages. Expert Syst Appl 158:113540
Dong A, Chung F-L, Deng Z, Wang S (2015) Semi-supervised SVM with extended hidden features. IEEE Trans Cybern 46:2924–2937
Dornaika F, El Traboulsi Y (2019) Joint sparse graph and flexible embedding for graph-based semi-supervised learning. Neural Netw 114:91–95
Triguero I, García S, Herrera F (2014) SEG-SSC: A framework based on synthetic examples generation for self-labeled semi-supervised classification. IEEE Trans Cybern 45:622–634
Xu X, Li W, Xu D, Tsang IW (2015) Co-labeling for multi-view weakly labeled learning. IEEE Trans Pattern Anal Mach Intell 38:1113–1125
Peng J, Estrada G, Pedersoli M, Desrosiers C (2020) Deep co-training for semi-supervised image segmentation. Pattern Recogn 107:107269
Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. In: Proceedings of the Eleventh Annual Conference on Computational Learning Theory, pp 92–100
Gan H, Sang N, Huang R, Tong X, Dan Z (2013) Using clustering analysis to improve semi-supervised classification. Neurocomputing 101:290–298
Wu D, Shang M, Luo X, Xu J, Yan H, Deng W, Wang G (2018) Self-training semi-supervised classification based on density peaks of data. Neurocomputing 275:180–191
Gong Y, Lu J (2019) Co-training method combined with semi-supervised clustering and weighted K-nearest neighbor. Comput Eng Appl 55:114–118
Gong Y, Lu J (2019) Co-training method combined active learning and density peaks clustering. Comput Appl 39:2297–2301
Lu J, Gong Y (2021) A co-training method based on entropy and multi-criteria. Appl Intell 51:3212–3225
Nigam K, Ghani R (2000) Analyzing the effectiveness and applicability of co-training. In: Proceedings of the Ninth International Conference on Information and Knowledge Management, pp 86–93
Zhang M-L, Zhou Z-H (2011) CoTrade: confident co-training with data editing. IEEE Trans Syst Man Cybern Part B (Cybern) 41:1612–1626
Zhang Y, Wen J, Wang X, Jiang Z (2014) Semi-supervised learning combining co-training with active learning. Expert Syst Appl 41:2372–2378
Azad PV, Yaslan Y (2017) Using co-training to empower active learning. In: 2017 25th Signal Processing and Communications Applications Conference (SIU), IEEE, pp 1–4
Liu Z, Gao Z, Li X (2018) Co-training method based on margin sample addition. Chin J Sci Instrum 39:45–53
Ma F, Meng D, Xie Q, Li Z, Dong X (2017) Self-paced co-training. In: International Conference on Machine Learning, PMLR, pp 2275–2284
Du J, Ling CX, Zhou Z-H (2010) When does cotraining work in real data? IEEE Trans Knowl Data Eng 23:788–799
Chen M, Weinberger KQ, Chen Y (2011) Automatic feature decomposition for single view co-training. In: ICML
Goldman S, Zhou Y (2000) Enhancing supervised learning with unlabeled data. ICML, Citeseer, pp 327–334
Zhou Z-H, Li M (2005) Tri-training: Exploiting unlabeled data using three classifiers. IEEE Trans Knowl Data Eng 17:1529–1541
Wang W, Zhou Z-H (2010) A new analysis of co-training. In: ICML
Gao C, Zhou J, Miao D, Wen J, Yue X (2021) Three-way decision with co-training for partially labeled data. Inf Sci 544:500–518
Han T, Xie W, Zisserman A (2020) Self-supervised co-training for video representation learning. Adv Neural Inf Process Syst 33:5679–5690
Zhan W, Zhang M-L (2017) Inductive semi-supervised multi-label learning with co-training. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 1305–1314
Xing Y, Yu G, Domeniconi C, Wang J, Zhang Z (2018) Multi-label co-training. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence; 2018, pp 2882–2888.
Xu C, Zhao W, Zhao J, Guan Z, Song X, Li J (2022) Uncertainty-aware multi-view deep learning for internet of things applications. In: IEEE Transactions on Industrial Informatics, pp 1–12
Yin X, Shu T, Huang Q (2012) Semi-supervised fuzzy clustering with metric learning and entropy regularization. Knowl-Based Syst 35:304–311
Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344:1492–1496
Hou J, Pelillo M (2016) A new density kernel in density peak based clustering. In: 2016 23rd International Conference on Pattern Recognition (ICPR), IEEE, pp 468–473
Ding J, He X, Yuan J, Jiang B (2018) Automatic clustering based on density peak detection using generalized extreme value distribution. Soft Comput 22:2777–2796
Ma F, Meng D, Dong X, Yang Y (2020) Self-paced multi-view co-training. J Mach Learn Res 21:1–38
Kumbure MM, Luukka P, Collan M (2020) A new fuzzy k-nearest neighbor classifier based on the Bonferroni mean. Pattern Recogn Lett 140:172–178
Zhu Q, Feng J, Huang J (2016) Natural neighbor: a self-adaptive neighborhood method without parameter K. Pattern Recogn Lett 80:30–36
Cheng D, Zhu Q, Huang J, Wu Q, Yang L (2018) A novel cluster validity index based on local cores. IEEE Trans Neural Netw Learn Syst 30:985–999
Cheng D, Zhu Q, Huang J, Wu Q, Lijun Y (2019) Clustering with local density peaks-based minimum spanning tree. In: IEEE Transactions on Knowledge and Data Engineering
Huang J, Zhu Q, Yang L, Feng J (2016) A non-parameter outlier detection algorithm based on natural neighbor. Knowl-Based Syst 92:71–77
Wahid A, Annavarapu CSR (2021) NaNOD: a natural neighbour-based outlier detection algorithm. Neural Comput Appl 33:2107–2123
Yousef A, Charkari NM (2015) SFM: a novel sequence-based fusion method for disease genes identification and prioritization. J Theor Biol 383:12–19
Nikdelfaz O, Jalili S (2018) Disease genes prediction by HMM based PU-learning using gene expression profiles. J Biomed Inform 81:102–111
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
Acknowledgements
This work is supported in part by the National Natural Science Foundation of China under Grant 62006029, and the Natural Science Foundation of Chongqing under Grant cstc2020jcyj-msxmX0137, in part by Postdoctoral Innovative Talent Support Program of Chongqing under Grant CQBX2021024, in part by Natural Science Foundation of Chongqing (China) under Grant CSTB2022NSCQ-MSX0258, in part by Project of Chongqing Municipal Education Commission, China under Grant KJQN202001434.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Gong, Y., Wu, Q. & Cheng, D. A co-training method based on parameter-free and single-step unlabeled data selection strategy with natural neighbors. Int. J. Mach. Learn. & Cyber. 14, 2887–2902 (2023). https://doi.org/10.1007/s13042-023-01805-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-023-01805-w