Skip to main content
Log in

A co-training method based on parameter-free and single-step unlabeled data selection strategy with natural neighbors

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

As an effective semi-supervised learning algorithm, the co-training method trains two classifiers on two views independently. The unlabeled sample selection strategy in the self-labeled process is crucial for co-training. However, most of the existing strategies strongly depend on parameter settings and require re-calculating the confidence of unlabeled samples in each iteration. Inspired by the concept of natural neighbors introduced recently, a co-training method based on parameter-free and single-step unlabeled data selection strategy with natural neighbors (CT-NaN) is proposed in this paper. In CT-NaN, the confidence value of unlabeled samples is calculated in a parameter-free manner by analyzing the training data based on natural neighbors before the iteration of co-training, and it requires to be calculated only once in the whole process of co-training. Besides, CT-NaN is able to mitigate the negative effect of outliers because the training stops automatically when only outliers remain. Four groups of experiments with 22 data sets are conducted, and the results verify the effectiveness of CT-NaN when compared with 8 state-of-the-art co-training methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data availability

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

References

  1. Wang X, Lin X, Dang X (2020) Supervised learning in spiking neural networks: a review of algorithms and evaluations. Neural Netw 125:258–280

    Article  Google Scholar 

  2. Wang Y, Ye H, Zhang T, Zhang H (2019) A data mining method based on unsupervised learning and spatiotemporal analysis for sheath current monitoring. Neurocomputing 352:54–63

    Article  Google Scholar 

  3. Patwary MJ, Wang X-Z (2019) Sensitivity analysis on initial classifier accuracy in fuzziness based semi-supervised learning. Inf Sci 490:93–112

    Article  Google Scholar 

  4. Zhang X-Y, Shi H, Zhu X, Li P (2019) Active semi-supervised learning based on self-expressive correlation with generative adversarial networks. Neurocomputing 345:103–113

    Article  Google Scholar 

  5. Gu X (2020) A self-training hierarchical prototype-based approach for semi-supervised classification. Inf Sci 535:204–224

    Article  MathSciNet  Google Scholar 

  6. Li J, Zhu Q, Wu Q (2020) A parameter-free hybrid instance selection algorithm based on local sets with natural neighbors. Appl Intell 50:1527–1541

    Article  Google Scholar 

  7. Duan J, Luo B, Zeng J (2020) Semi-supervised learning with generative model for sentiment classification of stock messages. Expert Syst Appl 158:113540

    Article  Google Scholar 

  8. Dong A, Chung F-L, Deng Z, Wang S (2015) Semi-supervised SVM with extended hidden features. IEEE Trans Cybern 46:2924–2937

    Article  Google Scholar 

  9. Dornaika F, El Traboulsi Y (2019) Joint sparse graph and flexible embedding for graph-based semi-supervised learning. Neural Netw 114:91–95

    Article  Google Scholar 

  10. Triguero I, García S, Herrera F (2014) SEG-SSC: A framework based on synthetic examples generation for self-labeled semi-supervised classification. IEEE Trans Cybern 45:622–634

    Article  Google Scholar 

  11. Xu X, Li W, Xu D, Tsang IW (2015) Co-labeling for multi-view weakly labeled learning. IEEE Trans Pattern Anal Mach Intell 38:1113–1125

    Article  Google Scholar 

  12. Peng J, Estrada G, Pedersoli M, Desrosiers C (2020) Deep co-training for semi-supervised image segmentation. Pattern Recogn 107:107269

    Article  Google Scholar 

  13. Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. In: Proceedings of the Eleventh Annual Conference on Computational Learning Theory, pp 92–100

  14. Gan H, Sang N, Huang R, Tong X, Dan Z (2013) Using clustering analysis to improve semi-supervised classification. Neurocomputing 101:290–298

    Article  Google Scholar 

  15. Wu D, Shang M, Luo X, Xu J, Yan H, Deng W, Wang G (2018) Self-training semi-supervised classification based on density peaks of data. Neurocomputing 275:180–191

    Article  Google Scholar 

  16. Gong Y, Lu J (2019) Co-training method combined with semi-supervised clustering and weighted K-nearest neighbor. Comput Eng Appl 55:114–118

    Google Scholar 

  17. Gong Y, Lu J (2019) Co-training method combined active learning and density peaks clustering. Comput Appl 39:2297–2301

    Google Scholar 

  18. Lu J, Gong Y (2021) A co-training method based on entropy and multi-criteria. Appl Intell 51:3212–3225

    Article  Google Scholar 

  19. Nigam K, Ghani R (2000) Analyzing the effectiveness and applicability of co-training. In: Proceedings of the Ninth International Conference on Information and Knowledge Management, pp 86–93

  20. Zhang M-L, Zhou Z-H (2011) CoTrade: confident co-training with data editing. IEEE Trans Syst Man Cybern Part B (Cybern) 41:1612–1626

    Article  Google Scholar 

  21. Zhang Y, Wen J, Wang X, Jiang Z (2014) Semi-supervised learning combining co-training with active learning. Expert Syst Appl 41:2372–2378

    Article  Google Scholar 

  22. Azad PV, Yaslan Y (2017) Using co-training to empower active learning. In: 2017 25th Signal Processing and Communications Applications Conference (SIU), IEEE, pp 1–4

  23. Liu Z, Gao Z, Li X (2018) Co-training method based on margin sample addition. Chin J Sci Instrum 39:45–53

    Google Scholar 

  24. Ma F, Meng D, Xie Q, Li Z, Dong X (2017) Self-paced co-training. In: International Conference on Machine Learning, PMLR, pp 2275–2284

  25. Du J, Ling CX, Zhou Z-H (2010) When does cotraining work in real data? IEEE Trans Knowl Data Eng 23:788–799

    Article  Google Scholar 

  26. Chen M, Weinberger KQ, Chen Y (2011) Automatic feature decomposition for single view co-training. In: ICML

  27. Goldman S, Zhou Y (2000) Enhancing supervised learning with unlabeled data. ICML, Citeseer, pp 327–334

    Google Scholar 

  28. Zhou Z-H, Li M (2005) Tri-training: Exploiting unlabeled data using three classifiers. IEEE Trans Knowl Data Eng 17:1529–1541

    Article  Google Scholar 

  29. Wang W, Zhou Z-H (2010) A new analysis of co-training. In: ICML

  30. Gao C, Zhou J, Miao D, Wen J, Yue X (2021) Three-way decision with co-training for partially labeled data. Inf Sci 544:500–518

    Article  MathSciNet  MATH  Google Scholar 

  31. Han T, Xie W, Zisserman A (2020) Self-supervised co-training for video representation learning. Adv Neural Inf Process Syst 33:5679–5690

    Google Scholar 

  32. Zhan W, Zhang M-L (2017) Inductive semi-supervised multi-label learning with co-training. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 1305–1314

  33. Xing Y, Yu G, Domeniconi C, Wang J, Zhang Z (2018) Multi-label co-training. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence; 2018, pp 2882–2888.

  34. Xu C, Zhao W, Zhao J, Guan Z, Song X, Li J (2022) Uncertainty-aware multi-view deep learning for internet of things applications. In: IEEE Transactions on Industrial Informatics, pp 1–12

  35. Yin X, Shu T, Huang Q (2012) Semi-supervised fuzzy clustering with metric learning and entropy regularization. Knowl-Based Syst 35:304–311

    Article  Google Scholar 

  36. Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344:1492–1496

    Article  Google Scholar 

  37. Hou J, Pelillo M (2016) A new density kernel in density peak based clustering. In: 2016 23rd International Conference on Pattern Recognition (ICPR), IEEE, pp 468–473

  38. Ding J, He X, Yuan J, Jiang B (2018) Automatic clustering based on density peak detection using generalized extreme value distribution. Soft Comput 22:2777–2796

    Article  Google Scholar 

  39. Ma F, Meng D, Dong X, Yang Y (2020) Self-paced multi-view co-training. J Mach Learn Res 21:1–38

    MathSciNet  MATH  Google Scholar 

  40. Kumbure MM, Luukka P, Collan M (2020) A new fuzzy k-nearest neighbor classifier based on the Bonferroni mean. Pattern Recogn Lett 140:172–178

    Article  Google Scholar 

  41. Zhu Q, Feng J, Huang J (2016) Natural neighbor: a self-adaptive neighborhood method without parameter K. Pattern Recogn Lett 80:30–36

    Article  Google Scholar 

  42. Cheng D, Zhu Q, Huang J, Wu Q, Yang L (2018) A novel cluster validity index based on local cores. IEEE Trans Neural Netw Learn Syst 30:985–999

    Article  Google Scholar 

  43. Cheng D, Zhu Q, Huang J, Wu Q, Lijun Y (2019) Clustering with local density peaks-based minimum spanning tree. In: IEEE Transactions on Knowledge and Data Engineering

  44. Huang J, Zhu Q, Yang L, Feng J (2016) A non-parameter outlier detection algorithm based on natural neighbor. Knowl-Based Syst 92:71–77

    Article  Google Scholar 

  45. Wahid A, Annavarapu CSR (2021) NaNOD: a natural neighbour-based outlier detection algorithm. Neural Comput Appl 33:2107–2123

    Article  Google Scholar 

  46. Yousef A, Charkari NM (2015) SFM: a novel sequence-based fusion method for disease genes identification and prioritization. J Theor Biol 383:12–19

    Article  MathSciNet  Google Scholar 

  47. Nikdelfaz O, Jalili S (2018) Disease genes prediction by HMM based PU-learning using gene expression profiles. J Biomed Inform 81:102–111

    Article  Google Scholar 

  48. Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

This work is supported in part by the National Natural Science Foundation of China under Grant 62006029, and the Natural Science Foundation of Chongqing under Grant cstc2020jcyj-msxmX0137, in part by Postdoctoral Innovative Talent Support Program of Chongqing under Grant CQBX2021024, in part by Natural Science Foundation of Chongqing (China) under Grant CSTB2022NSCQ-MSX0258, in part by Project of Chongqing Municipal Education Commission, China under Grant KJQN202001434.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Quanwang Wu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gong, Y., Wu, Q. & Cheng, D. A co-training method based on parameter-free and single-step unlabeled data selection strategy with natural neighbors. Int. J. Mach. Learn. & Cyber. 14, 2887–2902 (2023). https://doi.org/10.1007/s13042-023-01805-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-023-01805-w

Keywords

Navigation