Abstract
The semi-supervised self-training method is one of the successful methodologies of semi-supervised classification and can train a classifier by exploiting both labeled data and unlabeled data. However, most of the self-training methods are limited by the distribution of initial labeled data, heavily rely on parameters and have the poor ability of prediction in the self-training process. To solve these problems, a novel self-training method based on density peaks and natural neighbors (STDPNaN) is proposed. In STDPNaN, an improved parameter-free density peaks clustering (DPCNaN) is firstly presented by introducing natural neighbors. The DPCNaN can reveal the real structure and distribution of data without any parameter, and then helps STDPNaN restore the real data space with the spherical or non-spherical distribution. Also, an ensemble classifier is employed to improve the predictive ability of STDPNaN in the self-training process. Intensive experiments show that (a) STDPNaN outperforms state-of-the-art methods in improving classification accuracy of k nearest neighbor, support vector machine and classification and regression tree; (b) STDPNaN also outperforms comparison methods without any restriction on the number of labeled data; (c) the running time of STDPNaN is acceptable.
Similar content being viewed by others
References
Adankon MM, Cheriet M (2011) Help-training for semi-supervised support vector machines. Pattern Recogn 44(9):2220–2230
Amorim WP, Carvalho MH (2016) Improving semi-supervised learning through optimum connectivity. Pattern Recogn 60:72–85
Bentley JL (1975) Multidimensional binary search trees used for associative searching. Commun ACM 18(9):509–517
Chang C, Lin C (2011) LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems & Technology. https://www.csie.ntu.edu.tw/~cjlin/libsvm/index.html
Chen X, Yu G, Tan Q, Wang J, Chen T (2019) Weighted samples based semi-supervised classification. Appl Soft Comput 79:46–85
Cheng D, Zhu Q, Huang J, Yang L, Wu Q (2017) Natural neighbor-based clustering algorithm with local representatives. Knowl Based Syst 123(1):238–253
Dalva D, Guz U, Gurkan H (2018) Effective semi-supervised learning strategies for automatic sentence segmentation. Pattern Recogn Lett 105(1):76–86
Gan H, Tong X, Jiang Q, Sang N, Kong X, Wang F (2009) Discussion of FCM approaches. Algorithm with partial supervision. In: Proceedings of the eighth international symposium on distributed computing and applications to business, engineering and science, pp. 27–31
Gan H, Sang N, Huang R, Tong X, Dan Z (2013) Using clustering analysis to improve semi-supervised classification. Neurocomputing 101(4):290–298
Gan H, Li Z, Wu W, Luo Z, Huang R (2018) Safety-aware graph-based semi-supervised learning. Expert Syst Appl 107(1):243–254
Gan H, Fan Y, Luo Z, Huang R, Yang Z (2019) Confidence-weighted safe semi-supervised clustering. Eng Appl Artif Intell 81:107–116
Gross T (2010) Towards a new human-centred computing methodology for cooperative ambient intelligence. J Ambient Intell Hum Comput 1(1):31–42
Hajmohammadi MS, Ibrahim R, Selamat A, Fujita H (2015) Combination of active learning and self-training for cross-lingual sentiment classification with density analysis of unlabelled samples. Inf Sci 317(1):67–77
Huang J, Zhu Q, Yang L, Feng J (2016) A non-parameter outlier detection algorithm based on natural neighbor. Knowl Based Syst 92(15):71–77
Jang S, Jang Y, Kim Y, Yu H (2020) Input initialization for inversion of neural networks using k-nearest neighbor approach. Inf Sci 519:229–242
Joo-Chang K, Kyungyong C (2018) Neural-network based adaptive context prediction model for ambient intelligence. J Ambient Intell Hum Comput 11:1451–1458
Kilinc O, Uysal I (2018) Gar: an efficient and scalable graph-based activity regularization for semi-supervised learning. Neurocomputing 296(28):46–54
Kim D, Seo D, Cho S, Kang P (2019) Multi-co-training for document classification using various document representations: TF–IDF, LDA, and Doc2Vec. Inf Sci 477:15–29
Le THN, Luu K, Zhu C, Savvides M (2017) Semi self-training beard/moustache detection and segmentation simultaneously. Image Vis Comput 58:214–223
Levatić J, Ceci M, Kocev D, Džeroski S (2017) Self-training for multi-target regression with tree ensembles. Knowl Based Syst 123(1):41–60
Li M, Zhou ZH (2005) SETRED: Self-training with editing, Pacific-Asia conference on advances in knowledge discovery and data mining. 611–621.
Li J, Zhu Q (2019) Semi-supervised self-training method based on an optimum-path forest. IEEE Access 7:36388–36399
Li J, Zhu Q (2020) A boosting self-training framework based on instance generation with natural neighbors for K nearest neighbor. Appl Intell. https://doi.org/10.1007/s10489-020-01732-1
Li Y, Wang Y, Bi C (2018) Revisiting transductive support vector machines with margin distribution embedding. Knowl Based Syst 152(15):200–214
Li J, Zhu Q, Wu Q (2019) A self-training method based on density peaks and an extended parameter-free local noise filter for k nearest neighbor. Knowl Based Syst. https://doi.org/10.1016/j.knosys.2019.104895
Li J, Zhu Q, Wu Q (2020a) A parameter-free hybrid instance selection algorithm based on local sets with natural neighbors. Appl Intell 50(15):1–15
Li J, Zhu Q, Wu Q, Cheng D (2020b) An effective framework based on local cores for self-labeled semi-supervised classification. Knowl Based Syst. https://doi.org/10.1016/j.knosys.2020.105804
Liu ZG, Zhang Z, Liu Y, Dezert J, Pan Q (2019) A new pattern classification improvement method with local quality matrix based on k-nn. Knowl Based Syst 164(15):336–347
López J, Maldonado S, Carrasco M (2019) Robust nonparallel support vector machines via second-order cone programming. Neurocomputing 364(28):227–238
Lv M, Li Y, Chen L, Chen T (2019) Air quality estimation by exploiting terrain features and multi-view transfer semi-supervised regression. Inf Sci 483:82–95
Muhlenbach F, Lallich S, Zighed D (2014) Identifying and handling mislabelled. J Intell Inf Syst 39:89–109
Nigam K, Mccallum AK, Thrun S, Mitchell T (2000) Text classification from labeled and unlabeled documents using em. Mach Learn 39(2–3):103–134
Pavlinek M, Podgorelec V (2017) Text classification method based on self-training and LDA topic models. Expert Syst Appl 80(1):83–93
Pham BT, Prakash I, Bui DT (2018) Spatial prediction of landslides using a hybrid machine learning approach based on random subspace and classification and regression trees. Geomorphology 303(15):256–270
Piroonsup N, Sinthupinyo S (2018) Analysis of training data using clustering to improve semi-supervised self-training. Knowl Based Syst 143(1):65–80
Połap D (2019) Analysis of skin marks through the use of intelligent things. IEEE Access 7:149355–149363
Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344:1492–1496
Shi L, Ma X, Xi L, Duan Q, Zhao J (2011) Rough set and ensemble learning based semi-supervised algorithm for text classification. Expert Syst Appl 38(5):6300–6306
Tanha J, Van Someren M, Afsarmanesh H (2017) Semi-supervised self-training for decision tree classifiers. Int J Mach Learn Cybern 8(1):355–370
Triguero I, Sáez José A, Luengo J, Salvador G, Herrera F (2014) On the characterization of noise filters for self-training semi-supervised in nearest neighbor classification. Neurocomputing 132(20):30–41
Triguero I, Garcia S, Herrera F (2015) Seg-ssc: a framework based on synthetic examples generation for self-labeled semi-supervised classification. IEEE Trans Cybern 45(4):622–634
Vale KM, Canuto AM, de Medeiros Santos A, e Gorgônio FD, Tavares AD, Gorgnio AC, Alves CT (2018) Automatic adjustment of confidence values in self-training semi-supervised method. 2018 International joint conference on neural networks (IJCNN), pp 1–8
Vo DT, Bagheri E (2017) Self-training on refined clause patterns for relation extraction. Inf Process Manage 54(4):686–706
Wang Y, Xu X, Zhao H, Hua Z (2010) Semi-supervised learning based on nearest neighbor rule and cut edges. Knowl Based Syst 23(6):547–554
Wei Z, Wang H, Zhao R (2013) Semi-supervised multi-label image classification based on nearest neighbor editing. Neurocomputing 119(7):462–468
Wilson DL (1972) Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans Syst Man Cybern 2(3):408–421
Wu D, Yan H, Shang M, Shan K, Wang G (2017) Water eutrophication evaluation based on semi-supervised classification: a case study in three gorges reservoir. Ecol Ind 81:362–372
Wu D, Luo X, Wang G, Shang M, Yuan Y, Yan H (2018a) A highly accurate framework for self-labeled semisupervised classification in industrial applications. IEEE Trans Ind Inf 14(3):909–920
Wu D, Shang MS, Luo X, Xu J, Yan HY, Deng WH, Wang GY (2018b) Self-training semi-supervised classification based on density peaks of data. Neurocomputing 275(31):180–191
Yang L, Zhu Q, Huang J, Cheng D (2017) Adaptive edited natural neighbor algorithm. Neurocomputing 230:427–433
Yarowsky D (1995) Unsupervised word sense disambiguation rivaling supervised methods, in Proc. 33rd Annu. Meeting Assoc. Comput. Linguistics, pp. 189–19
Zhang YC, Sakhanenko L (2019) The naive Bayes classifier for functional data. Stat Probab Lett 152:137–146
Zhang Z, Hu Z, Yang H, Zhu R, Zuo D (2018) Factorization machines and deep views-based co-training for improving answer quality prediction in online health expert question-answering services. J Biomed Inf 87:21–36
Zhou ZH, Li M (2005) Tri-training: exploiting unlabeled data using three classifiers. IEEE Trans Knowl Data Eng 17(11):1529–1541
Zhu Q, Feng J, Huang J (2016) Natural neighbor: a self-adaptive neighborhood method without parameter k. Pattern Recogn Lett 80(1):30–36
Acknowledgements
The Science and Technology Project Affiliated to the Education Department of Chongqing Municipality (CYB20063).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhao, S., Li, J. A semi-supervised self-training method based on density peaks and natural neighbors. J Ambient Intell Human Comput 12, 2939–2953 (2021). https://doi.org/10.1007/s12652-020-02451-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12652-020-02451-8