A semi-supervised self-training method based on density peaks and natural neighbors

Zhao, Suwen; Li, Junnan

doi:10.1007/s12652-020-02451-8

A semi-supervised self-training method based on density peaks and natural neighbors

Original Research
Published: 08 August 2020

Volume 12, pages 2939–2953, (2021)
Cite this article

Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

407 Accesses
11 Citations
Explore all metrics

Abstract

The semi-supervised self-training method is one of the successful methodologies of semi-supervised classification and can train a classifier by exploiting both labeled data and unlabeled data. However, most of the self-training methods are limited by the distribution of initial labeled data, heavily rely on parameters and have the poor ability of prediction in the self-training process. To solve these problems, a novel self-training method based on density peaks and natural neighbors (STDPNaN) is proposed. In STDPNaN, an improved parameter-free density peaks clustering (DPCNaN) is firstly presented by introducing natural neighbors. The DPCNaN can reveal the real structure and distribution of data without any parameter, and then helps STDPNaN restore the real data space with the spherical or non-spherical distribution. Also, an ensemble classifier is employed to improve the predictive ability of STDPNaN in the self-training process. Intensive experiments show that (a) STDPNaN outperforms state-of-the-art methods in improving classification accuracy of k nearest neighbor, support vector machine and classification and regression tree; (b) STDPNaN also outperforms comparison methods without any restriction on the number of labeled data; (c) the running time of STDPNaN is acceptable.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improving self-training with density peaks of data and cut edge weight statistic

Article 04 April 2020

A unified view of density-based methods for semi-supervised clustering and classification

Article Open access 01 November 2019

Research Progress on Semi-Supervised Clustering

Article 17 July 2019

Notes

https://archive.ics.uci.edu/ml/datasets.html.

References

Adankon MM, Cheriet M (2011) Help-training for semi-supervised support vector machines. Pattern Recogn 44(9):2220–2230
Google Scholar
Amorim WP, Carvalho MH (2016) Improving semi-supervised learning through optimum connectivity. Pattern Recogn 60:72–85
Google Scholar
Bentley JL (1975) Multidimensional binary search trees used for associative searching. Commun ACM 18(9):509–517
MATH Google Scholar
Chang C, Lin C (2011) LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems & Technology. https://www.csie.ntu.edu.tw/~cjlin/libsvm/index.html
Chen X, Yu G, Tan Q, Wang J, Chen T (2019) Weighted samples based semi-supervised classification. Appl Soft Comput 79:46–85
Google Scholar
Cheng D, Zhu Q, Huang J, Yang L, Wu Q (2017) Natural neighbor-based clustering algorithm with local representatives. Knowl Based Syst 123(1):238–253
Google Scholar
Dalva D, Guz U, Gurkan H (2018) Effective semi-supervised learning strategies for automatic sentence segmentation. Pattern Recogn Lett 105(1):76–86
Google Scholar
Gan H, Tong X, Jiang Q, Sang N, Kong X, Wang F (2009) Discussion of FCM approaches. Algorithm with partial supervision. In: Proceedings of the eighth international symposium on distributed computing and applications to business, engineering and science, pp. 27–31
Gan H, Sang N, Huang R, Tong X, Dan Z (2013) Using clustering analysis to improve semi-supervised classification. Neurocomputing 101(4):290–298
Google Scholar
Gan H, Li Z, Wu W, Luo Z, Huang R (2018) Safety-aware graph-based semi-supervised learning. Expert Syst Appl 107(1):243–254
Google Scholar
Gan H, Fan Y, Luo Z, Huang R, Yang Z (2019) Confidence-weighted safe semi-supervised clustering. Eng Appl Artif Intell 81:107–116
Google Scholar
Gross T (2010) Towards a new human-centred computing methodology for cooperative ambient intelligence. J Ambient Intell Hum Comput 1(1):31–42
Google Scholar
Hajmohammadi MS, Ibrahim R, Selamat A, Fujita H (2015) Combination of active learning and self-training for cross-lingual sentiment classification with density analysis of unlabelled samples. Inf Sci 317(1):67–77
Google Scholar
Huang J, Zhu Q, Yang L, Feng J (2016) A non-parameter outlier detection algorithm based on natural neighbor. Knowl Based Syst 92(15):71–77
Google Scholar
Jang S, Jang Y, Kim Y, Yu H (2020) Input initialization for inversion of neural networks using k-nearest neighbor approach. Inf Sci 519:229–242
MathSciNet Google Scholar
Joo-Chang K, Kyungyong C (2018) Neural-network based adaptive context prediction model for ambient intelligence. J Ambient Intell Hum Comput 11:1451–1458
Google Scholar
Kilinc O, Uysal I (2018) Gar: an efficient and scalable graph-based activity regularization for semi-supervised learning. Neurocomputing 296(28):46–54
Google Scholar
Kim D, Seo D, Cho S, Kang P (2019) Multi-co-training for document classification using various document representations: TF–IDF, LDA, and Doc2Vec. Inf Sci 477:15–29
Google Scholar
Le THN, Luu K, Zhu C, Savvides M (2017) Semi self-training beard/moustache detection and segmentation simultaneously. Image Vis Comput 58:214–223
Google Scholar
Levatić J, Ceci M, Kocev D, Džeroski S (2017) Self-training for multi-target regression with tree ensembles. Knowl Based Syst 123(1):41–60
Google Scholar
Li M, Zhou ZH (2005) SETRED: Self-training with editing, Pacific-Asia conference on advances in knowledge discovery and data mining. 611–621.
Li J, Zhu Q (2019) Semi-supervised self-training method based on an optimum-path forest. IEEE Access 7:36388–36399
Google Scholar
Li J, Zhu Q (2020) A boosting self-training framework based on instance generation with natural neighbors for K nearest neighbor. Appl Intell. https://doi.org/10.1007/s10489-020-01732-1
Article Google Scholar
Li Y, Wang Y, Bi C (2018) Revisiting transductive support vector machines with margin distribution embedding. Knowl Based Syst 152(15):200–214
Google Scholar
Li J, Zhu Q, Wu Q (2019) A self-training method based on density peaks and an extended parameter-free local noise filter for k nearest neighbor. Knowl Based Syst. https://doi.org/10.1016/j.knosys.2019.104895
Article Google Scholar
Li J, Zhu Q, Wu Q (2020a) A parameter-free hybrid instance selection algorithm based on local sets with natural neighbors. Appl Intell 50(15):1–15
Google Scholar
Li J, Zhu Q, Wu Q, Cheng D (2020b) An effective framework based on local cores for self-labeled semi-supervised classification. Knowl Based Syst. https://doi.org/10.1016/j.knosys.2020.105804
Article Google Scholar
Liu ZG, Zhang Z, Liu Y, Dezert J, Pan Q (2019) A new pattern classification improvement method with local quality matrix based on k-nn. Knowl Based Syst 164(15):336–347
Google Scholar
López J, Maldonado S, Carrasco M (2019) Robust nonparallel support vector machines via second-order cone programming. Neurocomputing 364(28):227–238
Google Scholar
Lv M, Li Y, Chen L, Chen T (2019) Air quality estimation by exploiting terrain features and multi-view transfer semi-supervised regression. Inf Sci 483:82–95
Google Scholar
Muhlenbach F, Lallich S, Zighed D (2014) Identifying and handling mislabelled. J Intell Inf Syst 39:89–109
Google Scholar
Nigam K, Mccallum AK, Thrun S, Mitchell T (2000) Text classification from labeled and unlabeled documents using em. Mach Learn 39(2–3):103–134
MATH Google Scholar
Pavlinek M, Podgorelec V (2017) Text classification method based on self-training and LDA topic models. Expert Syst Appl 80(1):83–93
Google Scholar
Pham BT, Prakash I, Bui DT (2018) Spatial prediction of landslides using a hybrid machine learning approach based on random subspace and classification and regression trees. Geomorphology 303(15):256–270
Google Scholar
Piroonsup N, Sinthupinyo S (2018) Analysis of training data using clustering to improve semi-supervised self-training. Knowl Based Syst 143(1):65–80
Google Scholar
Połap D (2019) Analysis of skin marks through the use of intelligent things. IEEE Access 7:149355–149363
Google Scholar
Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344:1492–1496
Google Scholar
Shi L, Ma X, Xi L, Duan Q, Zhao J (2011) Rough set and ensemble learning based semi-supervised algorithm for text classification. Expert Syst Appl 38(5):6300–6306
Google Scholar
Tanha J, Van Someren M, Afsarmanesh H (2017) Semi-supervised self-training for decision tree classifiers. Int J Mach Learn Cybern 8(1):355–370
Google Scholar
Triguero I, Sáez José A, Luengo J, Salvador G, Herrera F (2014) On the characterization of noise filters for self-training semi-supervised in nearest neighbor classification. Neurocomputing 132(20):30–41
Google Scholar
Triguero I, Garcia S, Herrera F (2015) Seg-ssc: a framework based on synthetic examples generation for self-labeled semi-supervised classification. IEEE Trans Cybern 45(4):622–634
Google Scholar
Vale KM, Canuto AM, de Medeiros Santos A, e Gorgônio FD, Tavares AD, Gorgnio AC, Alves CT (2018) Automatic adjustment of confidence values in self-training semi-supervised method. 2018 International joint conference on neural networks (IJCNN), pp 1–8
Vo DT, Bagheri E (2017) Self-training on refined clause patterns for relation extraction. Inf Process Manage 54(4):686–706
Google Scholar
Wang Y, Xu X, Zhao H, Hua Z (2010) Semi-supervised learning based on nearest neighbor rule and cut edges. Knowl Based Syst 23(6):547–554
Google Scholar
Wei Z, Wang H, Zhao R (2013) Semi-supervised multi-label image classification based on nearest neighbor editing. Neurocomputing 119(7):462–468
Google Scholar
Wilson DL (1972) Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans Syst Man Cybern 2(3):408–421
MathSciNet MATH Google Scholar
Wu D, Yan H, Shang M, Shan K, Wang G (2017) Water eutrophication evaluation based on semi-supervised classification: a case study in three gorges reservoir. Ecol Ind 81:362–372
Google Scholar
Wu D, Luo X, Wang G, Shang M, Yuan Y, Yan H (2018a) A highly accurate framework for self-labeled semisupervised classification in industrial applications. IEEE Trans Ind Inf 14(3):909–920
Google Scholar
Wu D, Shang MS, Luo X, Xu J, Yan HY, Deng WH, Wang GY (2018b) Self-training semi-supervised classification based on density peaks of data. Neurocomputing 275(31):180–191
Google Scholar
Yang L, Zhu Q, Huang J, Cheng D (2017) Adaptive edited natural neighbor algorithm. Neurocomputing 230:427–433
Google Scholar
Yarowsky D (1995) Unsupervised word sense disambiguation rivaling supervised methods, in Proc. 33rd Annu. Meeting Assoc. Comput. Linguistics, pp. 189–19
Zhang YC, Sakhanenko L (2019) The naive Bayes classifier for functional data. Stat Probab Lett 152:137–146
MathSciNet MATH Google Scholar
Zhang Z, Hu Z, Yang H, Zhu R, Zuo D (2018) Factorization machines and deep views-based co-training for improving answer quality prediction in online health expert question-answering services. J Biomed Inf 87:21–36
Google Scholar
Zhou ZH, Li M (2005) Tri-training: exploiting unlabeled data using three classifiers. IEEE Trans Knowl Data Eng 17(11):1529–1541
Google Scholar
Zhu Q, Feng J, Huang J (2016) Natural neighbor: a self-adaptive neighborhood method without parameter k. Pattern Recogn Lett 80(1):30–36
Google Scholar

Download references

Acknowledgements

The Science and Technology Project Affiliated to the Education Department of Chongqing Municipality (CYB20063).

Author information

Authors and Affiliations

College of Bioengineering, Chongqing University, Chongqing, 400044, China
Suwen Zhao
Department of Electronic Engineering, Guilin University of Aerospace Technology, Guilin, 541004, China
Suwen Zhao
Chongqing Key Laboratory of Software Theory and Technology, College of Computer Science, Chongqing University, Chongqing, 400044, China
Junnan Li

Authors

Suwen Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Junnan Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Junnan Li.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhao, S., Li, J. A semi-supervised self-training method based on density peaks and natural neighbors. J Ambient Intell Human Comput 12, 2939–2953 (2021). https://doi.org/10.1007/s12652-020-02451-8

Download citation

Received: 14 May 2020
Accepted: 01 August 2020
Published: 08 August 2020
Issue Date: February 2021
DOI: https://doi.org/10.1007/s12652-020-02451-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A semi-supervised self-training method based on density peaks and natural neighbors

Abstract

Access this article

Similar content being viewed by others

Improving self-training with density peaks of data and cut edge weight statistic

A unified view of density-based methods for semi-supervised clustering and classification

Research Progress on Semi-Supervised Clustering

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A semi-supervised self-training method based on density peaks and natural neighbors

Abstract

Access this article

Similar content being viewed by others

Improving self-training with density peaks of data and cut edge weight statistic

A unified view of density-based methods for semi-supervised clustering and classification

Research Progress on Semi-Supervised Clustering

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation