Skip to main content
Log in

A semi-supervised self-training method based on density peaks and natural neighbors

  • Original Research
  • Published:
Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Abstract

The semi-supervised self-training method is one of the successful methodologies of semi-supervised classification and can train a classifier by exploiting both labeled data and unlabeled data. However, most of the self-training methods are limited by the distribution of initial labeled data, heavily rely on parameters and have the poor ability of prediction in the self-training process. To solve these problems, a novel self-training method based on density peaks and natural neighbors (STDPNaN) is proposed. In STDPNaN, an improved parameter-free density peaks clustering (DPCNaN) is firstly presented by introducing natural neighbors. The DPCNaN can reveal the real structure and distribution of data without any parameter, and then helps STDPNaN restore the real data space with the spherical or non-spherical distribution. Also, an ensemble classifier is employed to improve the predictive ability of STDPNaN in the self-training process. Intensive experiments show that (a) STDPNaN outperforms state-of-the-art methods in improving classification accuracy of k nearest neighbor, support vector machine and classification and regression tree; (b) STDPNaN also outperforms comparison methods without any restriction on the number of labeled data; (c) the running time of STDPNaN is acceptable.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. https://archive.ics.uci.edu/ml/datasets.html.

References

  • Adankon MM, Cheriet M (2011) Help-training for semi-supervised support vector machines. Pattern Recogn 44(9):2220–2230

    Google Scholar 

  • Amorim WP, Carvalho MH (2016) Improving semi-supervised learning through optimum connectivity. Pattern Recogn 60:72–85

    Google Scholar 

  • Bentley JL (1975) Multidimensional binary search trees used for associative searching. Commun ACM 18(9):509–517

    MATH  Google Scholar 

  • Chang C, Lin C (2011) LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems & Technology. https://www.csie.ntu.edu.tw/~cjlin/libsvm/index.html

  • Chen X, Yu G, Tan Q, Wang J, Chen T (2019) Weighted samples based semi-supervised classification. Appl Soft Comput 79:46–85

    Google Scholar 

  • Cheng D, Zhu Q, Huang J, Yang L, Wu Q (2017) Natural neighbor-based clustering algorithm with local representatives. Knowl Based Syst 123(1):238–253

    Google Scholar 

  • Dalva D, Guz U, Gurkan H (2018) Effective semi-supervised learning strategies for automatic sentence segmentation. Pattern Recogn Lett 105(1):76–86

    Google Scholar 

  • Gan H, Tong X, Jiang Q, Sang N, Kong X, Wang F (2009) Discussion of FCM approaches. Algorithm with partial supervision. In: Proceedings of the eighth international symposium on distributed computing and applications to business, engineering and science, pp. 27–31

  • Gan H, Sang N, Huang R, Tong X, Dan Z (2013) Using clustering analysis to improve semi-supervised classification. Neurocomputing 101(4):290–298

    Google Scholar 

  • Gan H, Li Z, Wu W, Luo Z, Huang R (2018) Safety-aware graph-based semi-supervised learning. Expert Syst Appl 107(1):243–254

    Google Scholar 

  • Gan H, Fan Y, Luo Z, Huang R, Yang Z (2019) Confidence-weighted safe semi-supervised clustering. Eng Appl Artif Intell 81:107–116

    Google Scholar 

  • Gross T (2010) Towards a new human-centred computing methodology for cooperative ambient intelligence. J Ambient Intell Hum Comput 1(1):31–42

    Google Scholar 

  • Hajmohammadi MS, Ibrahim R, Selamat A, Fujita H (2015) Combination of active learning and self-training for cross-lingual sentiment classification with density analysis of unlabelled samples. Inf Sci 317(1):67–77

    Google Scholar 

  • Huang J, Zhu Q, Yang L, Feng J (2016) A non-parameter outlier detection algorithm based on natural neighbor. Knowl Based Syst 92(15):71–77

    Google Scholar 

  • Jang S, Jang Y, Kim Y, Yu H (2020) Input initialization for inversion of neural networks using k-nearest neighbor approach. Inf Sci 519:229–242

    MathSciNet  Google Scholar 

  • Joo-Chang K, Kyungyong C (2018) Neural-network based adaptive context prediction model for ambient intelligence. J Ambient Intell Hum Comput 11:1451–1458

    Google Scholar 

  • Kilinc O, Uysal I (2018) Gar: an efficient and scalable graph-based activity regularization for semi-supervised learning. Neurocomputing 296(28):46–54

    Google Scholar 

  • Kim D, Seo D, Cho S, Kang P (2019) Multi-co-training for document classification using various document representations: TF–IDF, LDA, and Doc2Vec. Inf Sci 477:15–29

    Google Scholar 

  • Le THN, Luu K, Zhu C, Savvides M (2017) Semi self-training beard/moustache detection and segmentation simultaneously. Image Vis Comput 58:214–223

    Google Scholar 

  • Levatić J, Ceci M, Kocev D, Džeroski S (2017) Self-training for multi-target regression with tree ensembles. Knowl Based Syst 123(1):41–60

    Google Scholar 

  • Li M, Zhou ZH (2005) SETRED: Self-training with editing, Pacific-Asia conference on advances in knowledge discovery and data mining. 611–621.

  • Li J, Zhu Q (2019) Semi-supervised self-training method based on an optimum-path forest. IEEE Access 7:36388–36399

    Google Scholar 

  • Li J, Zhu Q (2020) A boosting self-training framework based on instance generation with natural neighbors for K nearest neighbor. Appl Intell. https://doi.org/10.1007/s10489-020-01732-1

    Article  Google Scholar 

  • Li Y, Wang Y, Bi C (2018) Revisiting transductive support vector machines with margin distribution embedding. Knowl Based Syst 152(15):200–214

    Google Scholar 

  • Li J, Zhu Q, Wu Q (2019) A self-training method based on density peaks and an extended parameter-free local noise filter for k nearest neighbor. Knowl Based Syst. https://doi.org/10.1016/j.knosys.2019.104895

    Article  Google Scholar 

  • Li J, Zhu Q, Wu Q (2020a) A parameter-free hybrid instance selection algorithm based on local sets with natural neighbors. Appl Intell 50(15):1–15

    Google Scholar 

  • Li J, Zhu Q, Wu Q, Cheng D (2020b) An effective framework based on local cores for self-labeled semi-supervised classification. Knowl Based Syst. https://doi.org/10.1016/j.knosys.2020.105804

    Article  Google Scholar 

  • Liu ZG, Zhang Z, Liu Y, Dezert J, Pan Q (2019) A new pattern classification improvement method with local quality matrix based on k-nn. Knowl Based Syst 164(15):336–347

    Google Scholar 

  • López J, Maldonado S, Carrasco M (2019) Robust nonparallel support vector machines via second-order cone programming. Neurocomputing 364(28):227–238

    Google Scholar 

  • Lv M, Li Y, Chen L, Chen T (2019) Air quality estimation by exploiting terrain features and multi-view transfer semi-supervised regression. Inf Sci 483:82–95

    Google Scholar 

  • Muhlenbach F, Lallich S, Zighed D (2014) Identifying and handling mislabelled. J Intell Inf Syst 39:89–109

    Google Scholar 

  • Nigam K, Mccallum AK, Thrun S, Mitchell T (2000) Text classification from labeled and unlabeled documents using em. Mach Learn 39(2–3):103–134

    MATH  Google Scholar 

  • Pavlinek M, Podgorelec V (2017) Text classification method based on self-training and LDA topic models. Expert Syst Appl 80(1):83–93

    Google Scholar 

  • Pham BT, Prakash I, Bui DT (2018) Spatial prediction of landslides using a hybrid machine learning approach based on random subspace and classification and regression trees. Geomorphology 303(15):256–270

    Google Scholar 

  • Piroonsup N, Sinthupinyo S (2018) Analysis of training data using clustering to improve semi-supervised self-training. Knowl Based Syst 143(1):65–80

    Google Scholar 

  • Połap D (2019) Analysis of skin marks through the use of intelligent things. IEEE Access 7:149355–149363

    Google Scholar 

  • Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344:1492–1496

    Google Scholar 

  • Shi L, Ma X, Xi L, Duan Q, Zhao J (2011) Rough set and ensemble learning based semi-supervised algorithm for text classification. Expert Syst Appl 38(5):6300–6306

    Google Scholar 

  • Tanha J, Van Someren M, Afsarmanesh H (2017) Semi-supervised self-training for decision tree classifiers. Int J Mach Learn Cybern 8(1):355–370

    Google Scholar 

  • Triguero I, Sáez José A, Luengo J, Salvador G, Herrera F (2014) On the characterization of noise filters for self-training semi-supervised in nearest neighbor classification. Neurocomputing 132(20):30–41

    Google Scholar 

  • Triguero I, Garcia S, Herrera F (2015) Seg-ssc: a framework based on synthetic examples generation for self-labeled semi-supervised classification. IEEE Trans Cybern 45(4):622–634

    Google Scholar 

  • Vale KM, Canuto AM, de Medeiros Santos A, e Gorgônio FD, Tavares AD, Gorgnio AC, Alves CT (2018) Automatic adjustment of confidence values in self-training semi-supervised method. 2018 International joint conference on neural networks (IJCNN), pp 1–8

  • Vo DT, Bagheri E (2017) Self-training on refined clause patterns for relation extraction. Inf Process Manage 54(4):686–706

    Google Scholar 

  • Wang Y, Xu X, Zhao H, Hua Z (2010) Semi-supervised learning based on nearest neighbor rule and cut edges. Knowl Based Syst 23(6):547–554

    Google Scholar 

  • Wei Z, Wang H, Zhao R (2013) Semi-supervised multi-label image classification based on nearest neighbor editing. Neurocomputing 119(7):462–468

    Google Scholar 

  • Wilson DL (1972) Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans Syst Man Cybern 2(3):408–421

    MathSciNet  MATH  Google Scholar 

  • Wu D, Yan H, Shang M, Shan K, Wang G (2017) Water eutrophication evaluation based on semi-supervised classification: a case study in three gorges reservoir. Ecol Ind 81:362–372

    Google Scholar 

  • Wu D, Luo X, Wang G, Shang M, Yuan Y, Yan H (2018a) A highly accurate framework for self-labeled semisupervised classification in industrial applications. IEEE Trans Ind Inf 14(3):909–920

    Google Scholar 

  • Wu D, Shang MS, Luo X, Xu J, Yan HY, Deng WH, Wang GY (2018b) Self-training semi-supervised classification based on density peaks of data. Neurocomputing 275(31):180–191

    Google Scholar 

  • Yang L, Zhu Q, Huang J, Cheng D (2017) Adaptive edited natural neighbor algorithm. Neurocomputing 230:427–433

    Google Scholar 

  • Yarowsky D (1995) Unsupervised word sense disambiguation rivaling supervised methods, in Proc. 33rd Annu. Meeting Assoc. Comput. Linguistics, pp. 189–19

  • Zhang YC, Sakhanenko L (2019) The naive Bayes classifier for functional data. Stat Probab Lett 152:137–146

    MathSciNet  MATH  Google Scholar 

  • Zhang Z, Hu Z, Yang H, Zhu R, Zuo D (2018) Factorization machines and deep views-based co-training for improving answer quality prediction in online health expert question-answering services. J Biomed Inf 87:21–36

    Google Scholar 

  • Zhou ZH, Li M (2005) Tri-training: exploiting unlabeled data using three classifiers. IEEE Trans Knowl Data Eng 17(11):1529–1541

    Google Scholar 

  • Zhu Q, Feng J, Huang J (2016) Natural neighbor: a self-adaptive neighborhood method without parameter k. Pattern Recogn Lett 80(1):30–36

    Google Scholar 

Download references

Acknowledgements

The Science and Technology Project Affiliated to the Education Department of Chongqing Municipality (CYB20063).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Junnan Li.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhao, S., Li, J. A semi-supervised self-training method based on density peaks and natural neighbors. J Ambient Intell Human Comput 12, 2939–2953 (2021). https://doi.org/10.1007/s12652-020-02451-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12652-020-02451-8

Keywords

Navigation