Skip to main content
Log in

Data sanitization against label flipping attacks using AdaBoost-based semi-supervised learning technology

  • Data analytics and machine learning
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

The label flipping attack is a special poisoning attack in the adversarial environment. The research designed a novel label noise processing framework, the core of which is the semi-supervised learning label correction algorithm based on AdaBoost (AdaSSL). It can effectively improve the label quality of training data and improve the classification performance of the model. Based on five real UCI datasets, this study chose six classic machine learning algorithms (NB, LR, SVM, DT, KNN and MLP) as the base classifiers to classify them. With a noise level of 0\( \sim \)20%, we evaluated the classification effect of these classifiers on UCI datasets based on the entropy label flipping attack and the AdaSSL defense algorithm. The experimental results show that the AdaSSL algorithm can effectively improve the robustness of the classifier against label flipping attack. Compared with the most advanced semi-supervised defense algorithm in the literature, the algorithm does not need to use additional datasets. At a noise ratio of 10%, the AdaSSL algorithm is significantly better than state-of-the-art label noise defense technology.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Altınel B, Ganiz MC (2016) A new hybrid semi-supervised algorithm for text classification with class-based semantics. Knowl-Based Syst 108:50–64

    Article  Google Scholar 

  • Barreno M, Nelson B, Sears R, Joseph AD, Tygar JD (2006) Can machine learning be secure? In: Proceedings of the 2006 ACM symposium on information, computer and communications security, pp 16–25

  • Bhagoji AN, Cullina D, Mittal P(2017) Dimensionality reduction as a defense against evasion attacks on machine learning classifiers. arXiv:1704.026542

  • Biggio B, Nelson B, Laskov P (2011) Support vector machines under adversarial label noise. In: Asian conference on machine learning, pp 97–112

  • Biggio B, Nelson B, Laskov P (2012) Poisoning attacks against support vector machines. arXiv:1206.6389

  • Chan PP, He ZM, Li H, Hsu CC (2018) Data sanitization against adversarial label contamination based on data complexity. Int J Mach Learn Cybern 9(6):1039–1052

    Article  Google Scholar 

  • Demidova L, Klyueva I, Sokolova Y, Stepanov N, Tyart N (2017) Intellectual approaches to improvement of the classification decisions quality on the base of the svm classifier. Procedia Comput Sci 103:222–230

    Article  Google Scholar 

  • Diab DM, El Hindi KM (2017) Using differential evolution for fine tuning naïve bayesian classifiers and its application for text classification. Appl Soft Comput 54:183–199

    Article  Google Scholar 

  • Frénay B, Verleysen M (2013) Classification in the presence of label noise: a survey. IEEE Tran Neural Netw Learn Syst 25(5):845–869

    Article  Google Scholar 

  • Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139

    Article  MathSciNet  Google Scholar 

  • Ghosh A, Kumar H, Sastry P (2017) Robust loss functions under label noise for deep neural networks. In: Proceedings of the AAAI conference on artificial intelligence, vol 31

  • Ghosh A, Manwani N, Sastry P (2015) Making risk minimization tolerant to label noise. Neurocomputing 160:93–107

    Article  Google Scholar 

  • Gupta V et al (2011) Recent trends in text classification techniques. Int J Comput Appl 35(6):45–51

    Google Scholar 

  • Kotsiantis SB (2013) Decision trees: a recent overview. Artif Intell Rev 39(4):261–283

    Article  Google Scholar 

  • Li B, Gao Q (2019) Improving data quality with label noise correction. Intell Data Anal 23(4):737–757

    Article  Google Scholar 

  • Liu H, Ditzler G (2019) Data poisoning attacks against mrmr. In: ICASSP 2019-2019 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 2517–2521. IEEE

  • Lukasik M, Bhojanapalli S, Menon AK, Kumar S (2020) Does label smoothing mitigate label noise? arXiv:2003.02819

  • Muñoz-González L, Biggio B, Demontis A, Paudice A, Wongrassamee V, Lupu EC, Roli F (2017) Towards poisoning of deep learning algorithms with back-gradient optimization. In: Proceedings of the 10th ACM workshop on artificial intelligence and security, pp 27–38

  • Nicholson B, Sheng VS, Zhang J (2016) Label noise correction and application in crowdsourcing. Expert Syst Appl 66:149–162

    Article  Google Scholar 

  • Paudice A, Muñoz-González L, Lupu EC (2018) Label sanitization against label flipping poisoning attacks. In: Joint European conference on machine learning and knowledge discovery in databases, pp 5–15. Springer

  • Samami M, Akbari E, Abdar M, Plawiak P, Nematzadeh H, Basiri ME, Makarenkov V (2020) A mixed solution-based high agreement filtering method for class noise detection in binary classification. Phys A Stat Mech Appl 553:124219

  • Shanthini A, Vinodhini G, Chandrasekaran R, Supraja P (2019) A taxonomy on impact of label noise and feature noise using machine learning techniques. Soft Comput 23(18):8597–8607. https://doi.org/10.1007/s00500-019-03968-7

    Article  Google Scholar 

  • Sharma K, Donmez P, Luo E, Liu Y, Yalniz IZ (2020) Noiserank: unsupervised label noise reduction with dependence models. arXiv:2003.06729

  • Sluban B, Lavrač N (2015) Relating ensemble diversity and performance: a study in class noise detection. Neurocomputing 160:120–131

    Article  Google Scholar 

  • Taheri R, Javidan R, Shojafar M, Pooranian Z, Miri A, Conti M (2020) On defending against label flipping attacks on malware detection systems. Neural Comput Appl 32:14781–14800

    Article  Google Scholar 

  • Thangaraj M, Sivakami M (2018) Text classification techniques: a literature review. Interdiscip J Inf Knowl Manag 13:117–135

    Google Scholar 

  • Xiao H, Biggio B, Brown G, Fumera G, Eckert C, Roli F (2015) Is feature selection secure against training data poisoning? In: International conference on machine learning, pp 1689–1698

  • Xiao H, Biggio B, Nelson B, Xiao H, Eckert C, Roli F (2015) Support vector machines under adversarial label contamination. Neurocomputing 160:53–62

    Article  Google Scholar 

  • Yan Y, Xu Z, Tsang I, Long G, Yang Y (2016) Robust semi-supervised learning through label aggregation. In: Proceedings of the AAAI conference on artificial intelligence, vol 30

  • Yen SJ, Lee YS, Ying JC, Wu YC (2011) A logistic regression-based smoothing method for Chinese text categorization. Expert Syst Appl 38(9):11581–11590

    Article  Google Scholar 

  • Zhang H, Cheng N, Zhang Y, Li Z (2021) Label flipping attacks against Naive Bayes on spam filtering systems. Appl Intell 51:4503–4514

    Article  Google Scholar 

  • Zhang J, Sheng VS, Li T, Wu X (2017) Improving crowdsourced label quality using noise correction. IEEE Trans Neural Netw Learn Syst 29(5):1675–1688

    Article  MathSciNet  Google Scholar 

  • Zhang Z, Sabuncu M (2018) Generalized cross entropy loss for training deep neural networks with noisy labels. Adv Neural Inf Process Syst 31:8778–8788

    Google Scholar 

Download references

Acknowledgements

This research was partly supported by the Key R&D and promotion projects of Henan Province (Technological research) (Grant No. 212102210143), the Key Science and Technology Project of Xinjiang Production and Construction Corps (Grant No. 2018AB017) and the Key Research, Development, and Dissemination Program of Henan Province (Science and Technology for the People) (Grant No. 182207310002).

Author information

Authors and Affiliations

Authors

Contributions

Hongpo Zhang contributed to the conception of the study; Ning Cheng performed the experiment; Hongpo Zhang contributed significantly to analysis and manuscript preparation; Ning Cheng performed the data analyses and wrote the manuscript; Zhanbo Li helped perform the analysis with constructive discussions.

Corresponding authors

Correspondence to Hongpo Zhang or Zhanbo Li.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cheng, N., Zhang, H. & Li, Z. Data sanitization against label flipping attacks using AdaBoost-based semi-supervised learning technology. Soft Comput 25, 14573–14581 (2021). https://doi.org/10.1007/s00500-021-06384-y

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-021-06384-y

Keywords

Navigation