Data sanitization against label flipping attacks using AdaBoost-based semi-supervised learning technology

Cheng, Ning; Zhang, Hongpo; Li, Zhanbo

doi:10.1007/s00500-021-06384-y

Data sanitization against label flipping attacks using AdaBoost-based semi-supervised learning technology

Data analytics and machine learning
Published: 18 October 2021

Volume 25, pages 14573–14581, (2021)
Cite this article

Soft Computing Aims and scope Submit manuscript

427 Accesses
6 Citations
Explore all metrics

Abstract

The label flipping attack is a special poisoning attack in the adversarial environment. The research designed a novel label noise processing framework, the core of which is the semi-supervised learning label correction algorithm based on AdaBoost (AdaSSL). It can effectively improve the label quality of training data and improve the classification performance of the model. Based on five real UCI datasets, this study chose six classic machine learning algorithms (NB, LR, SVM, DT, KNN and MLP) as the base classifiers to classify them. With a noise level of 0\( \sim \)20%, we evaluated the classification effect of these classifiers on UCI datasets based on the entropy label flipping attack and the AdaSSL defense algorithm. The experimental results show that the AdaSSL algorithm can effectively improve the robustness of the classifier against label flipping attack. Compared with the most advanced semi-supervised defense algorithm in the literature, the algorithm does not need to use additional datasets. At a noise ratio of 10%, the AdaSSL algorithm is significantly better than state-of-the-art label noise defense technology.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Data sanitization against adversarial label contamination based on data complexity

Article 24 January 2017

Patrick P. K. Chan, Zhi-Min He, … Chien-Chang Hsu

A Label Flipping Attack on Machine Learning Model and Its Defense Mechanism

Deep k-NN Defense Against Clean-Label Data Poisoning Attacks

References

Altınel B, Ganiz MC (2016) A new hybrid semi-supervised algorithm for text classification with class-based semantics. Knowl-Based Syst 108:50–64
Article Google Scholar
Barreno M, Nelson B, Sears R, Joseph AD, Tygar JD (2006) Can machine learning be secure? In: Proceedings of the 2006 ACM symposium on information, computer and communications security, pp 16–25
Bhagoji AN, Cullina D, Mittal P(2017) Dimensionality reduction as a defense against evasion attacks on machine learning classifiers. arXiv:1704.026542
Biggio B, Nelson B, Laskov P (2011) Support vector machines under adversarial label noise. In: Asian conference on machine learning, pp 97–112
Biggio B, Nelson B, Laskov P (2012) Poisoning attacks against support vector machines. arXiv:1206.6389
Chan PP, He ZM, Li H, Hsu CC (2018) Data sanitization against adversarial label contamination based on data complexity. Int J Mach Learn Cybern 9(6):1039–1052
Article Google Scholar
Demidova L, Klyueva I, Sokolova Y, Stepanov N, Tyart N (2017) Intellectual approaches to improvement of the classification decisions quality on the base of the svm classifier. Procedia Comput Sci 103:222–230
Article Google Scholar
Diab DM, El Hindi KM (2017) Using differential evolution for fine tuning naïve bayesian classifiers and its application for text classification. Appl Soft Comput 54:183–199
Article Google Scholar
Frénay B, Verleysen M (2013) Classification in the presence of label noise: a survey. IEEE Tran Neural Netw Learn Syst 25(5):845–869
Article Google Scholar
Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139
Article MathSciNet Google Scholar
Ghosh A, Kumar H, Sastry P (2017) Robust loss functions under label noise for deep neural networks. In: Proceedings of the AAAI conference on artificial intelligence, vol 31
Ghosh A, Manwani N, Sastry P (2015) Making risk minimization tolerant to label noise. Neurocomputing 160:93–107
Article Google Scholar
Gupta V et al (2011) Recent trends in text classification techniques. Int J Comput Appl 35(6):45–51
Google Scholar
Kotsiantis SB (2013) Decision trees: a recent overview. Artif Intell Rev 39(4):261–283
Article Google Scholar
Li B, Gao Q (2019) Improving data quality with label noise correction. Intell Data Anal 23(4):737–757
Article Google Scholar
Liu H, Ditzler G (2019) Data poisoning attacks against mrmr. In: ICASSP 2019-2019 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 2517–2521. IEEE
Lukasik M, Bhojanapalli S, Menon AK, Kumar S (2020) Does label smoothing mitigate label noise? arXiv:2003.02819
Muñoz-González L, Biggio B, Demontis A, Paudice A, Wongrassamee V, Lupu EC, Roli F (2017) Towards poisoning of deep learning algorithms with back-gradient optimization. In: Proceedings of the 10th ACM workshop on artificial intelligence and security, pp 27–38
Nicholson B, Sheng VS, Zhang J (2016) Label noise correction and application in crowdsourcing. Expert Syst Appl 66:149–162
Article Google Scholar
Paudice A, Muñoz-González L, Lupu EC (2018) Label sanitization against label flipping poisoning attacks. In: Joint European conference on machine learning and knowledge discovery in databases, pp 5–15. Springer
Samami M, Akbari E, Abdar M, Plawiak P, Nematzadeh H, Basiri ME, Makarenkov V (2020) A mixed solution-based high agreement filtering method for class noise detection in binary classification. Phys A Stat Mech Appl 553:124219
Shanthini A, Vinodhini G, Chandrasekaran R, Supraja P (2019) A taxonomy on impact of label noise and feature noise using machine learning techniques. Soft Comput 23(18):8597–8607. https://doi.org/10.1007/s00500-019-03968-7
Article Google Scholar
Sharma K, Donmez P, Luo E, Liu Y, Yalniz IZ (2020) Noiserank: unsupervised label noise reduction with dependence models. arXiv:2003.06729
Sluban B, Lavrač N (2015) Relating ensemble diversity and performance: a study in class noise detection. Neurocomputing 160:120–131
Article Google Scholar
Taheri R, Javidan R, Shojafar M, Pooranian Z, Miri A, Conti M (2020) On defending against label flipping attacks on malware detection systems. Neural Comput Appl 32:14781–14800
Article Google Scholar
Thangaraj M, Sivakami M (2018) Text classification techniques: a literature review. Interdiscip J Inf Knowl Manag 13:117–135
Google Scholar
Xiao H, Biggio B, Brown G, Fumera G, Eckert C, Roli F (2015) Is feature selection secure against training data poisoning? In: International conference on machine learning, pp 1689–1698
Xiao H, Biggio B, Nelson B, Xiao H, Eckert C, Roli F (2015) Support vector machines under adversarial label contamination. Neurocomputing 160:53–62
Article Google Scholar
Yan Y, Xu Z, Tsang I, Long G, Yang Y (2016) Robust semi-supervised learning through label aggregation. In: Proceedings of the AAAI conference on artificial intelligence, vol 30
Yen SJ, Lee YS, Ying JC, Wu YC (2011) A logistic regression-based smoothing method for Chinese text categorization. Expert Syst Appl 38(9):11581–11590
Article Google Scholar
Zhang H, Cheng N, Zhang Y, Li Z (2021) Label flipping attacks against Naive Bayes on spam filtering systems. Appl Intell 51:4503–4514
Article Google Scholar
Zhang J, Sheng VS, Li T, Wu X (2017) Improving crowdsourced label quality using noise correction. IEEE Trans Neural Netw Learn Syst 29(5):1675–1688
Article MathSciNet Google Scholar
Zhang Z, Sabuncu M (2018) Generalized cross entropy loss for training deep neural networks with noisy labels. Adv Neural Inf Process Syst 31:8778–8788
Google Scholar

Download references

Acknowledgements

This research was partly supported by the Key R&D and promotion projects of Henan Province (Technological research) (Grant No. 212102210143), the Key Science and Technology Project of Xinjiang Production and Construction Corps (Grant No. 2018AB017) and the Key Research, Development, and Dissemination Program of Henan Province (Science and Technology for the People) (Grant No. 182207310002).

Author information

Authors and Affiliations

School of Information Engineering, Zhengzhou University, Zhengzhou, 450001, Henan, China
Ning Cheng, Hongpo Zhang & Zhanbo Li
Cooperative Innovation Center of Internet Healthcare, Zhengzhou University, Zhengzhou, 450001, Henan, China
Ning Cheng, Hongpo Zhang & Zhanbo Li

Authors

Ning Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Hongpo Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zhanbo Li
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Hongpo Zhang contributed to the conception of the study; Ning Cheng performed the experiment; Hongpo Zhang contributed significantly to analysis and manuscript preparation; Ning Cheng performed the data analyses and wrote the manuscript; Zhanbo Li helped perform the analysis with constructive discussions.

Corresponding authors

Correspondence to Hongpo Zhang or Zhanbo Li.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cheng, N., Zhang, H. & Li, Z. Data sanitization against label flipping attacks using AdaBoost-based semi-supervised learning technology. Soft Comput 25, 14573–14581 (2021). https://doi.org/10.1007/s00500-021-06384-y

Download citation

Accepted: 26 September 2021
Published: 18 October 2021
Issue Date: December 2021
DOI: https://doi.org/10.1007/s00500-021-06384-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Data sanitization against label flipping attacks using AdaBoost-based semi-supervised learning technology

Abstract

Access this article

Similar content being viewed by others

Data sanitization against adversarial label contamination based on data complexity

A Label Flipping Attack on Machine Learning Model and Its Defense Mechanism

Deep k-NN Defense Against Clean-Label Data Poisoning Attacks

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Data sanitization against label flipping attacks using AdaBoost-based semi-supervised learning technology

Abstract

Access this article

Similar content being viewed by others

Data sanitization against adversarial label contamination based on data complexity

A Label Flipping Attack on Machine Learning Model and Its Defense Mechanism

Deep k-NN Defense Against Clean-Label Data Poisoning Attacks

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation