Abstract
The label flipping attack is a special poisoning attack in the adversarial environment. The research designed a novel label noise processing framework, the core of which is the semi-supervised learning label correction algorithm based on AdaBoost (AdaSSL). It can effectively improve the label quality of training data and improve the classification performance of the model. Based on five real UCI datasets, this study chose six classic machine learning algorithms (NB, LR, SVM, DT, KNN and MLP) as the base classifiers to classify them. With a noise level of 0\( \sim \)20%, we evaluated the classification effect of these classifiers on UCI datasets based on the entropy label flipping attack and the AdaSSL defense algorithm. The experimental results show that the AdaSSL algorithm can effectively improve the robustness of the classifier against label flipping attack. Compared with the most advanced semi-supervised defense algorithm in the literature, the algorithm does not need to use additional datasets. At a noise ratio of 10%, the AdaSSL algorithm is significantly better than state-of-the-art label noise defense technology.
This is a preview of subscription content, access via your institution.




References
Altınel B, Ganiz MC (2016) A new hybrid semi-supervised algorithm for text classification with class-based semantics. Knowl-Based Syst 108:50–64
Barreno M, Nelson B, Sears R, Joseph AD, Tygar JD (2006) Can machine learning be secure? In: Proceedings of the 2006 ACM symposium on information, computer and communications security, pp 16–25
Bhagoji AN, Cullina D, Mittal P(2017) Dimensionality reduction as a defense against evasion attacks on machine learning classifiers. arXiv:1704.026542
Biggio B, Nelson B, Laskov P (2011) Support vector machines under adversarial label noise. In: Asian conference on machine learning, pp 97–112
Biggio B, Nelson B, Laskov P (2012) Poisoning attacks against support vector machines. arXiv:1206.6389
Chan PP, He ZM, Li H, Hsu CC (2018) Data sanitization against adversarial label contamination based on data complexity. Int J Mach Learn Cybern 9(6):1039–1052
Demidova L, Klyueva I, Sokolova Y, Stepanov N, Tyart N (2017) Intellectual approaches to improvement of the classification decisions quality on the base of the svm classifier. Procedia Comput Sci 103:222–230
Diab DM, El Hindi KM (2017) Using differential evolution for fine tuning naïve bayesian classifiers and its application for text classification. Appl Soft Comput 54:183–199
Frénay B, Verleysen M (2013) Classification in the presence of label noise: a survey. IEEE Tran Neural Netw Learn Syst 25(5):845–869
Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139
Ghosh A, Kumar H, Sastry P (2017) Robust loss functions under label noise for deep neural networks. In: Proceedings of the AAAI conference on artificial intelligence, vol 31
Ghosh A, Manwani N, Sastry P (2015) Making risk minimization tolerant to label noise. Neurocomputing 160:93–107
Gupta V et al (2011) Recent trends in text classification techniques. Int J Comput Appl 35(6):45–51
Kotsiantis SB (2013) Decision trees: a recent overview. Artif Intell Rev 39(4):261–283
Li B, Gao Q (2019) Improving data quality with label noise correction. Intell Data Anal 23(4):737–757
Liu H, Ditzler G (2019) Data poisoning attacks against mrmr. In: ICASSP 2019-2019 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 2517–2521. IEEE
Lukasik M, Bhojanapalli S, Menon AK, Kumar S (2020) Does label smoothing mitigate label noise? arXiv:2003.02819
Muñoz-González L, Biggio B, Demontis A, Paudice A, Wongrassamee V, Lupu EC, Roli F (2017) Towards poisoning of deep learning algorithms with back-gradient optimization. In: Proceedings of the 10th ACM workshop on artificial intelligence and security, pp 27–38
Nicholson B, Sheng VS, Zhang J (2016) Label noise correction and application in crowdsourcing. Expert Syst Appl 66:149–162
Paudice A, Muñoz-González L, Lupu EC (2018) Label sanitization against label flipping poisoning attacks. In: Joint European conference on machine learning and knowledge discovery in databases, pp 5–15. Springer
Samami M, Akbari E, Abdar M, Plawiak P, Nematzadeh H, Basiri ME, Makarenkov V (2020) A mixed solution-based high agreement filtering method for class noise detection in binary classification. Phys A Stat Mech Appl 553:124219
Shanthini A, Vinodhini G, Chandrasekaran R, Supraja P (2019) A taxonomy on impact of label noise and feature noise using machine learning techniques. Soft Comput 23(18):8597–8607. https://doi.org/10.1007/s00500-019-03968-7
Sharma K, Donmez P, Luo E, Liu Y, Yalniz IZ (2020) Noiserank: unsupervised label noise reduction with dependence models. arXiv:2003.06729
Sluban B, Lavrač N (2015) Relating ensemble diversity and performance: a study in class noise detection. Neurocomputing 160:120–131
Taheri R, Javidan R, Shojafar M, Pooranian Z, Miri A, Conti M (2020) On defending against label flipping attacks on malware detection systems. Neural Comput Appl 32:14781–14800
Thangaraj M, Sivakami M (2018) Text classification techniques: a literature review. Interdiscip J Inf Knowl Manag 13:117–135
Xiao H, Biggio B, Brown G, Fumera G, Eckert C, Roli F (2015) Is feature selection secure against training data poisoning? In: International conference on machine learning, pp 1689–1698
Xiao H, Biggio B, Nelson B, Xiao H, Eckert C, Roli F (2015) Support vector machines under adversarial label contamination. Neurocomputing 160:53–62
Yan Y, Xu Z, Tsang I, Long G, Yang Y (2016) Robust semi-supervised learning through label aggregation. In: Proceedings of the AAAI conference on artificial intelligence, vol 30
Yen SJ, Lee YS, Ying JC, Wu YC (2011) A logistic regression-based smoothing method for Chinese text categorization. Expert Syst Appl 38(9):11581–11590
Zhang H, Cheng N, Zhang Y, Li Z (2021) Label flipping attacks against Naive Bayes on spam filtering systems. Appl Intell 51:4503–4514
Zhang J, Sheng VS, Li T, Wu X (2017) Improving crowdsourced label quality using noise correction. IEEE Trans Neural Netw Learn Syst 29(5):1675–1688
Zhang Z, Sabuncu M (2018) Generalized cross entropy loss for training deep neural networks with noisy labels. Adv Neural Inf Process Syst 31:8778–8788
Acknowledgements
This research was partly supported by the Key R&D and promotion projects of Henan Province (Technological research) (Grant No. 212102210143), the Key Science and Technology Project of Xinjiang Production and Construction Corps (Grant No. 2018AB017) and the Key Research, Development, and Dissemination Program of Henan Province (Science and Technology for the People) (Grant No. 182207310002).
Author information
Authors and Affiliations
Contributions
Hongpo Zhang contributed to the conception of the study; Ning Cheng performed the experiment; Hongpo Zhang contributed significantly to analysis and manuscript preparation; Ning Cheng performed the data analyses and wrote the manuscript; Zhanbo Li helped perform the analysis with constructive discussions.
Corresponding authors
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Cheng, N., Zhang, H. & Li, Z. Data sanitization against label flipping attacks using AdaBoost-based semi-supervised learning technology. Soft Comput 25, 14573–14581 (2021). https://doi.org/10.1007/s00500-021-06384-y
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-021-06384-y
Keywords
- Label noise detection
- Machine learning
- AdaBoost algorithm
- Semi-supervised