Patch-Based Backdoors Detection and Mitigation with Feature Masking

Wang, Tao; Zhang, Xiaoyu; Jin, Yulin; Chen, Chenyang; Zhu, Fei

doi:10.1007/978-981-19-7242-3_15

Tao Wang⁸,
Xiaoyu Zhang⁸,
Yulin Jin⁸,
Chenyang Chen⁸ &
…
Fei Zhu⁹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1663))

Included in the following conference series:

International Symposium on Security and Privacy in Social Networks and Big Data

468 Accesses

Abstract

Pre-trained models have been employed by substantial downstream tasks, achieving remarkable achievements in transfer learning scenarios. However, poisoning the training samples guides the target model to make misclassification in the inference phase, backdoor attacks against pre-trained models represents a new security threat. In this paper, we propose two patch-based backdoors detection and mitigation methods via feature masking. Our approaches are motivated by the observation that, patch-based triggers induce abnormal feature distribution at the intermediate layer. By exploiting the feature importance extraction method and gradient-based threshold method, the backdoored samples can be detected and the abnormal feature values can be backward linked to the trigger position. Hence, masking the features within the trigger posed achieves the correct labels for those backdoored samples. Finally, we employ the unlearning technique to dramatically mitigate the negative effect of the backdoor attacks. The extensive experimental results show that our approaches perform better in defense effectiveness and model inference accuracy on clean examples than the state-of-the-art method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Kai, W., Wang, C., Liu, J.: Evolutionary multitasking multi-layer network reconstruction. IEEE Trans. Cybern. (2021). https://doi.org/10.1109/TCYB.2021.3090769
Article Google Scholar
Kai, W., Hao, X., Liu, J., Liu, P., Shen, F.: Online reconstruction of complex networks from streaming data. IEEE Trans. Cybern. 52(6), 5136–5147 (2022)
Article Google Scholar
Bojarski, M., et al.: End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316 (2016)
Esteva, A., et al.: Dermatologist-level classification of skin cancer with deep neural networks. Nature 542(7639), 115–118 (2017)
Article Google Scholar
Ma, X., Chen, X., Zhang, X.: Non-interactive privacy-preserving neural network prediction. Inf. Sci. 481, 507–519 (2019)
Article Google Scholar
Zhang, X., Chen, X., Liu, J.K., Xiang, Y.: DeepPAR and DeepDPA: privacy preserving and asynchronous deep learning for industrial IoT. IEEE Trans. Ind. Inf. 16(3), 2081–2090 (2019)
Article Google Scholar
Zhang, H., et al.: Stackgan: text to photo-realistic image synthesis with stacked generative adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5907–5915 (2017)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R.R., Le, Q.V.: Xlnet: Generalized autoregressive pretraining for language understanding. Advances in Neural Information Processing Systems 32 (2019)
Google Scholar
Gu, T., Dolan-Gavitt, B., Garg, S.: Badnets: identifying vulnerabilities in the machine learning model supply chain. arXiv preprint arXiv:1708.06733 (2017)
Xi, Z., Pang, R., Ji, S., Wang, T.: Graph backdoor. In: 30th USENIX Security Symposium, pp. 1523–1540 (2021)
Google Scholar
Yao, Y., Li, H., Zheng, H., Zhao, B.Y.: Latent backdoor attacks on deep neural networks. In: Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, pp. 2041–2055 (2019)
Google Scholar
Liu, K., Dolan-Gavitt, B., Garg, S.: Fine-pruning: defending against backdooring attacks on deep neural networks. In: Bailey, M., Holz, T., Stamatogiannakis, M., Ioannidis, S. (eds.) RAID 2018. LNCS, vol. 11050, pp. 273–294. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00470-5_13
Chapter Google Scholar
Zhao, P., Chen, P.Y., Das, P., Ramamurthy, K.N., Lin, X.: Bridging mode connectivity in loss landscapes and adversarial robustness. arXiv preprint arXiv:2005.00060 (2020)
Li, Y., Lyu, X., Koren, N., Lyu, L., Li, B., Ma, X.: Neural attention distillation: Erasing backdoor triggers from deep neural networks. arXiv preprint arXiv:2101.05930 (2021)
Li, Y., Lyu, X., Koren, N., Lyu, L., Li, B., Ma, X.: Anti-backdoor learning: training clean models on poisoned data. Advances in Neural Information Processing Systems 34 (2021)
Google Scholar
Liu, Y., Ma, S., Aafer, Y., et al.: Trojaning attack on neural networks (2017)
Google Scholar
Liao, C., Zhong, H., Squicciarini, A., Zhu, S., Miller, D.: Backdoor embedding in convolutional neural network models via invisible perturbation. arXiv preprint arXiv:1808.10307 (2018)
Turner, A., Tsipras, D., Madry, A.: Clean-label backdoor attacks (2018)
Google Scholar
Liu, Y., Ma, X., Bailey, J., Lu, F.: Reflection backdoor: a natural backdoor attack on deep neural networks. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12355, pp. 182–199. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58607-2_11
Chapter Google Scholar
Salem, A., Wen, R., Backes, M., Ma, S., Zhang, Y.: Dynamic backdoor attacks against machine learning models. arXiv preprint arXiv:2003.03675 (2020)
Li, J., Monroe, W., Jurafsky, D.: Understanding neural networks through representation erasure. arXiv preprint arXiv:1612.08220 (2016)
Yang, P., Chen, J., Hsieh, C.J., Wang, J.L., Jordan, M.: Ml-loo: detecting adversarial examples with feature attribution. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 4, pp. 6639–6647 (2020)
Google Scholar
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2921–2929 (2016)
Google Scholar

Download references

Acknowledgment

This work is supported by the National Natural Science Foundation of China (No. 62102300).

Author information

Authors and Affiliations

State Key Laboratory of Integrated Service Networks (ISN), Xidian University, Xi’an, 710071, People’s Republic of China
Tao Wang, Xiaoyu Zhang, Yulin Jin & Chenyang Chen
School of Computing Technologies, RMIT University, Melbourne, VIC, 3000, Australia
Fei Zhu

Authors

Tao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoyu Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yulin Jin
View author publications
You can also search for this author in PubMed Google Scholar
Chenyang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Fei Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaoyu Zhang .

Editor information

Editors and Affiliations

Xidian University, Xi’an, China
Xiaofeng Chen
Fujian Normal University, Fuzhou, China
Xinyi Huang
Wrocław University of Science and Technology, Wrocław, Poland
Mirosław Kutyłowski

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, T., Zhang, X., Jin, Y., Chen, C., Zhu, F. (2022). Patch-Based Backdoors Detection and Mitigation with Feature Masking. In: Chen, X., Huang, X., Kutyłowski, M. (eds) Security and Privacy in Social Networks and Big Data. SocialSec 2022. Communications in Computer and Information Science, vol 1663. Springer, Singapore. https://doi.org/10.1007/978-981-19-7242-3_15

Download citation

DOI: https://doi.org/10.1007/978-981-19-7242-3_15
Published: 09 October 2022
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-7241-6
Online ISBN: 978-981-19-7242-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Patch-Based Backdoors Detection and Mitigation with Feature Masking