Abstract
Pre-trained models have been employed by substantial downstream tasks, achieving remarkable achievements in transfer learning scenarios. However, poisoning the training samples guides the target model to make misclassification in the inference phase, backdoor attacks against pre-trained models represents a new security threat. In this paper, we propose two patch-based backdoors detection and mitigation methods via feature masking. Our approaches are motivated by the observation that, patch-based triggers induce abnormal feature distribution at the intermediate layer. By exploiting the feature importance extraction method and gradient-based threshold method, the backdoored samples can be detected and the abnormal feature values can be backward linked to the trigger position. Hence, masking the features within the trigger posed achieves the correct labels for those backdoored samples. Finally, we employ the unlearning technique to dramatically mitigate the negative effect of the backdoor attacks. The extensive experimental results show that our approaches perform better in defense effectiveness and model inference accuracy on clean examples than the state-of-the-art method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Kai, W., Wang, C., Liu, J.: Evolutionary multitasking multi-layer network reconstruction. IEEE Trans. Cybern. (2021). https://doi.org/10.1109/TCYB.2021.3090769
Kai, W., Hao, X., Liu, J., Liu, P., Shen, F.: Online reconstruction of complex networks from streaming data. IEEE Trans. Cybern. 52(6), 5136–5147 (2022)
Bojarski, M., et al.: End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316 (2016)
Esteva, A., et al.: Dermatologist-level classification of skin cancer with deep neural networks. Nature 542(7639), 115–118 (2017)
Ma, X., Chen, X., Zhang, X.: Non-interactive privacy-preserving neural network prediction. Inf. Sci. 481, 507–519 (2019)
Zhang, X., Chen, X., Liu, J.K., Xiang, Y.: DeepPAR and DeepDPA: privacy preserving and asynchronous deep learning for industrial IoT. IEEE Trans. Ind. Inf. 16(3), 2081–2090 (2019)
Zhang, H., et al.: Stackgan: text to photo-realistic image synthesis with stacked generative adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5907–5915 (2017)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R.R., Le, Q.V.: Xlnet: Generalized autoregressive pretraining for language understanding. Advances in Neural Information Processing Systems 32 (2019)
Gu, T., Dolan-Gavitt, B., Garg, S.: Badnets: identifying vulnerabilities in the machine learning model supply chain. arXiv preprint arXiv:1708.06733 (2017)
Xi, Z., Pang, R., Ji, S., Wang, T.: Graph backdoor. In: 30th USENIX Security Symposium, pp. 1523–1540 (2021)
Yao, Y., Li, H., Zheng, H., Zhao, B.Y.: Latent backdoor attacks on deep neural networks. In: Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, pp. 2041–2055 (2019)
Liu, K., Dolan-Gavitt, B., Garg, S.: Fine-pruning: defending against backdooring attacks on deep neural networks. In: Bailey, M., Holz, T., Stamatogiannakis, M., Ioannidis, S. (eds.) RAID 2018. LNCS, vol. 11050, pp. 273–294. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00470-5_13
Zhao, P., Chen, P.Y., Das, P., Ramamurthy, K.N., Lin, X.: Bridging mode connectivity in loss landscapes and adversarial robustness. arXiv preprint arXiv:2005.00060 (2020)
Li, Y., Lyu, X., Koren, N., Lyu, L., Li, B., Ma, X.: Neural attention distillation: Erasing backdoor triggers from deep neural networks. arXiv preprint arXiv:2101.05930 (2021)
Li, Y., Lyu, X., Koren, N., Lyu, L., Li, B., Ma, X.: Anti-backdoor learning: training clean models on poisoned data. Advances in Neural Information Processing Systems 34 (2021)
Liu, Y., Ma, S., Aafer, Y., et al.: Trojaning attack on neural networks (2017)
Liao, C., Zhong, H., Squicciarini, A., Zhu, S., Miller, D.: Backdoor embedding in convolutional neural network models via invisible perturbation. arXiv preprint arXiv:1808.10307 (2018)
Turner, A., Tsipras, D., Madry, A.: Clean-label backdoor attacks (2018)
Liu, Y., Ma, X., Bailey, J., Lu, F.: Reflection backdoor: a natural backdoor attack on deep neural networks. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12355, pp. 182–199. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58607-2_11
Salem, A., Wen, R., Backes, M., Ma, S., Zhang, Y.: Dynamic backdoor attacks against machine learning models. arXiv preprint arXiv:2003.03675 (2020)
Li, J., Monroe, W., Jurafsky, D.: Understanding neural networks through representation erasure. arXiv preprint arXiv:1612.08220 (2016)
Yang, P., Chen, J., Hsieh, C.J., Wang, J.L., Jordan, M.: Ml-loo: detecting adversarial examples with feature attribution. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 4, pp. 6639–6647 (2020)
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2921–2929 (2016)
Acknowledgment
This work is supported by the National Natural Science Foundation of China (No. 62102300).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Wang, T., Zhang, X., Jin, Y., Chen, C., Zhu, F. (2022). Patch-Based Backdoors Detection and Mitigation with Feature Masking. In: Chen, X., Huang, X., Kutyłowski, M. (eds) Security and Privacy in Social Networks and Big Data. SocialSec 2022. Communications in Computer and Information Science, vol 1663. Springer, Singapore. https://doi.org/10.1007/978-981-19-7242-3_15
Download citation
DOI: https://doi.org/10.1007/978-981-19-7242-3_15
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-7241-6
Online ISBN: 978-981-19-7242-3
eBook Packages: Computer ScienceComputer Science (R0)