Skip to main content

Patch-Based Backdoors Detection and Mitigation with Feature Masking

  • Conference paper
  • First Online:
Security and Privacy in Social Networks and Big Data (SocialSec 2022)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1663))

  • 468 Accesses

Abstract

Pre-trained models have been employed by substantial downstream tasks, achieving remarkable achievements in transfer learning scenarios. However, poisoning the training samples guides the target model to make misclassification in the inference phase, backdoor attacks against pre-trained models represents a new security threat. In this paper, we propose two patch-based backdoors detection and mitigation methods via feature masking. Our approaches are motivated by the observation that, patch-based triggers induce abnormal feature distribution at the intermediate layer. By exploiting the feature importance extraction method and gradient-based threshold method, the backdoored samples can be detected and the abnormal feature values can be backward linked to the trigger position. Hence, masking the features within the trigger posed achieves the correct labels for those backdoored samples. Finally, we employ the unlearning technique to dramatically mitigate the negative effect of the backdoor attacks. The extensive experimental results show that our approaches perform better in defense effectiveness and model inference accuracy on clean examples than the state-of-the-art method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Kai, W., Wang, C., Liu, J.: Evolutionary multitasking multi-layer network reconstruction. IEEE Trans. Cybern. (2021). https://doi.org/10.1109/TCYB.2021.3090769

    Article  Google Scholar 

  2. Kai, W., Hao, X., Liu, J., Liu, P., Shen, F.: Online reconstruction of complex networks from streaming data. IEEE Trans. Cybern. 52(6), 5136–5147 (2022)

    Article  Google Scholar 

  3. Bojarski, M., et al.: End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316 (2016)

  4. Esteva, A., et al.: Dermatologist-level classification of skin cancer with deep neural networks. Nature 542(7639), 115–118 (2017)

    Article  Google Scholar 

  5. Ma, X., Chen, X., Zhang, X.: Non-interactive privacy-preserving neural network prediction. Inf. Sci. 481, 507–519 (2019)

    Article  Google Scholar 

  6. Zhang, X., Chen, X., Liu, J.K., Xiang, Y.: DeepPAR and DeepDPA: privacy preserving and asynchronous deep learning for industrial IoT. IEEE Trans. Ind. Inf. 16(3), 2081–2090 (2019)

    Article  Google Scholar 

  7. Zhang, H., et al.: Stackgan: text to photo-realistic image synthesis with stacked generative adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5907–5915 (2017)

    Google Scholar 

  8. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  9. Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R.R., Le, Q.V.: Xlnet: Generalized autoregressive pretraining for language understanding. Advances in Neural Information Processing Systems 32 (2019)

    Google Scholar 

  10. Gu, T., Dolan-Gavitt, B., Garg, S.: Badnets: identifying vulnerabilities in the machine learning model supply chain. arXiv preprint arXiv:1708.06733 (2017)

  11. Xi, Z., Pang, R., Ji, S., Wang, T.: Graph backdoor. In: 30th USENIX Security Symposium, pp. 1523–1540 (2021)

    Google Scholar 

  12. Yao, Y., Li, H., Zheng, H., Zhao, B.Y.: Latent backdoor attacks on deep neural networks. In: Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, pp. 2041–2055 (2019)

    Google Scholar 

  13. Liu, K., Dolan-Gavitt, B., Garg, S.: Fine-pruning: defending against backdooring attacks on deep neural networks. In: Bailey, M., Holz, T., Stamatogiannakis, M., Ioannidis, S. (eds.) RAID 2018. LNCS, vol. 11050, pp. 273–294. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00470-5_13

    Chapter  Google Scholar 

  14. Zhao, P., Chen, P.Y., Das, P., Ramamurthy, K.N., Lin, X.: Bridging mode connectivity in loss landscapes and adversarial robustness. arXiv preprint arXiv:2005.00060 (2020)

  15. Li, Y., Lyu, X., Koren, N., Lyu, L., Li, B., Ma, X.: Neural attention distillation: Erasing backdoor triggers from deep neural networks. arXiv preprint arXiv:2101.05930 (2021)

  16. Li, Y., Lyu, X., Koren, N., Lyu, L., Li, B., Ma, X.: Anti-backdoor learning: training clean models on poisoned data. Advances in Neural Information Processing Systems 34 (2021)

    Google Scholar 

  17. Liu, Y., Ma, S., Aafer, Y., et al.: Trojaning attack on neural networks (2017)

    Google Scholar 

  18. Liao, C., Zhong, H., Squicciarini, A., Zhu, S., Miller, D.: Backdoor embedding in convolutional neural network models via invisible perturbation. arXiv preprint arXiv:1808.10307 (2018)

  19. Turner, A., Tsipras, D., Madry, A.: Clean-label backdoor attacks (2018)

    Google Scholar 

  20. Liu, Y., Ma, X., Bailey, J., Lu, F.: Reflection backdoor: a natural backdoor attack on deep neural networks. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12355, pp. 182–199. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58607-2_11

    Chapter  Google Scholar 

  21. Salem, A., Wen, R., Backes, M., Ma, S., Zhang, Y.: Dynamic backdoor attacks against machine learning models. arXiv preprint arXiv:2003.03675 (2020)

  22. Li, J., Monroe, W., Jurafsky, D.: Understanding neural networks through representation erasure. arXiv preprint arXiv:1612.08220 (2016)

  23. Yang, P., Chen, J., Hsieh, C.J., Wang, J.L., Jordan, M.: Ml-loo: detecting adversarial examples with feature attribution. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 4, pp. 6639–6647 (2020)

    Google Scholar 

  24. Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2921–2929 (2016)

    Google Scholar 

Download references

Acknowledgment

This work is supported by the National Natural Science Foundation of China (No. 62102300).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaoyu Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, T., Zhang, X., Jin, Y., Chen, C., Zhu, F. (2022). Patch-Based Backdoors Detection and Mitigation with Feature Masking. In: Chen, X., Huang, X., Kutyłowski, M. (eds) Security and Privacy in Social Networks and Big Data. SocialSec 2022. Communications in Computer and Information Science, vol 1663. Springer, Singapore. https://doi.org/10.1007/978-981-19-7242-3_15

Download citation

  • DOI: https://doi.org/10.1007/978-981-19-7242-3_15

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-19-7241-6

  • Online ISBN: 978-981-19-7242-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics