Abstract
As the use and reliance on AI technologies continue to proliferate, there is mounting concern regarding adversarial example attacks, emphasizing the pressing necessity for robust defense strategies to protect AI systems from malicious input manipulation. In this paper, we introduce a computationally efficient plug-in module, seamlessly integrable with advanced diffusion models for purifying adversarial examples. Drawing inspiration from the concept of deconstruction and reconstruction (DR), our module decomposes an input image into foundational visual features expected to exhibit robustness against adversarial perturbations and subsequently rebuilds the image using an image-to-image transformation neural network. Through the collaborative integration of the module with an advanced diffusion model, this combination attains state-of-the-art performance in effectively purifying adversarial examples while preserving high classification accuracy on clean image samples. The model performance is evaluated on representative neural network classifiers pre-trained and fine-tuned on large-scale datasets. An ablation study analyses the impact of the proposed plug-in module on enhancing the effectiveness of diffusion-based purification. Furthermore, it is noteworthy that the module demonstrates significant computational efficiency, incurring only minimal computational overhead during the purification process.
E. Bao and C.-C. Chang—These authors contributed equally.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Alzantot, M., Sharma, Y., Elgohary, A., Ho, B.J., Srivastava, M.B., Chang, K.W.: Generating natural language adversarial examples. In: Proceedings of Conference on Empirical Methods Natural Language Processing (EMNLP) (2018)
Athalye, A., Carlini, N., Wagner, D.: Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples. In: Proceedings of International Conference on Machine Learning (ICML) (2018)
Brendel, W., Rauber, J., Bethge, M.: Decision-based adversarial attacks: reliable attacks against black-box machine learning models. In: Proceedings of International Conference on Learning Representations (ICLR) (2018)
Carlini, N., et al.: Hidden voice commands. In: Proceedings of USENIX Security Symposium (USENIX Security) (2016)
Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks. In: Proceedings of IEEE Symposium on Security and Privacy (SP) (2017)
Chen, P.Y., Zhang, H., Sharma, Y., Yi, J., Hsieh, C.J.: Zoo: zeroth order optimization based black-box attacks to deep neural networks without training substitute models. In: Proceedings of ACM Workshop Artificial Intellgient Security (AISec) (2017)
Croce, F., et al.: Robustbench: a standardized adversarial robustness benchmark. In: Proceedings of Advance Neural Information Processing System (NeurIPS) (2021)
Dhillon, G.S., et al.: Stochastic activation pruning for robust adversarial defense. In: Proceedings of International Conference on Learning Representations (ICLR) (2018)
Dong, Y., et al.: Boosting adversarial attacks with momentum. In: Proceedings of IEEE Conference on Computer Vision on Pattern Recognition (CVPR) (2018)
Dong, Y., et al.: Efficient decision-based black-box adversarial attacks on face recognition. In: Proceedings of IEEE Conference on Computer Vision on Pattern Recognition (CVPR) (2019)
Dziugaite, G.K., Ghahramani, Z., Roy, D.M.: A study of the effect of JPG compression on adversarial images. arXiv preprint arXiv:1608.00853 (2016)
Eykholt, K., et al.: Robust physical-world attacks on deep learning visual classification. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Proceedings of International Conference on Learning Representations (ICLR) (2015)
Guo, C., Rana, M., Cisse, M., van der Maaten, L.: Countering adversarial images using input transformations. In: Proceedings of International Conference on Learning Representations (ICLR) (2018)
Hendrycks, D., Gimpel, K.: A baseline for detecting misclassified and out-of-distribution examples in neural networks. In: Proceedings of International Conference on Learning Representations (ICLR) (2017)
Hendrycks, D., Gimpel, K.: Early methods for detecting adversarial images. In: Proceedings of International Conference on Learning Representations Workshop (ICLR) (2017)
Ilyas, A., Engstrom, L., Athalye, A., Lin, J.: Black-box adversarial attacks with limited queries and information. In: Proceedings of International Conference on Machine Learning (ICML) (2018)
Ilyas, A., Santurkar, S., Tsipras, D., Engstrom, L., Tran, B., Madry, A.: Adversarial examples are not bugs, they are features. In: Proceedings of Advance Neural Information Processing System (NeurIPS) (2019)
Kurakin, A., Goodfellow, I.J., Bengio, S.: Adversarial examples in the physical world. In: Proceedings of International Conference on Learning Representations Workshop (ICLR) (2017)
Liao, F., Liang, M., Dong, Y., Pang, T., Hu, X., Zhu, J.: Defense against adversarial attacks using high-level representation guided denoiser. In: Proceedings of IEEE Conference on Computer Vision Pattern Recognition (CVPR) (2018)
Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning models resistant to adversarial attacks. arxiv:1706.06083 (2017)
Metzen, J.H., Genewein, T., Fischer, V., Bischoff, B.: On detecting adversarial perturbations. In: Proceedings of International Conference on Learning Representations (ICLR) (2017)
Moosavi-Dezfooli, S.M., Fawzi, A., Frossard, P.: Deepfool: a simple and accurate method to fool deep neural networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Nie, W., Guo, B., Huang, Y., Xiao, C., Vahdat, A., Anandkumar, A.: Diffusion models for adversarial purification. In: Proceedings of International Conference on Machine Learning (ICML) (2022)
Pang, T., Xu, K., Zhu, J.: Mixup inference: better exploiting mixup to defend adversarial attacks. In: Proceedings of International Conference on Learning Representations (ICLR) (2020)
Raghunathan, A., Steinhardt, J., Liang, P.: Certified defenses against adversarial examples. In: Proceedings of International Conference on Learning Representations (ICLR) (2018)
Rauber, J., Brendel, W., Bethge, M.: Foolbox: a python toolbox to benchmark the robustness of machine learning models. In: Proceedings of International Conference on Machine Learning (ICML) (2017)
Samangouei, P., Kabkab, M., Chellappa, R.: Defense-GAN: protecting classifiers against adversarial attacks using generative models. In: Proceedings of International Conference on Learning Representations (ICLR) (2018)
Shafahi, A., et al.: Adversarial training for free! In: Proceedings of Advance Neural Information Processing System (NeurIPS) (2019)
Sharif, M., Bhagavatula, S., Bauer, L., Reiter, M.K.: Accessorize to a crime: real and stealthy attacks on state-of-the-art face recognition. In: Proceedings of ACM SIGSAC Conference on Computer Communication Security (CCS) (2016)
Sharif, M., Bhagavatula, S., Bauer, L., Reiter, M.K.: A general framework for adversarial examples with objectives. ACM Trans. Priv. Secur. (TOPS) 22(3), 1–30 (2019)
Sinha, A., Namkoong, H., Duchi, J.: Certifying some distributional robustness with principled adversarial training. In: Proceedings of International Conference on Learning Representations (ICLR) (2018)
Song, Y., Kim, T., Nowozin, S., Ermon, S., Kushman, N.: Pixeldefend: Leveraging generative models to understand and defend against adversarial examples. In: Proceedings of International Conference on Learning Representations (ICLR) (2018)
Szegedy, C., et al.: Intriguing properties of neural networks. In: Proceedings of International Conference on Learning Representations (ICLR) (2014)
Uesato, J., O’Donoghue, B., van den Oord, A., Kohli, P.: Adversarial risk and the dangers of evaluating against weak attacks. In: Proceedings of International Conference on Machine Learning (ICML) (2018)
Wong, E., Kolter, Z.: Provable defenses against adversarial examples via the convex outer adversarial polytope. In: Proceedings of International Conference on Machine Learning (ICML) (2018)
Wong, E., Schmidt, F., Metzen, J.H., Kolter, J.Z.: Scaling provable adversarial defenses. In: Proceedings of Advance Neural Information Processing System (NeurIPS) (2018)
Xiao, C., Li, B., Zhu, J.Y., He, W., Liu, M., Song, D.: Generating adversarial examples with adversarial networks. In: Proceedings of International Joint Conference on Artificial Intelligence (IJCAI) (2018)
Xie, C., Wang, J., Zhang, Z., Ren, Z., Yuille, A.: Mitigating adversarial effects through randomization. In: Proceedings of International Conference on Learning Representations (ICLR) (2018)
Xie, C., et al.: Improving transferability of adversarial examples with input diversity. In: Proceedings of IEEE Conference on Computer Vision Pattern Recognition (CVPR) (2019)
Xu, W., Evans, D., Qi, Y.: Feature squeezing: detecting adversarial examples in deep neural networks. In: Proceedings of Network Distribution System on Security Symposium (NDSS) (2018)
Yuan, X., He, P., Zhu, Q., Li, X.: Adversarial examples: attacks and defenses for deep learning. IEEE Trans. Neural Netw. Learn. Syst. 30, 2805–2824 (2019)
Zhang, D., Zhang, T., Lu, Y., Zhu, Z., Dong, B.: You only propagate once: accelerating adversarial training via maximal principle. In: Proceedings of Advances Neural Information Processing System (NeurIPS) (2019)
Acknowledgments
This work was partially supported by JSPS KAKENHI Grants JP18H04120, JP20K23355, JP21H04907, and JP21K18023, and by JST CREST Grants JPMJCR18A6 and JPMJCR20D3, Japan.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Bao, E., Chang, CC., Nguyen, H.H., Echizen, I. (2024). From Deconstruction to Reconstruction: A Plug-In Module for Diffusion-Based Purification of Adversarial Examples. In: Ma, B., Li, J., Li, Q. (eds) Digital Forensics and Watermarking. IWDW 2023. Lecture Notes in Computer Science, vol 14511. Springer, Singapore. https://doi.org/10.1007/978-981-97-2585-4_4
Download citation
DOI: https://doi.org/10.1007/978-981-97-2585-4_4
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-2584-7
Online ISBN: 978-981-97-2585-4
eBook Packages: Computer ScienceComputer Science (R0)