Skip to main content

DA3G: Detecting Adversarial Attacks by Analysing Gradients

  • 2021 Accesses

Part of the Lecture Notes in Computer Science book series (LNSC,volume 12972)

Abstract

Deep learning models are vulnerable to specifically crafted inputs, called adversarial examples. In this paper, we present DA3G, a novel method to reliably detect evasion attacks on neural networks. We analyse the behaviour of the network under test on the given input sample. Compared to the benign training data, adversarial examples cause a discrepancy between visual and causal perception. Although visually close to a benign input class, the output is shifted at the attacker’s will. DA3G detects these changes in the pattern of the gradient using an auxiliary neural network. Our end-to-end approach readily integrates with a variety of existing architectures. DA3G reliably detects known as well as unknown attacks and increases the difficulty of adaptive attacks.

Keywords

  • Adversarial machine learning
  • Attack detection
  • Defence methods
  • Evasion attacks
  • Deep learning
  • Neural network security

J.-P. Schulze and P. Sperl—Co-first authors.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-030-88418-5_27
  • Chapter length: 21 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   89.00
Price excludes VAT (USA)
  • ISBN: 978-3-030-88418-5
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   119.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.

References

  1. Athalye, A., Carlini, N., Wagner, D.: Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples. In: 35th International Conference on Machine Learning, ICML 2018, vol. 80, pp. 274–283 (2018). http://proceedings.mlr.press/v80/athalye18a.html

  2. Athalye, A., Engstrom, L., Ilyas, A., Kevin, K.: Synthesizing robust adversarial examples. In: 35th International Conference on Machine Learning, ICML 2018, vol. 80, pp. 284–293 (2018). http://proceedings.mlr.press/v80/athalye18b.html

  3. Biggio, B., et al.: Evasion attacks against machine learning at test time. In: Machine Learning and Knowledge Discovery in Databases, pp. 387–402 (2013). https://doi.org/10.1007/978-3-642-40994-3_25

  4. Brendel, W., Rauber, J., Bethge, M.: Decision-based adversarial attacks: reliable attacks against black-box machine learning models. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=SyZI0GWCZ

  5. Carlini, N., et al.: On evaluating adversarial robustness. arXiv (2019). http://arxiv.org/abs/1902.06705

  6. Carlini, N., Wagner, D.: Adversarial examples are not easily detected: bypassing ten detection methods. In: AISec 2017 - Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pp. 3–14 (2017). http://doi.acm.org/10.1145/3128572.3140444

  7. Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks. In: Proceedings - IEEE Symposium on Security and Privacy, pp. 39–57. IEEE (2017). https://doi.org/10.1109/SP.2017.49

  8. Chollet, F.: Simple MNIST convnet (2015). https://keras.io/examples/vision/mnist_convnet/

  9. Croce, F., Hein, M.: Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. In: Proceedings of Machine Learning Research, pp. 2206–2216. PMLR (2020). http://proceedings.mlr.press/v119/croce20b/croce20b.pdf

  10. Dhaliwal, J., Shintre, S.: Gradient similarity : an explainable approach to detect adversarial attacks against deep learning (2018). https://arxiv.org/pdf/1806.10707.pdf

  11. Dvijotham, K.D., et al.: Training verified learners with learned verifiers (2018). https://arxiv.org/pdf/1805.10265.pdf

  12. Fidel, G., Bitton, R., Shabtai, A.: When explainability meets adversarial learning: detecting adversarial examples using SHAP signatures. In: 2020 International Joint Conference on Neural Networks (IJCNN) (2020). https://doi.org/10.1109/IJCNN48605.2020.9207637

  13. Freitas, S., Chen, S.T., Wang, Z.J., Horng Chau, D.: UnMask: adversarial detection and defense through robust feature alignment. In: Proceedings - 2020 IEEE International Conference on Big Data, pp. 1081–1088 (2020). https://doi.org/10.1109/BigData50022.2020.9378303

  14. Ghiasi, A., Shafahi, A., Goldstein, T.: Breaking certified defenses: semantic adversarial examples with spoofed robustness certificates. In: International Conference on Learning Representations (2020). https://openreview.net/forum?id=HJxdTxHYvB

  15. Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples (2015). http://arxiv.org/abs/1412.6572

  16. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90

  17. Hein, M., Andriushchenko, M.: Formal guarantees on the robustness of a classifier against adversarial manipulation. In: Advances in Neural Information Processing Systems (2017). https://proceedings.neurips.cc/paper/2017/file/e077e1a544eec4f0307cf5c3c721d944-Paper.pdf

  18. Ilyas, A., Santurkar, S., Tsipras, D., Engstrom, L., Tran, B., Madry, A.: Adversarial examples are not bugs, they are features. In: Advances in Neural Information Processing Systems, vol. 32 (2019). https://proceedings.neurips.cc/paper/2019/file/e2c420d928d4bf8ce0ff2ec19b371514-Paper.pdf

  19. Keras: CIFAR-10 CNN (2020). https://keras.io/examples/cifar10_cnn/

  20. Klambauer, G., Unterthiner, T., Mayr, A.: Self-normalizing neural networks. In: NIPS 2017: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 972–981 (2017). https://papers.neurips.cc/paper/2017/file/5d44ee6f2c3f71b73125876103c8f6c4-Paper.pdf

  21. Krizhevsky, A.: Learning multiple layers of features from tiny images. Technical report (2009). https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf

  22. Kurakin, A., Goodfellow, I.J., Bengio, S.: Adversarial examples in the physical world. In: International Conference on Learning Representations, ICLR, pp. 99–112 (2016). https://doi.org/10.1201/9781351251389-8

  23. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2323 (1998). https://doi.org/10.1109/5.726791

    CrossRef  Google Scholar 

  24. LeCun, Y., Cortes, C., Burges, C.: MNIST handwritten digit database. ATT Labs, February 2010. http://yann.lecun.com/exdb/mnist

  25. Lust, J., Condurache, A.P.: GraN: an efficient gradient-norm based detector for adversarial and misclassified examples. In: 28th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (2020)

    Google Scholar 

  26. Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A., Science, C.: Towards deep learning medels resistant to adversarial attacks. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=rJzIBfZAb

  27. Moosavi-Dezfooli, S., Fawzi, A., Frossard, P.: DeepFool: a simple and accurate method to fool deep neural networks. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2574–2582 (2016). https://doi.org/10.1109/CVPR.2016.282

  28. Nguyen, A., Yosinski, J., Clune, J.: Deep neural networks are easily fooled: high confidence predictions for unrecognizable images. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 427–436 (2015). https://doi.org/10.1109/CVPR.2015.7298640

  29. Raghunathan, A., Steinhardt, J., Liang, P.: Certified defenses against adversarial examples. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=Bys4ob-Rb

  30. Raghunathan, A., Steinhardt, J., Liang, P.: Semidefinite relaxations for certifying robustness to adversarial examples. In: Advances in Neural Information Processing Systems (2018). https://proceedings.neurips.cc/paper/2018/file/29c0605a3bab4229e46723f89cf59d83-Paper.pdf

  31. Rauber, J., Brendel, W., Bethge, M.: Foolbox: a Python toolbox to benchmark the robustness of machine learning models. In: Reliable Machine Learning in the Wild Workshop, 34th International Conference on Machine Learning (2017). http://arxiv.org/abs/1707.04131

  32. Rauber, J., Zimmermann, R., Bethge, M., Brendel, W.: Foolbox native: fast adversarial attacks to benchmark the robustness of machine learning models in PyTorch, Tensorflow, and JAX. J. Open Source Softw. 5(53), 2607 (2020). https://doi.org/10.21105/joss.02607

  33. Sharad, K., Marson, G.A., Truong, H.T.T., Karame, G.: On the security of randomized defenses against adversarial samples. In: Proceedings of the 15th ACM Asia Conference on Computer and Communications Security, pp. 381–393 (2020). https://doi.org/10.1145/3320269.3384751

  34. Sperl, P., Kao, C.Y., Chen, P., Lei, X., Bottinger, K.: DLA: dense-layer-analysis for adversarial example detection. In: Proceedings 5th IEEE European Symposium on Security and Privacy, pp. 198–215 (2020). https://doi.org/10.1109/EuroSP48549.2020.00021

  35. Sperl, P., Schulze, J.P., Böttinger, K.: Activation anomaly analysis. In: Machine Learning and Knowledge Discovery in Databases, pp. 69–84 (2021). https://doi.org/10.1007/978-3-030-67661-2_5

  36. Su, J., Vargas, D.V., Sakurai, K.: One pixel attack for fooling deep neural networks. IEEE Trans. Evol. Comput. 23(5), 828–841 (2019). https://doi.org/10.1109/TEVC.2019.2890858

    CrossRef  Google Scholar 

  37. Szegedy, C., et al.: Intriguing properties of neural networks. In: International Conference on Learning Representations (2014). https://openreview.net/forum?id=kklr_MTHMRQjG

  38. Tramèr, F., Carlini, N., Brendel, W., Madry, A.: On adaptive attacks to adversarial example defenses. In: Advances in Neural Information Processing Systems, pp. 1633–1645 (2020). https://proceedings.neurips.cc/paper/2020/file/11f38f8ecd71867b42433548d1078e38-Paper.pdf

  39. Wong, E., Zico Kolter, J.: Provable defenses against adversarial examples via the convex outer adversarial polytope. In: Proceedings of the 35th International Conference on Machine Learning, pp. 5286–5295 (2018). http://proceedings.mlr.press/v80/wong18a/wong18a.pdf

  40. Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. arXiv pp. 1–6 (2017). https://arxiv.org/pdf/1708.07747.pdf

  41. Xu, W., Evans, D., Qi, Y.: Feature squeezing: detecting adversarial examples in deep neural networks. In: Network and Distributed System Security Symposium, NDSS (2018). https://doi.org/10.14722/ndss.2018.23198

  42. Yang, P., Chen, J., Hsieh, C.J., Wang, J.L., Jordan, M.I.: ML-LOO: detecting adversarial examples with feature attribution. In: AAAI Conference on Artificial Intelligence (2020). https://doi.org/10.1609/aaai.v34i04.6140

  43. Zhang, S., et al.: Detecting adversarial samples for deep learning models: a comparative study. IEEE Trans. Netw. Sci. Eng. 4697 (2021). https://doi.org/10.1109/tnse.2021.3057071

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Jan-Philipp Schulze or Philip Sperl .

Editor information

Editors and Affiliations

Appendices

A Network Architectures

For our evaluation, we applied DA3G on a variety of common NN architectures. We give an in-depth overview of the respective parameters in Table 4.

Table 4. Architectures, training settings, and clean accuracy of the target NNs.

B Adaptive Attacks

We show our results for F-MNIST in Fig. 3.

Fig. 3.
figure 3

Adaptive attacks on the Fashion-MNIST models. The dashed lines show the number of C&W steps required such that at least 80% of all attacks on the protected model were successful.

C PGD Attack Step Size

We evaluated multiple hyperparameters to increase the strength of the adaptive attacks. In Table 5, we list the PGD step sizes chosen for the final models. For the grey-box experiments, we used Foolbox’s [31, 32] default values relative to \(\epsilon \).

Table 5. PGD step sizes during the adaptive attacks.

Rights and permissions

Reprints and Permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Schulze, JP., Sperl, P., Böttinger, K. (2021). DA3G: Detecting Adversarial Attacks by Analysing Gradients. In: Bertino, E., Shulman, H., Waidner, M. (eds) Computer Security – ESORICS 2021. ESORICS 2021. Lecture Notes in Computer Science(), vol 12972. Springer, Cham. https://doi.org/10.1007/978-3-030-88418-5_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-88418-5_27

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-88417-8

  • Online ISBN: 978-3-030-88418-5

  • eBook Packages: Computer ScienceComputer Science (R0)