Scaling up the Randomized Gradient-Free Adversarial Attack Reveals Overestimation of Robustness Using Established Attacks

Abstract

Modern neural networks are highly non-robust against adversarial manipulation. A significant amount of work has been invested in techniques to compute lower bounds on robustness through formal guarantees and to build provably robust models. However, it is still difficult to get guarantees for larger networks or robustness against larger perturbations. Thus attack strategies are needed to provide tight upper bounds on the actual robustness. We significantly improve the randomized gradient-free attack for ReLU networks (Croce and Hein in GCPR, 2018), in particular by scaling it up to large networks. We show that our attack achieves similar or significantly smaller robust accuracy than state-of-the-art attacks like PGD or the one of Carlini and Wagner, thus revealing an overestimation of the robustness by these state-of-the-art methods. Our attack is not based on a gradient descent scheme and in this sense gradient-free, which makes it less sensitive to the choice of hyperparameters as no careful selection of the stepsize is required.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Notes

  1. 1.

    The case \(x^{(j-1)}_r=0\) implies that the region on which the affine approximation holds has dimension smaller than that of the input space. Setting \(\mathop {\mathrm{sgn}}\nolimits (0)=1\) we consider a polytope which contains as a face the hyperplane defined by the condition \(x^{(j-1)}_r=0\).

  2. 2.

    https://github.com/jonasrauber/linear-region-attack.

  3. 3.

    https://github.com/MadryLab/mnist_challenge.

  4. 4.

    https://github.com/MadryLab/cifar10_challenge.

References

  1. Arora, R., Basuy, A., Mianjyz, P., & Mukherjee, A. (2018). Understanding deep neural networks with rectified linear unit. In ICLR.

  2. Athalye, A., Carlini, N., & Wagner, D. A. (2018). Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In ICML.

  3. Beck, A., & Teboulle, M. (2009). A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences, 2, 183–202.

    MathSciNet  Article  Google Scholar 

  4. Brendel, W., Rauber, J., & Bethge, M. (2018). Decision-based adversarial attacks: Reliable attacks against black-box machine learning models. In ICLR.

  5. Carlini, N., & Wagner, D. (2017a). Adversarial examples are not easily detected: Bypassing ten detection methods. In ACM workshop on artificial intelligence and security.

  6. Carlini, N., & Wagner, D. (2017b). Towards evaluating the robustness of neural networks. In IEEE symposium on security and privacy.

  7. Chambolle, A., & Pock, T. (2011). A first-order primal-dual algorithm for convex problems with applications to imaging. Journal of Mathematical Imaging and Vision, 40(1), 120–145.

    MathSciNet  Article  Google Scholar 

  8. Croce, F., Andriushchenko, M., & Hein, M. (2019). Provable robustness of ReLU networks via maximization of linear regions. In AISTATS.

  9. Croce, F., & Hein, M. (2018). A randomized gradient-free attack on ReLU networks. In GCPR.

  10. Dalvi, N., Domingos, P., Mausam, S., & Verma, D. (2004). Adversarial classification. In KDD.

  11. Goodfellow, I. J., Shlens, J., & Szegedy, C. (2015). Explaining and harnessing adversarial examples. In ICLR.

  12. He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. CVPR (pp. 770–778).

  13. Hein, M., & Andriushchenko, M. (2017). Formal guarantees on the robustness of a classifier against adversarial manipulation. InNIPS.

  14. Huang, G., Liu, Z., & Weinberger, K. Q. (2016a). Densely connected convolutional networks. In CoRR, abs/1608.06993.

  15. Huang, R., Xu, B., Schuurmans, D., & Szepesvari, C. (2016b). Learning with a strong adversary. In ICLR.

  16. Katz, G., Barrett, C., Dill, D., Julian, K., & Kochenderfer, M. (2017). Reluplex: An efficient SMT solver for verifying deep neural networks. In CAV.

  17. Krizhevsky, A., Nair, V., & Hinton, G. (2014). Cifar-10 (canadian institute for advanced research). https://www.cs.toronto.edu/~kriz/cifar.html.

  18. Kurakin, A., Goodfellow, I. J., & Bengio, S. (2017). Adversarial examples in the physical world. In ICLR workshop.

  19. Liu, Y., Chen, X., Liu, C., & Song, D. (2017). Delving into transferable adversarial examples and black-box attacks. In ICLR.

  20. Lowd, D., & Meek, C. (2005). Adversarial learning. In KDD.

  21. Madry, A., Makelov, A., Schmidt, L., Tsipras, D., & Valdu, A. (2018). Towards deep learning models resistant to adversarial attacks. In ICLR.

  22. Mirman, M., Gehr, T., & Vechev, M. (2018). Differentiable abstract interpretation for provably robust neural networks. In ICML.

  23. Moosavi-Dezfooli, S.-M., Fawzi, A., & Frossard, P. (2016). Deepfool: A simple and accurate method to fool deep neural networks. In CVPR (pp. 2574–2582).

  24. Mosbach, M., Andriushchenko, M., Trost, T., Hein, M., & Klakow, D. (2018). Logit pairing methods can fool gradient-based attacks. In NeurIPS 2018 workshop on security in machine learning. arXiv:1810.12042.

  25. Narodytska, N., & Kasiviswanathan, S. P. (2016). Simple black-box adversarial perturbations for deep networks. In CVPR 2017 Workshops.

  26. Nesterov, Y. E. (1983). A method of solving a convex programming problem with convergence rate O\((1/k^2)\). Soviet Mathematics Doklady, 27(2), 372–376.

    MATH  Google Scholar 

  27. Papernot, N., Carlini, N., Goodfellow, I., Feinman, R., Faghri, F., & Matyasko, A., et al. (2017). cleverhans v2.0.0: An adversarial machine learning library. preprint arXiv:1610.00768.

  28. Papernot, N., McDonald, P., Wu, X., Jha, S., & Swami, A. (2016). Distillation as a defense to adversarial perturbations against deep networks. In IEEE symposium on security & privacy.

  29. Raghunathan, A., Steinhardt, J., & Liang, P. (2018). Certified defenses against adversarial examples. In ICLR.

  30. Rauber, J., Brendel, W., & Bethge, M. (2017). Foolbox: A python toolbox to benchmark the robustness of machine learning models. In ICML reliable machine learning in the wild workshop.

  31. Schott, L., Rauber, J., Bethge, M., & Brendel, W. (2019). Towards the first adversarially robust neural network model on MNIST. In ICLR.

  32. Stallkamp, J., Schlipsing, M., Salmen, J., & Igel, C. (2012). Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition. Neural Networks, 32, 323–332.

    Article  Google Scholar 

  33. Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., & Goodfellow, I., et al. (2014). Intriguing properties of neural networks. In ICLR (pp. 2503–2511).

  34. Tjeng, V., Xiao, K., & Tedrake, R. (2019). Evaluating robustness of neural networks with mixed integer programming. preprint arXiv:1711.07356v3.

  35. Weng, T., Zhang, H., Chen, H., Song, Z., Hsieh, C., & Daniel, L., et al. (2018). Towards fast computation of certified robustness for ReLU networks. In ICML.

  36. Wong, E., & Kolter, J. Z. (2018). Provable defenses against adversarial examples via the convex outer adversarial polytope. In ICML.

  37. Wong, E., Schmidt, F., Metzen, J. H., & Kolter, J. Z. (2018). Scaling provable adversarial defenses. In NeurIPS.

  38. Yuan, X., He, P., Zhu, Q., Bhat, R. R., & Li, X. (2019). Adversarial examples: Attacks and defenses for deep learning. IEEE Transactions on Neural Networks and Learning Systems, 30, 2805–2824.

    MathSciNet  Article  Google Scholar 

Download references

Acknowledgements

F. C. and M. H. acknowledge support from the BMBF through the Tübingen AI Center (FKZ: 01IS18039A) and by the DFG via Grant 389792660 as part of TRR 248 and the Excellence Cluster “Machine Learning-New Perspectives for Science”. J. R. acknowledges support from the Bosch Research Foundation (Stifterverband, T113/30057/17) and the International Max Planck Research School for Intelligent Systems (IMPRS-IS).

Author information

Affiliations

Authors

Corresponding author

Correspondence to Francesco Croce.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Communicated by Thomas Brox.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Croce, F., Rauber, J. & Hein, M. Scaling up the Randomized Gradient-Free Adversarial Attack Reveals Overestimation of Robustness Using Established Attacks. Int J Comput Vis 128, 1028–1046 (2020). https://doi.org/10.1007/s11263-019-01213-0

Download citation

Keywords

  • Adversarial attacks
  • Adversarial robustness
  • White-box attacks
  • Gradient-free attacks