Scaling up the Randomized Gradient-Free Adversarial Attack Reveals Overestimation of Robustness Using Established Attacks

  • Francesco CroceEmail author
  • Jonas Rauber
  • Matthias Hein
Part of the following topical collections:
  1. Special issue on Computer Vision and Pattern Recognition


Modern neural networks are highly non-robust against adversarial manipulation. A significant amount of work has been invested in techniques to compute lower bounds on robustness through formal guarantees and to build provably robust models. However, it is still difficult to get guarantees for larger networks or robustness against larger perturbations. Thus attack strategies are needed to provide tight upper bounds on the actual robustness. We significantly improve the randomized gradient-free attack for ReLU networks (Croce and Hein in GCPR, 2018), in particular by scaling it up to large networks. We show that our attack achieves similar or significantly smaller robust accuracy than state-of-the-art attacks like PGD or the one of Carlini and Wagner, thus revealing an overestimation of the robustness by these state-of-the-art methods. Our attack is not based on a gradient descent scheme and in this sense gradient-free, which makes it less sensitive to the choice of hyperparameters as no careful selection of the stepsize is required.


Adversarial attacks Adversarial robustness White-box attacks Gradient-free attacks 



F. C. and M. H. acknowledge support from the BMBF through the Tübingen AI Center (FKZ: 01IS18039A) and by the DFG via Grant 389792660 as part of TRR 248 and the Excellence Cluster “Machine Learning-New Perspectives for Science”. J. R. acknowledges support from the Bosch Research Foundation (Stifterverband, T113/30057/17) and the International Max Planck Research School for Intelligent Systems (IMPRS-IS).


  1. Arora, R., Basuy, A., Mianjyz, P., & Mukherjee, A. (2018). Understanding deep neural networks with rectified linear unit. In ICLR.Google Scholar
  2. Athalye, A., Carlini, N., & Wagner, D. A. (2018). Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In ICML.Google Scholar
  3. Beck, A., & Teboulle, M. (2009). A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences, 2, 183–202.MathSciNetCrossRefGoogle Scholar
  4. Brendel, W., Rauber, J., & Bethge, M. (2018). Decision-based adversarial attacks: Reliable attacks against black-box machine learning models. In ICLR.Google Scholar
  5. Carlini, N., & Wagner, D. (2017a). Adversarial examples are not easily detected: Bypassing ten detection methods. In ACM workshop on artificial intelligence and security.Google Scholar
  6. Carlini, N., & Wagner, D. (2017b). Towards evaluating the robustness of neural networks. In IEEE symposium on security and privacy.Google Scholar
  7. Chambolle, A., & Pock, T. (2011). A first-order primal-dual algorithm for convex problems with applications to imaging. Journal of Mathematical Imaging and Vision, 40(1), 120–145.MathSciNetCrossRefGoogle Scholar
  8. Croce, F., Andriushchenko, M., & Hein, M. (2019). Provable robustness of ReLU networks via maximization of linear regions. In AISTATS.Google Scholar
  9. Croce, F., & Hein, M. (2018). A randomized gradient-free attack on ReLU networks. In GCPR.Google Scholar
  10. Dalvi, N., Domingos, P., Mausam, S., & Verma, D. (2004). Adversarial classification. In KDD.Google Scholar
  11. Goodfellow, I. J., Shlens, J., & Szegedy, C. (2015). Explaining and harnessing adversarial examples. In ICLR.Google Scholar
  12. He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. CVPR (pp. 770–778).Google Scholar
  13. Hein, M., & Andriushchenko, M. (2017). Formal guarantees on the robustness of a classifier against adversarial manipulation. InNIPS.Google Scholar
  14. Huang, G., Liu, Z., & Weinberger, K. Q. (2016a). Densely connected convolutional networks. In CoRR, abs/1608.06993.Google Scholar
  15. Huang, R., Xu, B., Schuurmans, D., & Szepesvari, C. (2016b). Learning with a strong adversary. In ICLR.Google Scholar
  16. Katz, G., Barrett, C., Dill, D., Julian, K., & Kochenderfer, M. (2017). Reluplex: An efficient SMT solver for verifying deep neural networks. In CAV.Google Scholar
  17. Krizhevsky, A., Nair, V., & Hinton, G. (2014). Cifar-10 (canadian institute for advanced research).
  18. Kurakin, A., Goodfellow, I. J., & Bengio, S. (2017). Adversarial examples in the physical world. In ICLR workshop.Google Scholar
  19. Liu, Y., Chen, X., Liu, C., & Song, D. (2017). Delving into transferable adversarial examples and black-box attacks. In ICLR.Google Scholar
  20. Lowd, D., & Meek, C. (2005). Adversarial learning. In KDD.Google Scholar
  21. Madry, A., Makelov, A., Schmidt, L., Tsipras, D., & Valdu, A. (2018). Towards deep learning models resistant to adversarial attacks. In ICLR.Google Scholar
  22. Mirman, M., Gehr, T., & Vechev, M. (2018). Differentiable abstract interpretation for provably robust neural networks. In ICML.Google Scholar
  23. Moosavi-Dezfooli, S.-M., Fawzi, A., & Frossard, P. (2016). Deepfool: A simple and accurate method to fool deep neural networks. In CVPR (pp. 2574–2582).Google Scholar
  24. Mosbach, M., Andriushchenko, M., Trost, T., Hein, M., & Klakow, D. (2018). Logit pairing methods can fool gradient-based attacks. In NeurIPS 2018 workshop on security in machine learning. arXiv:1810.12042.
  25. Narodytska, N., & Kasiviswanathan, S. P. (2016). Simple black-box adversarial perturbations for deep networks. In CVPR 2017 Workshops.Google Scholar
  26. Nesterov, Y. E. (1983). A method of solving a convex programming problem with convergence rate O\((1/k^2)\). Soviet Mathematics Doklady, 27(2), 372–376.Google Scholar
  27. Papernot, N., Carlini, N., Goodfellow, I., Feinman, R., Faghri, F., & Matyasko, A., et al. (2017). cleverhans v2.0.0: An adversarial machine learning library. preprint arXiv:1610.00768.
  28. Papernot, N., McDonald, P., Wu, X., Jha, S., & Swami, A. (2016). Distillation as a defense to adversarial perturbations against deep networks. In IEEE symposium on security & privacy.Google Scholar
  29. Raghunathan, A., Steinhardt, J., & Liang, P. (2018). Certified defenses against adversarial examples. In ICLR.Google Scholar
  30. Rauber, J., Brendel, W., & Bethge, M. (2017). Foolbox: A python toolbox to benchmark the robustness of machine learning models. In ICML reliable machine learning in the wild workshop.Google Scholar
  31. Schott, L., Rauber, J., Bethge, M., & Brendel, W. (2019). Towards the first adversarially robust neural network model on MNIST. In ICLR.Google Scholar
  32. Stallkamp, J., Schlipsing, M., Salmen, J., & Igel, C. (2012). Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition. Neural Networks, 32, 323–332.CrossRefGoogle Scholar
  33. Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., & Goodfellow, I., et al. (2014). Intriguing properties of neural networks. In ICLR (pp. 2503–2511).Google Scholar
  34. Tjeng, V., Xiao, K., & Tedrake, R. (2019). Evaluating robustness of neural networks with mixed integer programming. preprint arXiv:1711.07356v3.
  35. Weng, T., Zhang, H., Chen, H., Song, Z., Hsieh, C., & Daniel, L., et al. (2018). Towards fast computation of certified robustness for ReLU networks. In ICML.Google Scholar
  36. Wong, E., & Kolter, J. Z. (2018). Provable defenses against adversarial examples via the convex outer adversarial polytope. In ICML.Google Scholar
  37. Wong, E., Schmidt, F., Metzen, J. H., & Kolter, J. Z. (2018). Scaling provable adversarial defenses. In NeurIPS.Google Scholar
  38. Yuan, X., He, P., Zhu, Q., Bhat, R. R., & Li, X. (2019). Adversarial examples: Attacks and defenses for deep learning. IEEE Transactions on Neural Networks and Learning Systems, 30, 2805–2824.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of TübingenTübingenGermany

Personalised recommendations