Scaling up the Randomized Gradient-Free Adversarial Attack Reveals Overestimation of Robustness Using Established Attacks
- 30 Downloads
Modern neural networks are highly non-robust against adversarial manipulation. A significant amount of work has been invested in techniques to compute lower bounds on robustness through formal guarantees and to build provably robust models. However, it is still difficult to get guarantees for larger networks or robustness against larger perturbations. Thus attack strategies are needed to provide tight upper bounds on the actual robustness. We significantly improve the randomized gradient-free attack for ReLU networks (Croce and Hein in GCPR, 2018), in particular by scaling it up to large networks. We show that our attack achieves similar or significantly smaller robust accuracy than state-of-the-art attacks like PGD or the one of Carlini and Wagner, thus revealing an overestimation of the robustness by these state-of-the-art methods. Our attack is not based on a gradient descent scheme and in this sense gradient-free, which makes it less sensitive to the choice of hyperparameters as no careful selection of the stepsize is required.
KeywordsAdversarial attacks Adversarial robustness White-box attacks Gradient-free attacks
F. C. and M. H. acknowledge support from the BMBF through the Tübingen AI Center (FKZ: 01IS18039A) and by the DFG via Grant 389792660 as part of TRR 248 and the Excellence Cluster “Machine Learning-New Perspectives for Science”. J. R. acknowledges support from the Bosch Research Foundation (Stifterverband, T113/30057/17) and the International Max Planck Research School for Intelligent Systems (IMPRS-IS).
- Arora, R., Basuy, A., Mianjyz, P., & Mukherjee, A. (2018). Understanding deep neural networks with rectified linear unit. In ICLR.Google Scholar
- Athalye, A., Carlini, N., & Wagner, D. A. (2018). Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In ICML.Google Scholar
- Brendel, W., Rauber, J., & Bethge, M. (2018). Decision-based adversarial attacks: Reliable attacks against black-box machine learning models. In ICLR.Google Scholar
- Carlini, N., & Wagner, D. (2017a). Adversarial examples are not easily detected: Bypassing ten detection methods. In ACM workshop on artificial intelligence and security.Google Scholar
- Carlini, N., & Wagner, D. (2017b). Towards evaluating the robustness of neural networks. In IEEE symposium on security and privacy.Google Scholar
- Croce, F., Andriushchenko, M., & Hein, M. (2019). Provable robustness of ReLU networks via maximization of linear regions. In AISTATS.Google Scholar
- Croce, F., & Hein, M. (2018). A randomized gradient-free attack on ReLU networks. In GCPR.Google Scholar
- Dalvi, N., Domingos, P., Mausam, S., & Verma, D. (2004). Adversarial classification. In KDD.Google Scholar
- Goodfellow, I. J., Shlens, J., & Szegedy, C. (2015). Explaining and harnessing adversarial examples. In ICLR.Google Scholar
- He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. CVPR (pp. 770–778).Google Scholar
- Hein, M., & Andriushchenko, M. (2017). Formal guarantees on the robustness of a classifier against adversarial manipulation. InNIPS.Google Scholar
- Huang, G., Liu, Z., & Weinberger, K. Q. (2016a). Densely connected convolutional networks. In CoRR, abs/1608.06993.Google Scholar
- Huang, R., Xu, B., Schuurmans, D., & Szepesvari, C. (2016b). Learning with a strong adversary. In ICLR.Google Scholar
- Katz, G., Barrett, C., Dill, D., Julian, K., & Kochenderfer, M. (2017). Reluplex: An efficient SMT solver for verifying deep neural networks. In CAV.Google Scholar
- Krizhevsky, A., Nair, V., & Hinton, G. (2014). Cifar-10 (canadian institute for advanced research). https://www.cs.toronto.edu/~kriz/cifar.html.
- Kurakin, A., Goodfellow, I. J., & Bengio, S. (2017). Adversarial examples in the physical world. In ICLR workshop.Google Scholar
- Liu, Y., Chen, X., Liu, C., & Song, D. (2017). Delving into transferable adversarial examples and black-box attacks. In ICLR.Google Scholar
- Lowd, D., & Meek, C. (2005). Adversarial learning. In KDD.Google Scholar
- Madry, A., Makelov, A., Schmidt, L., Tsipras, D., & Valdu, A. (2018). Towards deep learning models resistant to adversarial attacks. In ICLR.Google Scholar
- Mirman, M., Gehr, T., & Vechev, M. (2018). Differentiable abstract interpretation for provably robust neural networks. In ICML.Google Scholar
- Moosavi-Dezfooli, S.-M., Fawzi, A., & Frossard, P. (2016). Deepfool: A simple and accurate method to fool deep neural networks. In CVPR (pp. 2574–2582).Google Scholar
- Mosbach, M., Andriushchenko, M., Trost, T., Hein, M., & Klakow, D. (2018). Logit pairing methods can fool gradient-based attacks. In NeurIPS 2018 workshop on security in machine learning. arXiv:1810.12042.
- Narodytska, N., & Kasiviswanathan, S. P. (2016). Simple black-box adversarial perturbations for deep networks. In CVPR 2017 Workshops.Google Scholar
- Nesterov, Y. E. (1983). A method of solving a convex programming problem with convergence rate O\((1/k^2)\). Soviet Mathematics Doklady, 27(2), 372–376.Google Scholar
- Papernot, N., Carlini, N., Goodfellow, I., Feinman, R., Faghri, F., & Matyasko, A., et al. (2017). cleverhans v2.0.0: An adversarial machine learning library. preprint arXiv:1610.00768.
- Papernot, N., McDonald, P., Wu, X., Jha, S., & Swami, A. (2016). Distillation as a defense to adversarial perturbations against deep networks. In IEEE symposium on security & privacy.Google Scholar
- Raghunathan, A., Steinhardt, J., & Liang, P. (2018). Certified defenses against adversarial examples. In ICLR.Google Scholar
- Rauber, J., Brendel, W., & Bethge, M. (2017). Foolbox: A python toolbox to benchmark the robustness of machine learning models. In ICML reliable machine learning in the wild workshop.Google Scholar
- Schott, L., Rauber, J., Bethge, M., & Brendel, W. (2019). Towards the first adversarially robust neural network model on MNIST. In ICLR.Google Scholar
- Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., & Goodfellow, I., et al. (2014). Intriguing properties of neural networks. In ICLR (pp. 2503–2511).Google Scholar
- Tjeng, V., Xiao, K., & Tedrake, R. (2019). Evaluating robustness of neural networks with mixed integer programming. preprint arXiv:1711.07356v3.
- Weng, T., Zhang, H., Chen, H., Song, Z., Hsieh, C., & Daniel, L., et al. (2018). Towards fast computation of certified robustness for ReLU networks. In ICML.Google Scholar
- Wong, E., & Kolter, J. Z. (2018). Provable defenses against adversarial examples via the convex outer adversarial polytope. In ICML.Google Scholar
- Wong, E., Schmidt, F., Metzen, J. H., & Kolter, J. Z. (2018). Scaling provable adversarial defenses. In NeurIPS.Google Scholar