Abstract
Adversarial robustness of machine learning models has attracted considerable attention over recent years. Adversarial attacks undermine the reliability of and trust in machine learning models, but the construction of more robust models hinges on a rigorous understanding of adversarial robustness as a property of a given model. Point-wise measures for specific threat models are currently the most popular tool for comparing the robustness of classifiers and are used in most recent publications on adversarial robustness. In this work, we use robustness curves to show that point-wise measures fail to capture important global properties that are essential to reliably compare the robustness of different classifiers. We introduce new ways in which robustness curves can be used to systematically uncover these properties and provide concrete recommendations for researchers and practitioners when assessing and comparing the robustness of trained models. Furthermore, we characterize scale as a way to distinguish small and large perturbations, and relate it to inherent properties of data sets, demonstrating that robustness thresholds must be chosen accordingly. We hope that our work contributes to a shift of focus away from point-wise measures of robustness and towards a discussion of the question what kind of robustness could and should reasonably be expected. We release code to reproduce all experiments presented in this paper, which includes a Python module to calculate robustness curves for arbitrary data sets and classifiers, supporting a number of frameworks, including TensorFlow, PyTorch and JAX.
N. Risse, C. Göpfert and J. P. Göpfert—Equal contribution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
The full code is available at www.github.com/niklasrisse/how-to-compare-adversarial-robustness-of-classifiers-from-a-global-perspective.
- 3.
The models trained with ST, KW, AT and MMR + AT are avaible at www.github.com/max-andr/provable-robustness-max-linear-regions.
- 4.
The models trained with MMR-UNIV are avaible at www.github.com/fra31/mmr-universal.
References
Alayrac, J.-B., Uesato, J., Huang, P.-S., Fawzi, A., Stanforth, R., Kohli, P.: Are labels required for improving adversarial robustness? In: NeurIPS (2019)
Anguita, D., Ghio, A., Oneto, L., Parra, X., Reyes-Ortiz, J.: A public domain dataset for human activity recognition using smartphones. In: ESANN (2013)
Boopathy, A., et al.: Proper network interpretability helps adversarial robustness in classification. In: ICML (2020)
Brendel, W., Rauber, J., Kümmerer, M., Ustyuzhaninov, I., Bethge, M.: Accurate, reliable and fast robustness evaluation. In: NeurIPS (2019)
Carlini, N., et al.: On evaluating adversarial robustness (2019). arXiv: 1902.06705
Carlini, N., Wagner, D.A.: Towards evaluating the robustness of neural networks. In: 2017 IEEE Symposium on Security and Privacy (SP) (2017)
Carmon, Y., Raghunathan, A., Schmidt, L., Liang, P., Duchi, J.C.: Unlabeled data improves adversarial robustness. In: NeurIPS (2019)
Cohen, J., Rosenfeld, E., Kolter, Z.: Certified adversarial robustness via randomized smoothing. In: ICML (2019)
Croce, F., Andriushchenko, M., Hein, M.: Provable robustness of ReLU networks via maximization of linear regions. In: PMLR (2019)
Croce, F., Hein, M.: Provable robustness against all adversarial l\(_{p}\)-perturbations for p \(\ge \) 1. In: International Conference on Learning Representations (2020)
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: 3rd International Conference on Learning Representations (2015)
Göpfert, C., Göpfert, J.P., Hammer, B.: Adversarial robustness curves. In: Machine Learning and Knowledge Discovery in Databases (2020)
Göpfert, J.P., Artelt, A., Wersing, H., Hammer, B.: Adversarial attacks hidden in plain sight. In: Symposium on Intelligent Data Analysis (2020)
Guo, C., Rana, M., Cisse, M., van der Maaten, L.: Countering adversarial images using input transformations (2017). arXiv: 1711.00117
Hein, M., Andriushchenko, M.: Formal guarantees on the robustness of a classifier against adversarial manipulation (2017). arXiv: 1705.08475
Hendrycks, D., Mazeika, M., Kadavath, S., Song, D.: Using self-supervised learning can improve model robustness and uncertainty. In: NeurIPS (2019)
Houben, S., Stallkamp, J., Salmen, J., Schlipsing, M., Igel, C.: Detection of traffic signs in real-world images: the german traffic sign detection benchmark. In: IJCNN (2013)
Krizhevsky, A.: Learning multiple layers of features from tiny images. Technical report (2009)
Lecuyer, M., Atlidakis, V., Geambasu, R., Hsu, D., Jana, S.: Certified robustness to adversarial examples with differential privacy. In: 2019 IEEE Symposium on Security and Privacy (SP) (2019)
Lee, G.-H., Yuan, Y., Chang, S., Jaakkola, T.: Tight certificates of adversarial robustness for randomly smoothed classifiers. In: NeurIPS (2019)
Li, B., Chen, C., Wang, W., Carin, L.: Certified adversarial robustness with additive noise. In: NeurIPS (2019)
Li, F.-F., Karpathy, A., Johnson, J.: CS231n: convolutional neural networks for visual recognition (2016). http://cs231n.stanford.edu/2016/project.html. Accessed 28 Mar 2020
Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning models resistant to adversarial attacks. In: ICLR (2018)
Mahloujifar, S., Zhang, X., Mahmoody, M., Evans, D.: Empirically measuring concentration: fundamental limits on intrinsic robustness. In: NeurIPS (2019)
Maini, P., Wong, E., Kolter, Z.: Adversarial robustness against the union of multiple threat models. In: ICML (2020)
Mao, C., Zhong, Z., Yang, J., Vondrick, C., Ray, B.: Metric learning for adversarial robustness. In: NeurIPS (2019)
Najafi, A., Maeda, S.-I., Koyama, M., Miyato, T.: Robustness to adversarial perturbations in learning from incomplete data. In: NeurIPS (2019)
Pinot, R., et al.: Theoretical evidence for adversarial robustness through randomization. In: NeurIPS (2019)
Qin, C., et al.: Adversarial robustness through local linearization. In: NeurIPS (2019)
Rauber, J., Brendel, W., Bethge, M.: Foolbox: a Python toolbox to benchmark the robustness of machine learning models (2017). arXiv: 1707.04131
Rice, L., Wong, E., Kolter, Z.: Overfitting in adversarially robust deep learning. In: ICML (2020)
Singla, S., Feizi, S.: Second-order provable defenses against adversarial attacks. In: ICML (2020)
Song, C., He, K., Lin, J., Wang, L., Hopcroft, J.E.: Robust local features for improving the generalization of adversarial training. In: ICLR (2020)
Szegedy, C., et al.: Intriguing properties of neural networks (2014). arXiv: 1312.6199
Tramer, F., Boneh, D.: Adversarial training and robustness for multiple perturbations. In: NeurIPS (2019)
Wang, Y., Zou, D., Yi, J., Bailey, J., Ma, X., Gu, Q.: Improving adversarial robustness requires revisiting misclassified examples. In: ICLR (2020)
Wong, E., Kolter, Z.: Provable defenses against adversarial examples via the convex outer adversarial polytope. In: ICML (2018)
Wong, E., Rice, L., Kolter, J.Z.: Fast is better than free: revisiting adversarial training. In: ICLR (2020)
Wong, E., Schmidt, F.R., Kolter, J.Z.: Wasserstein adversarial examples via projected Sinkhorn iterations. In: ICML (2019)
Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms (2017). arXiv: 1708.07747
Xie, C., Yuille, A.: Intriguing properties of adversarial training at scale. In: ICLR (2020)
Zhang, J., et al.: Attacks which do not kill training make adversarial learning stronger. In: ICML (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Risse, N., Göpfert, C., Göpfert, J.P. (2021). How to Compare Adversarial Robustness of Classifiers from a Global Perspective. In: Farkaš, I., Masulli, P., Otte, S., Wermter, S. (eds) Artificial Neural Networks and Machine Learning – ICANN 2021. ICANN 2021. Lecture Notes in Computer Science(), vol 12891. Springer, Cham. https://doi.org/10.1007/978-3-030-86362-3_3
Download citation
DOI: https://doi.org/10.1007/978-3-030-86362-3_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86361-6
Online ISBN: 978-3-030-86362-3
eBook Packages: Computer ScienceComputer Science (R0)