How to Compare Adversarial Robustness of Classifiers from a Global Perspective

Risse, Niklas; Göpfert, Christina; Göpfert, Jan Philip

doi:10.1007/978-3-030-86362-3_3

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12891))

Included in the following conference series:

International Conference on Artificial Neural Networks

2984 Accesses
1 Citations

Abstract

Adversarial robustness of machine learning models has attracted considerable attention over recent years. Adversarial attacks undermine the reliability of and trust in machine learning models, but the construction of more robust models hinges on a rigorous understanding of adversarial robustness as a property of a given model. Point-wise measures for specific threat models are currently the most popular tool for comparing the robustness of classifiers and are used in most recent publications on adversarial robustness. In this work, we use robustness curves to show that point-wise measures fail to capture important global properties that are essential to reliably compare the robustness of different classifiers. We introduce new ways in which robustness curves can be used to systematically uncover these properties and provide concrete recommendations for researchers and practitioners when assessing and comparing the robustness of trained models. Furthermore, we characterize scale as a way to distinguish small and large perturbations, and relate it to inherent properties of data sets, demonstrating that robustness thresholds must be chosen accordingly. We hope that our work contributes to a shift of focus away from point-wise measures of robustness and towards a discussion of the question what kind of robustness could and should reasonably be expected. We release code to reproduce all experiments presented in this paper, which includes a Python module to calculate robustness curves for arbitrary data sets and classifiers, supporting a number of frameworks, including TensorFlow, PyTorch and JAX.

N. Risse, C. Göpfert and J. P. Göpfert—Equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Single thresholds: [1, 4, 10, 26, 29, 31,32,33, 35, 36, 41, 42], multiple thresholds: [3, 16, 20, 24, 38], full analysis: [7, 21, 25, 27, 28].
2.
The full code is available at www.github.com/niklasrisse/how-to-compare-adversari al-robustness-of-classifiers-from-a-global-perspective.
3.
The models trained with ST, KW, AT and MMR + AT are avaible at www.github.com/max-andr/provable-robustness-max-linear-regions.
4.
The models trained with MMR-UNIV are avaible at www.github.com/fra31/mmr-universal.

References

Alayrac, J.-B., Uesato, J., Huang, P.-S., Fawzi, A., Stanforth, R., Kohli, P.: Are labels required for improving adversarial robustness? In: NeurIPS (2019)
Google Scholar
Anguita, D., Ghio, A., Oneto, L., Parra, X., Reyes-Ortiz, J.: A public domain dataset for human activity recognition using smartphones. In: ESANN (2013)
Google Scholar
Boopathy, A., et al.: Proper network interpretability helps adversarial robustness in classification. In: ICML (2020)
Google Scholar
Brendel, W., Rauber, J., Kümmerer, M., Ustyuzhaninov, I., Bethge, M.: Accurate, reliable and fast robustness evaluation. In: NeurIPS (2019)
Google Scholar
Carlini, N., et al.: On evaluating adversarial robustness (2019). arXiv: 1902.06705
Carlini, N., Wagner, D.A.: Towards evaluating the robustness of neural networks. In: 2017 IEEE Symposium on Security and Privacy (SP) (2017)
Google Scholar
Carmon, Y., Raghunathan, A., Schmidt, L., Liang, P., Duchi, J.C.: Unlabeled data improves adversarial robustness. In: NeurIPS (2019)
Google Scholar
Cohen, J., Rosenfeld, E., Kolter, Z.: Certified adversarial robustness via randomized smoothing. In: ICML (2019)
Google Scholar
Croce, F., Andriushchenko, M., Hein, M.: Provable robustness of ReLU networks via maximization of linear regions. In: PMLR (2019)
Google Scholar
Croce, F., Hein, M.: Provable robustness against all adversarial l\(_{p}\)-perturbations for p \(\ge \) 1. In: International Conference on Learning Representations (2020)
Google Scholar
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: 3rd International Conference on Learning Representations (2015)
Google Scholar
Göpfert, C., Göpfert, J.P., Hammer, B.: Adversarial robustness curves. In: Machine Learning and Knowledge Discovery in Databases (2020)
Google Scholar
Göpfert, J.P., Artelt, A., Wersing, H., Hammer, B.: Adversarial attacks hidden in plain sight. In: Symposium on Intelligent Data Analysis (2020)
Google Scholar
Guo, C., Rana, M., Cisse, M., van der Maaten, L.: Countering adversarial images using input transformations (2017). arXiv: 1711.00117
Hein, M., Andriushchenko, M.: Formal guarantees on the robustness of a classifier against adversarial manipulation (2017). arXiv: 1705.08475
Hendrycks, D., Mazeika, M., Kadavath, S., Song, D.: Using self-supervised learning can improve model robustness and uncertainty. In: NeurIPS (2019)
Google Scholar
Houben, S., Stallkamp, J., Salmen, J., Schlipsing, M., Igel, C.: Detection of traffic signs in real-world images: the german traffic sign detection benchmark. In: IJCNN (2013)
Google Scholar
Krizhevsky, A.: Learning multiple layers of features from tiny images. Technical report (2009)
Google Scholar
Lecuyer, M., Atlidakis, V., Geambasu, R., Hsu, D., Jana, S.: Certified robustness to adversarial examples with differential privacy. In: 2019 IEEE Symposium on Security and Privacy (SP) (2019)
Google Scholar
Lee, G.-H., Yuan, Y., Chang, S., Jaakkola, T.: Tight certificates of adversarial robustness for randomly smoothed classifiers. In: NeurIPS (2019)
Google Scholar
Li, B., Chen, C., Wang, W., Carin, L.: Certified adversarial robustness with additive noise. In: NeurIPS (2019)
Google Scholar
Li, F.-F., Karpathy, A., Johnson, J.: CS231n: convolutional neural networks for visual recognition (2016). http://cs231n.stanford.edu/2016/project.html. Accessed 28 Mar 2020
Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning models resistant to adversarial attacks. In: ICLR (2018)
Google Scholar
Mahloujifar, S., Zhang, X., Mahmoody, M., Evans, D.: Empirically measuring concentration: fundamental limits on intrinsic robustness. In: NeurIPS (2019)
Google Scholar
Maini, P., Wong, E., Kolter, Z.: Adversarial robustness against the union of multiple threat models. In: ICML (2020)
Google Scholar
Mao, C., Zhong, Z., Yang, J., Vondrick, C., Ray, B.: Metric learning for adversarial robustness. In: NeurIPS (2019)
Google Scholar
Najafi, A., Maeda, S.-I., Koyama, M., Miyato, T.: Robustness to adversarial perturbations in learning from incomplete data. In: NeurIPS (2019)
Google Scholar
Pinot, R., et al.: Theoretical evidence for adversarial robustness through randomization. In: NeurIPS (2019)
Google Scholar
Qin, C., et al.: Adversarial robustness through local linearization. In: NeurIPS (2019)
Google Scholar
Rauber, J., Brendel, W., Bethge, M.: Foolbox: a Python toolbox to benchmark the robustness of machine learning models (2017). arXiv: 1707.04131
Rice, L., Wong, E., Kolter, Z.: Overfitting in adversarially robust deep learning. In: ICML (2020)
Google Scholar
Singla, S., Feizi, S.: Second-order provable defenses against adversarial attacks. In: ICML (2020)
Google Scholar
Song, C., He, K., Lin, J., Wang, L., Hopcroft, J.E.: Robust local features for improving the generalization of adversarial training. In: ICLR (2020)
Google Scholar
Szegedy, C., et al.: Intriguing properties of neural networks (2014). arXiv: 1312.6199
Tramer, F., Boneh, D.: Adversarial training and robustness for multiple perturbations. In: NeurIPS (2019)
Google Scholar
Wang, Y., Zou, D., Yi, J., Bailey, J., Ma, X., Gu, Q.: Improving adversarial robustness requires revisiting misclassified examples. In: ICLR (2020)
Google Scholar
Wong, E., Kolter, Z.: Provable defenses against adversarial examples via the convex outer adversarial polytope. In: ICML (2018)
Google Scholar
Wong, E., Rice, L., Kolter, J.Z.: Fast is better than free: revisiting adversarial training. In: ICLR (2020)
Google Scholar
Wong, E., Schmidt, F.R., Kolter, J.Z.: Wasserstein adversarial examples via projected Sinkhorn iterations. In: ICML (2019)
Google Scholar
Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms (2017). arXiv: 1708.07747
Xie, C., Yuille, A.: Intriguing properties of adversarial training at scale. In: ICLR (2020)
Google Scholar
Zhang, J., et al.: Attacks which do not kill training make adversarial learning stronger. In: ICML (2020)
Google Scholar

Download references

Author information

Authors and Affiliations

Bielefeld University, Bielefeld, Germany
Niklas Risse, Christina Göpfert & Jan Philip Göpfert

Authors

Niklas Risse
View author publications
You can also search for this author in PubMed Google Scholar
Christina Göpfert
View author publications
You can also search for this author in PubMed Google Scholar
Jan Philip Göpfert
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Christina Göpfert .

Editor information

Editors and Affiliations

Comenius University in Bratislava, Bratislava, Slovakia
Igor Farkaš
iMotions A/S, Copenhagen, Denmark
Paolo Masulli
University of Tübingen, Tübingen, Baden-Württemberg, Germany
Sebastian Otte
Universität Hamburg, Hamburg, Germany
Stefan Wermter

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Risse, N., Göpfert, C., Göpfert, J.P. (2021). How to Compare Adversarial Robustness of Classifiers from a Global Perspective. In: Farkaš, I., Masulli, P., Otte, S., Wermter, S. (eds) Artificial Neural Networks and Machine Learning – ICANN 2021. ICANN 2021. Lecture Notes in Computer Science(), vol 12891. Springer, Cham. https://doi.org/10.1007/978-3-030-86362-3_3

Download citation

DOI: https://doi.org/10.1007/978-3-030-86362-3_3
Published: 07 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86361-6
Online ISBN: 978-3-030-86362-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics