Calibrated Multi-probabilistic Prediction as a Defense Against Adversarial Attacks

Peck, Jonathan; Goossens, Bart; Saeys, Yvan

doi:10.1007/978-3-030-65154-1_6

Jonathan Peck^12,13,
Bart Goossens¹⁴ &
Yvan Saeys^12,13

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1196))

Included in the following conference series:

503 Accesses
1 Citations

Abstract

Machine learning (ML) classifiers—in particular deep neural networks—are surprisingly vulnerable to so-called adversarial examples. These are small modifications of natural inputs which drastically alter the output of the model even though no relevant features appear to have been modified. One explanation that has been offered for this phenomenon is the calibration hypothesis, which states that the probabilistic predictions of typical ML models are miscalibrated. As a result, classifiers can often be very confident in completely erroneous predictions. Based on this idea, we propose the MultIVAP algorithm for defending arbitrary ML models against adversarial examples. Our method is inspired by the inductive Venn-ABERS predictor (IVAP) technique from the field of conformal prediction. The IVAP enjoys the theoretical guarantee that its predictions will be perfectly calibrated, thus addressing the problem of miscalibration. Experimental results on five image classification tasks demonstrate empirically that the MultIVAP has a reasonably small computational overhead and provides significantly higher adversarial robustness without sacrificing accuracy on clean data. This increase in robustness is observed both against defense-oblivious attacks as well as a defense-aware white-box attack specifically designed for the MultIVAP.

We make our code available at https://github.com/saeyslab/multivap.

We thank the NVIDIA Corporation for the donation of a Titan Xp GPU with which we were able to carry out our experiments. Jonathan Peck is sponsored by a fellowship of the Research Foundation Flanders (FWO). Yvan Saeys is an ISAC Marylou Ingram scholar.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Of course, there is the classic result of [25] which states that, in expectation, the cross entropy loss is minimized if and only if the model perfectly recovers the data distribution. In practice, however, we rarely minimize this loss exactly. It is currently a major open problem in deep learning to provide similar guarantees when the model fit is suboptimal.
2.
Guarantees can also be given for non-convex problems, but these usually require at least bounded iterates or a Lipschitz continuous gradient [78]. Such assumptions are often violated or difficult to verify in practice.
3.
A bag or multiset is a collection of objects where the order is irrelevant (like a set) but duplicates are allowed (like a list).
4.
A probability distribution is said to be exchangeable if every permutation of a sequence is equally likely.
5.
See [11] for an overview of the various desiderata that an adversarial defense evaluation should satisfy.
6.
We use the \(\ell _\infty \) norm everywhere as this is recommended by [51]. However, the attack can be trivially adapted to any other norm.
7.
Implementation available at https://github.com/ashafahi/free_adv_train. Accessed 2020-06-17.

References

Abadi, M., et al.: TensorFlow: a system for large-scale machine learning. In: 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), pp. 265–283 (2016)
Google Scholar
Andriushchenko, M., Hein, M.: Provably robust boosted decision stumps and trees against adversarial attacks. In: Advances in Neural Information Processing Systems, pp. 12997–13008 (2019)
Google Scholar
Arora, S., Ge, R., Neyshabur, B., Zhang, Y.: Stronger generalization bounds for deep nets via a compression approach. In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning. Proceedings of Machine Learning Research, Stockholmsmässan, Stockholm, Sweden, vol. 80, pp. 254–263. PMLR, 10–15 July 2018. http://proceedings.mlr.press/v80/arora18b.html
Athalye, A., Carlini, N., Wagner, D.: Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples. arXiv preprint arXiv:1802.00420 (2018)
Azuma, K.: Weighted sums of certain dependent random variables. Tohoku Math. J. Second Ser. 19(3), 357–367 (1967)
Article MathSciNet Google Scholar
Biggio, B., Roli, F.: Wild patterns: ten years after the rise of adversarial machine learning. Pattern Recognit. 84, 317–331 (2018)
Article Google Scholar
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)
Book Google Scholar
Brendel, W., Rauber, J., Bethge, M.: Decision-based adversarial attacks: reliable attacks against black-box machine learning models. arXiv preprint arXiv:1712.04248 (2017)
Card, D., Zhang, M., Smith, N.A.: Deep weighted averaging classifiers. In: Proceedings of the Conference on Fairness, Accountability, and Transparency, FAT* 2019, pp. 369–378. ACM, New York (2019). https://doi.org/10.1145/3287560.3287595
Carlini, N.: Is Am I (attacks meet interpretability) robust to adversarial examples? arXiv preprint arXiv:1902.02322 (2019)
Carlini, N., et al.: On evaluating adversarial robustness. arXiv preprint arXiv:1902.06705 (2019)
Carlini, N., Wagner, D.: Adversarial examples are not easily detected: bypassing ten detection methods. In: Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pp. 3–14. ACM (2017)
Google Scholar
Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks. In: 2017 IEEE Symposium on Security and Privacy (SP), pp. 39–57. IEEE (2017)
Google Scholar
Cheng, S., Dong, Y., Pang, T., Su, H., Zhu, J.: Improving black-box adversarial attacks with a transfer-based prior. In: Advances in Neural Information Processing Systems 32, pp. 10934–10944. Curran Associates, Inc. (2019)
Google Scholar
Chollet, F., et al.: Keras (2015). https://keras.io
Cohen, J.M., Rosenfeld, E., Kolter, J.Z.: Certified adversarial robustness via randomized smoothing. arXiv preprint arXiv:1902.02918 (2019)
Cullina, D., Bhagoji, A.N., Mittal, P.: Pac-learning in the presence of adversaries. In: Advances in Neural Information Processing Systems, pp. 230–241 (2018)
Google Scholar
Dalvi, N., Domingos, P., Sanghai, S., Verma, D., et al.: Adversarial classification. In: Proceedings of the tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 99–108. ACM (2004)
Google Scholar
De Vries, H., Memisevic, R., Courville, A.C.: Deep learning vector quantization. In: ESANN (2016)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR09 (2009)
Google Scholar
Doob, J.L.: Regularity properties of certain families of chance variables. Trans. Am. Math. Soc. 47(3), 455–486 (1940)
Article MathSciNet Google Scholar
Ebrahimi, J., Rao, A., Lowd, D., Dou, D.: HotFlip: white-box adversarial examples for text classification. arXiv preprint arXiv:1712.06751 (2017)
Elson, J., Douceur, J.J., Howell, J., Saul, J.: Asirra: a CAPTCHA that exploits interest-aligned manual image categorization. In: Proceedings of 14th ACM Conference on Computer and Communications Security (CCS). Association for Computing Machinery, Inc., October 2007
Google Scholar
Engstrom, L., Madry, A.: Understanding the landscape of adversarial robustness. Ph.D. thesis, Massachusetts Institute of Technology (2019)
Google Scholar
Friedman, J., Hastie, T., Tibshirani, R.: The Elements of Statistical Learning, vol. 1. Springer, New York (2001). https://doi.org/10.1007/978-0-387-21606-5
Book MATH Google Scholar
Gal, Y.: Uncertainty in deep learning. Ph.D. thesis, University of Cambridge (2016)
Google Scholar
Gal, Y., Ghahramani, Z.: Dropout as a Bayesian approximation: representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016)
Google Scholar
Gelada, C., Buckman, J.: Bayesian neural networks need not concentrate (2020). https://jacobbuckman.com/2020-01-22-bayesian-neural-networks-need-not-concentrate/
Gilmer, J., Adams, R.P., Goodfellow, I., Andersen, D., Dahl, G.E.: Motivating the rules of the game for adversarial example research. arXiv preprint arXiv:1807.06732 (2018)
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016). http://www.deeplearningbook.org
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014)
Gourdeau, P., Kanade, V., Kwiatkowska, M., Worrell, J.: On the hardness of robust classification. In: Advances in Neural Information Processing Systems, pp. 7444–7453 (2019)
Google Scholar
Grumer, C., Peck, J., Olumofin, F., Nascimento, A., De Cock, M.: Hardening DGA classifiers utilizing IVAP. In: IEEE Big Data (2019)
Google Scholar
Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 70, pp. 1321–1330. PMLR, International Convention Centre, Sydney, Australia, 06–11 August 2017
Google Scholar
Guo, Y., Yan, Z., Zhang, C.: Subspace attack: exploiting promising subspaces for query-efficient black-box attacks. In: Advances in Neural Information Processing Systems 32, pp. 3825–3834. Curran Associates, Inc. (2019)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Hein, M., Andriushchenko, M., Bitterwolf, J.: Why ReLU networks yield high-confidence predictions far away from the training data and how to mitigate the problem. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 41–50 (2019)
Google Scholar
Hoeffding, W.: Probability inequalities for sums of bounded random variables. In: Fisher, N.I., Sen, P.K. (eds.) The Collected Works of Wassily Hoeffding. SSS, pp. 409–426. Springer, New York (1994). https://doi.org/10.1007/978-1-4612-0865-5_26
Chapter MATH Google Scholar
Jaccard, P.: The distribution of the flora in the alpine zone. New Phytol. 11(2), 37–50 (1912)
Article Google Scholar
Jain, H., Balasubramanian, V., Chunduri, B., Varma, M.: Slice: scalable linear extreme classifiers trained on 100 million labels for related searches. In: Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, pp. 528–536. ACM (2019)
Google Scholar
Jaynes, E.T.: Prior probabilities. IEEE Trans. Syst. Sci. Cybern. 4(3), 227–241 (1968)
Article Google Scholar
Jordan, M.I., Ghahramani, Z., Jaakkola, T.S., Saul, L.K.: An introduction to variational methods for graphical models. Mach. Learn. 37(2), 183–233 (1999)
Article Google Scholar
Kanbak, C., Moosavi-Dezfooli, S.M., Frossard, P.: Geometric robustness of deep networks: analysis and improvement. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4441–4449 (2018)
Google Scholar
Kannan, H., Kurakin, A., Goodfellow, I.: Adversarial logit pairing. arXiv preprint arXiv:1803.06373 (2018)
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Klaus, B., Strimmer, K.: fdrtool: Estimation of (Local) False Discovery Rates and Higher Criticism (2015). https://CRAN.R-project.org/package=fdrtool. r package version 1.2.15
Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images. Technical Report, Citeseer (2009)
Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P., et al.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Lowd, D., Meek, C.: Adversarial learning. In: Proceedings of the eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp. 641–647. ACM (2005)
Google Scholar
Maddox, W.J., Izmailov, P., Garipov, T., Vetrov, D.P., Wilson, A.G.: A simple baseline for Bayesian uncertainty in deep learning. In: Advances in Neural Information Processing Systems 32, pp. 13132–13143. Curran Associates, Inc. (2019)
Google Scholar
Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083 (2017)
Manokhin, V.: Multi-class probabilistic classification using inductive and cross Venn-Abers predictors. In: Conformal and Probabilistic Prediction and Applications, pp. 228–240 (2017)
Google Scholar
McDiarmid, C.: On the method of bounded differences. Surv. Comb. 141(1), 148–188 (1989)
MathSciNet MATH Google Scholar
Meinke, A., Hein, M.: Towards neural networks that provably know when they don’t know. arXiv preprint arXiv:1909.12180 (2019)
Moosavi-Dezfooli, S.M., Fawzi, A., Fawzi, O., Frossard, P.: Universal adversarial perturbations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1765–1773 (2017)
Google Scholar
Moosavi-Dezfooli, S.M., Fawzi, A., Frossard, P.: DeepFool: a simple and accurate method to fool deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2574–2582 (2016)
Google Scholar
Murphy, K.P.: Machine Learning: A Probabilistic Perspective. MIT Press, Cambridge (2012)
MATH Google Scholar
Nakkiran, P.: Adversarial robustness may be at odds with simplicity. arXiv preprint arXiv:1901.00532 (2019)
Narodytska, N., Kasiviswanathan, S.P.: Simple black-box adversarial perturbations for deep networks. arXiv preprint arXiv:1612.06299 (2016)
Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning. In: NIPS Workshop on Deep Learning and Unsupervised Feature Learning (2011)
Google Scholar
Papernot, N., McDaniel, P.: Deep k-nearest neighbors: towards confident, interpretable and robust deep learning. arXiv preprint arXiv:1803.04765 (2018)
Peck, J., Goossens, B., Saeys, Y.: Detecting adversarial examples with inductive Venn-Abers predictors. In: European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, pp. 143–148 (2019)
Google Scholar
Platt, J., et al.: Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv. Large Margin Cl. 10(3), 61–74 (1999)
Google Scholar
Price, D., Knerr, S., Personnaz, L., Dreyfus, G.: Pairwise neural network classifiers with probabilistic outputs. In: Advances in Neural Information Processing Systems, pp. 1109–1116 (1995)
Google Scholar
Qin, Y., Carlini, N., Cottrell, G., Goodfellow, I., Raffel, C.: Imperceptible, robust, and targeted adversarial examples for automatic speech recognition. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning. Proceedings of Machine Learning Research, Long Beach, California, USA, vol. 97, pp. 5231–5240. PMLR, June 2019
Google Scholar
R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2015). https://www.R-project.org/
Raghunathan, A., Steinhardt, J., Liang, P.S.: Semidefinite relaxations for certifying robustness to adversarial examples. In: Advances in Neural Information Processing Systems, pp. 10877–10887 (2018)
Google Scholar
Rauber, J., Brendel, W., Bethge, M.: Foolbox v0. 8.0: A Python toolbox to benchmark the robustness of machine learning models. arXiv preprint arXiv:1707.04131 5 (2017)
Schmidt, L., Santurkar, S., Tsipras, D., Talwar, K., Madry, A.: Adversarially robust generalization requires more data. In: Advances in Neural Information Processing Systems, pp. 5014–5026 (2018)
Google Scholar
Schneier, B.: Schneier’s law (2011). https://www.schneier.com/blog/archives/2011/04/schneiers_law.html
Shafahi, A., et al.: Adversarial training for free! In: Advances in Neural Information Processing Systems 32, pp. 3353–3364. Curran Associates, Inc. (2019)
Google Scholar
Shafer, G., Vovk, V.: A tutorial on conformal prediction. J. Mach. Learn. Res. 9, 371–421 (2008)
MathSciNet MATH Google Scholar
Shen, J., et al.: Lingvo: a modular and scalable framework for sequence-to-sequence modeling. arXiv preprint arXiv:1902.08295 (2019)
Sinha, A., Namkoong, H., Duchi, J.: Certifiable distributional robustness with principled adversarial training. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=Hk6kPgZA-
Sitawarin, C., Wagner, D.: On the robustness of deep k-nearest neighbors. arXiv preprint arXiv:1903.08333 (2019)
So, D., Le, Q., Liang, C.: The evolved transformer. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning. Proceedings of Machine Learning Research, Long Beach, California, USA, vol. 97, pp. 5877–5886. PMLR, June 2019
Google Scholar
Su, J., Vargas, D.V., Sakurai, K.: One pixel attack for fooling deep neural networks. IEEE Trans. Evol. Comput. 23, 828–841 (2019)
Article Google Scholar
Sun, R.: Optimization for deep learning: theory and algorithms. arXiv preprint arXiv:1912.08957 (2019)
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)
Google Scholar
Szegedy, C., et al.: Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013)
Tanay, T., Griffin, L.: A boundary tilting persepective on the phenomenon of adversarial examples. arXiv preprint arXiv:1608.07690 (2016)
Toccaceli, P.: Venn-ABERS predictor (2017). https://github.com/ptocca/VennABERS
Tuy, H.: Convex Analysis and Global Optimization. Springer, New York (1998). https://doi.org/10.1007/978-1-4757-2809-5
Book MATH Google Scholar
Vorobeychik, Y., Li, B.: Optimal randomized classification in adversarial settings. In: Proceedings of the 2014 International Conference on Autonomous Agents and Multi-Agent Systems, pp. 485–492. International Foundation for Autonomous Agents and Multiagent Systems (2014)
Google Scholar
Vovk, V., Gammerman, A., Shafer, G.: Algorithmic Learning in a Random World. Springer, Boston (2005). https://doi.org/10.1007/b106715
Book MATH Google Scholar
Vovk, V., Petej, I., Fedorova, V.: Large-scale probabilistic predictors with and without guarantees of validity. In: Advances in Neural Information Processing Systems, pp. 892–900 (2015)
Google Scholar
Wasserman, L.: Frasian inference. Stat. Sci., 322–325 (2011)
Google Scholar
Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms (2017)
Google Scholar
Yin, D., Ramchandran, K., Bartlett, P.: Rademacher complexity for adversarially robust generalization. arXiv preprint arXiv:1810.11914 (2018)
Zagoruyko, S., Komodakis, N.: Wide residual networks. arXiv preprint arXiv:1605.07146 (2016)
Zhang, C., Bengio, S., Hardt, M., Recht, B., Vinyals, O.: Understanding deep learning requires rethinking generalization. arXiv preprint arXiv:1611.03530 (2016)

Download references

Author information

Authors and Affiliations

Department of Applied Mathematics, Computer Science and Statistics, Ghent University, 9000, Ghent, Belgium
Jonathan Peck & Yvan Saeys
Data Mining and Modeling for Biomedicine, VIB Inflammation Research Center, 9052, Ghent, Belgium
Jonathan Peck & Yvan Saeys
Department of Telecommunications and Information Processing, IMEC/Ghent University, 9000, Ghent, Belgium
Bart Goossens

Authors

Jonathan Peck
View author publications
You can also search for this author in PubMed Google Scholar
Bart Goossens
View author publications
You can also search for this author in PubMed Google Scholar
Yvan Saeys
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jonathan Peck .

Editor information

Editors and Affiliations

Vrije Universiteit Brussel, Brussels, Belgium
Bart Bogaerts
Université Libre de Bruxelles, Brussels, Belgium
Gianluca Bontempi
Université de Liège, Liège, Belgium
Pierre Geurts
Vrije Universiteit Brussel, Brussels, Belgium
Nick Harley
Université Libre de Bruxelles, Brussels, Belgium
Bertrand Lebichot
Université Libre de Bruxelles, Brussels, Belgium
Tom Lenaerts
Université de Liège, Liège, Belgium
Gilles Louppe

A Network Architectures

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Peck, J., Goossens, B., Saeys, Y. (2020). Calibrated Multi-probabilistic Prediction as a Defense Against Adversarial Attacks. In: Bogaerts, B., et al. Artificial Intelligence and Machine Learning. BNAIC BENELEARN 2019 2019. Communications in Computer and Information Science, vol 1196. Springer, Cham. https://doi.org/10.1007/978-3-030-65154-1_6

Download citation

DOI: https://doi.org/10.1007/978-3-030-65154-1_6
Published: 05 January 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-65153-4
Online ISBN: 978-3-030-65154-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Calibrated Multi-probabilistic Prediction as a Defense Against Adversarial Attacks

Abstract

Access this chapter

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Network Architectures

A Network Architectures

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation