Abstract
Machine learning (ML) classifiers—in particular deep neural networks—are surprisingly vulnerable to so-called adversarial examples. These are small modifications of natural inputs which drastically alter the output of the model even though no relevant features appear to have been modified. One explanation that has been offered for this phenomenon is the calibration hypothesis, which states that the probabilistic predictions of typical ML models are miscalibrated. As a result, classifiers can often be very confident in completely erroneous predictions. Based on this idea, we propose the MultIVAP algorithm for defending arbitrary ML models against adversarial examples. Our method is inspired by the inductive Venn-ABERS predictor (IVAP) technique from the field of conformal prediction. The IVAP enjoys the theoretical guarantee that its predictions will be perfectly calibrated, thus addressing the problem of miscalibration. Experimental results on five image classification tasks demonstrate empirically that the MultIVAP has a reasonably small computational overhead and provides significantly higher adversarial robustness without sacrificing accuracy on clean data. This increase in robustness is observed both against defense-oblivious attacks as well as a defense-aware white-box attack specifically designed for the MultIVAP.
We make our code available at https://github.com/saeyslab/multivap.
We thank the NVIDIA Corporation for the donation of a Titan Xp GPU with which we were able to carry out our experiments. Jonathan Peck is sponsored by a fellowship of the Research Foundation Flanders (FWO). Yvan Saeys is an ISAC Marylou Ingram scholar.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Of course, there is the classic result of [25] which states that, in expectation, the cross entropy loss is minimized if and only if the model perfectly recovers the data distribution. In practice, however, we rarely minimize this loss exactly. It is currently a major open problem in deep learning to provide similar guarantees when the model fit is suboptimal.
- 2.
Guarantees can also be given for non-convex problems, but these usually require at least bounded iterates or a Lipschitz continuous gradient [78]. Such assumptions are often violated or difficult to verify in practice.
- 3.
A bag or multiset is a collection of objects where the order is irrelevant (like a set) but duplicates are allowed (like a list).
- 4.
A probability distribution is said to be exchangeable if every permutation of a sequence is equally likely.
- 5.
See [11] for an overview of the various desiderata that an adversarial defense evaluation should satisfy.
- 6.
We use the \(\ell _\infty \) norm everywhere as this is recommended by [51]. However, the attack can be trivially adapted to any other norm.
- 7.
Implementation available at https://github.com/ashafahi/free_adv_train. Accessed 2020-06-17.
References
Abadi, M., et al.: TensorFlow: a system for large-scale machine learning. In: 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), pp. 265–283 (2016)
Andriushchenko, M., Hein, M.: Provably robust boosted decision stumps and trees against adversarial attacks. In: Advances in Neural Information Processing Systems, pp. 12997–13008 (2019)
Arora, S., Ge, R., Neyshabur, B., Zhang, Y.: Stronger generalization bounds for deep nets via a compression approach. In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning. Proceedings of Machine Learning Research, Stockholmsmässan, Stockholm, Sweden, vol. 80, pp. 254–263. PMLR, 10–15 July 2018. http://proceedings.mlr.press/v80/arora18b.html
Athalye, A., Carlini, N., Wagner, D.: Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples. arXiv preprint arXiv:1802.00420 (2018)
Azuma, K.: Weighted sums of certain dependent random variables. Tohoku Math. J. Second Ser. 19(3), 357–367 (1967)
Biggio, B., Roli, F.: Wild patterns: ten years after the rise of adversarial machine learning. Pattern Recognit. 84, 317–331 (2018)
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)
Brendel, W., Rauber, J., Bethge, M.: Decision-based adversarial attacks: reliable attacks against black-box machine learning models. arXiv preprint arXiv:1712.04248 (2017)
Card, D., Zhang, M., Smith, N.A.: Deep weighted averaging classifiers. In: Proceedings of the Conference on Fairness, Accountability, and Transparency, FAT* 2019, pp. 369–378. ACM, New York (2019). https://doi.org/10.1145/3287560.3287595
Carlini, N.: Is Am I (attacks meet interpretability) robust to adversarial examples? arXiv preprint arXiv:1902.02322 (2019)
Carlini, N., et al.: On evaluating adversarial robustness. arXiv preprint arXiv:1902.06705 (2019)
Carlini, N., Wagner, D.: Adversarial examples are not easily detected: bypassing ten detection methods. In: Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pp. 3–14. ACM (2017)
Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks. In: 2017 IEEE Symposium on Security and Privacy (SP), pp. 39–57. IEEE (2017)
Cheng, S., Dong, Y., Pang, T., Su, H., Zhu, J.: Improving black-box adversarial attacks with a transfer-based prior. In: Advances in Neural Information Processing Systems 32, pp. 10934–10944. Curran Associates, Inc. (2019)
Chollet, F., et al.: Keras (2015). https://keras.io
Cohen, J.M., Rosenfeld, E., Kolter, J.Z.: Certified adversarial robustness via randomized smoothing. arXiv preprint arXiv:1902.02918 (2019)
Cullina, D., Bhagoji, A.N., Mittal, P.: Pac-learning in the presence of adversaries. In: Advances in Neural Information Processing Systems, pp. 230–241 (2018)
Dalvi, N., Domingos, P., Sanghai, S., Verma, D., et al.: Adversarial classification. In: Proceedings of the tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 99–108. ACM (2004)
De Vries, H., Memisevic, R., Courville, A.C.: Deep learning vector quantization. In: ESANN (2016)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR09 (2009)
Doob, J.L.: Regularity properties of certain families of chance variables. Trans. Am. Math. Soc. 47(3), 455–486 (1940)
Ebrahimi, J., Rao, A., Lowd, D., Dou, D.: HotFlip: white-box adversarial examples for text classification. arXiv preprint arXiv:1712.06751 (2017)
Elson, J., Douceur, J.J., Howell, J., Saul, J.: Asirra: a CAPTCHA that exploits interest-aligned manual image categorization. In: Proceedings of 14th ACM Conference on Computer and Communications Security (CCS). Association for Computing Machinery, Inc., October 2007
Engstrom, L., Madry, A.: Understanding the landscape of adversarial robustness. Ph.D. thesis, Massachusetts Institute of Technology (2019)
Friedman, J., Hastie, T., Tibshirani, R.: The Elements of Statistical Learning, vol. 1. Springer, New York (2001). https://doi.org/10.1007/978-0-387-21606-5
Gal, Y.: Uncertainty in deep learning. Ph.D. thesis, University of Cambridge (2016)
Gal, Y., Ghahramani, Z.: Dropout as a Bayesian approximation: representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016)
Gelada, C., Buckman, J.: Bayesian neural networks need not concentrate (2020). https://jacobbuckman.com/2020-01-22-bayesian-neural-networks-need-not-concentrate/
Gilmer, J., Adams, R.P., Goodfellow, I., Andersen, D., Dahl, G.E.: Motivating the rules of the game for adversarial example research. arXiv preprint arXiv:1807.06732 (2018)
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016). http://www.deeplearningbook.org
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014)
Gourdeau, P., Kanade, V., Kwiatkowska, M., Worrell, J.: On the hardness of robust classification. In: Advances in Neural Information Processing Systems, pp. 7444–7453 (2019)
Grumer, C., Peck, J., Olumofin, F., Nascimento, A., De Cock, M.: Hardening DGA classifiers utilizing IVAP. In: IEEE Big Data (2019)
Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 70, pp. 1321–1330. PMLR, International Convention Centre, Sydney, Australia, 06–11 August 2017
Guo, Y., Yan, Z., Zhang, C.: Subspace attack: exploiting promising subspaces for query-efficient black-box attacks. In: Advances in Neural Information Processing Systems 32, pp. 3825–3834. Curran Associates, Inc. (2019)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Hein, M., Andriushchenko, M., Bitterwolf, J.: Why ReLU networks yield high-confidence predictions far away from the training data and how to mitigate the problem. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 41–50 (2019)
Hoeffding, W.: Probability inequalities for sums of bounded random variables. In: Fisher, N.I., Sen, P.K. (eds.) The Collected Works of Wassily Hoeffding. SSS, pp. 409–426. Springer, New York (1994). https://doi.org/10.1007/978-1-4612-0865-5_26
Jaccard, P.: The distribution of the flora in the alpine zone. New Phytol. 11(2), 37–50 (1912)
Jain, H., Balasubramanian, V., Chunduri, B., Varma, M.: Slice: scalable linear extreme classifiers trained on 100 million labels for related searches. In: Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, pp. 528–536. ACM (2019)
Jaynes, E.T.: Prior probabilities. IEEE Trans. Syst. Sci. Cybern. 4(3), 227–241 (1968)
Jordan, M.I., Ghahramani, Z., Jaakkola, T.S., Saul, L.K.: An introduction to variational methods for graphical models. Mach. Learn. 37(2), 183–233 (1999)
Kanbak, C., Moosavi-Dezfooli, S.M., Frossard, P.: Geometric robustness of deep networks: analysis and improvement. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4441–4449 (2018)
Kannan, H., Kurakin, A., Goodfellow, I.: Adversarial logit pairing. arXiv preprint arXiv:1803.06373 (2018)
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Klaus, B., Strimmer, K.: fdrtool: Estimation of (Local) False Discovery Rates and Higher Criticism (2015). https://CRAN.R-project.org/package=fdrtool. r package version 1.2.15
Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images. Technical Report, Citeseer (2009)
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P., et al.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Lowd, D., Meek, C.: Adversarial learning. In: Proceedings of the eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp. 641–647. ACM (2005)
Maddox, W.J., Izmailov, P., Garipov, T., Vetrov, D.P., Wilson, A.G.: A simple baseline for Bayesian uncertainty in deep learning. In: Advances in Neural Information Processing Systems 32, pp. 13132–13143. Curran Associates, Inc. (2019)
Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083 (2017)
Manokhin, V.: Multi-class probabilistic classification using inductive and cross Venn-Abers predictors. In: Conformal and Probabilistic Prediction and Applications, pp. 228–240 (2017)
McDiarmid, C.: On the method of bounded differences. Surv. Comb. 141(1), 148–188 (1989)
Meinke, A., Hein, M.: Towards neural networks that provably know when they don’t know. arXiv preprint arXiv:1909.12180 (2019)
Moosavi-Dezfooli, S.M., Fawzi, A., Fawzi, O., Frossard, P.: Universal adversarial perturbations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1765–1773 (2017)
Moosavi-Dezfooli, S.M., Fawzi, A., Frossard, P.: DeepFool: a simple and accurate method to fool deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2574–2582 (2016)
Murphy, K.P.: Machine Learning: A Probabilistic Perspective. MIT Press, Cambridge (2012)
Nakkiran, P.: Adversarial robustness may be at odds with simplicity. arXiv preprint arXiv:1901.00532 (2019)
Narodytska, N., Kasiviswanathan, S.P.: Simple black-box adversarial perturbations for deep networks. arXiv preprint arXiv:1612.06299 (2016)
Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning. In: NIPS Workshop on Deep Learning and Unsupervised Feature Learning (2011)
Papernot, N., McDaniel, P.: Deep k-nearest neighbors: towards confident, interpretable and robust deep learning. arXiv preprint arXiv:1803.04765 (2018)
Peck, J., Goossens, B., Saeys, Y.: Detecting adversarial examples with inductive Venn-Abers predictors. In: European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, pp. 143–148 (2019)
Platt, J., et al.: Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv. Large Margin Cl. 10(3), 61–74 (1999)
Price, D., Knerr, S., Personnaz, L., Dreyfus, G.: Pairwise neural network classifiers with probabilistic outputs. In: Advances in Neural Information Processing Systems, pp. 1109–1116 (1995)
Qin, Y., Carlini, N., Cottrell, G., Goodfellow, I., Raffel, C.: Imperceptible, robust, and targeted adversarial examples for automatic speech recognition. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning. Proceedings of Machine Learning Research, Long Beach, California, USA, vol. 97, pp. 5231–5240. PMLR, June 2019
R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2015). https://www.R-project.org/
Raghunathan, A., Steinhardt, J., Liang, P.S.: Semidefinite relaxations for certifying robustness to adversarial examples. In: Advances in Neural Information Processing Systems, pp. 10877–10887 (2018)
Rauber, J., Brendel, W., Bethge, M.: Foolbox v0. 8.0: A Python toolbox to benchmark the robustness of machine learning models. arXiv preprint arXiv:1707.04131 5 (2017)
Schmidt, L., Santurkar, S., Tsipras, D., Talwar, K., Madry, A.: Adversarially robust generalization requires more data. In: Advances in Neural Information Processing Systems, pp. 5014–5026 (2018)
Schneier, B.: Schneier’s law (2011). https://www.schneier.com/blog/archives/2011/04/schneiers_law.html
Shafahi, A., et al.: Adversarial training for free! In: Advances in Neural Information Processing Systems 32, pp. 3353–3364. Curran Associates, Inc. (2019)
Shafer, G., Vovk, V.: A tutorial on conformal prediction. J. Mach. Learn. Res. 9, 371–421 (2008)
Shen, J., et al.: Lingvo: a modular and scalable framework for sequence-to-sequence modeling. arXiv preprint arXiv:1902.08295 (2019)
Sinha, A., Namkoong, H., Duchi, J.: Certifiable distributional robustness with principled adversarial training. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=Hk6kPgZA-
Sitawarin, C., Wagner, D.: On the robustness of deep k-nearest neighbors. arXiv preprint arXiv:1903.08333 (2019)
So, D., Le, Q., Liang, C.: The evolved transformer. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning. Proceedings of Machine Learning Research, Long Beach, California, USA, vol. 97, pp. 5877–5886. PMLR, June 2019
Su, J., Vargas, D.V., Sakurai, K.: One pixel attack for fooling deep neural networks. IEEE Trans. Evol. Comput. 23, 828–841 (2019)
Sun, R.: Optimization for deep learning: theory and algorithms. arXiv preprint arXiv:1912.08957 (2019)
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)
Szegedy, C., et al.: Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013)
Tanay, T., Griffin, L.: A boundary tilting persepective on the phenomenon of adversarial examples. arXiv preprint arXiv:1608.07690 (2016)
Toccaceli, P.: Venn-ABERS predictor (2017). https://github.com/ptocca/VennABERS
Tuy, H.: Convex Analysis and Global Optimization. Springer, New York (1998). https://doi.org/10.1007/978-1-4757-2809-5
Vorobeychik, Y., Li, B.: Optimal randomized classification in adversarial settings. In: Proceedings of the 2014 International Conference on Autonomous Agents and Multi-Agent Systems, pp. 485–492. International Foundation for Autonomous Agents and Multiagent Systems (2014)
Vovk, V., Gammerman, A., Shafer, G.: Algorithmic Learning in a Random World. Springer, Boston (2005). https://doi.org/10.1007/b106715
Vovk, V., Petej, I., Fedorova, V.: Large-scale probabilistic predictors with and without guarantees of validity. In: Advances in Neural Information Processing Systems, pp. 892–900 (2015)
Wasserman, L.: Frasian inference. Stat. Sci., 322–325 (2011)
Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms (2017)
Yin, D., Ramchandran, K., Bartlett, P.: Rademacher complexity for adversarially robust generalization. arXiv preprint arXiv:1810.11914 (2018)
Zagoruyko, S., Komodakis, N.: Wide residual networks. arXiv preprint arXiv:1605.07146 (2016)
Zhang, C., Bengio, S., Hardt, M., Recht, B., Vinyals, O.: Understanding deep learning requires rethinking generalization. arXiv preprint arXiv:1611.03530 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
AÂ Network Architectures
AÂ Network Architectures
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Peck, J., Goossens, B., Saeys, Y. (2020). Calibrated Multi-probabilistic Prediction as a Defense Against Adversarial Attacks. In: Bogaerts, B., et al. Artificial Intelligence and Machine Learning. BNAIC BENELEARN 2019 2019. Communications in Computer and Information Science, vol 1196. Springer, Cham. https://doi.org/10.1007/978-3-030-65154-1_6
Download citation
DOI: https://doi.org/10.1007/978-3-030-65154-1_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-65153-4
Online ISBN: 978-3-030-65154-1
eBook Packages: Computer ScienceComputer Science (R0)