Skip to main content

Calibrated Multi-probabilistic Prediction as a Defense Against Adversarial Attacks

  • Conference paper
  • First Online:
Artificial Intelligence and Machine Learning (BNAIC 2019, BENELEARN 2019)

Abstract

Machine learning (ML) classifiers—in particular deep neural networks—are surprisingly vulnerable to so-called adversarial examples. These are small modifications of natural inputs which drastically alter the output of the model even though no relevant features appear to have been modified. One explanation that has been offered for this phenomenon is the calibration hypothesis, which states that the probabilistic predictions of typical ML models are miscalibrated. As a result, classifiers can often be very confident in completely erroneous predictions. Based on this idea, we propose the MultIVAP algorithm for defending arbitrary ML models against adversarial examples. Our method is inspired by the inductive Venn-ABERS predictor (IVAP) technique from the field of conformal prediction. The IVAP enjoys the theoretical guarantee that its predictions will be perfectly calibrated, thus addressing the problem of miscalibration. Experimental results on five image classification tasks demonstrate empirically that the MultIVAP has a reasonably small computational overhead and provides significantly higher adversarial robustness without sacrificing accuracy on clean data. This increase in robustness is observed both against defense-oblivious attacks as well as a defense-aware white-box attack specifically designed for the MultIVAP.

We make our code available at https://github.com/saeyslab/multivap.

We thank the NVIDIA Corporation for the donation of a Titan Xp GPU with which we were able to carry out our experiments. Jonathan Peck is sponsored by a fellowship of the Research Foundation Flanders (FWO). Yvan Saeys is an ISAC Marylou Ingram scholar.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Of course, there is the classic result of [25] which states that, in expectation, the cross entropy loss is minimized if and only if the model perfectly recovers the data distribution. In practice, however, we rarely minimize this loss exactly. It is currently a major open problem in deep learning to provide similar guarantees when the model fit is suboptimal.

  2. 2.

    Guarantees can also be given for non-convex problems, but these usually require at least bounded iterates or a Lipschitz continuous gradient [78]. Such assumptions are often violated or difficult to verify in practice.

  3. 3.

    A bag or multiset is a collection of objects where the order is irrelevant (like a set) but duplicates are allowed (like a list).

  4. 4.

    A probability distribution is said to be exchangeable if every permutation of a sequence is equally likely.

  5. 5.

    See [11] for an overview of the various desiderata that an adversarial defense evaluation should satisfy.

  6. 6.

    We use the \(\ell _\infty \) norm everywhere as this is recommended by [51]. However, the attack can be trivially adapted to any other norm.

  7. 7.

    Implementation available at https://github.com/ashafahi/free_adv_train. Accessed 2020-06-17.

References

  1. Abadi, M., et al.: TensorFlow: a system for large-scale machine learning. In: 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), pp. 265–283 (2016)

    Google Scholar 

  2. Andriushchenko, M., Hein, M.: Provably robust boosted decision stumps and trees against adversarial attacks. In: Advances in Neural Information Processing Systems, pp. 12997–13008 (2019)

    Google Scholar 

  3. Arora, S., Ge, R., Neyshabur, B., Zhang, Y.: Stronger generalization bounds for deep nets via a compression approach. In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning. Proceedings of Machine Learning Research, Stockholmsmässan, Stockholm, Sweden, vol. 80, pp. 254–263. PMLR, 10–15 July 2018. http://proceedings.mlr.press/v80/arora18b.html

  4. Athalye, A., Carlini, N., Wagner, D.: Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples. arXiv preprint arXiv:1802.00420 (2018)

  5. Azuma, K.: Weighted sums of certain dependent random variables. Tohoku Math. J. Second Ser. 19(3), 357–367 (1967)

    Article  MathSciNet  Google Scholar 

  6. Biggio, B., Roli, F.: Wild patterns: ten years after the rise of adversarial machine learning. Pattern Recognit. 84, 317–331 (2018)

    Article  Google Scholar 

  7. Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)

    Book  Google Scholar 

  8. Brendel, W., Rauber, J., Bethge, M.: Decision-based adversarial attacks: reliable attacks against black-box machine learning models. arXiv preprint arXiv:1712.04248 (2017)

  9. Card, D., Zhang, M., Smith, N.A.: Deep weighted averaging classifiers. In: Proceedings of the Conference on Fairness, Accountability, and Transparency, FAT* 2019, pp. 369–378. ACM, New York (2019). https://doi.org/10.1145/3287560.3287595

  10. Carlini, N.: Is Am I (attacks meet interpretability) robust to adversarial examples? arXiv preprint arXiv:1902.02322 (2019)

  11. Carlini, N., et al.: On evaluating adversarial robustness. arXiv preprint arXiv:1902.06705 (2019)

  12. Carlini, N., Wagner, D.: Adversarial examples are not easily detected: bypassing ten detection methods. In: Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pp. 3–14. ACM (2017)

    Google Scholar 

  13. Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks. In: 2017 IEEE Symposium on Security and Privacy (SP), pp. 39–57. IEEE (2017)

    Google Scholar 

  14. Cheng, S., Dong, Y., Pang, T., Su, H., Zhu, J.: Improving black-box adversarial attacks with a transfer-based prior. In: Advances in Neural Information Processing Systems 32, pp. 10934–10944. Curran Associates, Inc. (2019)

    Google Scholar 

  15. Chollet, F., et al.: Keras (2015). https://keras.io

  16. Cohen, J.M., Rosenfeld, E., Kolter, J.Z.: Certified adversarial robustness via randomized smoothing. arXiv preprint arXiv:1902.02918 (2019)

  17. Cullina, D., Bhagoji, A.N., Mittal, P.: Pac-learning in the presence of adversaries. In: Advances in Neural Information Processing Systems, pp. 230–241 (2018)

    Google Scholar 

  18. Dalvi, N., Domingos, P., Sanghai, S., Verma, D., et al.: Adversarial classification. In: Proceedings of the tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 99–108. ACM (2004)

    Google Scholar 

  19. De Vries, H., Memisevic, R., Courville, A.C.: Deep learning vector quantization. In: ESANN (2016)

    Google Scholar 

  20. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR09 (2009)

    Google Scholar 

  21. Doob, J.L.: Regularity properties of certain families of chance variables. Trans. Am. Math. Soc. 47(3), 455–486 (1940)

    Article  MathSciNet  Google Scholar 

  22. Ebrahimi, J., Rao, A., Lowd, D., Dou, D.: HotFlip: white-box adversarial examples for text classification. arXiv preprint arXiv:1712.06751 (2017)

  23. Elson, J., Douceur, J.J., Howell, J., Saul, J.: Asirra: a CAPTCHA that exploits interest-aligned manual image categorization. In: Proceedings of 14th ACM Conference on Computer and Communications Security (CCS). Association for Computing Machinery, Inc., October 2007

    Google Scholar 

  24. Engstrom, L., Madry, A.: Understanding the landscape of adversarial robustness. Ph.D. thesis, Massachusetts Institute of Technology (2019)

    Google Scholar 

  25. Friedman, J., Hastie, T., Tibshirani, R.: The Elements of Statistical Learning, vol. 1. Springer, New York (2001). https://doi.org/10.1007/978-0-387-21606-5

    Book  MATH  Google Scholar 

  26. Gal, Y.: Uncertainty in deep learning. Ph.D. thesis, University of Cambridge (2016)

    Google Scholar 

  27. Gal, Y., Ghahramani, Z.: Dropout as a Bayesian approximation: representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016)

    Google Scholar 

  28. Gelada, C., Buckman, J.: Bayesian neural networks need not concentrate (2020). https://jacobbuckman.com/2020-01-22-bayesian-neural-networks-need-not-concentrate/

  29. Gilmer, J., Adams, R.P., Goodfellow, I., Andersen, D., Dahl, G.E.: Motivating the rules of the game for adversarial example research. arXiv preprint arXiv:1807.06732 (2018)

  30. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016). http://www.deeplearningbook.org

  31. Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014)

  32. Gourdeau, P., Kanade, V., Kwiatkowska, M., Worrell, J.: On the hardness of robust classification. In: Advances in Neural Information Processing Systems, pp. 7444–7453 (2019)

    Google Scholar 

  33. Grumer, C., Peck, J., Olumofin, F., Nascimento, A., De Cock, M.: Hardening DGA classifiers utilizing IVAP. In: IEEE Big Data (2019)

    Google Scholar 

  34. Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 70, pp. 1321–1330. PMLR, International Convention Centre, Sydney, Australia, 06–11 August 2017

    Google Scholar 

  35. Guo, Y., Yan, Z., Zhang, C.: Subspace attack: exploiting promising subspaces for query-efficient black-box attacks. In: Advances in Neural Information Processing Systems 32, pp. 3825–3834. Curran Associates, Inc. (2019)

    Google Scholar 

  36. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  37. Hein, M., Andriushchenko, M., Bitterwolf, J.: Why ReLU networks yield high-confidence predictions far away from the training data and how to mitigate the problem. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 41–50 (2019)

    Google Scholar 

  38. Hoeffding, W.: Probability inequalities for sums of bounded random variables. In: Fisher, N.I., Sen, P.K. (eds.) The Collected Works of Wassily Hoeffding. SSS, pp. 409–426. Springer, New York (1994). https://doi.org/10.1007/978-1-4612-0865-5_26

    Chapter  MATH  Google Scholar 

  39. Jaccard, P.: The distribution of the flora in the alpine zone. New Phytol. 11(2), 37–50 (1912)

    Article  Google Scholar 

  40. Jain, H., Balasubramanian, V., Chunduri, B., Varma, M.: Slice: scalable linear extreme classifiers trained on 100 million labels for related searches. In: Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, pp. 528–536. ACM (2019)

    Google Scholar 

  41. Jaynes, E.T.: Prior probabilities. IEEE Trans. Syst. Sci. Cybern. 4(3), 227–241 (1968)

    Article  Google Scholar 

  42. Jordan, M.I., Ghahramani, Z., Jaakkola, T.S., Saul, L.K.: An introduction to variational methods for graphical models. Mach. Learn. 37(2), 183–233 (1999)

    Article  Google Scholar 

  43. Kanbak, C., Moosavi-Dezfooli, S.M., Frossard, P.: Geometric robustness of deep networks: analysis and improvement. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4441–4449 (2018)

    Google Scholar 

  44. Kannan, H., Kurakin, A., Goodfellow, I.: Adversarial logit pairing. arXiv preprint arXiv:1803.06373 (2018)

  45. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  46. Klaus, B., Strimmer, K.: fdrtool: Estimation of (Local) False Discovery Rates and Higher Criticism (2015). https://CRAN.R-project.org/package=fdrtool. r package version 1.2.15

  47. Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images. Technical Report, Citeseer (2009)

    Google Scholar 

  48. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P., et al.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)

    Article  Google Scholar 

  49. Lowd, D., Meek, C.: Adversarial learning. In: Proceedings of the eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp. 641–647. ACM (2005)

    Google Scholar 

  50. Maddox, W.J., Izmailov, P., Garipov, T., Vetrov, D.P., Wilson, A.G.: A simple baseline for Bayesian uncertainty in deep learning. In: Advances in Neural Information Processing Systems 32, pp. 13132–13143. Curran Associates, Inc. (2019)

    Google Scholar 

  51. Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083 (2017)

  52. Manokhin, V.: Multi-class probabilistic classification using inductive and cross Venn-Abers predictors. In: Conformal and Probabilistic Prediction and Applications, pp. 228–240 (2017)

    Google Scholar 

  53. McDiarmid, C.: On the method of bounded differences. Surv. Comb. 141(1), 148–188 (1989)

    MathSciNet  MATH  Google Scholar 

  54. Meinke, A., Hein, M.: Towards neural networks that provably know when they don’t know. arXiv preprint arXiv:1909.12180 (2019)

  55. Moosavi-Dezfooli, S.M., Fawzi, A., Fawzi, O., Frossard, P.: Universal adversarial perturbations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1765–1773 (2017)

    Google Scholar 

  56. Moosavi-Dezfooli, S.M., Fawzi, A., Frossard, P.: DeepFool: a simple and accurate method to fool deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2574–2582 (2016)

    Google Scholar 

  57. Murphy, K.P.: Machine Learning: A Probabilistic Perspective. MIT Press, Cambridge (2012)

    MATH  Google Scholar 

  58. Nakkiran, P.: Adversarial robustness may be at odds with simplicity. arXiv preprint arXiv:1901.00532 (2019)

  59. Narodytska, N., Kasiviswanathan, S.P.: Simple black-box adversarial perturbations for deep networks. arXiv preprint arXiv:1612.06299 (2016)

  60. Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning. In: NIPS Workshop on Deep Learning and Unsupervised Feature Learning (2011)

    Google Scholar 

  61. Papernot, N., McDaniel, P.: Deep k-nearest neighbors: towards confident, interpretable and robust deep learning. arXiv preprint arXiv:1803.04765 (2018)

  62. Peck, J., Goossens, B., Saeys, Y.: Detecting adversarial examples with inductive Venn-Abers predictors. In: European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, pp. 143–148 (2019)

    Google Scholar 

  63. Platt, J., et al.: Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv. Large Margin Cl. 10(3), 61–74 (1999)

    Google Scholar 

  64. Price, D., Knerr, S., Personnaz, L., Dreyfus, G.: Pairwise neural network classifiers with probabilistic outputs. In: Advances in Neural Information Processing Systems, pp. 1109–1116 (1995)

    Google Scholar 

  65. Qin, Y., Carlini, N., Cottrell, G., Goodfellow, I., Raffel, C.: Imperceptible, robust, and targeted adversarial examples for automatic speech recognition. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning. Proceedings of Machine Learning Research, Long Beach, California, USA, vol. 97, pp. 5231–5240. PMLR, June 2019

    Google Scholar 

  66. R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2015). https://www.R-project.org/

  67. Raghunathan, A., Steinhardt, J., Liang, P.S.: Semidefinite relaxations for certifying robustness to adversarial examples. In: Advances in Neural Information Processing Systems, pp. 10877–10887 (2018)

    Google Scholar 

  68. Rauber, J., Brendel, W., Bethge, M.: Foolbox v0. 8.0: A Python toolbox to benchmark the robustness of machine learning models. arXiv preprint arXiv:1707.04131 5 (2017)

  69. Schmidt, L., Santurkar, S., Tsipras, D., Talwar, K., Madry, A.: Adversarially robust generalization requires more data. In: Advances in Neural Information Processing Systems, pp. 5014–5026 (2018)

    Google Scholar 

  70. Schneier, B.: Schneier’s law (2011). https://www.schneier.com/blog/archives/2011/04/schneiers_law.html

  71. Shafahi, A., et al.: Adversarial training for free! In: Advances in Neural Information Processing Systems 32, pp. 3353–3364. Curran Associates, Inc. (2019)

    Google Scholar 

  72. Shafer, G., Vovk, V.: A tutorial on conformal prediction. J. Mach. Learn. Res. 9, 371–421 (2008)

    MathSciNet  MATH  Google Scholar 

  73. Shen, J., et al.: Lingvo: a modular and scalable framework for sequence-to-sequence modeling. arXiv preprint arXiv:1902.08295 (2019)

  74. Sinha, A., Namkoong, H., Duchi, J.: Certifiable distributional robustness with principled adversarial training. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=Hk6kPgZA-

  75. Sitawarin, C., Wagner, D.: On the robustness of deep k-nearest neighbors. arXiv preprint arXiv:1903.08333 (2019)

  76. So, D., Le, Q., Liang, C.: The evolved transformer. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning. Proceedings of Machine Learning Research, Long Beach, California, USA, vol. 97, pp. 5877–5886. PMLR, June 2019

    Google Scholar 

  77. Su, J., Vargas, D.V., Sakurai, K.: One pixel attack for fooling deep neural networks. IEEE Trans. Evol. Comput. 23, 828–841 (2019)

    Article  Google Scholar 

  78. Sun, R.: Optimization for deep learning: theory and algorithms. arXiv preprint arXiv:1912.08957 (2019)

  79. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)

    Google Scholar 

  80. Szegedy, C., et al.: Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013)

  81. Tanay, T., Griffin, L.: A boundary tilting persepective on the phenomenon of adversarial examples. arXiv preprint arXiv:1608.07690 (2016)

  82. Toccaceli, P.: Venn-ABERS predictor (2017). https://github.com/ptocca/VennABERS

  83. Tuy, H.: Convex Analysis and Global Optimization. Springer, New York (1998). https://doi.org/10.1007/978-1-4757-2809-5

    Book  MATH  Google Scholar 

  84. Vorobeychik, Y., Li, B.: Optimal randomized classification in adversarial settings. In: Proceedings of the 2014 International Conference on Autonomous Agents and Multi-Agent Systems, pp. 485–492. International Foundation for Autonomous Agents and Multiagent Systems (2014)

    Google Scholar 

  85. Vovk, V., Gammerman, A., Shafer, G.: Algorithmic Learning in a Random World. Springer, Boston (2005). https://doi.org/10.1007/b106715

    Book  MATH  Google Scholar 

  86. Vovk, V., Petej, I., Fedorova, V.: Large-scale probabilistic predictors with and without guarantees of validity. In: Advances in Neural Information Processing Systems, pp. 892–900 (2015)

    Google Scholar 

  87. Wasserman, L.: Frasian inference. Stat. Sci., 322–325 (2011)

    Google Scholar 

  88. Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms (2017)

    Google Scholar 

  89. Yin, D., Ramchandran, K., Bartlett, P.: Rademacher complexity for adversarially robust generalization. arXiv preprint arXiv:1810.11914 (2018)

  90. Zagoruyko, S., Komodakis, N.: Wide residual networks. arXiv preprint arXiv:1605.07146 (2016)

  91. Zhang, C., Bengio, S., Hardt, M., Recht, B., Vinyals, O.: Understanding deep learning requires rethinking generalization. arXiv preprint arXiv:1611.03530 (2016)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jonathan Peck .

Editor information

Editors and Affiliations

A  Network Architectures

A  Network Architectures

figure f
figure g
figure h
figure i
figure j

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Peck, J., Goossens, B., Saeys, Y. (2020). Calibrated Multi-probabilistic Prediction as a Defense Against Adversarial Attacks. In: Bogaerts, B., et al. Artificial Intelligence and Machine Learning. BNAIC BENELEARN 2019 2019. Communications in Computer and Information Science, vol 1196. Springer, Cham. https://doi.org/10.1007/978-3-030-65154-1_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-65154-1_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-65153-4

  • Online ISBN: 978-3-030-65154-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics