Bridging Machine Learning and Cryptography in Defence Against Adversarial Attacks

  • Olga TaranEmail author
  • Shideh Rezaeifar
  • Slava Voloshynovskiy
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11130)


In the last decade, deep learning algorithms have become very popular thanks to the achieved performance in many machine learning and computer vision tasks. However, most of the deep learning architectures are vulnerable to so called adversarial examples. This questions the security of deep neural networks (DNN) for many security- and trust-sensitive domains. The majority of the proposed existing adversarial attacks are based on the differentiability of the DNN cost function. Defence strategies are mostly based on machine learning and signal processing principles that either try to detect-reject or filter out the adversarial perturbations and completely neglect the classical cryptographic component in the defence.

In this work, we propose a new defence mechanism based on the second Kerckhoffs’s cryptographic principle which states that the defence and classification algorithm are supposed to be known, but not the key.

To be compliant with the assumption that the attacker does not have access to the secret key, we will primarily focus on a gray-box scenario and do not address a white-box one. More particularly, we assume that the attacker does not have direct access to the secret block, but (a) he completely knows the system architecture, (b) he has access to the data used for training and testing and (c) he can observe the output of the classifier for each given input. We show empirically that our system is efficient against most famous state-of-the-art attacks in black-box and gray-box scenarios.


Adversarial attacks Defence Data-independent transform Secret key Cryptography principle 


  1. 1.
    Massey, J.L.: Cryptography: fundamentals and applications. In: Copies of transparencies, Advanced Technology Seminars, vol. 109, p. 119 (1993)Google Scholar
  2. 2.
    Lecun, Y., Cortes, C., Burges, C.J.: The MNIST database of handwritten digits (2009).
  3. 3.
    Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747 (2017)
  4. 4.
    Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014)
  5. 5.
    Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks. In: 2017 IEEE Symposium on Security and Privacy (SP), pp. 39–57. IEEE (2017)Google Scholar
  6. 6.
    Yuan, X., He, P., Zhu, Q., Bhat, R.R., Li, X.: Adversarial examples: attacks and defenses for deep learning. arXiv preprint arXiv:1712.07107 (2017)
  7. 7.
    Szegedy, C., et al.: Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013)
  8. 8.
    Dong, Y., et al.: Boosting adversarial attacks with momentum (2017)Google Scholar
  9. 9.
    Kurakin, A., Goodfellow, I., Bengio, S.: Adversarial machine learning at scale. arXiv preprint arXiv:1611.01236 (2016)
  10. 10.
    Tramèr, F., Kurakin, A., Papernot, N., Boneh, D., McDaniel, P.: Ensemble adversarial training: attacks and defenses. arXiv preprint arXiv:1705.07204 (2017)
  11. 11.
    Kurakin, A., Goodfellow, I., Bengio, S.: Adversarial examples in the physical world. arXiv preprint arXiv:1607.02533 (2016)
  12. 12.
    Papernot, N., McDaniel, P., Jha, S., Fredrikson, M., Celik, Z.B., Swami, A.: The limitations of deep learning in adversarial settings. In: 2016 IEEE European Symposium on Security and Privacy (EuroS&P), pp. 372–387. IEEE (2016)Google Scholar
  13. 13.
    Moosavi Dezfooli, S.M., Fawzi, A., Frossard, P.: Deepfool: a simple and accurate method to fool deep neural networks. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Number EPFL-CONF-218057 (2016)Google Scholar
  14. 14.
    Moosavi-Dezfooli, S.M., Fawzi, A., Fawzi, O., Frossard, P.: Universal adversarial perturbations. arXiv preprint (2017)Google Scholar
  15. 15.
    Carlini, N., Wagner, D.: Adversarial examples are not easily detected: bypassing ten detection methods. In: Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pp. 3–14. ACM (2017)Google Scholar
  16. 16.
    He, W., Wei, J., Chen, X., Carlini, N., Song, D.: Adversarial example defenses: ensembles of weak defenses are not strong. arXiv preprint arXiv:1706.04701 (2017)
  17. 17.
    Athalye, A., Carlini, N., Wagner, D.: Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples. arXiv preprint arXiv:1802.00420 (2018)
  18. 18.
    Chen, P.Y., Zhang, H., Sharma, Y., Yi, J., Hsieh, C.J.: Zoo: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models. In: Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pp. 15–26. ACM (2017)Google Scholar
  19. 19.
    Su, J., Vargas, D.V., Kouichi, S.: One pixel attack for fooling deep neural networks. arXiv preprint arXiv:1710.08864 (2017)
  20. 20.
    Papernot, N., McDaniel, P., Wu, X., Jha, S., Swami, A.: Distillation as a defense to adversarial perturbations against deep neural networks. In: 2016 IEEE Symposium on Security and Privacy (SP), pp. 582–597. IEEE (2016)Google Scholar
  21. 21.
    Huang, R., Xu, B., Schuurmans, D., Szepesvári, C.: Learning with a strong adversary. arXiv preprint arXiv:1511.03034 (2015)
  22. 22.
    Wu, Y., Bamman, D., Russell, S.: Adversarial training for relation extraction. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 1778–1783 (2017)Google Scholar
  23. 23.
    Metzen, J.H., Genewein, T., Fischer, V., Bischoff, B.: On detecting adversarial perturbations. arXiv preprint arXiv:1702.04267 (2017)
  24. 24.
    Hendrycks, D., Gimpel, K.: Early methods for detecting adversarial images (2017)Google Scholar
  25. 25.
    Li, X., Li, F.: Adversarial examples detection in deep networks with convolutional filter statistics. CoRR, abs/1612.07767 7 (2016)Google Scholar
  26. 26.
    Feinman, R., Curtin, R.R., Shintre, S., Gardner, A.B.: Detecting adversarial samples from artifacts. arXiv preprint arXiv:1703.00410 (2017)
  27. 27.
    Katz, G., Barrett, C., Dill, D.L., Julian, K., Kochenderfer, M.J.: Reluplex: an efficient SMT solver for verifying deep neural networks. In: Majumdar, R., Kunčak, V. (eds.) CAV 2017. LNCS, vol. 10426, pp. 97–117. Springer, Cham (2017). Scholar
  28. 28.
    Katz, G., Barrett, C., Dill, D.L., Julian, K., Kochenderfer, M.J.: Towards proving the adversarial robustness of deep neural networks. arXiv preprint arXiv:1709.02802 (2017)
  29. 29.
    Meng, D., Chen, H.: Magnet: a two-pronged defense against adversarial examples. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 135–147. ACM (2017)Google Scholar
  30. 30.
    Lee, S., Lee, J.: Defensive denoising methods against adversarial attack (2018)Google Scholar
  31. 31.
    Gu, S., Rigazio, L.: Towards deep neural network architectures robust to adversarial examples. arXiv preprint arXiv:1412.5068 (2014)
  32. 32.
    Krizhevsky, A., Nair, V., Hinton, G.: The cifar-10 dataset (2014).

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Olga Taran
    • 1
    Email author
  • Shideh Rezaeifar
    • 1
  • Slava Voloshynovskiy
    • 1
  1. 1.Computer Science DepartmentUniversity of GenevaGenevaSwitzerland

Personalised recommendations