Skip to main content

Explainable AI for Inspecting Adversarial Attacks on Deep Neural Networks

  • Conference paper
  • First Online:
Artificial Intelligence and Soft Computing (ICAISC 2020)

Abstract

Deep Neural Networks (DNN) are state of the art algorithms for image classification. Although significant achievements and perspectives, deep neural networks and accompanying learning algorithms have some important challenges to tackle. However, it appears that it is relatively easy to attack and fool with well-designed input samples called adversarial examples. Adversarial perturbations are unnoticeable for humans. Such attacks are a severe threat to the development of these systems in critical applications, such as medical or military systems. Hence, it is necessary to develop methods of counteracting these attacks. These methods are called defense strategies and aim at increasing the neural model’s robustness against adversarial attacks. In this paper, we reviewed the recent findings in adversarial attacks and defense strategies. We also analyzed the effects of attacks and defense strategies applied, using the local and global analyzing methods from the family of explainable artificial intelligence.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016). ISBN: 0262035618

    MATH  Google Scholar 

  2. Krizhevsky, A., Sutskever, I., Hinton, G.: ImageNet classification with deep convolutional neural networks. In: Neural Information Processing Systems, vol. 25, pp. 1097–1105 (2012)

    Google Scholar 

  3. Kukačka, J., Golkov, V., Cremers, D.: Regularization for Deep Learning: a Taxonomy (2017). arXiv:1710.10686

  4. Grochowski, M., Kwasigroch, A., Mikołajczyk, A.: Selected technical issues of deep neural networks for image classification purposes. Bull. Pol. Acad. Sci. Tech. Sci. 67(2) (2019)

    Google Scholar 

  5. Mikołajczyk, A., Grochowski, M.: Data augmentation for improving deep learning in image classification problem. In: International Interdisciplinary PhD Workshop (IIPhDW), pp. 117–122 (2018)

    Google Scholar 

  6. Mikołajczyk, A., Grochowski, M.: Style transfer-based image synthesis as an efficient regularization technique in deep learning. In: 24th International Methods and Models in Automation and Robotics (MMAR), pp. 42–47 (2019)

    Google Scholar 

  7. Elsken, T., Metzen, J.H., Hutter, F.: Neural Architecture Search: A Survey (2019). arXiv:1808.05377

  8. Kwasigroch, A., Grochowski, M., Mikołajczyk, A.: Neural architecture search for skin lesion classification. IEEE Access 8, 9061–9071 (2020)

    Article  Google Scholar 

  9. Mikołajczyk, A., Grochowski, M., Kwasigroch, A.: Towards explainable classifiers using the counterfactual approach - global explanations for discovering bias in data. J. Artif. Intell. Soft Comput. Res. (in press)

    Google Scholar 

  10. Yuan, X., He, P., Li, X., Zhu, Q.: Adversarial examples: attacks and defenses for deep learning. IEEE Trans. Neural Netw. Learn. Syst. 30, 2805–2824 (2019)

    Article  MathSciNet  Google Scholar 

  11. Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., Fergus, R.: Intriguing properties of neural networks. In: 2nd International Conference on Learning Representations (2014)

    Google Scholar 

  12. Nguyen, A., Clune, J., Yosinski, J.: Deep neural networks are easily fooled: high confidence predictions. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 427–436 (2015)

    Google Scholar 

  13. Eykholt, K., et al.: Robust physical-world attacks on deep learning visual classification. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, pp. 1625–1634 (2018)

    Google Scholar 

  14. Cisse, M., Adi, Y., Keshet, J., Neverova, N.: Houdini: Fooling Deep Structured Prediction Models (2017). arXiv:1707.05373

  15. Finlayson, S.G., Chung, H.W., Beam, A., Kohane I.S.: Adversarial attacks against medical deep learning systems (2018). arXiv:1804.05296

  16. Feinman, R., Curtin, R., Gardner, A., Shintre, S.: Detecting Adversarial Samples from Artifacts (2017). arXiv:1703.00410

  17. Rigazio, L., Gu, S.: Towards Deep Neural Network Architectures Robust to Adversarial Examples (2014). arXiv:1412.5068

  18. Xu, H., et al.: Adversarial Attacks and Defenses in Images, Graphs and Text: A Review (2019). arXiv:1909.08072

  19. Papernot, N., McDaniel, P., Wu, X., Swami, A., Jha, S.: Distillation as a defense to adversarial perturbations against deep neural networks. In: IEEE Symposium on Security and Privacy, pp. 582–597 (2016)

    Google Scholar 

  20. Goodfellow, I., McDaniel, P., Papernot, N.: Making machine learning robust against adversarial inputs. Commun. ACM Assoc. Comput. Mach. 61, 56–66 (2018)

    Google Scholar 

  21. Fidel, G., Bitton R., Shabtai, A.: When Explainability Meets Adversarial Learning: Detecting Adversarial Examples using SHAP Signatures (2019). arXiv:1909.03418

  22. Binder, A., Samek, W., Montavon, G., Lapuschkin, S., Müller, K.R.: Analyzing and validating neural networks predictions. In: ICML’16 Workshop on Visualization for Deep Learning (2016)

    Google Scholar 

  23. Ribeiro, M.T., Singh, S., Guestrin, C.: “Why Should I Trust You?”: Explaining the Predictions of Any Classifier (2016). arXiv:1602.04938

  24. Ribeiro, M.T., Singh, S., Guestrin, C.: Anchors: high-precision model-agnostic explanations. In: 32nd AAAI Conference on Artificial Intelligence (2018)

    Google Scholar 

  25. Smilkov, D., Thorat, N., Kim, B., Wattenberg, M., Viégas, F.: SmoothGrad: removing noise by adding noise (2017). arXiv:1706.03825

  26. Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 818–833. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10590-1_53

    Chapter  Google Scholar 

  27. Moeys, D.P., et al.: Steering a predator robot using a mixed frame/event-driven convolutional neural network. In: Second International Conference on Event-based Control, Communication and Signal Processing (EBCCSP), Krakow, pp. 1–8 (2016)

    Google Scholar 

  28. Ancona, M., Ceolini, E., Gross, M., Öztireli, C.: A unified view of gradient-based attribution methods for Deep Neural Networks. In: NIPS 2017-Workshop on Interpreting, Explaining and Visualizing Deep Learning, ETH, Zurich (2017)

    Google Scholar 

  29. Binder, A., Montavon, G., Lapuschkin, S., Müller, K.-R., Samek, W.: Layer-wise relevance propagation for neural networks with local renormalization layers. In: Villa, A.E.P., Masulli, P., Pons Rivero, A.J. (eds.) ICANN 2016. LNCS, vol. 9887, pp. 63–71. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-44781-0_8

    Chapter  MATH  Google Scholar 

  30. Kaggle: Dogs & Cats Images (2020). https://kaggle.com/chetankv/dogs-cats-images

  31. Lapuschkin, S., Wäldchen, S., Binder, A., Montavon, G., Samek, W., Müller, K.R.: Unmasking clever hans predictors and assessing what machines really learn. Nat. Commun. 10(1), 1096 (2019)

    Article  Google Scholar 

  32. Fidel, G., Bitton, R., Shabtai, A.: When Explainability Meets Adversarial Learning: Detecting Adversarial Examples using SHAP Signatures (2019). arXiv:1909.03418

  33. McInnes, L.: Umap (2020). https://github.com/lmcinnes/umap

  34. Alber, M.: Innvestigate (2020). https://github.com/albermax/innvestigate

  35. FGSM-Keras, GitHub. https://github.com/soumyac1999/FGSM-Keras

  36. GitHub. https://github.com/Hyperparticle/one-pixel-attack-keras

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michał Grochowski .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Klawikowska, Z., Mikołajczyk, A., Grochowski, M. (2020). Explainable AI for Inspecting Adversarial Attacks on Deep Neural Networks. In: Rutkowski, L., Scherer, R., Korytkowski, M., Pedrycz, W., Tadeusiewicz, R., Zurada, J.M. (eds) Artificial Intelligence and Soft Computing. ICAISC 2020. Lecture Notes in Computer Science(), vol 12415. Springer, Cham. https://doi.org/10.1007/978-3-030-61401-0_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-61401-0_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-61400-3

  • Online ISBN: 978-3-030-61401-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics