Abstract
Deep Neural Networks (DNN) are state of the art algorithms for image classification. Although significant achievements and perspectives, deep neural networks and accompanying learning algorithms have some important challenges to tackle. However, it appears that it is relatively easy to attack and fool with well-designed input samples called adversarial examples. Adversarial perturbations are unnoticeable for humans. Such attacks are a severe threat to the development of these systems in critical applications, such as medical or military systems. Hence, it is necessary to develop methods of counteracting these attacks. These methods are called defense strategies and aim at increasing the neural model’s robustness against adversarial attacks. In this paper, we reviewed the recent findings in adversarial attacks and defense strategies. We also analyzed the effects of attacks and defense strategies applied, using the local and global analyzing methods from the family of explainable artificial intelligence.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016). ISBN: 0262035618
Krizhevsky, A., Sutskever, I., Hinton, G.: ImageNet classification with deep convolutional neural networks. In: Neural Information Processing Systems, vol. 25, pp. 1097–1105 (2012)
Kukačka, J., Golkov, V., Cremers, D.: Regularization for Deep Learning: a Taxonomy (2017). arXiv:1710.10686
Grochowski, M., Kwasigroch, A., Mikołajczyk, A.: Selected technical issues of deep neural networks for image classification purposes. Bull. Pol. Acad. Sci. Tech. Sci. 67(2) (2019)
Mikołajczyk, A., Grochowski, M.: Data augmentation for improving deep learning in image classification problem. In: International Interdisciplinary PhD Workshop (IIPhDW), pp. 117–122 (2018)
Mikołajczyk, A., Grochowski, M.: Style transfer-based image synthesis as an efficient regularization technique in deep learning. In: 24th International Methods and Models in Automation and Robotics (MMAR), pp. 42–47 (2019)
Elsken, T., Metzen, J.H., Hutter, F.: Neural Architecture Search: A Survey (2019). arXiv:1808.05377
Kwasigroch, A., Grochowski, M., Mikołajczyk, A.: Neural architecture search for skin lesion classification. IEEE Access 8, 9061–9071 (2020)
Mikołajczyk, A., Grochowski, M., Kwasigroch, A.: Towards explainable classifiers using the counterfactual approach - global explanations for discovering bias in data. J. Artif. Intell. Soft Comput. Res. (in press)
Yuan, X., He, P., Li, X., Zhu, Q.: Adversarial examples: attacks and defenses for deep learning. IEEE Trans. Neural Netw. Learn. Syst. 30, 2805–2824 (2019)
Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., Fergus, R.: Intriguing properties of neural networks. In: 2nd International Conference on Learning Representations (2014)
Nguyen, A., Clune, J., Yosinski, J.: Deep neural networks are easily fooled: high confidence predictions. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 427–436 (2015)
Eykholt, K., et al.: Robust physical-world attacks on deep learning visual classification. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, pp. 1625–1634 (2018)
Cisse, M., Adi, Y., Keshet, J., Neverova, N.: Houdini: Fooling Deep Structured Prediction Models (2017). arXiv:1707.05373
Finlayson, S.G., Chung, H.W., Beam, A., Kohane I.S.: Adversarial attacks against medical deep learning systems (2018). arXiv:1804.05296
Feinman, R., Curtin, R., Gardner, A., Shintre, S.: Detecting Adversarial Samples from Artifacts (2017). arXiv:1703.00410
Rigazio, L., Gu, S.: Towards Deep Neural Network Architectures Robust to Adversarial Examples (2014). arXiv:1412.5068
Xu, H., et al.: Adversarial Attacks and Defenses in Images, Graphs and Text: A Review (2019). arXiv:1909.08072
Papernot, N., McDaniel, P., Wu, X., Swami, A., Jha, S.: Distillation as a defense to adversarial perturbations against deep neural networks. In: IEEE Symposium on Security and Privacy, pp. 582–597 (2016)
Goodfellow, I., McDaniel, P., Papernot, N.: Making machine learning robust against adversarial inputs. Commun. ACM Assoc. Comput. Mach. 61, 56–66 (2018)
Fidel, G., Bitton R., Shabtai, A.: When Explainability Meets Adversarial Learning: Detecting Adversarial Examples using SHAP Signatures (2019). arXiv:1909.03418
Binder, A., Samek, W., Montavon, G., Lapuschkin, S., Müller, K.R.: Analyzing and validating neural networks predictions. In: ICML’16 Workshop on Visualization for Deep Learning (2016)
Ribeiro, M.T., Singh, S., Guestrin, C.: “Why Should I Trust You?”: Explaining the Predictions of Any Classifier (2016). arXiv:1602.04938
Ribeiro, M.T., Singh, S., Guestrin, C.: Anchors: high-precision model-agnostic explanations. In: 32nd AAAI Conference on Artificial Intelligence (2018)
Smilkov, D., Thorat, N., Kim, B., Wattenberg, M., Viégas, F.: SmoothGrad: removing noise by adding noise (2017). arXiv:1706.03825
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 818–833. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10590-1_53
Moeys, D.P., et al.: Steering a predator robot using a mixed frame/event-driven convolutional neural network. In: Second International Conference on Event-based Control, Communication and Signal Processing (EBCCSP), Krakow, pp. 1–8 (2016)
Ancona, M., Ceolini, E., Gross, M., Öztireli, C.: A unified view of gradient-based attribution methods for Deep Neural Networks. In: NIPS 2017-Workshop on Interpreting, Explaining and Visualizing Deep Learning, ETH, Zurich (2017)
Binder, A., Montavon, G., Lapuschkin, S., Müller, K.-R., Samek, W.: Layer-wise relevance propagation for neural networks with local renormalization layers. In: Villa, A.E.P., Masulli, P., Pons Rivero, A.J. (eds.) ICANN 2016. LNCS, vol. 9887, pp. 63–71. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-44781-0_8
Kaggle: Dogs & Cats Images (2020). https://kaggle.com/chetankv/dogs-cats-images
Lapuschkin, S., Wäldchen, S., Binder, A., Montavon, G., Samek, W., Müller, K.R.: Unmasking clever hans predictors and assessing what machines really learn. Nat. Commun. 10(1), 1096 (2019)
Fidel, G., Bitton, R., Shabtai, A.: When Explainability Meets Adversarial Learning: Detecting Adversarial Examples using SHAP Signatures (2019). arXiv:1909.03418
McInnes, L.: Umap (2020). https://github.com/lmcinnes/umap
Alber, M.: Innvestigate (2020). https://github.com/albermax/innvestigate
FGSM-Keras, GitHub. https://github.com/soumyac1999/FGSM-Keras
GitHub. https://github.com/Hyperparticle/one-pixel-attack-keras
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Klawikowska, Z., Mikołajczyk, A., Grochowski, M. (2020). Explainable AI for Inspecting Adversarial Attacks on Deep Neural Networks. In: Rutkowski, L., Scherer, R., Korytkowski, M., Pedrycz, W., Tadeusiewicz, R., Zurada, J.M. (eds) Artificial Intelligence and Soft Computing. ICAISC 2020. Lecture Notes in Computer Science(), vol 12415. Springer, Cham. https://doi.org/10.1007/978-3-030-61401-0_14
Download citation
DOI: https://doi.org/10.1007/978-3-030-61401-0_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-61400-3
Online ISBN: 978-3-030-61401-0
eBook Packages: Computer ScienceComputer Science (R0)