Abstract
With the noteworthy achievements of deep learning models, there are transformative applications that aim at cost reduction and the improvement in human quality of life. Nevertheless, recent work aimed at testing a classifier’s ability to withstand targeted and black-box adversarial attacks demonstrated that deep learning models, in particular, are brittle and lack certain robustness that makes them particularly weak, and ultimately leading to a lack of trust. For this specific area, a question arises concerning certain regions’ sensitivity in the input space against adversarial perturbations for a classification model. This paper aims to study such a problem by looking into a Sensitivity-inspired Constrained Evaluation Method (SICEM) to deterministically evaluate how much a region of the input space is vulnerable to adversarial perturbations compared to other regions and also the entire input space. Our experiments suggest that SICEM can accurately quantify region vulnerabilities on MNIST and CIFAR-10 datasets.
Similar content being viewed by others
Availability of data and material
For our experiments, we use MNIST and CIFAR-10 data, which are publicly available.
Code availability
Code is made available as explained in Appendix A.
References
Abadi M, Andersen DG (2016) Learning to protect communications with adversarial neural cryptography. arXiv preprint arXiv:1610.06918
Browne MW (2000) Cross-validation methods. J Math Psychol 44(1):108–132
Carlini N, Athalye A, Papernot N, Brendel W, Rauber J, Tsipras D, Goodfellow I, Madry A, Kurakin A (2019) On evaluating adversarial robustness. arXiv preprint arXiv:1902.06705
Carlini N, Wagner D (2017) Towards evaluating the robustness of neural networks. In: 2017 IEEE symposium on security and privacy (sp), pp. 39–57. IEEE
Cohen J, Rosenfeld E, Kolter Z (2019) Certified adversarial robustness via randomized smoothing. In: international conference on machine learning, pp. 1310–1320. PMLR
Coutinho M, de Oliveira Albuquerque R, Borges F, Garcia Villalba LJ, Kim TH (2018) Learning perfectly secure cryptography to protect communications with adversarial neural cryptography. Sensors 18(5):1306
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255. IEEE
Dong Y, Fu QA, Yang X, Pang T, Su H, Xiao Z, Zhu J (2020) Benchmarking adversarial robustness on image classification. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 321–331
Etmann C, Lunz S, Maass P, Schönlieb CB (2019) On the connection between adversarial robustness and saliency map interpretability. arXiv preprint arXiv:1905.04172
Eykholt K, Evtimov I, Fernandes E, Li B, Rahmati A, Xiao C, Prakash A, Kohno T, Song D (2018) Robust physical-world attacks on deep learning visual classification. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1625–1634
Goodfellow IJ, Shlens J, Szegedy C (2014) Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572
Grosse K, Papernot N, Manoharan P, Backes M, McDaniel P (2017) Adversarial examples for malware detection. In: European symposium on research in computer security, pp. 62–79. Springer
Hardy W, Chen L, Hou S, Ye Y, Li X (2016) Dl4md: A deep learning framework for intelligent malware detection. In: proceedings of the international conference on data mining (DMIN), p. 61. The steering committee of the World congress in computer science
Ilyas A, Santurkar S, Tsipras D, Engstrom L, Tran B, Madry A (2019) Adversarial examples are not bugs, they are features. In: Advances in neural information processing systems, pp. 125–136
Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: international conference on machine learning, pp. 448–456. PMLR
Kos J, Fischer I, Song D (2018) Adversarial examples for generative models. In: 2018 IEEE security and privacy workshops (spw), pp. 36–42. IEEE
Krizhevsky A, Hinton G, et al. (2009) Learning multiple layers of features from tiny images
Kurakin A, Goodfellow I, Bengio S (2016) Adversarial examples in the physical world. arXiv preprint arXiv:1607.02533
LeCun Y (1998) The mnist database of handwritten digits. http://yann. lecun. com/exdb/mnist/
Lopez MM, Kalita J (2017) Deep learning applied to nlp. arXiv preprint arXiv:1703.03091
Madry A, Makelov A, Schmidt L, Tsipras D, Vladu A (2017) Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083
Mangla P, Singh V, Balasubramanian VN (2020) On saliency maps and adversarial robustness. In: joint European conference on machine learning and knowledge discovery in databases, pp. 272–288. Springer
Papernot N, McDaniel P, Jha S, Fredrikson M, Celik ZB, Swami A (2016) The limitations of deep learning in adversarial settings. In: 2016 IEEE European symposium on security and privacy (EuroS &P), pp. 372–387. IEEE
Papernot N, McDaniel P, Wu X, Jha S, Swami A (2016) Distillation as a defense to adversarial perturbations against deep neural networks. In: 2016 IEEE symposium on security and privacy (SP), pp. 582–597. IEEE
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958
Su J, Vargas DV, Sakurai K (2019) One pixel attack for fooling deep neural networks. IEEE Trans Evol Comput 23(5):828–841
Sun Y, Yin J, Wu C, Zheng K, Niu X (2021) Generating facial expression adversarial examples based on saliency map. Image Vis Comput 116:104318
Szegedy C, Zaremba W, Sutskever I, Bruna J, Erhan D, Goodfellow I, Fergus R (2013) Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199
Tabacof P, Tavares J, Valle E (2016) Adversarial images for variational autoencoders. arXiv preprint arXiv:1612.00155
Theis L, Shi W, Cunningham A, Huszár F (2017) Lossy image compression with compressive autoencoders. arXiv preprint arXiv:1703.00395
Vinayakumar R, Alazab M, Soman K, Poornachandran P, Venkatraman S (2019) Robust intelligent malware detection using deep learning. IEEE Access 7:46717–46738
Voulodimos A, Doulamis N, Doulamis A, Protopapadakis E (2018) Deep learning for computer vision: a brief review. Comput Intell Neurosci 2018
Wang S, Gong Y (2021) Adversarial example detection based on saliency map features. Appl Intell pp. 1–14
Wang TC, Liu MY, Zhu JY, Liu G, Tao A, Kautz J, Catanzaro B (2018) Video-to-video synthesis. arXiv preprint arXiv:1808.06601
Xin Y, Kong L, Liu Z, Chen Y, Li Y, Zhu H, Gao M, Hou H, Wang C (2018) Machine learning and deep learning methods for cybersecurity. IEEE Access 6:35365–35381
Xu W, Evans D, Qi, Y (2017) Feature squeezing: Detecting adversarial examples in deep neural networks. arXiv preprint arXiv:1704.01155
Yuan Z, Lu Y, Xue Y (2016) Droiddetector: android malware characterization and detection using deep learning. Tsinghua Sci Technol 21(1):114–123
Zheng S, Song Y, Leung T, Goodfellow I (2016) Improving the robustness of deep neural networks via stability training. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4480–4488
Funding
This material is based upon work supported by the National Science Foundation under Grant CHE-1905043 and CNS-2136961.
Author information
Authors and Affiliations
Contributions
K. Sooksatra executed the research. P. Rivas directed the research.
Corresponding author
Ethics declarations
Conflict of interest
The authors have no conflicts of interest to declare that are relevant to the content of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
A Code to reproduce experiments
The code to reproduce all the experiments in this paper can be found in the attached supplementary zip file entitled:
sicem.zip
This .zip includes the weights for MNIST and CIFAR-10 trained classifiers; please change the path variable in the code to your working space to avoid overwriting the pre-trained classifiers. Also, beware that training the deep convolutional network, calculating the success rate, and individual-image agreement section consume a significant amount of time. Please make sure you have sufficient time. However, all the results are already shown in the python notebooks, and you do not need to re-run all the experiments. Nonetheless, if you do want to re-run everything, please follow our advice above (Figs. 12, 13).
B Architectures of MNIST and CIFAR-10 classifiers
C Success rate and average required \(\epsilon \)
Tables 3 and 4 show the success rates and average required \(\epsilon \), respectively, to find adversarial examples after performing the adversarial attacks on 200 images of MNIST dataset where t-attack denotes that the adversary performs attack with mask \(M_t\) and b-attack denotes that the adversary performs attack with mask \(M_b\). Similarly, Tables 5 and 6 shows the success rates and average required \(\epsilon \), respectively, after performing the attacks on 200 images of CIFAR-10 dataset.
Rights and permissions
About this article
Cite this article
Sooksatra, K., Rivas, P. Evaluation of adversarial attacks sensitivity of classifiers with occluded input data. Neural Comput & Applic 34, 17615–17632 (2022). https://doi.org/10.1007/s00521-022-07387-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-022-07387-y