Adversarial example detection for DNN models: a review and experimental comparison

Aldahdooh, Ahmed; Hamidouche, Wassim; Fezza, Sid Ahmed; Déforges, Olivier

doi:10.1007/s10462-021-10125-w

Adversarial example detection for DNN models: a review and experimental comparison

Published: 06 January 2022

Volume 55, pages 4403–4462, (2022)
Cite this article

Artificial Intelligence Review Aims and scope Submit manuscript

Ahmed Aldahdooh ORCID: orcid.org/0000-0002-7624-5253¹,
Wassim Hamidouche¹,
Sid Ahmed Fezza² &
…
Olivier Déforges¹

4733 Accesses
43 Citations
7 Altmetric
Explore all metrics

Abstract

Deep learning (DL) has shown great success in many human-related tasks, which has led to its adoption in many computer vision based applications, such as security surveillance systems, autonomous vehicles and healthcare. Such safety-critical applications have to draw their path to success deployment once they have the capability to overcome safety-critical challenges. Among these challenges are the defense against or/and the detection of the adversarial examples (AEs). Adversaries can carefully craft small, often imperceptible, noise called perturbations to be added to the clean image to generate the AE. The aim of AE is to fool the DL model which makes it a potential risk for DL applications. Many test-time evasion attacks and countermeasures, i.e., defense or detection methods, are proposed in the literature. Moreover, few reviews and surveys were published and theoretically showed the taxonomy of the threats and the countermeasure methods with little focus in AE detection methods. In this paper, we focus on image classification task and attempt to provide a survey for detection methods of test-time evasion attacks on neural network classifiers. A detailed discussion for such methods is provided with experimental results for eight state-of-the-art detectors under different scenarios on four datasets. We also provide potential challenges and future perspectives for this research direction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

Article 18 August 2021

Deepfakes generation and detection: state-of-the-art, open challenges, countermeasures, and way forward

Article 04 June 2022

Image forgery detection: a survey of recent deep-learning approaches

Article Open access 03 October 2022

Notes

Benchmark website: https://aldahdooh.github.io/detectors_review/.
Successful AEs are the attacked samples that are able to fool the learning model, while the failed AEs are the attacked samples that are not able to fool the learning model.
The code is available at: https://github.com/aldahdooh/detectors_review.

References

Aigrain J, Detyniecki M (2019) Detecting adversarial examples and other misclassifications in neural networks by introspection. CoRR, abs/1905.09186
Akhtar N, Mian A (2018) Threat of adversarial attacks on deep learning in computer vision: a survey. IEEE Access 6:14410–14430
Article Google Scholar
Aldahdooh A, Hamidouche W, Déforges O (2021) Revisiting model’s uncertainty and confidences for adversarial example detection. arXiv preprint arXiv:2103.05354
Athalye A, Engstrom L, Ilyas A, Kwok K (2018a) Synthesizing robust adversarial examples. In: International conference on machine learning, PMLR, pp 284–293
Athalye A, Carlini N, Wagner DA (2018b) Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples. In: Dy JG, Krause A (eds) Proceedings of the 35th international conference on machine learning, ICML 2018, proceedings of machine learning research, PMLR, vol 80, Stockholmsmässan, Stockholm, Sweden, 10–15 July 2018, pp 274–283
Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: Bengio Y, LeCun Y (eds) 3rd International conference on learning representations, ICLR 2015, conference track proceedings, San Diego, CA, USA, 7–9 May 2015
Bakhti Y, Fezza SA, Hamidouche W, Déforges O (2019) DDSA: a defense against adversarial attacks using deep denoising sparse autoencoder. IEEE Access 7:160397–160407
Article Google Scholar
Baluja S, Fischer I (2017) Adversarial transformation networks: learning to generate adversarial examples. CoRR, abs/1703.09387
Bertinetto L, Valmadre J, Henriques JF, Vedaldi A, Torr PHS (2016) Fully-convolutional Siamese networks for object tracking. In: European conference on computer vision. Springer, pp 850–865
Biggio B, Roli F (2018) Wild patterns: ten years after the rise of adversarial machine learning. Pattern Recognit 84:317–331
Article Google Scholar
Biggio B, Corona I, Maiorca D, Nelson B, Šrndić N, Laskov P, Giacinto G, Roli F (2013) Evasion attacks against machine learning at test time. In: Joint European conference on machine learning and knowledge discovery in databases, pp 387–402. Springer
Biggio B, Fumera G, Roli F (2014a) Pattern recognition systems under attack: design issues and research challenges. Int J Pattern Recognit Artif Intell 28(07):1460002
Article Google Scholar
Biggio B, Corona I, Nelson B, Rubinstein BIP, Maiorca D, Fumera G, Giacinto G, Roli F (2014b) Security evaluation of support vector machines in adversarial environments. In: Support vector machines applications. Springer, pp 105–153
Borkar T, Heide F, Karam L (2020) Defending against universal attacks through selective feature regeneration. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 709–719
Brendel W, Rauber J, Bethge M (2018) Decision-based adversarial attacks: reliable attacks against black-box machine learning models. In: 6th International conference on learning representations, ICLR 2018, conference track proceedings, Vancouver, BC, Canada, 30 April–3 May 2018. OpenReview.net
Bulusu S, Kailkhura B, Li B, Varshney PK, Song D (2020) Anomalous example detection in deep learning: a survey. IEEE Access 8:132330–132347
Article Google Scholar
Carlini N, Wagner DA (2016) Defensive distillation is not robust to adversarial examples. CoRR, abs/1607.04311
Carlini N, Wagner DA (2017a) MagNet and “efficient defenses against adversarial attacks” are not robust to adversarial examples. CoRR, abs/1711.08478
Carlini N, Wagner D (2017b) Adversarial examples are not easily detected: bypassing ten detection methods. In: Proceedings of the 10th ACM workshop on artificial intelligence and security, pp 3–14
Carlini N, Wagner D (2017c) Towards evaluating the robustness of neural networks. In: 2017 IEEE symposium on security and privacy (SP). IEEE, pp 39–57
Carrara F, Falchi F, Caldelli R, Amato G, Fumarola R, Becarelli R (2017) Detecting adversarial example attacks to deep neural networks. In: Proceedings of the 15th international workshop on content-based multimedia indexing, pp 1–7
Carrara F, Becarelli R, Caldelli R, Falchi F, Amato G (2018) Adversarial examples detection in features distance spaces. In: Proceedings of the European conference on computer vision (ECCV)
Chakraborty A, Alam M, Dey V, Chattopadhyay A, Mukhopadhyay D (2018) Adversarial attacks and defences: a survey. CoRR, abs/1810.00069
Chen P-Y, Zhang H, Sharma Y, Yi J, Hsieh C-J (2017) ZOO: zeroth order optimization based black-box attacks to deep neural networks without training substitute models. In: Proceedings of the 10th ACM workshop on artificial intelligence and security, pp 15–26
Chen J, Jordan MI, Wainwright MJ (2020) HopSkipJump attack: a query-efficient decision-based attack. In: 2020 IEEE symposium on security and privacy (SP). IEEE, pp 1277–1294
Cohen G, Sapiro G, Giryes R (2020) Detecting adversarial samples using influence functions and nearest neighbors. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 14453–14462
Croce F, Hein M (2020) Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. In: Proceedings of the 37th international conference on machine learning, ICML 2020, proceedings of machine learning research, PMLR, vol 119, virtual event, 13–18 July 2020, pp 2206–2216
Danelljan M, Bhat G, Shahbaz Khan F, Felsberg M (2017) ECO: efficient convolution operators for tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6638–6646
Dasgupta P, Collins J (2019) A survey of game theoretic approaches for adversarial machine learning in cybersecurity tasks. AI Mag 40(2):31–43
Google Scholar
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) ImageNet: a large-scale hierarchical image database. In: CVPR09
Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the North American Chapter of the Association for Computational Linguistics: human language technologies, NAACL-HLT 2019, long and short papers, vol 1, Minneapolis, MN, USA, 2–7 June 2019. Association for Computational Linguistics, pp 4171–4186
Engstrom L, Tran B, Tsipras D, Schmidt L, Madry A (2019) Exploring the landscape of spatial robustness. In: International conference on machine learning, pp 1802–1811
Eniser HF, Christakis M, Wüstholz V (2020) RAID: randomized adversarial-input detection for neural networks. CoRR, abs/2002.02776
Evtimov I, Eykholt K, Fernandes E, Kohno T, Li B, Prakash A, Rahmati A, Song D (2017) Robust physical-world attacks on machine learning models. CoRR, abs/1707.08945
Feinman R, Curtin RR, Shintre S, Gardner AB (2017) Detecting adversarial samples from artifacts. CoRR, abs/1703.00410
Finlayson SG, Kohane IS, Beam AL (2018) Adversarial attacks against medical deep learning systems. CoRR, abs/1804.05296
Freitas S, Chen S-T, Wang ZJ, Chau DH (2020) UnMask: adversarial detection and defense through robust feature alignment. In: IEEE international conference on big data, Big Data 2020, Atlanta, GA, USA, 10–13 December 2020. IEEE, pp 1081–1088
Geifman Y, El-Yaniv R (2019) SelectiveNet: a deep neural network with an integrated reject option. CoRR, abs/1901.09192
Girshick RB (2015) Fast R-CNN. In: 2015 IEEE international conference on computer vision, ICCV 2015, Santiago, Chile, 7–13 December 2015. IEEE Computer Society, pp 1440–1448
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587
Gong Z, Wang W, Ku W-S (2017) Adversarial and clean data are not twins. CoRR, abs/1704.04960
Goodfellow IJ, Shlens J, Szegedy C (2015) Explaining and harnessing adversarial examples. In: Bengio Y, LeCun Y (eds) 3rd International conference on learning representations, ICLR 2015, conference track proceedings, San Diego, CA, USA, 7–9 May 2015
Gretton A, Borgwardt KM, Rasch MJ, Schölkopf B, Smola A (2012) A kernel two-sample test. J Mach Learn Res 13(1):723–773
MathSciNet MATH Google Scholar
Grosse K, Papernot N, Manoharan P, Backes M, McDaniel PD (2016) Adversarial perturbations against deep neural networks for malware classification. CoRR, abs/1606.04435
Grosse K, Manoharan P, Papernot N, Backes M, McDaniel PD (2017) On the (statistical) detection of adversarial examples. CoRR, abs/1702.06280
Gu S, Rigazio L(2015) Towards deep neural network architectures robust to adversarial examples. In: Bengio Y, LeCun Y (eds) 3rd International conference on learning representations, ICLR 2015, workshop track proceedings, San Diego, CA, USA, 7–9 May 2015
Gu T, Dolan-Gavitt B, Garg S (2017) BadNets: identifying vulnerabilities in the machine learning model supply chain. CoRR, abs/1708.06733
Hannun AY, Case C, Casper J, Catanzaro B, Diamos G, Elsen E, Prenger R, Satheesh S, Sengupta S, Coates A, Ng AY (2014) Deep speech: scaling up end-to-end speech recognition. CoRR, abs/1412.5567
He K, Zhang X, Ren S, Sun J (2016a) Identity mappings in deep residual networks. In: European conference on computer vision. Springer, pp 630–645
He K, Zhang X, Ren S, Sun J (2016b) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Hendrycks D, Gimpel K (2017a) A baseline for detecting misclassified and out-of-distribution examples in neural networks. In: 5th International conference on learning representations, ICLR 2017, conference track proceedings, Toulon, France, 24–26 April 2017. OpenReview.net
Hendrycks D, Gimpel K (2017b) Early methods for detecting adversarial images. In: 5th International conference on learning representations, ICLR 2017, workshop track proceedings, Toulon, France, 24–26 April 2017. OpenReview.net
Hosseini H, Chen Y, Kannan S, Zhang B, Poovendran R (2017) Blocking transferability of adversarial examples in black-box learning systems. CoRR, abs/1703.04318
Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) MobileNets: efficient convolutional neural networks for mobile vision applications. CoRR, abs/1704.04861
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708
Ilyas A, Santurkar S, Tsipras D, Engstrom L, Tran B, Madry A (2019) Adversarial examples are not bugs, they are features. In: Advances in neural information processing systems, pp 125–136
Ker J, Wang L, Rao J, Lim T (2017) Deep learning applications in medical image analysis. IEEE Access 6:9375–9389
Article Google Scholar
Kherchouche A, Fezza SA, Hamidouche W, Déforges O (2020) Detection of adversarial examples in deep neural networks with natural scene statistics. In: 2020 International joint conference on neural networks (IJCNN). IEEE, pp 1–7
Kotyan S, Vargas DV (2019) Adversarial robustness assessment: why both \(l_{0}\) and \(l_{\infty} \) attacks are necessary, p 1906. arXiv e-prints
Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Master’s Thesis, Department of Computer Science, University of Toronto
Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22(1):79–86
Article MathSciNet Google Scholar
Kurakin A, Goodfellow I, Bengio S (2017) Adversarial examples in the physical world. In: ICLR workshop
Lan Z, Chen M, Goodman S, Gimpel K, Sharma P, Soricut R (2020) ALBERT: a lite BERT for self-supervised learning of language representations. In: 8th International conference on learning representations, ICLR 2020, Addis Ababa, Ethiopia, 26–30 April 2020. OpenReview.net
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Article Google Scholar
Lee K, Lee K, Lee H, Shin J (2018) A simple unified framework for detecting out-of-distribution samples and adversarial attacks. In: Advances in neural information processing systems, pp 7167–7177
Li X, Li F (2017) Adversarial examples detection in deep networks with convolutional filter statistics. In: Proceedings of the IEEE international conference on computer vision, pp 5764–5772
Li D, Vargas DV, Kouichi S (2019) Universal rules for fooling deep neural networks based text classification. In: IEEE congress on evolutionary computation, CEC 2019, Wellington, New Zealand, 10–13 June 2019. IEEE, pp 2221–2228
Li Y, Wu B, Jiang Y, Li Z, Xia S-T (2020) Backdoor learning: a survey. arXiv preprint arXiv:2007.08745
Liang B, Li H, Miaoqiang S, Li X, Shi W, Wang X (2021) Detecting adversarial image examples in deep neural networks with adaptive noise reduction. IEEE Trans Depend Secur Comput 18(1):72–85
Article Google Scholar
Liao F, Liang M, Dong Y, Pang T, Hu X, Zhu J (2018) Defense against adversarial attacks using high-level representation guided denoiser. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1778–1787
Liu DC, Nocedal J (1989) On the limited memory BFGS method for large scale optimization. Math Program 45(1–3):503–528
Article MathSciNet Google Scholar
Liu Y, Chen X, Liu C, Song D (2017) Delving into transferable adversarial examples and black-box attacks. In: 5th International conference on learning representations, ICLR 2017, conference track proceedings, Toulon, France, 24–26 April 2017. OpenReview.net
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440
Lu J, Sibai H, Fabry E, Forsyth DA (2017a) NO need to worry about adversarial examples in object detection in autonomous vehicles. CoRR, abs/1707.03501
Lu J, Issaranon T, Forsyth D (2017b) SafetyNet: detecting and rejecting adversarial examples robustly. In: Proceedings of the IEEE international conference on computer vision, pp 446–454
Lust J, Condurache AP (2020) GraN: an efficient gradient-norm based detector for adversarial and misclassified examples. In: 28th European symposium on artificial neural networks, computational intelligence and machine learning, ESANN 2020, Bruges, Belgium, 2–4 October 2020, pp 7–12
Ma S, Liu Y (2019) NIC: detecting adversarial samples with neural network invariant checking. In: Proceedings of the 26th network and distributed system security symposium (NDSS 2019)
Ma X, Li B, Wang Y, Erfani SM, Wijewickrema SNR, Schoenebeck G, Song D, Houle ME, Bailey J (2018) Characterizing adversarial subspaces using local intrinsic dimensionality. In: 6th International conference on learning representations, ICLR 2018, conference track proceedings, Vancouver, BC, Canada, 30 April–3 May 2018. OpenReview.net
Machado GR, Silva E, Goldschmidt RR (2020) Adversarial machine learning in image classification: a survey towards the defender’s perspective. CoRR, abs/2009.03728
Madry A, Makelov A, Schmidt L, Tsipras D, Vladu A (2018) Towards deep learning models resistant to adversarial attacks. In: 6th International conference on learning representations, ICLR 2018, conference track proceedings, Vancouver, BC, Canada, 30 April–3 May 2018. OpenReview.net
Maksym A, Francesco C, Nicolas F, Matthias H (2020) Square attack: a query-efficient black-box adversarial attack via random search. In: Vedaldi A, Bischof H, Brox T, Frahm J-M (eds) Computer vision—ECCV 2020—16th European conference, proceedings, Part XXIII. Lecture notes in computer science, vol 12368, Glasgow, UK, 23–28 August 2020. Springer, pp 484–501
Mao X, Chen Y, Li Y, He Y, Xue H (2020) Learning to characterize adversarial subspaces. In: ICASSP 2020—2020 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 2438–2442
Melis M, Demontis A, Biggio B, Brown G, Fumera G, Roli F (2017) Is deep learning safe for robot vision? Adversarial examples against the iCub humanoid. In: Proceedings of the IEEE international conference on computer vision workshops, pp 751–759
Meng D, Chen H (2017) MagNet: a two-pronged defense against adversarial examples. In: Proceedings of the 2017 ACM SIGSAC conference on computer and communications security, pp 135–147
Metzen JH, Genewein T, Fischer V, Bischoff B (2017) On detecting adversarial perturbations. In: 5th International conference on learning representations, ICLR 2017, conference track proceedings, Toulon, France, 24–26 April 2017. OpenReview.net
Miller D, Wang Y, Kesidis G (2019) When not to classify: anomaly detection of attacks (ADA) on DNN classifiers at test time. Neural Comput 31(8):1624–1670
Article MathSciNet Google Scholar
Miller DJ, Xiang Z, Kesidis G (2020) Adversarial learning targeting deep neural network classification: a comprehensive review of defenses against attacks. Proc IEEE 108(3):402–433
Article Google Scholar
Mittal A, Moorthy AK, Bovik AC (2012) No-reference image quality assessment in the spatial domain. IEEE Trans Image Process 21(12):4695–4708
Article MathSciNet Google Scholar
Monteiro J, Albuquerque I, Akhtar Z, Falk TH (2019) Generalizable adversarial examples detection based on bi-model decision mismatch. In: 2019 IEEE international conference on systems, man and cybernetics (SMC). IEEE, pp 2839–2844
Moosavi-Dezfooli S-M, Fawzi A, Frossard P (2016) DeepFool: a simple and accurate method to fool deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2574–2582
Moosavi-Dezfooli S-M, Fawzi A, Fawzi O, Frossard P (2017) Universal adversarial perturbations. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1765–1773
Mustafa A, Khan SH, Hayat M, Shen J, Shao L (2019) Image super-resolution as a defense against adversarial attacks. IEEE Trans Image Process 29:1711–1724
Article MathSciNet Google Scholar
Nayebi A, Ganguli S (2017) Biologically inspired protection of deep networks from adversarial attacks. CoRR, abs/1703.09202
Netzer Y, Wang T, Coates A, Bissacco A, Wu B, Ng AY (2011) Reading digits in natural images with unsupervised feature learning. In: NIPS workshop on deep learning and unsupervised feature learning 2011, Granada, Spain
Nguyen A, Yosinski J, Clune J (2015) Deep neural networks are easily fooled: high confidence predictions for unrecognizable images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 427–436
Nicolae M-I, Sinn M, Tran MN, Buesser B, Rawat A, Wistuba M, Zantedeschi V, Baracaldo N, Chen B, Ludwig H, Molloy IM, Edwards B (2019) Adversarial robustness toolbox v1.0.0, 2019
Ortiz-Jiménez G, Modas A, Moosavi-Dezfooli S-M, Frossard P (2021) Optimism in the face of adversity: understanding and improving deep learning through adversarial robustness. Proc IEEE 109(5):635–659
Article Google Scholar
O’Shea K, Nash R (2015) An introduction to convolutional neural networks. arXiv preprint arXiv:1511.08458
Pang T, Du C, Dong Y, Zhu J (2018) Towards robust detection of adversarial examples. In: Advances in neural information processing systems, pp 4579–4589
Papernot N, McDaniel PD, Goodfellow IJ (2016a) Transferability in machine learning: from phenomena to black-box attacks using adversarial samples. CoRR, abs/1605.07277
Papernot N, McDaniel P, Wu X, Jha S, Swami A (2016b) Distillation as a defense to adversarial perturbations against deep neural networks. In: 2016 IEEE symposium on security and privacy (SP). IEEE, pp 582–597
Papernot N, McDaniel P, Jha S, Fredrikson M, Celik ZB, Swami A (2016c) The limitations of deep learning in adversarial settings. In: 2016 IEEE European symposium on security and privacy (EuroS&P). IEEE, pp 372–387
Papernot N, McDaniel P, Goodfellow I, Jha S, Celik ZB, Swami A (2017) Practical black-box attacks against machine learning. In: Proceedings of the 2017 ACM on Asia conference on computer and communications security, pp 506–519
Pertigkiozoglou S, Maragos P (2018) Detecting adversarial examples in convolutional neural networks. CoRR, abs/1812.03303
Pitropakis N, Panaousis E, Giannetsos T, Anastasiadis E, Loukas G (2019) A taxonomy and survey of attacks against machine learning. Comput Sci Rev 34:100199
Article MathSciNet Google Scholar
Prakash A, Moran N, Garber S, DiLillo A, Storer J (2018) Deflecting adversarial attacks with pixel deflection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8571–8580
Redmon J, Divvala SK, Girshick RB, Farhadi A (2016) You only look once: unified, real-time object detection. In: 2016 IEEE conference on computer vision and pattern recognition, CVPR 2016, Las Vegas, NV, USA, 27–30 June 2016. IEEE Computer Society, pp 779–788
Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99
Ren H, Huang T, Yan H (2021) Adversarial examples: attacks and defenses in the physical world. Int J Mach Learn Cybern 12:1–12
Article Google Scholar
Sabour S, Cao Y, Faghri F, Fleet DJ (2016) Adversarial manipulation of deep representations. In: Bengio Y, LeCun Y (eds) 4th International conference on learning representations, ICLR 2016, conference track proceedings, San Juan, Puerto Rico, 2–4 May 2016
Sarkar S, Bansal A, Mahbub U, Chellappa R (2017) UPSET and ANGRI: breaking high performance image classifiers. CoRR, abs/1707.01159
Schmidt L, Santurkar S, Tsipras D, Talwar K, Madry A (2018) Adversarially robust generalization requires more data. In: Advances in neural information processing systems 31: annual conference on neural information processing systems 2018, NeurIPS 2018, 3–8 December 2018, Montréal, Canada, pp 5019–5031
Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-CAM: visual explanations from deep networks via gradient-based localization. In: IEEE international conference on computer vision, ICCV 2017, Venice, Italy, 22–29 October 2017. IEEE Computer Society, pp 618–626
Serban A, Poll E, Visser J (2020) Adversarial examples on object recognition: a comprehensive survey. ACM Comput Surv 53(3):1–38
Article Google Scholar
Sheikholeslami F, Jain S, Giannakis GB (2020) Minimum uncertainty based detection of adversaries in deep neural networks. In: Information theory and applications workshop, ITA 2020, San Diego, CA, USA, 2–7 February 2020. IEEE, pp 1–16
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: Bengio Y, LeCun Y (eds) 3rd International conference on learning representations, ICLR 2015, conference track proceedings, San Diego, CA, USA, 7–9 May 2015
Smith L, Gal Y (2018) Understanding measures of uncertainty for adversarial example detection. In: Globerson A, Silva R (eds) Proceedings of the thirty-fourth conference on uncertainty in artificial intelligence, UAI 2018, Monterey, California, USA, 6–10 August 2018. AUAI Press, pp 560–569
Song Y, Kim T, Nowozin S, Ermon S, Kushman N (2018) PixelDefend: leveraging generative models to understand and defend against adversarial examples. In: 6th International conference on learning representations, ICLR 2018, conference track proceedings, Vancouver, BC, Canada, 30 April–3 May 2018. OpenReview.net
Sotgiu A, Demontis A, Melis M, Biggio B, Fumera G, Feng X, Roli F (2020) Deep neural rejection against adversarial examples. EURASIP J Inf Secur. https://doi.org/10.1186/s13635-020-00105-y
Article Google Scholar
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958
MathSciNet MATH Google Scholar
Su J, Vargas DV, Kouichi S (2019) One pixel attack for fooling deep neural networks. IEEE Trans Evol Comput 23(5):828–841
Article Google Scholar
Sun L, Hashimoto K, Yin W, Asai A, Li J, Yu PS, Xiong C (2020) Adv-BERT: BERT is not robust on misspellings! Generating nature adversarial samples on BERT. CoRR, abs/2003.04985
Szegedy C, Zaremba W, Sutskever I, Bruna J, Erhan D, Goodfellow IJ, Fergus R (2014) Intriguing properties of neural networks. In: Bengio Y, LeCun Y (eds) 2nd International conference on learning representations, ICLR 2014, conference track proceedings, Banff, AB, Canada, 14–16 April 2014
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2015) Rethinking the inception architecture for computer vision. 2015. arXiv preprint arXiv:1512.00567
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: 2016 IEEE conference on computer vision and pattern recognition, CVPR 2016, Las Vegas, NV, USA, 27–30 June 2016. IEEE Computer Society, pp 2818–2826
Tobias SJ, Alexey D, Thomas B, Riedmiller MA (2015) Striving for simplicity: the all convolutional net. In: Bengio Y, LeCun Y (eds) 3rd International conference on learning representations, ICLR 2015. Workshop track proceedings, San Diego, CA, USA, 7–9 May 2015
Tramèr F, Kurakin A, Papernot N, Goodfellow IJ, Boneh D, McDaniel PD (2018) Ensemble adversarial training: attacks and defenses. In: 6th International conference on learning representations, ICLR 2018, conference track proceedings, Vancouver, BC, Canada, 30 April–3 May 2018. OpenReview.net
Van den Oord A, Kalchbrenner N, Espeholt L, Vinyals O, Graves A et al (2016) Conditional image generation with PixelCNN decoders. Adv Neural Inf Process Syst 29:4790–4798
Google Scholar
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems 30: annual conference on neural information processing systems 2017, 4–9 December 2017, Long Beach, CA, USA, pp 5998–6008
Wang X, Li J, Kuang X, Tan Y, Li J (2019) The security of machine learning in an adversarial setting: a survey. J Parallel Distrib Comput 130:12–23
Article Google Scholar
Wang D, Wang R, Dong L, Yan D, Zhang X, Gong Y (2020) Adversarial examples attack and countermeasure for speech recognition system: a survey. In: International conference on security and privacy in digital economy. Springer, pp 443–468
Xie C, Wang J, Zhang Z, Zhou Y, Xie L, Yuille AL (2017) Adversarial examples for semantic segmentation and object detection. In: IEEE international conference on computer vision, ICCV 2017, Venice, Italy, 22–29 October 2017. IEEE Computer Society, pp 1378–1387
Xie C, Wu Y, van der Maaten L, Yuille AL, He K (2019) Feature denoising for improving adversarial robustness. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 501–509
Xie C, Tan M, Gong B, Yuille AL, Le QV (2020) Smooth adversarial training. CoRR, abs/2006.14536
Xu W, Evans D, Qi Y (2018) Feature squeezing: detecting adversarial examples in deep neural networks. In: 25th Annual network and distributed system security symposium, NDSS 2018, San Diego, California, USA, 18–21 February 2018. The Internet Society
Xu H, Ma Y, Liu H, Deb D, Liu H, Tang J, Jain AK (2020) Adversarial attacks and defenses in images, graphs and text: a review. Int J Autom Comput 17(2):151–178
Article Google Scholar
Yang Z, Dai Z, Yang Y, Carbonell JG, Salakhutdinov R, Le QV (2019) XLNet: generalized autoregressive pretraining for language understanding. In: Advances in neural information processing systems 32: annual conference on neural information processing systems 2019, NeurIPS 2019, 8–14 December 2019, Vancouver, BC, Canada, pp 5754–5764
Yao L, Miller J (2015) Tiny ImageNet classification with convolutional neural networks. CS 231N 2(5):8
Google Scholar
Yuan X, He P, Zhu Q, Li X (2019) Adversarial examples: attacks and defenses for deep learning. IEEE Trans Neural Netw Learn Syst 30(9):2805–2824
Article MathSciNet Google Scholar
Zhang WE, Sheng QZ, Alhazmi A, Li C (2020) Adversarial attacks on deep-learning models in natural language processing: a survey. ACM Trans Intell Syst Technol (TIST) 11(3):1–41
Google Scholar
Zheng Z, Hong P (2018) Robust detection of adversarial attacks by modeling the intrinsic properties of deep neural networks. In: Advances in neural information processing systems, pp 7913–7922
Zuo F, Zeng Q (2021) Exploiting the sensitivity of L2 adversarial examples to erase-and-restore. In: Cao J, Au MH, Lin Z, Yung M, (eds) ASIA CCS ’21: ACM Asia conference on computer and communications security, virtual event, Hong Kong, 7–11 June 2021. ACM, pp 40–51

Download references

Acknowledgements

The Project is funded by both Région Bretagne (Brittany region), France, and direction générale de l’armement (DGA).

Author information

Authors and Affiliations

University of Rennes, INSA Rennes, CNRS, IETR - UMR 6164, 35000, Rennes, France
Ahmed Aldahdooh, Wassim Hamidouche & Olivier Déforges
National Institute of Telecommunications and ICT, Oran, Algeria
Sid Ahmed Fezza

Authors

Ahmed Aldahdooh
View author publications
You can also search for this author in PubMed Google Scholar
Wassim Hamidouche
View author publications
You can also search for this author in PubMed Google Scholar
Sid Ahmed Fezza
View author publications
You can also search for this author in PubMed Google Scholar
Olivier Déforges
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ahmed Aldahdooh.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Aldahdooh, A., Hamidouche, W., Fezza, S.A. et al. Adversarial example detection for DNN models: a review and experimental comparison. Artif Intell Rev 55, 4403–4462 (2022). https://doi.org/10.1007/s10462-021-10125-w

Download citation

Accepted: 17 December 2021
Published: 06 January 2022
Issue Date: August 2022
DOI: https://doi.org/10.1007/s10462-021-10125-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Adversarial example detection for DNN models: a review and experimental comparison

Abstract

Access this article

Similar content being viewed by others

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

Deepfakes generation and detection: state-of-the-art, open challenges, countermeasures, and way forward

Image forgery detection: a survey of recent deep-learning approaches

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Adversarial example detection for DNN models: a review and experimental comparison

Abstract

Access this article

Similar content being viewed by others

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

Deepfakes generation and detection: state-of-the-art, open challenges, countermeasures, and way forward

Image forgery detection: a survey of recent deep-learning approaches

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation