Skip to main content

Adversarial image detection in deep neural networks


Deep neural networks are more and more pervading many computer vision applications and in particular image classification. Notwithstanding that, recent works have demonstrated that it is quite easy to create adversarial examples, i.e., images malevolently modified to cause deep neural networks to fail. Such images contain changes unnoticeable to the human eye but sufficient to mislead the network. This represents a serious threat for machine learning methods. In this paper, we investigate the robustness of the representations learned by the fooled neural network, analyzing the activations of its hidden layers. Specifically, we tested scoring approaches used for kNN classification, in order to distinguish between correctly classified authentic images and adversarial examples. These scores are obtained searching only between the very same images used for training the network. The results show that hidden layers activations can be used to reveal incorrect classifications caused by adversarial attacks.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6


  1. 1.

  2. 2.

  3. 3.

  4. 4.


  1. 1.

    Amato G, Falchi F, Gennaro C, Rabitti F (2016) Yfcc100m-hnfc6: a large-scale deep features benchmark for similarity search. In: International conference on similarity search and applications. Springer, pp 196–209

  2. 2.

    Amato G, Falchi F, Vadicamo L (2016) Visual recognition of ancient inscriptions using convolutional neural network and fisher vector. J Comput Cultur Heritag (JOCCH) 9(4):21

    Google Scholar 

  3. 3.

    Amato G, Carrara F, Falchi F, Gennaro C, Meghini C, Vairo C (2017) Deep learning for decentralized parking lot occupancy detection. Expert Syst Appl 72:327–334

    Article  Google Scholar 

  4. 4.

    Amerini I, Uricchio T, Ballan L, Caldelli R (2017) Localization of jpeg double compression through multi-domain convolutional neural networks. In: 2017 IEEE Conference on computer vision and pattern recognition workshops (CVPRW), pp 1865–1871.

  5. 5.

    Baraldi L, Grana C, Cucchiara R (2016) Hierarchical boundary-aware neural encoder for video captioning. arXiv:1611.09312

  6. 6.

    Bayar B, Stamm MC (2016) A deep learning approach to universal image manipulation detection using a new convolutional layer. In: 4th ACM Workshop on information hiding and multimedia security, pp 5–10

  7. 7.

    Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828.

    Article  Google Scholar 

  8. 8.

    Brain G (2017) NIPS 2017: competition on adversarial attacks and defenses. Online Accessed 19 Jan 2018

  9. 9.

    Carrara F, Esuli A, Fagni T, Falchi F, Fernández A M (2016) Picture it in your mind: generating high level visual representations from textual descriptions. arXiv:1606.07287

  10. 10.

    Carrara F, Falchi F, Caldelli R, Amato G, Fumarola R, Becarelli R (2017) Detecting adversarial example attacks to deep neural networks. In: Proceedings of the 15th international workshop on content-based multimedia indexing. ACM, p 38

  11. 11.

    Chandrasekhar V, Lin J, Morère O, Goh H, Veillard A (2015) A practical guide to cnns and fisher vectors for image instance retrieval. arXiv:1508.02496

  12. 12.

    Donahue J, Jia Y, Vinyals O, Hoffman J, Zhang N, Tzeng E, Darrell T (2014) Decaf: a deep convolutional activation feature for generic visual recognition. In: Icml, vol 32, pp 647–655

  13. 13.

    Dong C, Loy CC, He K, Tang X (2016) Image super-resolution using deep convolutional networks. IEEE Trans Pattern Anal Mach Intell 38(2):295–307

    Article  Google Scholar 

  14. 14.

    Dong J, Li X, Snoek CG (2016) Word2visualvec: cross-media retrieval by visual feature prediction. arXiv:1604.06838

  15. 15.

    Goodfellow IJ, Shlens J, Szegedy C (2014) Explaining and harnessing adversarial examples. arXiv:1412.6572

  16. 16.

    Gordo A, Almazán J, Revaud J, Larlus D (2016) Deep image retrieval: learning global representations for image search. In: European conference on computer vision. Springer, pp 241–257

  17. 17.

    Grosse K, Papernot N, Manoharan P, Backes M, McDaniel P (2016) Adversarial perturbations against deep neural networks for malware classification. arXiv:1606.04435

  18. 18.

    Grosse K, Manoharan P, Papernot N, Backes M, McDaniel P (2017) On the (statistical) detection of adversarial examples. arXiv:1702.06280

  19. 19.

    Gu S, Rigazio L (2014) Towards deep neural network architectures robust to adversarial examples. arXiv:1412.5068

  20. 20.

    He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  21. 21.

    Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: Convolutional architecture for fast feature embedding. arXiv:1408.5093

  22. 22.

    Klarreich E (2016) Learning securely. Commun ACM 59(11):12–14.

    Article  Google Scholar 

  23. 23.

    Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105

  24. 24.

    Kurakin A, Goodfellow I, Bengio S (2016) Adversarial examples in the physical world. arXiv:1607.02533

  25. 25.

    LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521 (7553):436–444

    Article  Google Scholar 

  26. 26.

    Li X, Uricchio T, Ballan L, Bertini M, Snoek CG, Bimbo AD (2016) Socializing the semantic gap: a comparative survey on image tag assignment, refinement, and retrieval. ACM Comput Surv (CSUR) 49(1):14

    Article  Google Scholar 

  27. 27.

    Metzen JH, Genewein T, Fischer V, Bischoff B (2017) On detecting adversarial perturbations. arXiv:1702.04267

  28. 28.

    Moosavi-Dezfooli SM, Fawzi A, Fawzi O, Frossard P (2016) Universal adversarial perturbations. arXiv:1610.08401

  29. 29.

    Nguyen A, Yosinski J, Clune J (2015) Deep neural networks are easily fooled: high confidence predictions for unrecognizable images. In: The IEEE Conference on computer vision and pattern recognition (CVPR)

  30. 30.

    Papernot N, McDaniel P, Goodfellow I, Jha S, Celik ZB, Swami A (2016) Practical black-box attacks against deep learning systems using adversarial examples. arXiv:1602.02697

  31. 31.

    Papernot N, McDaniel P, Jha S, Fredrikson M, Celik ZB, Swami A (2016) The limitations of deep learning in adversarial settings. In: 2016 IEEE European symposium on security and privacy (EuroS&P). IEEE, pp 372–387

  32. 32.

    Papernot N, McDaniel P, Sinha A, Wellman M (2016) Towards the science of security and privacy in machine learning. arXiv:1611.03814

  33. 33.

    Papernot N, McDaniel P, Wu X, Jha S, Swami A (2016) Distillation as a defense to adversarial perturbations against deep neural networks. In: 2016 IEEE Symposium on security and privacy (SP). IEEE, pp 582–597

  34. 34.

    Papernot N, Carlini N, Goodfellow I, Feinman R, Faghri F, Matyasko A, Hambardzumyan K, Juang YL, Kurakin A, Sheatsley R, Garg A, Lin YC (2017) cleverhans v2.0.0: an adversarial machine learning library. arXiv:1610.00768

  35. 35.

    Parkhi OM, Vedaldi A, Zisserman A (2015) Deep face recognition. In: British machine vision conference

  36. 36.

    Razavian AS, Azizpour H, Sullivan J, Carlsson S (2014) CNN features off-the-shelf: an astounding baseline for recognition. In: 2014 IEEE Conference on computer vision and pattern recognition workshops (CVPRW). IEEE, pp 512–519

  37. 37.

    Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2015) ImageNet large scale visual recognition challenge. Int J Comput Vis (IJCV) 115(3):211–252.

    MathSciNet  Article  Google Scholar 

  38. 38.

    Schroff F, Kalenichenko D, Philbin J (2015) Facenet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 815–823

  39. 39.

    Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y (2013) Overfeat: integrated recognition, localization and detection using convolutional networks. arXiv:1312.6229

  40. 40.

    Sharif Razavian A, Azizpour H, Sullivan J, Carlsson S (2014) Cnn features off-the-shelf: an astounding baseline for recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 806–813

  41. 41.

    Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. CoRR arXiv:1409.1556

  42. 42.

    Srivastava N, Hinton GE, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958

    MathSciNet  MATH  Google Scholar 

  43. 43.

    Szegedy C, Zaremba W, Sutskever I, Bruna J, Erhan D, Goodfellow I, Fergus R (2013) Intriguing properties of neural networks. arXiv:1312.6199

  44. 44.

    Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9

  45. 45.

    Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826

  46. 46.

    Tabacof P, Valle E (2016) Exploring the space of adversarial images. In: 2016 International joint conference on neural networks (IJCNN). IEEE, pp 426–433

  47. 47.

    Tuama A, Comby F, Chaumont M (2016) Camera model identification with the use of deep convolutional neural networks. In: 2016 IEEE International workshop on information forensics and security (WIFS)

  48. 48.

    Ying Z, Goha J, Wina L, Thinga V (2016) Image region forgery detection: a deep learning approach. In: Singapore cyber-security conference (SG-CRC), pp 1–11

Download references


This work was partially supported by Smart News, Social sensing for breaking news, co-founded by the Tuscany region under the FAR-FAS 2014 program, CUP CIPE D58C15000270008, and the project ESPRESS (Smartphone identification based on on-board sensors for security applications) co-funded by Fondazione Cassa di Risparmio di Firenze (Italy) within the Scientific Research and Technological Innovation framework. We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Tesla K40 GPU used for this research.

Author information



Corresponding author

Correspondence to Fabio Carrara.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Carrara, F., Falchi, F., Caldelli, R. et al. Adversarial image detection in deep neural networks. Multimed Tools Appl 78, 2815–2835 (2019).

Download citation


  • Adversarial images detection
  • Deep convolutional neural network
  • Machine learning security