International Journal of Computer Vision

, Volume 120, Issue 3, pp 233–255 | Cite as

Visualizing Deep Convolutional Neural Networks Using Natural Pre-images

  • Aravindh MahendranEmail author
  • Andrea Vedaldi


Image representations, from SIFT and bag of visual words to convolutional neural networks (CNNs) are a crucial component of almost all computer vision systems. However, our understanding of them remains limited. In this paper we study several landmark representations, both shallow and deep, by a number of complementary visualization techniques. These visualizations are based on the concept of “natural pre-image”, namely a natural-looking image whose representation has some notable property. We study in particular three such visualizations: inversion, in which the aim is to reconstruct an image from its representation, activation maximization, in which we search for patterns that maximally stimulate a representation component, and caricaturization, in which the visual patterns that a representation detects in an image are exaggerated. We pose these as a regularized energy-minimization framework and demonstrate its generality and effectiveness. In particular, we show that this method can invert representations such as HOG more accurately than recent alternatives while being applicable to CNNs too. Among our findings, we show that several layers in CNNs retain photographically accurate information about the image, with different degrees of geometric and photometric invariance.


Visualization Convolutional neural networks Pre-image problem 



We gratefully acknowledge the support of the ERC StG IDIU for Andrea Vedaldi and of BP for Aravindh Mahendran.


  1. Bishop, C. M. (1995). Neural networks for pattern recognition. Oxford: Clarendon Press.zbMATHGoogle Scholar
  2. Chatfield, K., Simonyan, K., Vedaldi, A., & Zisserman, A. (2014). Return of the devil in the details: Delving deep into convolutional nets. In Proceedings of BMVC.Google Scholar
  3. Chen, Y., Ranftl, R., & Pock, T. (2014). A bi-level view of inpainting-based image compression. In Proceedings of computer vision winter workshop.Google Scholar
  4. Csurka, G., Dance, C. R., Dan, L., Willamowski, J., & Bray, C. (2004). Visual categorization with bags of keypoints. In Proceedings of ECCV workshop on statistical learning in computer vision.Google Scholar
  5. Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In CVPR.Google Scholar
  6. d’Angelo, E., Alahi, A., & Vandergheynst, P. (2012). Beyond bits: Reconstructing images from local binary descriptors. In ICPR (pp. 935–938).Google Scholar
  7. Dosovitskiy, A., & Brox, T. (2015). Inverting convolutional networks with convolutional networks. CoRR. arXiv:1506.02753.
  8. Duchi, J., Hazan, E., & Singer, Y. (2011). Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learninig Research, 12, 2121–2159.MathSciNetzbMATHGoogle Scholar
  9. Erhan, D., Bengio, Y., Courville, A., & Vincent, P. (2009). Visualizing higher-layer features of a deep network. Technical report (Vol. 1341). Montreal: University of Montreal.Google Scholar
  10. Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2010). The PASCAL visual object classes challenge 2010 (VOC2010) results. Accessed 10 Apr 2016.
  11. Felzenszwalb, P. F., Girshick, R. B., McAllester, D., & Ramanan, D. (2010). Object detection with discriminatively trained part based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9), 1627–1645.CrossRefGoogle Scholar
  12. Gatys, L. A., Ecker, A. S., & Bethge, M. (2015). A neural algorithm of artistic style. CoRR. arXiv:1508.06576.
  13. Gatys, L. A., Ecker, A. S., & Bethge, M. (2015). Texture synthesis and the controlled generation of natural stimuli using convolutional neural networks. In NIPS proceedings.Google Scholar
  14. Girshick, R. B., Felzenszwalb, P. F., & McAllester, D. (2010). Discriminatively trained deformable part models, release 5. Accessed 10 April 2016.
  15. Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 313, 5786.MathSciNetCrossRefzbMATHGoogle Scholar
  16. Huang, W. J., & Mumford, D. (1999). Statistics of natural images and models. In Proceedings of CVPR.Google Scholar
  17. Jégou, H., Douze, M., Schmid, C., & Pérez, P. (2010). Aggregating local descriptors into a compact image representation. In CVPR.Google Scholar
  18. Jensen, C. A., Reed, R. D., Marks, R. J., El-Sharkawi, M., Jung, J. B., Miyamoto, R., et al. (1999). Inversion of feedforward neural networks: Algorithms and applications. Proceedings of the IEEE, 87, 9.CrossRefGoogle Scholar
  19. Jia, Y. (2013). Caffe: An open source convolutional architecture for fast feature embedding. Accessed 14 Nov 2014.
  20. Julesz, B. (1981). Textons, the elements of texture perception, and their interactions. Nature, 290(5802), 91–97.CrossRefGoogle Scholar
  21. Kato, H., & Harada, T. (2014). Image reconstruction from bag-of-visual-words. In CVPR.Google Scholar
  22. Krishnan, D., & Fergus, R. (2009). Fast image deconvolution using hyper-laplacian priors. In NIPS.Google Scholar
  23. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In NIPS.Google Scholar
  24. Lee, S., & Kil, R. M. (1994). Inverse mapping of continuous functions using local and global information. IEEE Transactions on Neural Networks, 5, 3.CrossRefGoogle Scholar
  25. Leung, T., & Malik, J. (2001). Representing and recognizing the visual appearance of materials using three-dimensional textons. IJCV, 43, 1.CrossRefzbMATHGoogle Scholar
  26. Linden, A., & Kindermann, J. (1989). Inversion of multilayer nets. In Proceedings of international conference on neural networks.Google Scholar
  27. Lowe, D. G. (1999). Object recognition from local scale-invariant features. In ICCV.Google Scholar
  28. Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. IJCV, 2(60), 91–110.CrossRefGoogle Scholar
  29. Lu, B. L., Kita, H., & Nishikawa, Y. (1999). Inverting feedforward neural networks using linear and nonlinear programming. IEEE Transactions on Neural Networks, 10, 6.CrossRefGoogle Scholar
  30. Mahendran, A., & Vedaldi, A. (2015). Understanding deep image representations by inverting them. In Proceedings of CVPR.Google Scholar
  31. Mordvintsev, A., Olah, C., & Tyka, M. (2015). Inceptionism: Going deeper into neural networks. Accessed 17 Oct 2015.
  32. Nguyen, A., Yosinski, J., & Clune, J. (2015). Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In Proceedings of CVPR.Google Scholar
  33. Nowak, E., Jurie, F., & Triggs, B. (2006). Sampling strategies for bag-of-features image classification. In ECCV.Google Scholar
  34. Perronnin, F., & Dance, C. (2006). Fisher kernels on visual vocabularies for image categorizaton. In CVPR.Google Scholar
  35. Portilla, J., & Simoncelli, E. P. (2000). A parametric texture model based on joint statistics of complex wavelet coefficients. IJCV, 40, 49–70.CrossRefzbMATHGoogle Scholar
  36. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015). ImageNet large scale visual recognition challenge. IJCV, 115(3), 211–252. doi: 10.1007/s11263-015-0816-y.MathSciNetCrossRefGoogle Scholar
  37. Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., & LeCun, Y. (2014). Overfeat: Integrated recognition, localization and detection using convolutional networks. In CoRR. arXiv:1312.6229.
  38. Simonyan, K., Vedaldi, A., & Zisserman, A. (2014). Deep inside convolutional networks: Visualising image classification models and saliency maps. In Proceedings of ICLR.Google Scholar
  39. Sivic, J., & Zisserman, A. (2003). Video Google: A text retrieval approach to object matching in videos. In ICCV Google Scholar
  40. Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I. J., et al. (2014). Intriguing properties of neural networks. In Proceedings of ICLR.Google Scholar
  41. Tatu, A., Lauze, F., Nielsen, M., & Kimia, B. (2011). Exploring the representation capabilities of the HOG descriptor. In ICCV workshop.Google Scholar
  42. Várkonyi-Kóczy, A. R., & Rövid, A. (2005). Observer based iterative neural network model inversion. In IEEE international conference on fuzzy systems.Google Scholar
  43. Vedaldi, A. (2007). An open implementation of the SIFT detector and descriptor. Technical report 070012. Los Angeles: UCLA CSD.Google Scholar
  44. Vedaldi, A., & Lenc, K. (2014). MatConvNet: CNNs for MATLAB. Accessed 17 Oct 2015.
  45. Vondrick, C., Khosla, A., Malisiewicz, T., & Torralba, A. (2013). HOGgles: Visualizing object detection features. In ICCV.Google Scholar
  46. Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., & Gong, Y. (2010). Locality-constrained linear coding for image classification. In CVPR.Google Scholar
  47. Weinzaepfel, P., Jégou, H., & Pérez, P. (2011). Reconstructing an image from its local descriptors. In CVPR.Google Scholar
  48. Williams, R. J. (1986). Inverting a connectionist network mapping by back-propagation of error. In Proceedings of CogSci.Google Scholar
  49. Yang, J., Yu, K., & Huang, T. (2010). Supervised translation-invariant sparse coding. In CVPR.Google Scholar
  50. Yosinksi, J., Clune, J., Nguyen, A., Fuchs, T., & Lipson, H. (2015). Understanding neural networks through deep visualization. In Proceedings of ICML workshop.Google Scholar
  51. Zeiler, M. D. (2012). ADADELTA: An adaptive learning rate method. CoRR. arXiv:1212.5701.
  52. Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In ECCV.Google Scholar
  53. Zhou, X., Yu, K., Zhang, T., & Huang, T. S. (2010). Image classification using super-vector coding of local image descriptors. In ECCV.Google Scholar
  54. Zhu, C., Byrd, R. H., Lu, P., & Nocedal, J. (1997). Algorithm 778: L-bfgs-b: Fortran subroutines for large-scale bound-constrained optimization. ACM Transactions on Mathematical Software (TOMS), 23(4), 550–560.MathSciNetCrossRefzbMATHGoogle Scholar
  55. Zhu, S. C., Wu, Y., & Mumford, D. (1998). Filters, random fields and maximum entropy (FRAME): Towards a unified theory for texture modeling. IJCV, 27(2), 4.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  1. 1.University of OxfordOxfordUK

Personalised recommendations