Making Sense of CNNs: Interpreting Deep Representations and Their Invariances with INNs

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12362)


To tackle increasingly complex tasks, it has become an essential ability of neural networks to learn abstract representations. These task-specific representations and, particularly, the invariances they capture turn neural networks into black box models that lack interpretability. To open such a black box, it is, therefore, crucial to uncover the different semantic concepts a model has learned as well as those that it has learned to be invariant to. We present an approach based on INNs that (i) recovers the task-specific, learned invariances by disentangling the remaining factor of variation in the data and that (ii) invertibly transforms these recovered invariances combined with the model representation into an equally expressive one with accessible semantic concepts. As a consequence, neural network representations become understandable by providing the means to (i) expose their semantic meaning, (ii) semantically modify a representation, and (iii) visualize individual learned semantic concepts and invariances. Our invertible approach significantly extends the abilities to understand black box models by enabling post-hoc interpretations of state-of-the-art networks without compromising their performance.



This work has been supported in part by the German Research Foundation (DFG) projects 371923335, 421703927, and EXC 2181/1 - 390900948 and the German federal ministry BMWi within the project “KI Absicherung”.

Supplementary material

504472_1_En_38_MOESM1_ESM.pdf (23.1 mb)
Supplementary material 1 (pdf 23614 KB)


  1. 1.
    Achille, A., Soatto, S.: Emergence of invariance and disentanglement in deep representations. J. Mach. Learn. Res. 19(1), 1947–1980 (2018)MathSciNetzbMATHGoogle Scholar
  2. 2.
    Ardizzone, L., et al.: Analyzing inverse problems with invertible neural networks (2018)Google Scholar
  3. 3.
    Bach, S., Binder, A., Montavon, G., Klauschen, F., Müller, K.R., Samek, W.: On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLoS ONE 10(7), e0130140 (2015)CrossRefGoogle Scholar
  4. 4.
    Bau, D., Zhou, B., Khosla, A., Oliva, A., Torralba, A.: Network dissection: quantifying interpretability of deep visual representations. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017).
  5. 5.
    Bau, D., et al.: GAN dissection: visualizing and understanding generative adversarial networks (2018)Google Scholar
  6. 6.
    Brock, A., Donahue, J., Simonyan, K.: Large scale GAN training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096 (2018)
  7. 7.
    Cao, Q., Shen, L., Xie, W., Parkhi, O.M., Zisserman, A.: VGGFace2: a dataset for recognising faces across pose and age. In: 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), pp. 67–74. IEEE (2018)Google Scholar
  8. 8.
    Choi, Y., Choi, M., Kim, M., Ha, J.W., Kim, S., Choo, J.: StarGAN: unified generative adversarial networks for multi-domain image-to-image translation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)Google Scholar
  9. 9.
    Commission, E.: On artificial intelligence - a European approach to excellence and trust. Technical report (2020). Accessed Feb 2020
  10. 10.
    Dai, B., Wipf, D.: Diagnosing and enhancing VAE models (2019)Google Scholar
  11. 11.
    Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)Google Scholar
  12. 12.
    Dinh, L., Sohl-Dickstein, J., Bengio, S.: Density estimation using real NVP (2016)Google Scholar
  13. 13.
    Dosovitskiy, A., Brox, T.: Generating images with perceptual similarity metrics based on deep networks (2016)Google Scholar
  14. 14.
    Esser, P., Haux, J., Ommer, B.: Unsupervised robust disentangling of latent characteristics for image synthesis. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV) (2019).
  15. 15.
    Esser, P., Rombach, R., Ommer, B.: A disentangling invertible interpretation network for explaining latent representations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9223–9232 (2020)Google Scholar
  16. 16.
    Fong, R., Vedaldi, A.: Net2Vec: quantifying and explaining how concepts are encoded by filters in deep neural networks. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8730–8738 (2018).
  17. 17.
    Geirhos, R., Rubisch, P., Michaelis, C., Bethge, M., Wichmann, F.A., Brendel, W.: ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness. arXiv preprint arXiv:1811.12231 (2018)
  18. 18.
    Goetschalckx, L., Andonian, A., Oliva, A., Isola, P.: GANalyze: toward visual definitions of cognitive image properties. arXiv preprint arXiv:1906.10112 (2019)
  19. 19.
    Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014)
  20. 20.
    Goodman, B., Flaxman, S.: European union regulations on algorithmic decision-making and a “right to explanation”. AI Mag. 38(3), 50–57 (2017). Scholar
  21. 21.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)Google Scholar
  22. 22.
    Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local Nash equilibrium (2017)Google Scholar
  23. 23.
    Iandola, F.N., Han, S., Moskewicz, M.W., Ashraf, K., Dally, W.J., Keutzer, K.: SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and \(<\)0.5 mb model size. arXiv preprint arXiv:1602.07360 (2016)
  24. 24.
    Jacobsen, J.H., Behrmann, J., Zemel, R., Bethge, M.: Excessive invariance causes adversarial vulnerability (2018)Google Scholar
  25. 25.
    Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. arXiv preprint arXiv:1312.6114 (2013)
  26. 26.
    Kingma, D.P., Dhariwal, P.: Glow: generative flow with invertible 1x1 convolutions. In: Advances in Neural Information Processing Systems, pp. 10215–10224 (2018)Google Scholar
  27. 27.
    Kotovenko, D., Sanakoyeu, A., Lang, S., Ommer, B.: Content and style disentanglement for artistic style transfer. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 4421–4430 (2019)Google Scholar
  28. 28.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)Google Scholar
  29. 29.
    Kulkarni, T.D., Whitney, W., Kohli, P., Tenenbaum, J.B.: Deep convolutional inverse graphics network (2015)Google Scholar
  30. 30.
    LeCun, Y.: The MNIST database of handwritten digits (1998).
  31. 31.
    LeCun, Y.: Learning invariant feature hierarchies. In: Fusiello, A., Murino, V., Cucchiara, R. (eds.) ECCV 2012. LNCS, vol. 7583, pp. 496–505. Springer, Heidelberg (2012). Scholar
  32. 32.
    Li, Y., Singh, K.K., Ojha, U., Lee, Y.J.: MixNMatch: multifactor disentanglement and encoding for conditional image generation (2019)Google Scholar
  33. 33.
    Lipton, Z.C.: The mythos of model interpretability (2016)Google Scholar
  34. 34.
    Liu, M.Y., et al.: Few-shot unsupervised image-to-image translation. arXiv preprint arXiv:1905.01723 (2019)
  35. 35.
    Liu, Z., Luo, P., Wang, X., Tang, X.: Deep learning face attributes in the wild. In: Proceedings of International Conference on Computer Vision (ICCV) (2015)Google Scholar
  36. 36.
    Locatello, F., et al.: Challenging common assumptions in the unsupervised learning of disentangled representations (2018)Google Scholar
  37. 37.
    Lorenz, D., Bereska, L., Milbich, T., Ommer, B.: Unsupervised part-based disentangling of object shape and appearance. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10947–10956 (2019)Google Scholar
  38. 38.
    Mahendran, A., Vedaldi, A.: Visualizing deep convolutional neural networks using natural pre-images. Int. J. Comput. Vis. 120(3), 233–255 (2016)MathSciNetCrossRefGoogle Scholar
  39. 39.
    Miller, T.: Explanation in artificial intelligence: insights from the social sciences. Artif. Intell. 267, 1–38 (2019)MathSciNetCrossRefGoogle Scholar
  40. 40.
    Montavon, G., Lapuschkin, S., Binder, A., Samek, W., Müller, K.R.: Explaining nonlinear classification decisions with deep Taylor decomposition. Pattern Recogn. 65, 211–222 (2017)CrossRefGoogle Scholar
  41. 41.
    Montavon, G., Samek, W., Müller, K.R.: Methods for interpreting and understanding deep neural networks. Digit. Signal Proc. 73, 1–15 (2018)MathSciNetCrossRefGoogle Scholar
  42. 42.
    Mordvintsev, A., Olah, C., Tyka, M.: Inceptionism: going deeper into neural networks (2015)Google Scholar
  43. 43.
    Nash, C., Kushman, N., Williams, C.K.: Inverting supervised representations with autoregressive neural density models. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 1620–1629 (2019)Google Scholar
  44. 44.
    Nguyen, A., Dosovitskiy, A., Yosinski, J., Brox, T., Clune, J.: Synthesizing the preferred inputs for neurons in neural networks via deep generator networks (2016)Google Scholar
  45. 45.
    Plumb, G., Al-Shedivat, M., Xing, E., Talwalkar, A.: Regularizing black-box models for improved interpretability (2019)Google Scholar
  46. 46.
    Redlich, A.N.: Supervised factorial learning. Neural Comput. 5(5), 750–766 (1993). Scholar
  47. 47.
    Rezende, D.J., Mohamed, S., Wierstra, D.: Stochastic backpropagation and approximate inference in deep generative models. In: Proceedings of the 31st International Conference on International Conference on Machine Learning, vol. 32, pp. II-1278. (2014)Google Scholar
  48. 48.
    Rombach, R., Esser, P., Ommer, B.: Network fusion for content creation with conditional INNs (2020)Google Scholar
  49. 49.
    Samek, W., Wiegand, T., Müller, K.R.: Explainable artificial intelligence: understanding, visualizing and interpreting deep learning models. arXiv preprint arXiv:1708.08296 (2017)
  50. 50.
    Santurkar, S., Tsipras, D., Tran, B., Ilyas, A., Engstrom, L., Madry, A.: Image synthesis with a single (robust) classifier (2019)Google Scholar
  51. 51.
    Schroff, F., Kalenichenko, D., Philbin, J.: FaceNet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 815–823 (2015)Google Scholar
  52. 52.
    Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-CAM: visual explanations from deep networks via gradient-based localization. Int. J. Comput. Vis. 128(2), 336–359 (2019). Scholar
  53. 53.
    Shocher, A., et al.: Semantic pyramid for image generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)Google Scholar
  54. 54.
    Simon, M., Rodner, E.: Neural activation constellations: unsupervised part model discovery with convolutional networks. In: 2015 IEEE International Conference on Computer Vision (ICCV) (2015).
  55. 55.
    Simon, M., Rodner, E., Denzler, J.: Part detector discovery in deep convolutional neural networks. ArXiv abs/1411.3159 (2014)Google Scholar
  56. 56.
    Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034 (2013)
  57. 57.
    Szegedy, C., et al.: Intriguing properties of neural networks (2013)Google Scholar
  58. 58.
    Upchurch, P., et al.: Deep feature interpolation for image content changes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7064–7073 (2017)Google Scholar
  59. 59.
    Xian, Y., Lampert, C.H., Schiele, B., Akata, Z.: Zero-shot learning’a comprehensive evaluation of the good, the bad and the ugly. IEEE Trans. Pattern Anal. Mach. Intell. 41(9), 2251–2265 (2018)CrossRefGoogle Scholar
  60. 60.
    Xiao, Z., Yan, Q., Amit, Y.: Generative latent flow (2019)Google Scholar
  61. 61.
    Yosinski, J., Clune, J., Nguyen, A., Fuchs, T., Lipson, H.: Understanding neural networks through deep visualization (2015)Google Scholar
  62. 62.
    Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 818–833. Springer, Cham (2014). Scholar
  63. 63.
    Zhang, Q., Nian Wu, Y., Zhu, S.C.: Interpretable convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8827–8836 (2018)Google Scholar
  64. 64.
    Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: CVPR (2018)Google Scholar
  65. 65.
    Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Object detectors emerge in deep scene CNNs (2014)Google Scholar
  66. 66.
    Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016).

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Interdisciplinary Center for Scientific Computing, HCIHeidelberg UniversityHeidelbergGermany

Personalised recommendations