Explanations for Attributing Deep Neural Network Predictions

  • Ruth FongEmail author
  • Andrea Vedaldi
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11700)


Given the recent success of deep neural networks and their applications to more high impact and high risk applications, like autonomous driving and healthcare decision-making, there is a great need for faithful and interpretable explanations of “why” an algorithm is making a certain prediction. In this chapter, we introduce 1. Meta-Predictors as Explanations, a principled framework for learning explanations for any black box algorithm, and 2. Meaningful Perturbations, an instantiation of our paradigm applied to the problem of attribution, which is concerned with attributing what features of an input (i.e., regions of an input image) are responsible for a model’s output (i.e., a CNN classifier’s object class prediction). We first introduced these contributions in [8]. We also briefly survey existing visual attribution methods and highlight how they faith to be both faithful and interpretable.


Explainable artificial intelligence Computer vision Machine learning 


  1. 1.
    Adebayo, J., Gilmer, J., Muelly, M., Goodfellow, I., Hardt, M., Kim, B.: Sanity checks for saliency maps. In: NeurIPS, pp. 9525–9536 (2018)Google Scholar
  2. 2.
    Ancona, M., Ceolini, E., Öztireli, C., Gross, M.: Towards better understanding of gradient-based attribution methods for deep neural networks. In: ICLR (2018)Google Scholar
  3. 3.
    Arras, L., Horn, F., Montavon, G., Müller, K.R., Samek, W.: “What is relevant in a text document?”: an interpretable machine learning approach. PLoS ONE 12(8), e0181142 (2017)CrossRefGoogle Scholar
  4. 4.
    Bach, S., Binder, A., Montavon, G., Klauschen, F., Müller, K.R., Samek, W.: On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLoS ONE 10(7), e0130140 (2015)CrossRefGoogle Scholar
  5. 5.
    Cao, C., et al.: Look and think twice: capturing top-down visual attention with feedback convolutional neural networks. In: ICCV, pp. 2956–2964 (2015)Google Scholar
  6. 6.
    Dabkowski, P., Gal, Y.: Real time image saliency for black box classifiers. In: NIPS, pp. 6967–6976 (2017)Google Scholar
  7. 7.
    Fong, R., Vedaldi, A.: Net2vec: quantifying and explaining how concepts are encoded by filters in deep neural networks. In: CVPR, pp. 8730–8738 (2018)Google Scholar
  8. 8.
    Fong, R.C., Vedaldi, A.: Interpretable explanations of black boxes by meaningful perturbation. In: ICCV, pp. 3429–3437 (2017)Google Scholar
  9. 9.
    Greydanus, S., Koul, A., Dodge, J., Fern, A.: Visualizing and understanding atari agents. arXiv preprint arXiv:1711.00138 (2017)
  10. 10.
    Jia, Y., et al.: Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093 (2014)
  11. 11.
    Kingma, D., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
  12. 12.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS, pp. 1097–1105 (2012)Google Scholar
  13. 13.
    Kurakin, A., Goodfellow, I., Bengio, S.: Adversarial examples in the physical world. arXiv preprint arXiv:1607.02533 (2016)
  14. 14.
    Lenc, K., Vedaldi, A.: Understanding image representations by measuring their equivariance and equivalence. In: CVPR, pp. 991–999 (2015)Google Scholar
  15. 15.
    Mahendran, A., Vedaldi, A.: Understanding deep image representations by inverting them. In: CVPR, pp. 5188–5196 (2015)Google Scholar
  16. 16.
    Mahendran, A., Vedaldi, A.: Salient deconvolutional networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 120–135. Springer, Cham (2016). Scholar
  17. 17.
    Montavon, G., Samek, W., Müller, K.R.: Methods for interpreting and understanding deep neural networks. Digital Sig. Process. 73, 1–15 (2018)MathSciNetCrossRefGoogle Scholar
  18. 18.
    Nguyen, A., Yosinski, J., Clune, J.: Deep neural networks are easily fooled: high confidence predictions for unrecognizable images. In: CVPR, pp. 427–436 (2015)Google Scholar
  19. 19.
    Petsiuk, V., Das, A., Saenko, K.: Rise: randomized input sampling for explanation of black-box models. In: BMVC (2018)Google Scholar
  20. 20.
    Ribeiro, M.T., Singh, S., Guestrin, C.: Why should I trust you?: explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1135–1144. ACM (2016)Google Scholar
  21. 21.
    Samek, W., Binder, A., Montavon, G., Lapuschkin, S., Müller, K.R.: Evaluating the visualization of what a deep neural network has learned. IEEE Trans. Neural Netw. Learn. Syst. 28(11), 2660–2673 (2017)MathSciNetCrossRefGoogle Scholar
  22. 22.
    Selvaraju, R.R., Das, A., Vedantam, R., Cogswell, M., Parikh, D., Batra, D.: Grad-cam: Why did you say that? visual explanations from deep networks via gradient-based localization. arXiv preprint arXiv:1610.02391 (2016)
  23. 23.
    Shrikumar, A., Greenside, P., Kundaje, A.: Learning important features through propagating activation differences. arXiv preprint arXiv:1704.02685 (2017)
  24. 24.
    Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: visualising image classification models and saliency maps. In: ICLR (2014)Google Scholar
  25. 25.
    Smilkov, D., Thorat, N., Kim, B., Viégas, F., Wattenberg, M.: Smoothgrad: removing noise by adding noise. arXiv preprint arxiv:1706.03825 (2017)
  26. 26.
    Springenberg, J.T., Dosovitskiy, A., Brox, T., Riedmiller, M.: Striving for simplicity: the all convolutional net. arXiv preprint arXiv:1412.6806 (2014)
  27. 27.
    Sundararajan, M., Taly, A., Yan, Q.: Axiomatic attribution for deep networks. In: ICML, pp. 3319–3328 (2017)Google Scholar
  28. 28.
    Szegedy, C., et al.: Going deeper with convolutions. In: CVPR, pp. 1–9 (2015)Google Scholar
  29. 29.
    Turner, R.: A model explanation system. In: IEEE MLSP, pp. 1–6 (2016)Google Scholar
  30. 30.
    Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 818–833. Springer, Cham (2014). Scholar
  31. 31.
    Zhang, J., Lin, Z., Brandt, J., Shen, X., Sclaroff, S.: Top-down neural attention by excitation backprop. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 543–559. Springer, Cham (2016). Scholar
  32. 32.
    Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Object detectors emerge in deep scene CNNs. arXiv preprint arXiv:1412.6856 (2014)
  33. 33.
    Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: CVPR, pp. 2921–2929 (2016)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.University of OxfordOxfordUK

Personalised recommendations