Abstract
Recent advances in deep learning have made tremendous progress in the adoption of neural network models for tasks from resource utilization to autonomous driving. Most deep learning models are opaque black-box models that are not easily explainable. Unlike linear models, the weights of a neural network are not inherently interpretable to humans. The need for explainable deep learning has led to the development of a variety of methods that can help us better understand the decisions and decision-making process of neural network models. We note that many of the general post-hoc model-agnostic methods presented in Chap. 5 are applicable to deep learning models. This chapter presents a collection of explanation approaches that are specifically developed for neural networks by leveraging architecture or learning method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
D. Alvarez-Melis, T.S. Jaakkola, Towards robust interpretability with self-explaining neural networks (2018). arXiv:1806.07538 [cs.LG]
S. Bach et al., On pixel-wise explanations for non-linear classifier decisions by layerwise relevance propagation. PLOS ONE 10(7), 1–46 (2015). https://doi.org/10.1371/journal.pone.0130140.9
A. Binder et al., Layer-wise relevance propagation for deep neural network architectures, in Information Science and Applications (ICISA) 2016, ed. by K.J. Kim, N. Joukov, vol. 376. Lecture Notes in Electrical Engineering (Springer Singapore, Singapore, 2016), pp. 913–922. ISBN:978-981-10-0557-2
C. Chen et al., This looks like that: Deep learning for interpretable image recognition (2019). arXiv:1806.10574 [cs.LG]
H. Chen, S. Lundberg, S.-I. Lee, Explaining models by propagating Shapley values of local components (2019). arXiv:1911.11888 [cs.LG]
Y. Dong et al., Improving interpretability of deep neural networks with semantic information (2017). arXiv:1703.04096 [cs.CV]
D. Erhan, A. Courville, Y. Bengio, Understanding Representations Learned in Deep Architectures. Tech. rep. 1355. Université de Montréal/DIRO (Oct. 2010)
R.C. Fong, A. Vedaldi, Interpretable explanations of black boxes by meaningful perturbation, in 2017 IEEE International Conference on Computer Vision (ICCV) (Oct. 2017). https://doi.org/10.1109/iccv.2017.371. http://dx.doi.org/10.1109/ICCV.2017.371
Y. Goyal et al., Explaining classifiers with causal concept effect (CaCE) (2020). arXiv:1907.07165 [cs.LG]
G. Hinton, O. Vinyals, J. Dean, Distilling the knowledge in a neural network (2015). arXiv:1503.02531 [stat.ML]
R. Iyer et al., Transparency and explanation in deep reinforcement learning neural networks (2018). arXiv:1809.06061 [cs.LG]
B. Kim et al., Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (TCAV), in ICML, ed. by J.G. Dy, A. Krause, vol. 80. Proceedings of Machine Learning Research (PMLR, 2018), pp. 2673–2682
T. Lei, R. Barzilay, T. Jaakkola, Rationalizing neural predictions. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (Association for Computational Linguistics, Austin, Texas, 2016), pp. 107–117. https://doi.org/10.18653/v1/D16-1011. https://www.aclweb.org/anthology/D16-1011
J. Li, W. Monroe, D. Jurafsky, Understanding neural networks through representation erasure (2017). arXiv:1612.08220 [cs.CL]
O. Li et al., Deep learning for case-based reasoning through prototypes: a neural network that explains its predictions (2017). arXiv:1710.04806 [cs.AI]
H. Liu, Q. Yin, W.Y. Wang, Towards explainable NLP: A generative explanation framework for text classification (2019). arXiv:1811.00196 [cs.CL]
G. Montavon et al., Layer-wise relevance propagation: An overview. Explainable AI (2019)
V. Petsiuk, A. Das, K. Saenko, RISE: Randomized input sampling for explanation of black-box models (2018). arXiv:1806.07421 [cs.CV]
R.R. Selvaraju et al., Grad-CAM: Visual explanations from deep networks via gradient-based localization. Int. J. Comput. Vis. 128(2), 336–359 (2019). ISSN: 1573-1405. http://doi.org/10.1007/s11263-019-01228-7
S. Serrano, N.A. Smith, Is attention interpretable? (2019). arXiv:1906.03731 [cs.CL]
A. Shrikumar, P. Greenside, A. Kundaje, Learning important features through propagating activation differences (2019). arXiv:1704.02685 [cs.CV]
K. Simonyan, A. Vedaldi, A. Zisserman, Deep inside convolutional networks: Visualising image classification models and saliency maps (2014). arXiv:1312.6034 [cs.CV]
J.T. Springenberg et al., Striving for simplicity: The all convolutional net (2015). arXiv:1412.6806 [cs.LG]
M. Sundararajan, A. Taly, Q. Yan, Axiomatic attribution for deep networks (2017). arXiv:1703.01365 [cs.LG]
A.M. Turing, Computers & amp; thought (MIT Press, 1995), pp. 11–35. Chap. Computing Machinery and Intelligence
M.D. Zeiler, G.W. Taylor, R. Fergus, Adaptive deconvolutional networks for mid and high level feature learning, in 2011 International Conference on Computer Vision (2011), pp. 2018–2025. https://doi.org/10.1109/ICCV.2011.6126474
M.D. Zeiler, R. Fergus, Visualizing and understanding convolutional networks (2013). arXiv:1311.2901 [cs.CV]
R. Zellers et al., From recognition to cognition: Visual commonsense reasoning, in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019, pp. 6713–6724. https://doi.org/10.1109/CVPR.2019.00688
J. Zhang et al., Top-down neural attention by excitation backprop (2016). arXiv:1608.00507 [cs.CV]
B. Zhou et al., Learning deep features for discriminative localization (2015). arXiv:1512.04150 [cs.CV]
B. Zhou et al., Object detectors emerge in deep scene CNNs (2015). arXiv:1412.6856 [cs.CV]
L.M. Zintgraf et al., Visualizing deep neural network decisions: Prediction difference analysis (2017). arXiv:1702.04595 [cs.CV]
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Kamath, U., Liu, J. (2021). Explainable Deep Learning. In: Explainable Artificial Intelligence: An Introduction to Interpretable Machine Learning. Springer, Cham. https://doi.org/10.1007/978-3-030-83356-5_6
Download citation
DOI: https://doi.org/10.1007/978-3-030-83356-5_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-83355-8
Online ISBN: 978-3-030-83356-5
eBook Packages: Computer ScienceComputer Science (R0)