Structuring Neural Networks for More Explainable Predictions

  • Laura RiegerEmail author
  • Pattarawat Chormai
  • Grégoire Montavon
  • Lars Kai Hansen
  • Klaus-Robert MüllerEmail author
Part of the The Springer Series on Challenges in Machine Learning book series (SSCML)


Machine learning algorithms such as neural networks are more useful, when their predictions can be explained, e.g. in terms of input variables. Often simpler models are more interpretable than more complex models with higher performance. In practice, one can choose a readily interpretable (possibly less predictive) model. Another solution is to directly explain the original, highly predictive model. In this chapter, we present a middle-ground approach where the original neural network architecture is modified parsimoniously in order to reduce common biases observed in the explanations. Our approach leads to explanations that better separate classes in feed-forward networks, and that also better identify relevant time steps in recurrent neural networks.


Interpretable machine learning Convolutional neural networks Recurrent neural networks 



This work was supported by the Brain Korea 21 Plus Program through the National Research Foundation of Korea; the Institute for Information & Communications Technology Promotion (IITP) grant funded by the Korea government [No. 2017-0-01779]; the Deutsche Forschungsgemeinschaft (DFG) [grant MU 987/17-1]; the German Ministry for Education and Research as Berlin Big Data Center (BBDC) [01IS14013A]; and the Innovation Foundation Denmark through the DABAI project. This publication only reflects the authors views. Funding agencies are not liable for any use that may be made of the information contained herein.


  1. Angermueller C, Pärnamaa T, Parts L, Stegle O (2016) Deep learning for computational biology. Molecular Systems Biology 12(7)CrossRefGoogle Scholar
  2. Arras L, Montavon G, Müller K, Samek W (2017) Explaining recurrent neural network predictions in sentiment analysis. In: Proceedings of the 8th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, WASSA@EMNLP 2017, Copenhagen, Denmark, September 8, 2017, pp 159–168Google Scholar
  3. Bach S, Binder A, Montavon G, Klauschen F, Müller KR, Samek W (2015) On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLOS ONE 10(7):e0130,140CrossRefGoogle Scholar
  4. Baehrens D, Schroeter T, Harmeling S, Kawanabe M, Hansen K, Müller K (2010) How to explain individual classification decisions. Journal of Machine Learning Research 11:1803–1831MathSciNetzbMATHGoogle Scholar
  5. Balduzzi D, Frean M, Leary L, Lewis JP, Ma KW, McWilliams B (2017) The shattered gradients problem: If resnets are the answer, then what is the question? In: Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6–11 August 2017, pp 342–350Google Scholar
  6. Bau D, Zhou B, Khosla A, Oliva A, Torralba A (2017) Network dissection: Quantifying interpretability of deep visual representations. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21–26, 2017, pp 3319–3327Google Scholar
  7. Bengio Y, Simard PY, Frasconi P (1994) Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Networks 5(2):157–166CrossRefGoogle Scholar
  8. Caruana R, Lou Y, Gehrke J, Koch P, Sturm M, Elhadad N (2015) Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, NSW, Australia, August 10–13, 2015, pp 1721–1730Google Scholar
  9. Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa PP (2011) Natural language processing (almost) from scratch. Journal of Machine Learning Research 12:2493–2537zbMATHGoogle Scholar
  10. Gevrey M, Dimopoulos I, Lek S (2003) Review and comparison of methods to study the contribution of variables in artificial neural network models. Ecological Modelling 160(3):249–264CrossRefGoogle Scholar
  11. Hihi SE, Bengio Y (1995) Hierarchical recurrent neural networks for long-term dependencies. In: Advances in Neural Information Processing Systems 8, NIPS, Denver, CO, November 27–30, 1995, pp 493–499Google Scholar
  12. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Computation 9(8):1735–1780CrossRefGoogle Scholar
  13. Jarrett K, Kavukcuoglu K, Ranzato M, LeCun Y (2009) What is the best multi-stage architecture for object recognition? In: IEEE 12th International Conference on Computer Vision, ICCV 2009, Kyoto, Japan, September 27 - October 4, 2009, pp 2146–2153Google Scholar
  14. Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. CoRR abs/1412.6980Google Scholar
  15. Krizhevsky A (2009) Learning Multiple Layers of Features from Tiny Images. Tech. rep., University of TorontoGoogle Scholar
  16. Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3–6, 2012, Lake Tahoe, Nevada, United States., pp 1106–1114Google Scholar
  17. Landecker W, Thomure MD, Bettencourt LMA, Mitchell M, Kenyon GT, Brumby SP (2013) Interpreting individual classifications of hierarchical networks. In: IEEE Symposium on Computational Intelligence and Data Mining, CIDM 2013, Singapore, 16–19 April, 2013, pp 32–38Google Scholar
  18. LeCun Y (1989) Generalization and network design strategies. In: Pfeifer R, Schreter Z, Fogelman F, Steels L (eds) Connectionism in perspective, ElsevierGoogle Scholar
  19. LeCun Y, Cortes C (2010) MNIST handwritten digit database.
  20. LeCun Y, Bottou L, Orr GB, Müller KR (2012) Efficient backprop. In: Neural networks: Tricks of the trade, Springer, pp 9–50Google Scholar
  21. Montavon G, Lapuschkin S, Binder A, Samek W, Müller K (2017) Explaining nonlinear classification decisions with deep Taylor decomposition. Pattern Recognition 65:211–222CrossRefGoogle Scholar
  22. Montavon G, Samek W, Müller K (2018) Methods for interpreting and understanding deep neural networks. Digital Signal Processing 73:1–15MathSciNetCrossRefGoogle Scholar
  23. Montúfar GF, Pascanu R, Cho K, Bengio Y (2014) On the number of linear regions of deep neural networks. In: Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8–13 2014, Montreal, Quebec, Canada, pp 2924–2932Google Scholar
  24. Pascanu R, Mikolov T, Bengio Y (2013) On the difficulty of training recurrent neural networks. In: Proceedings of the 30th International Conference on Machine Learning, ICML 2013, Atlanta, GA, USA, 16–21 June 2013, pp 1310–1318Google Scholar
  25. Poulin B, Eisner R, Szafron D, Lu P, Greiner R, Wishart DS, Fyshe A, Pearcy B, Macdonell C, Anvik J (2006) Visual explanation of evidence with additive classifiers. In: Proceedings, The Twenty-First National Conference on Artificial Intelligence and the Eighteenth Innovative Applications of Artificial Intelligence Conference, July 16–20, 2006, Boston, Massachusetts, USA, pp 1822–1829Google Scholar
  26. Rasmussen PM, Hansen LK, Madsen KH, Churchill NW, Strother SC (2012) Model sparsity and brain pattern interpretation of classification models in neuroimaging. Pattern Recognition 45(6):2085–2100CrossRefGoogle Scholar
  27. Ribeiro MT, Singh S, Guestrin C (2016) “Why should I trust you?”: Explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13–17, 2016, pp 1135–1144Google Scholar
  28. Samek W, Binder A, Montavon G, Lapuschkin S, Müller KR (2017) Evaluating the visualization of what a deep neural network has learned. IEEE transactions on neural networks and learning systems 28(11):2660–2673MathSciNetCrossRefGoogle Scholar
  29. Schütt KT, Arbabzadah F, Chmiela S, Müller KR, Tkatchenko A (2017) Quantum-chemical insights from deep tensor neural networks. Nature Communications 8:13,890CrossRefGoogle Scholar
  30. Shrikumar A, Greenside P, Kundaje A (2017) Learning important features through propagating activation differences. In: Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6–11 August 2017, pp 3145–3153Google Scholar
  31. Simonyan K, Vedaldi A, Zisserman A (2013) Deep inside convolutional networks: Visualising image classification models and saliency maps. CoRR abs/1312.6034Google Scholar
  32. Snyder JC, Rupp M, Hansen K, Müller KR, Burke K (2012) Finding density functionals with machine learning. Physical Review Letters 108(25)Google Scholar
  33. Snyder JC, Rupp M, Müller KR, Burke K (2015) Nonlinear gradient denoising: Finding accurate extrema from inaccurate functional derivatives. International Journal of Quantum Chemistry 115(16):1102–1114CrossRefGoogle Scholar
  34. Springenberg JT, Dosovitskiy A, Brox T, Riedmiller MA (2014) Striving for simplicity: The all convolutional net. CoRR abs/1412.6806Google Scholar
  35. Sutskever I, Martens J, Dahl GE, Hinton GE (2013) On the importance of initialization and momentum in deep learning. In: Proceedings of the 30th International Conference on Machine Learning, ICML 2013, Atlanta, GA, USA, 16–21 June 2013, pp 1139–1147Google Scholar
  36. Szegedy C, Liu W, Jia Y, Sermanet P, Reed SE, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7–12, 2015, pp 1–9Google Scholar
  37. Xiao H, Rasul K, Vollgraf R (2017) Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. CoRR abs/1708.07747Google Scholar
  38. Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: Computer Vision - ECCV 2014 - 13th European Conference, Zurich, Switzerland, September 6–12, 2014, Proceedings, Part I, pp 818–833Google Scholar
  39. Zhang J, Lin ZL, Brandt J, Shen X, Sclaroff S (2016) Top-down neural attention by excitation backprop. In: Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV, pp 543–559CrossRefGoogle Scholar
  40. Zhou B, Khosla A, Lapedriza À, Oliva A, Torralba A (2016) Learning deep features for discriminative localization. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27–30, 2016, pp 2921–2929Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Laura Rieger
    • 1
    Email author
  • Pattarawat Chormai
    • 2
  • Grégoire Montavon
    • 2
  • Lars Kai Hansen
    • 1
  • Klaus-Robert Müller
    • 2
    • 3
    • 4
    Email author
  1. 1.DTU ComputeTechnical University of DenmarkKongens LyngbyDenmark
  2. 2.Department of Electrical Engineering and Computer ScienceTechnische Universität BerlinBerlinGermany
  3. 3.Department of Brain and Cognitive EngineeringKorea UniversitySeongbuk-gu, SeoulSouth Korea
  4. 4.Max Planck Institute for InformaticsSaarbrückenGermany

Personalised recommendations