Explanation Methods in Deep Learning: Users, Values, Concerns and Challenges

  • Gabriëlle RasEmail author
  • Marcel van Gerven
  • Pim Haselager
Part of the The Springer Series on Challenges in Machine Learning book series (SSCML)


Issues regarding explainable AI involve four components: users, laws and regulations, explanations and algorithms. Together these components provide a context in which explanation methods can be evaluated regarding their adequacy. The goal of this chapter is to bridge the gap between expert users and lay users. Different kinds of users are identified and their concerns revealed, relevant statements from the General Data Protection Regulation are analyzed in the context of Deep Neural Networks (DNNs), a taxonomy for the classification of existing explanation methods is introduced, and finally, the various classes of explanation methods are analyzed to verify if user concerns are justified. Overall, it is clear that (visual) explanations can be given about various aspects of the influence of the input on the output. However, it is noted that explanation methods or interfaces for lay users are missing and we speculate which criteria these methods/interfaces should satisfy. Finally it is noted that two important concerns are difficult to address with explanation methods: the concern about bias in datasets that leads to biased DNNs, as well as the suspicion about unfair outcomes.


Explanation methods Explainable AI Interpretability Deep neural networks Artificial intelligence 


  1. Adler, P., Falk, C., Friedler, S. A., Rybeck, G., Scheidegger, C., Smith, B., and Venkatasubramanian, S. (2016). Auditing black-box models for indirect influence. In 2016 IEEE 16th International Conference on Data Mining (ICDM). IEEE.Google Scholar
  2. Ancona, M., Ceolini, E., Oztireli, C., and Gross, M. (2018). Towards better understanding of gradient-based attribution methods for deep neural networks. In 6th International Conference on Learning Representations (ICLR 2018).Google Scholar
  3. Andrews, R., Diederich, J., and Tickle, A. B. (1995). Survey and critique of techniques for extracting rules from trained artificial neural networks. Knowledge-Based Systems, 8(6):373–389.CrossRefGoogle Scholar
  4. Arbatli, A. D. and Akin, H. L. (1997). Rule extraction from trained neural networks using genetic algorithms. Nonlinear Analysis: Theory, Methods & Applications, 30(3):1639–1648.CrossRefGoogle Scholar
  5. Bach, S., Binder, A., Montavon, G., Klauschen, F., Müller, K.-R., and Samek, W. (2015). On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLOS ONE, 10(7).CrossRefGoogle Scholar
  6. Baehrens, D., Schroeter, T., Harmeling, S., Kawanabe, M., Hansen, K., and Müller, K.-R. (2010). How to explain individual classification decisions. Journal of Machine Learning Research (JMLR), 11:1803–1831.MathSciNetzbMATHGoogle Scholar
  7. Barocas, S. and Selbst, A. D. (2016). Big data’s disparate impact. Cal. L. Rev., 104:671.Google Scholar
  8. Binder, A., Bach, S., Montavon, G., Müller, K.-R., and Samek, W. (2016). Layer-wise relevance propagation for deep neural network architectures. In Information Science and Applications (ICISA) 2016, pages 913–922. Springer.Google Scholar
  9. Bojarski, M., Yeres, P., Choromanska, A., Choromanski, K., Firner, B., Jackel, L., and Muller, U. (2017). Explaining how a deep neural network trained with end-to-end learning steers a car. arXiv preprint arXiv:1704.07911.Google Scholar
  10. Caliskan, A., Bryson, J. J., and Narayanan, A. (2017). Semantics derived automatically from language corpora contain human-like biases. Science, 356(6334):183–186.CrossRefGoogle Scholar
  11. Carlini, N. and Wagner, D. (2018). Audio adversarial examples: Targeted attacks on speech-to-text. arXiv preprint arXiv:1801.01944.Google Scholar
  12. Chiappa, S. and Gillam, T. P. (2018). Path-specific counterfactual fairness. arXiv preprint arXiv:1802.08139.Google Scholar
  13. Craven, M. W. and Shavlik, J. W. (1994). Using sampling and queries to extract rules from trained neural networks. In Machine Learning Proceedings 1994, pages 37–45. Elsevier.Google Scholar
  14. Cubuk, E. D., Zoph, B., Schoenholz, S. S., and Le, Q. V. (2017). Intriguing properties of adversarial examples. arXiv preprint arXiv:1711.02846.Google Scholar
  15. Danks, D. and London, A. J. (2017). Algorithmic bias in autonomous systems. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI), pages 4691–4697. AAAI Press.Google Scholar
  16. Dong, Y., Su, H., Zhu, J., and Bao, F. (2017a). Towards interpretable deep neural networks by leveraging adversarial examples. arXiv preprint arXiv:1708.05493.Google Scholar
  17. Dong, Y., Su, H., Zhu, J., and Zhang, B. (2017b). Improving interpretability of deep neural networks with semantic information. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE.Google Scholar
  18. Doran, D., Schulz, S., and Besold, T. R. (2017). What does explainable AI really mean? a new conceptualization of perspectives. arXiv preprint arXiv:1710.00794.Google Scholar
  19. Doshi-Velez, F. and Kim, B. (2017). Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608.Google Scholar
  20. Doshi-Velez, F., Kortz, M., Budish, R., Bavitz, C., Gershman, S. J., O’Brien, D., Shieber, S., Waldo, J., Weinberger, D., and Wood, A. (2017). Accountability of AI under the law: The role of explanation. SSRN Electronic Journal.Google Scholar
  21. Elman, J. L. (1989). Representation and structure in connectionist models. Technical report.Google Scholar
  22. Erhan, D., Bengio, Y., Courville, A., and Vincent, P. (2009). Visualizing higher-layer features of a deep network. University of Montreal, 1341:3.Google Scholar
  23. Floridi, L., Fresco, N., and Primiero, G. (2015). On malfunctioning software. Synthese, 192(4):1199–1220.CrossRefGoogle Scholar
  24. Fong, R. C. and Vedaldi, A. (2017). Interpretable explanations of black boxes by meaningful perturbation. In 2017 IEEE International Conference on Computer Vision (ICCV). IEEE.Google Scholar
  25. Goudet, O., Kalainathan, D., Caillou, P., Lopez-Paz, D., Guyon, I., Sebag, M., Tritas, A., and Tubaro, P. (2017). Learning functional causal models with generative neural networks. arXiv preprint arXiv:1709.05321.Google Scholar
  26. Grün, F., Rupprecht, C., Navab, N., and Tombari, F. (2016). A taxonomy and library for visualizing learned features in convolutional neural networks. arXiv preprint arXiv:1606.07757.Google Scholar
  27. Guçlütürk, Y., Güçlü, U., Perez, M., Jair Escalante, H., Baro, X., Guyon, I., Andujar, C., Jacques Junior, J., Madadi, M., Escalera, S., van Gerven, M. A. J., and van Lier, R. (2017). Visualizing apparent personality analysis with deep residual networks. In 2017 IEEE International Conference on Computer Vision Workshops (ICCVW), pages 3101–3109.CrossRefGoogle Scholar
  28. Gunning, D. (2017). Explainable artificial intelligence (XAI). Defense Advanced Research Projects Agency (DARPA).Google Scholar
  29. Hall, P., Phan, W., and Ambati, S. (2017). Ideas on interpreting machine learning. Available online at:
  30. Holzinger, A., Biemann, C., Pattichis, C. S., and Kell, D. B. (2017a). What do we need to build explainable AI systems for the medical domain? arXiv preprint arXiv:1712.09923.Google Scholar
  31. Holzinger, A., Plass, M., Holzinger, K., Crişan, G. C., Pintea, C.-M., and Palade, V. (2017b). A glass-box interactive machine learning approach for solving NP-hard problems with the human-in-the-loop. arXiv preprint arXiv:1708.01104.Google Scholar
  32. Kindermans, P.-J., Hooker, S., Adebayo, J., Alber, M., Schütt, K. T., Dähne, S., Erhan, D., and Kim, B. (2017). The (un) reliability of saliency methods. arXiv preprint arXiv:1711.00867.Google Scholar
  33. Kindermans, P.-J., Schütt, K. T., Müller, K.-R., and Dähne, S. (2016). Investigating the influence of noise and distractors on the interpretation of neural networks. arXiv preprint arXiv:1611.07270.Google Scholar
  34. Koh, P. W. and Liang, P. (2017). Understanding black-box predictions via influence functions. In Proceedings of the 34th International Conference on Machine Learning (ICML), volume 70 of Proceedings of Machine Learning Research (PMLR), pages 1885–1894.Google Scholar
  35. Lakkaraju, H., Kamar, E., Caruana, R., and Leskovec, J. (2017). Interpretable & explorable approximations of black box models. arXiv preprint arXiv:1707.01154.Google Scholar
  36. Lee, H., Tajmir, S., Lee, J., Zissen, M., Yeshiwas, B. A., Alkasab, T. K., Choy, G., and Do, S. (2017). Fully automated deep learning system for bone age assessment. Journal of Digital Imaging (JDI), 30(4):427–441.CrossRefGoogle Scholar
  37. Li, J., Monroe, W., and Jurafsky, D. (2016). Understanding neural networks through representation erasure. arXiv preprint arXiv:1612.08220.Google Scholar
  38. Li, X., Wu, T., Song, X., and Krim, H. (2017). AOGNets: Deep AND-OR grammar networks for visual recognition. arXiv preprint arXiv:1711.05847.Google Scholar
  39. Lin, Y.-C., Liu, M.-Y., Sun, M., and Huang, J.-B. (2017). Detecting adversarial attacks on neural network policies with visual foresight. arXiv preprint arXiv:1710.00814.Google Scholar
  40. Lockett, A., Jefferies, T., Etheridge, N., and Brewer, A. White paper tag predictions: How DISCO AI is bringing deep learning to legal technology. Available online at:
  41. Louizos, C., Shalit, U., Mooij, J. M., Sontag, D., Zemel, R., and Welling, M. (2017). Causal effect inference with deep latent-variable models. In Advances in Neural Information Processing Systems 30 (NIPS), pages 6446–6456.Google Scholar
  42. Lu, J., Tokinaga, S., and Ikeda, Y. (2006). Explanatory rule extraction based on the trained neural network and the genetic programming. Journal of the Operations Research Society of Japan (JORSJ), 49(1):66–82.MathSciNetCrossRefGoogle Scholar
  43. Marcus, G. (2018). Deep learning: A critical appraisal. arXiv preprint arXiv:1801.00631.Google Scholar
  44. Markowska-Kaczmar, U. and Wnuk-Lipiński, P. (2004). Rule extraction from neural network by genetic algorithm with pareto optimization. Artificial Intelligence and Soft Computing-ICAISC 2004, pages 450–455.zbMATHGoogle Scholar
  45. Mittelstadt, B. D., Allo, P., Taddeo, M., Wachter, S., and Floridi, L. (2016). The ethics of algorithms: Mapping the debate. Big Data & Society, 3(2).CrossRefGoogle Scholar
  46. Montavon, G., Samek, W., and Müller, K.-R. (2018). Methods for interpreting and understanding deep neural networks. Digital Signal Processing, 73:1–15.MathSciNetCrossRefGoogle Scholar
  47. Murdoch, W. J., Liu, P. J., and Yu, B. (2018). Beyond word importance: Contextual decomposition to extract interactions from LSTMs. In International Conference on Learning Representations (ICLR).Google Scholar
  48. Murdoch, W. J. and Szlam, A. (2017). Automatic rule extraction from long short term memory networks. In International Conference on Learning Representations (ICLR).Google Scholar
  49. Olah, C., Mordvintsev, A., and Schubert, L. (2017). Feature visualization. Distill. Available online at:
  50. Olah, C., Satyanarayan, A., Johnson, I., Carter, S., Schubert, L., Ye, K., and Mordvintsev, A. (2018). The building blocks of interpretability. Distill. Available online at:
  51. Palm, R. B., Paquet, U., and Winther, O. (2017). Recurrent relational networks for complex relational reasoning. arXiv preprint arXiv:1711.08028.Google Scholar
  52. Papernot, N., McDaniel, P., Goodfellow, I., Jha, S., Celik, Z. B., and Swami, A. (2017). Practical black-box attacks against machine learning. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security (ASIA CCS ’17), pages 506–519.Google Scholar
  53. Rawat, A., Wistuba, M., and Nicolae, M.-I. (2017). Adversarial phenomenon in the eyes of Bayesian deep learning. arXiv preprint arXiv:1711.08244.Google Scholar
  54. Ribeiro, M. T., Singh, S., and Guestrin, C. (2016a). Nothing else matters: Model-agnostic explanations by identifying prediction invariance. In NIPS Workshop on Interpretable Machine Learning in Complex Systems.Google Scholar
  55. Ribeiro, M. T., Singh, S., and Guestrin, C. (2016b). “Why should I trust you?” Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’16), pages 1135–1144.CrossRefGoogle Scholar
  56. Robnik-Šikonja, M. and Kononenko, I. (2008). Explaining classifications for individual instances. IEEE Transactions on Knowledge and Data Engineering, 20(5):589–600.CrossRefGoogle Scholar
  57. Samek, W., Wiegand, T., and Müller, K.-R. (2017). Explainable artificial intelligence: Understanding, visualizing and interpreting deep learning models. arXiv preprint arXiv:1708.08296.Google Scholar
  58. Santoro, A., Raposo, D., Barrett, D. G., Malinowski, M., Pascanu, R., Battaglia, P., and Lillicrap, T. (2017). A simple neural network module for relational reasoning. arXiv preprint arXiv:1706.01427.Google Scholar
  59. Seifert, C., Aamir, A., Balagopalan, A., Jain, D., Sharma, A., Grottel, S., and Gumhold, S. (2017). Visualizations of deep neural networks in computer vision: A survey. In Transparent Data Mining for Big and Small Data, pages 123–144. Springer.Google Scholar
  60. Shrikumar, A., Greenside, P., and Kundaje, A. (2017). Learning important features through propagating activation differences. In Proceedings of the 34th International Conference on Machine Learning (ICML), volume 70 of Proceedings of Machine Learning Research (PMLR).Google Scholar
  61. Simonyan, K., Vedaldi, A., and Zisserman, A. (2013). Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034.Google Scholar
  62. Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., and Fergus, R. (2013). Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199.Google Scholar
  63. von Neumann, J. and Morgenstern, O. (1953). Theory of Games and Economic Behavior. Princeton University Press, Princeton, NJ, 3rd edition.Google Scholar
  64. Wachter, S., Mittelstadt, B., and Floridi, L. (2017). Transparent, explainable, and accountable AI for robotics. Science Robotics, 2(6).CrossRefGoogle Scholar
  65. Weller, A. (2017). Challenges for transparency. Workshop on Human Interpretability in Machine Learning – ICML 2017.Google Scholar
  66. Wu, T., Li, X., Song, X., Sun, W., Dong, L., and Li, B. (2017). Interpretable R-CNN. arXiv preprint arXiv:1711.05226.Google Scholar
  67. Zeiler, M. D. and Fergus, R. (2014). Visualizing and understanding convolutional networks. In European Conference on Computer Vision (ECCV), pages 818–833. Springer.Google Scholar
  68. Zeiler, M. D., Krishnan, D., Taylor, G. W., and Fergus, R. (2010). Deconvolutional networks. In 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2528–2535. IEEE.Google Scholar
  69. Zeng, H. (2016). Towards better understanding of deep learning with visualization.Google Scholar
  70. Zilke, J. R., Mencía, E. L., and Janssen, F. (2016). DeepRED – Rule extraction from deep neural networks. In International Conference on Discovery Science (ICDS), pages 457–473. Springer.Google Scholar
  71. Zintgraf, L. M., Cohen, T. S., Adel, T., and Welling, M. (2017). Visualizing deep neural network decisions: Prediction difference analysis. In International Conference on Learning Representations (ICLR).Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Gabriëlle Ras
    • 1
  • Marcel van Gerven
    • 1
  • Pim Haselager
    • 1
  1. 1.Radboud UniversityNijmegenThe Netherlands

Personalised recommendations