Solving the Black Box Problem: A Normative Framework for Explainable Artificial Intelligence


Many of the computing systems programmed using Machine Learning are opaque: it is difficult to know why they do what they do or how they work. Explainable Artificial Intelligence aims to develop analytic techniques that render opaque computing systems transparent, but lacks a normative framework with which to evaluate these techniques’ explanatory successes. The aim of the present discussion is to develop such a framework, paying particular attention to different stakeholders’ distinct explanatory requirements. Building on an analysis of “opacity” from philosophy of science, this framework is modeled after accounts of explanation in cognitive science. The framework distinguishes between the explanation-seeking questions that are likely to be asked by different stakeholders, and specifies the general ways in which these questions should be answered so as to allow these stakeholders to perform their roles in the Machine Learning ecosystem. By applying the normative framework to recently developed techniques such as input heatmapping, feature-detector visualization, and diagnostic classification, it is possible to determine whether and to what extent techniques from Explainable Artificial Intelligence can be used to render opaque computing systems transparent and, thus, whether they can be used to solve the Black Box Problem.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5


  1. 1.

    It is worth distinguishing two distinct streams within the Explainable AI research program. The present discussion focuses on attempts to solve the Black Box Problem by analyzing computing systems so as to render them transparent post hoc, i.e., after they have been developed. In contrast, the discussion will not consider efforts to avoid the Black Box Problem altogether, by modifying the relevant ML methods so that the computers being programmed do not become opaque in the first place (for discussion see, e.g., Doran et al. 2017).

  2. 2.

    This list is by no means comprehensive. Indeed, XAI is an incredibly dynamic field in which original and increasingly powerful techniques are being developed almost daily. Although not all relevant XAI techniques can be considered, the normative framework being developed here is meant to apply generally, including to unconsidered techniques and to techniques that do not yet exist.

  3. 3.

    Their ability to produce such solutions is perhaps the main reason for using ML methods in many different problem domains. That said, it is important to recognize that ML methods are by no means all-conquering. Many important AI problems remain unsolved, and in many cases, it is unclear whether, and if so how, ML methods could ever be used to solve them. Indeed, in many problem domains, traditional AI methods remain far more effective (for discussion see, e.g., Lake et al. 2017; Marcus 2018)

  4. 4.

    Indeed, although ML developers typically have access to a system’s learnable parameter values, they tend to be the first to call for more illuminating explanations that allow them to, for example, demonstrate that the system does in fact do what it is supposed to do (see Sect. 4 below) or to improve its behavior when it does not (see Sect. 5 below; Hohman et al. 2018).

  5. 5.

    Humphreys subsequently introduces the notion of “essential epistemic opacity,” which applies to systems whose epistemically relevant elements are not only unknown to the agent but are in fact impossible to know by that agent (Humphreys 2009, p. 618). Notably, Humphreys’ notion of possibility is not logical but practical—it depends on an agent’s limited time, money, and computational resources, among other things (Durán and Formanek 2018). For this reason, analytic techniques from XAI that allow an agent to use these resources more efficiently are poised to broaden the scope of the possible all while allowing that some systems may remain opaque even after the most powerful techniques have been applied. That said, the present discussion need not speculate about the scope of the possible. Rather, its aim is to show how systems that can be rendered transparent should be rendered transparent.

  6. 6.

    Applications that depend on ML-specific hardware components are considered in Sect. 5.

  7. 7.

    Different explanatory requirements might also stem from other differences, such as differences in background knowledge, abilities, and skills. Although these additional differences are undeniably interesting and relevant, the present discussion will not consider them further and will instead only focus on the differences that arise from stakeholders’ distinct roles in the ML ecosystem.

  8. 8.

    The present account is agnostic with respect to which kind of regularity or correlation actually obtains between inputs and outputs. In particular, whereas in some cases f' might be a mere statistical correlation, in others f' might pertain to a bona fide causal regularity. Although recent work has emphasized the advantages of ML-programmed systems that learn to track genuinely causal relationships (see, e.g., Wachter et al. 2018), much interesting work is also based on (possibly quite surprising) statistical correlations.

  9. 9.

    Although executors and examiners are equally motivated by answering why-questions, they may be concerned with why-questions of different scope. Whereas executors must decide on an individual case (“the applicant’s income is too low”), an examiner is more likely to be interested in a general tendency (“applicants with low income are rejected systematically”). Although both may be considered answers to why-questions insofar as the relevant EREs are features of the environment, the former is narrower in scope than the latter.

  10. 10.

    It is a point of contention to what extent the GDPR’s right to explanation constitutes a right at all and, if so, what it actually guarantees (Wachter et al. 2017). Although the former is a legal question that goes beyond the scope of the present discussion, the latter is a normative question that may benefit from the present discussion. In particular, Goodman and Flaxman (2016) argue that the GDPR grants data subjects (and decision subjects) the right to acquire “meaningful information about the logic involved.” However, what exactly is meant by “logic” in this context remains unclear. The present discussion implies that the relevant stakeholders should be primarily concerned with why-questions and that in this sense, the “logic” should be specified in terms of learned regularities between environmental factors. Moreover, the present discussion suggests that AI service providers may, for legal reasons, be compelled to deploy XAI techniques capable of answering why-questions.

  11. 11.

    Although machine vision may be the domain in which input heatmaps are most intuitive, they may also be used in other domains. For example, input heatmaps may be constructed for audio inputs, highlighting the moments within an audio recording that are most responsible for classifying that recording according to musical genre. Moreover, although LRP is specifically designed to work with artificial neural networks (i.e., it is model-specific), other methods can be used to generate input heatmaps for other kinds of systems. Thus, input heatmapping in general can be viewed as a general-purpose (i.e., model-agnostic) XAI technique for answering what- and why-questions.

  12. 12.

    Gwern Branwen maintains a helpful online resource on this particular example, listing different versions and assessing their probable veracity: (retrieved January 25, 2019).

  13. 13.

    The program that mediates between “input” and “output”—the learned program—must not be confused with the learning algorithm that is used to develop (i.e., to program) the system in the first place.

  14. 14.

    Intuitively, answers to how- and/or where-questions specify the EREs that are causally relevant to the behavior that is being explained. Although there are longstanding philosophical questions about the particular kinds of elements that may be considered causally relevant, the present focus on intervention suggests a maximally inclusive approach (see also Woodward, 2003).

  15. 15.

    Although there is a clear sense in which interventions can also be achieved by modifying a system’s inputs—a different sin will typically lead to a different sout—interventions on the mediating states, transitions, or realizers are likely to be far more wide-ranging and systematic.

  16. 16.

    Curiously, in such scenarios, a computing system’s hardware components become analogous to the “Black Box” voice recorders used on commercial airliners.

  17. 17.

    Strictly speaking, because the aim of the GANs in this study is not detection but generation, the relevant units might more appropriately be called feature generators.


  1. Bau, D., Zhu, J.-Y., Strobelt, H., Zhou, B., Tenenbaum, J. B., Freeman, W. T., & Torralba, A. (2018). GAN dissection: visualizing and understanding generative adversarial networks. ArXiv, 1811, 10597.

    Google Scholar 

  2. Bechtel, W., & Richardson, R. C. (1993). Discovering complexity: decomposition and localization as strategies in scientific research (MIT Press ed.). Cambridge, Mass: MIT Press.

    Google Scholar 

  3. Bickle, J. (2006). Reducing mind to molecular pathways: explicating the reductionism implicit in current cellular and molecular neuroscience. Synthese, 151(3), 411–434.

    Article  Google Scholar 

  4. Buckner, C. (2018). Empiricism without magic: transformational abstraction in deep convolutional neural networks. Synthese, 195(12), 5339–5372.

    Article  Google Scholar 

  5. Burrell, J. (2016). How the machine ‘thinks’: Understanding opacity in machine learning algorithms. Big Data & Society, 3(1), 205395171562251.

    Article  Google Scholar 

  6. Busemeyer, J. R., & Diederich, A. (2010). Cognitive modeling. Sage.

  7. Chemero, A. (2000). Anti-representationalism and the dynamical stance. Philosophy of Science.

  8. Churchland, P. M. (1981). Eliminative Materialism and the Propositional Attitudes. The Journal of Philosophy, 78(2), 67–90.

    Google Scholar 

  9. Clark, A. (1993). Associative engines: connectionism, concepts, and representational change. MIT Press.

  10. Dennett, D. C. (1987). The Intentional Stance. Cambridge, MA: MIT Press.

  11. Doran, D., Schulz, S., & Besold, T. R. (2017). What does explainable AI really mean? a new conceptualization of perspectives. ArXiv, 1710, 00794.

    Google Scholar 

  12. Durán, J. M., & Formanek, N. (2018). Grounds for trust: essential epistemic opacity and computational reliabilism. Minds and Machines, 28(4), 645–666.

    Article  Google Scholar 

  13. Elman, J. L. (1990). Finding structure in time. Cognitive Science, 14, 179–211.

    Article  Google Scholar 

  14. European Commission.(2016) Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation)

  15. Fodor, J. A. (1987). Psychosemantics. Cambrdige, MA: MIT Press.

    Google Scholar 

  16. Goodman, B., & Flaxman, S. (2016). European Union regulations on algorithmic decision-making and a" right to explanation". ArXiv, 1606, 08813.

    Google Scholar 

  17. Guidotti, R., Monreale, A., Ruggieri, S., Pedreschi, D., Turini, F., & Giannotti, F. (2018). Local rule-based explanations of Black Box Decision Systems. ArXiv, 1805, 10820.

    Google Scholar 

  18. Hohman, F. M., Kahng, M., Pienta, R., & Chau, D. H. (2018). Visual analytics in deep learning: an interrogative survey for the next frontiers. IEEE Transactions on Visualization and Computer Graphics.

  19. Humphreys, P. (2009). The philosophical novelty of computer simulation methods. Synthese, 169(3), 615–626.

    Article  Google Scholar 

  20. Hupkes, D., Veldhoen, S., & Zuidema, W. (2018). Visualisation and’diagnostic classifiers’ reveal how recurrent and recursive neural networks process hierarchical structure. Journal of Artificial Intelligence Research, 61, 907–926.

    Article  Google Scholar 

  21. Lake, B. M., Ullman, T. D., Tenenbaum, J. B., & Gershman, S. J. (2017). Building machines that learn and think like people. The Behavioral and Brain Sciences.

  22. Lipton, Z. C. (2016). The mythos of model interpretability. ArXiv, 1606, 03490.

    Google Scholar 

  23. Marcus, G. (2018). Deep learning: a critical appraisal. ArXiv, 1801, 00631.

    Google Scholar 

  24. Marr, D. (1982). Vision: a computational investigation into the human representation and processing of visual information. Cambridge, MA: MIT Press.

    Google Scholar 

  25. McClamrock, R. (1991). Marr’s three levels: a re-evaluation. Minds and Machines, 1(2), 185–196.

    Article  Google Scholar 

  26. Minsky, M. (ed) (1968). Semantic Information Processing. Cambridge, MA: MIT Press.

  27. Montavon, G., Samek, W., & Müller, K.-R. (2018). Methods for interpreting and understanding deep neural networks. Digital Signal Processing, 73, 1–15.

    Article  Google Scholar 

  28. Pfeiffer, M., & Pfeil, T. (2018). Deep learning with spiking neurons: opportunities and challenges. Frontiers in Neuroscience, 12.

  29. Piccinini, G., & Craver, C. F. (2011). Integrating psychology and neuroscience: functional analyses as mechanism sketches. Synthese, 183(3), 283–311.

    Article  Google Scholar 

  30. Pylyshyn, Z. W. (1984). Computation and cognition. Cambridge, MA: MIT Press.

    Google Scholar 

  31. Ramsey, W. (1997). Do connectionist representations earn their explanatory keep? Mind & Language, 12(1), 34–66.

    Article  Google Scholar 

  32. Ras, G., van Gerven, M., & Haselager, P. (2018). Explanation methods in deep learning: users, values, concerns and challenges. In Explainable and Interpretable Models in Computer Vision and Machine Learning (pp. 19–36). Springer.

  33. Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). Why should I trust you?: explaining the predictions of any classifier. ArXiv, 1602, 04938v3.

    Google Scholar 

  34. Rieder, G., & Simon, J. (2017). Big data: a new empiricism and its epistemic and socio-political consequences. In Berechenbarkeit der Welt? Philosophie und Wissenschaft im Zeitalter von Big Data (pp. 85–105). Wiesbaden: Springer VS.

    Google Scholar 

  35. Russell, S.J., Norvig, P. & Davis, E. (2010). Artificial Intelligence: A Modern Approach (3rd ed.). Upper Saddle River, NJ: Prentice Hall.

  36. Shagrir, O. (2010). Marr on computational-level theories. Philosophy of Science, 77(4), 477–500.

    Article  Google Scholar 

  37. Shallice, T., & Cooper, R. P. (2011). The organisation of mind. Oxford: Oxford University Press.

    Google Scholar 

  38. Smolensky, P. (1988). On the proper treatment of connectionism. Behavioral and Brain Sciences, 11(1), 1–23.

    Article  Google Scholar 

  39. Stinson, C. (2016). Mechanisms in psychology: ripping nature at its seams. Synthese, 193(5), 1585–1614.

    Article  Google Scholar 

  40. Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., & Fergus, R. (2013). Intriguing properties of neural networks. ArXiv, 1312, 6199.

    Google Scholar 

  41. Tomsett, R., Braines, D., Harborne, D., Preece, A., & Chakraborty, S. (2018). Interpretable to whom? a role-based model for analyzing interpretable machine learning systems. ArXiv, 1806, 07552.

    Google Scholar 

  42. Wachter, S., Mittelstadt, B., & Floridi, L. (2017). Why a right to explanation of automated decision-making does not exist in the general data protection regulation. International Data Privacy Law, 2017.

  43. Zednik, C. (2017). Mechanisms in cognitive science. In S. Glennan & P. Illari (Eds.), The Routledge Handbook of Mechanisms and Mechanical Philosophy (pp. 389–400). London: Routledge.

    Google Scholar 

  44. Zednik, C. (2018). Will machine learning yield machine intelligence? In Philosophy and Theory of Artificial Intelligence 2017.

    Google Scholar 

  45. Zerilli, J., Knott, A., Maclaurin, J., & Gavaghan, C. (2018). Transparency in algorithmic and human decision-making: is there a double standard? Philosophy & Technology.

    Article  Google Scholar 

Download references


This work was supported by the German Research Foundation (DFG project ZE 1062/4-1). The author would also like to thank Cameron Buckner and Christian Heine for written comments on earlier drafts. The initial impulse for this work came during discussions of the consortium on "Artificial Intelligence - Life Cycle Processes and Quality Requirements" at the German Institute for Standardization (DIN SPEC 92001). However, the final product is the work of the author.

Author information



Corresponding author

Correspondence to Carlos Zednik.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Zednik, C. Solving the Black Box Problem: A Normative Framework for Explainable Artificial Intelligence. Philos. Technol. (2019).

Download citation


  • Artificial intelligence
  • Black box problem
  • Epistemic opacity
  • Explainable artificial intelligence
  • Levels of analysis
  • Machine learning
  • Scientific explanation