1 Introduction

The last decade saw a sharp increase in research papers concerning interpretability for Artificial Intelligence (AI), also referred to as eXplainable AI (XAI). In 2020, the number of papers containing “interpretable AI", “explainable AI", “XAI", “explainability", or “interpretability" has increased to more than three times that of 2010, following the trend shown in Fig. 1.

Fig. 1
figure 1

Trends of the publications containing “interpretable AI" or “explainable AI" as keywords

Being applied to an increasingly large number of applications and domains, AI solutions mostly divide into the two approaches illustrated in Fig. 2. On the one side, we have Symbolic AI, symbolic reasoning on knowledge bases as an important element of automated intelligent agents, which reflect the humans’ social constructs into the virtual world (Russell and Norvig 2002). To communicate intuitions and results, humans (henceforth agents) tend to construct and share rational explanations, which are means to match intuitive and analytical cognition (Omicini 2020). On the other side, Machine Learning (ML) and Deep Learning (DL) models reach high performance by learning from the data and through experience. The complexity of the tasks in both approaches has increased over time, together with the complexity of the models being used and their opacity. A rising interest in interpretability came with the increasing opacity of the systems and with the frequent adoption of "black-box" methods such as DL, as documented by multiple studies (Miller 2019; Lipton 2018; Tjoa and Guan 2020; Murdoch et al. 2019; Chromik and Schuessler 2020; Arrieta et al. 2020; Adadi and Berrada 2018; Rudin 2019; Arya et al. 2019; Mittelstadt et al. 2019).

Fig. 2
figure 2

Graphical representation of Artificial Intelligence, Machine Learning, and Deep Learning adapted from https://www.intel.com

A strong condition to ensure the reliable use of AI is improving the understanding of its internal mechanics, particularly when complex DL models are deployed. As the previous studies on interpretability point out, understanding the decision-making of an AI system is a non-trivial task that spans over three areas, namely understanding the task, the performance metric used by the model and the type of experience being used. With the intent of improving the interpretability within these three areas, a large number of requirements, tools and techniques have been developed in different application fields, leading to inconsistent use of the terminology. Interpretability is often confused with more abstract notions of fairness, privacy and transparency (Weller 2019). These terms do not have a clear and unique definition and the understanding of these terms may differ depending on the domain and context. Similarly, the words interpretable and explainable have been used interchangeably in some articles (Miller 2019; Lipton 2018), while others use a strong distinction between the two terms (Rudin 2019). Undoubtedly, there is a link between the act of interpreting and that of explaining, as shown by the etymology of the words themselves (that we report in Table 3). Interpretability has been presented as “explaining or presenting in understandable terms to a human", “providing explanations" to humans (Miller 2019) and “assigning meaning to an explanation" (Palacio et al. 2021). For (Rudin 2019), however, there is a strong distinction between interpreting and explaining since models may be developed to directly encompass the ability to explain their decision-making. In this case, interpretability refers to meeting the transparency requirement at the task definition level, whereas explanation refers to a post-hoc (after training) evaluation of the model understandability.

The different perspectives about the technical terminology are discussed in several papers within the specific context of explainable AI and ML design, finding difficult integration within the other domains that are driving and shaping AI development. Policies for funding and regulating AI research also refer to concepts such as transparency, explicability, reliability, informed consent, accountability, and auditability of the systems (Bibal et al. 2020, 2020; Edwards and Veale 2017). Clarifying what these terms refer to and unifying the social and technical perspectives on these aspects is fundamental to determine directions for progress and to encourage cross-disciplinary discussion and interaction on AI developments. Fields that analyzed the impact of technologies over the centuries such as cognitive sciences, sociology, philosophy and ethics constitute invaluable resources of knowledge from which it is possible to evaluate and understand how human trust evolves over time and how it can be built to motivate the adoption of new technologies. If the use of a global terminology is adopted by these disciplines, then a broader range of possibilities can open, encouraging the design of interpretability tools that are not only useful and understandable to ML developers but to a wider audience ranging from the final decision-maker to anyone affected by this decision (Tonekaboni et al. 2019).

The contributions of this paper are the following: (i) we collect the perspectives on the interpretable AI terminology from a large number of experts, reporting the results of the interdisciplinary collaboration with 8 disciplines in the social and technical sciences; (ii) we propose a taxonomy and interdisciplinary definitions for interpretability and interpretable AI that can be used in multiple contexts; (iii) we propose the study of a use case in the medical field to demonstrate the relevance of unifying perspectives and adopting a common terminology.

2 Related work

Several papers in the literature proposed a taxonomy of interpretable AI. Table 1 reviews in chronological order the numerous definitions that were given in the ML literature for interpretable, explainable, transparent, decomposable and intelligible. While trying to be as complete as possible, we clarify that this table is not exhaustive. We excluded from this review the papers that defined the taxonomy for developing a single technique. Discordance can be noticed on the meaning assigned to the terms by the papers in this collection, with major dividing points emerging on the words: (i) interpretable and explainable; (ii) transparency and decomposability ; (iii) intelligible and interpretable;

The terms interpretable and explainable are equated, for example, by several researchers (Miller 2019; Adadi and Berrada 2018; Arya et al. 2019; Clinciu and Hastie 2019; Murdoch et al. 2019; Vered et al. 2020). An even broader number of papers describes a clear distinction between these two terms (Rudin 2019; Lipton 2018; Biran and Cotton 2017; Montavon et al. 2018; Mittelstadt et al. 2019; Chromik and Schuessler 2020; Arrieta et al. 2020; Palacio et al. 2021), suggesting that a distinction between these two terms is more popular among researchers. As for interpretability, multiple definitions exists also within the context of explainability, for which we refer the reader to the systematic review by Vilone and Longo (2020). The work by Arrieta et al. (2020), for instance, distinguishes interpretability from explainability, which is defined as a human-understandable interface that exists between the user and the system. Transparency is used in multiple papers with the meaning described by Lipton (2018) of model decomposability (Lipton 2018; Clinciu and Hastie 2019; Chromik and Schuessler 2020). In other papers, this term is used as a synonym for interpretability (Murdoch et al. 2019; Arrieta et al. 2020) or for functional understanding of the model (Mittelstadt et al. 2019). Rudin et al. (2019) define transparency as models with particular properties such as monotonicity since these models are transparent in the way their variables are jointly related. Finally, the concept of intelligible model equated to that of an inherently interpretable model in Arya et al. (2019), while it is used meaning the introduction of interpretability constraints in the model design in Clinciu and Hastie (2019); Montavon et al. (2018).

None of the papers in Table 1 considers the taxonomy used by policymakers, regulators, philosophers and sociologists discussing the impact of AI on society and on the research community. The perspectives in this paper are discussed by experts in AI development and familiarity with ML. As a consequence, different definitions are used in social sciences. This paper reviews the existing definitions and gathers the perspectives from a multidisciplinary pool of experts to provide a taxonomy that can be used in multiple domains in a unique way that adapts to both the social and the technical sciences.

Table 1 Multiple taxonomies-part 1
Table 2 Multiple taxonomies-part 2

3 Methods

A round table public meeting was held online on April 29th, 2021 on “A Global Taxonomy for Interpretable AI"Footnote 1. Endorsed by the AI4Media project within the European Union’s Horizon 2020 for research and innovation plan, this event was organized to bring together researchers from multidisciplinary backgrounds to collaborate on a global definition of interpretability that may be used with high versatility in the documentation of social, cognitive, philosophical, ethical and legal concerns about AI. A total of 18 experts were invited to participate in the event. The selection of the experts was tailored to obtain the most representative consortium of the fields dealing with Interpretable AI at the moment. The final pool of experts involved in this work also depended on the experts’ interests and their availability but the selection was by no means at all made in such a way to steer the discussion in the direction of a pre-agreed consensus. The experts were both internal members of the AI4media project and external non-affiliated members. The external experts were invited so as to obtain a balanced perspective on the topic that went beyond the purpose of the project itself. For each of the discussed disciplines, at least one external expert was included in the discussion. The selection was done based on the previous publication records on interpretable AI and on the reported interest and availability to participate in the study. In addition, attention was given to the inclusiveness in terms of gender and ethnicity of the experts. The experts represent institutions from eight different countries (of which two are non-european) and span from academia to industry and healthcare professionals.

The workshop was organized in two sessions, consisting of a round table discussion and a panel session with a question and answer format. The first session consisted of seven short talks of 12 minutes followed by 3 minutes for questions. The second session involved a panel of five experts discussing questions from the audience concerning the role and implications of AI and transparency. The workshop was streamed on YouTubeFootnote 2 and spectators were able to interact with the audience through a live chat.

The round table resulted in a solid basis for the work reported in this paper and steered further discussion and proposed future research directions. We hope that this work may constitute a first solid step towards finding a global consensus on the taxonomy for interpretable AI for both the social and the technical sciences.

4 Results

4.1 Etymology and existing definitions

Table 3 analyzes the etymology of frequently used words in the context of interpretable AI. Looking at the historical formation and the original meaning of a word can shed light on its roots and history, deepening the understanding of its meaning and the context in which it should be used. The word clue, for example, gains meaning from its intrinsic referral to Greek mythology. It originates from the Germanic word clew that indicates a ball of thread or yarn. Theseus used a clue of thread to find the exit of the Labyrinth. When people say “give me a clue", they refer to some helpful information and not the ball of yarn itself. Understanding the etymology of the words in the AI interpretability terminology can help in a similar way to better understand the meaning of each term and why one word is more appropriate than another in specific contexts.

Table 3 Analysis of the etymology of the terms related to interpretability

Figure 3 illustrates how some of the terms defined in Table 3 (such as intelligible, transparent, explainable, accountable, auditable and reliable) slightly change their meaning depending on the context, acquiring multiple shades and connotations as they interact with the different domains. This analysis, based on the cross-disciplinary knowledge of the people participating in the initiative, gives insights into how each domain envisions these concepts. Some conflicts in the definitions are shown as the words are used in one or another discipline. The attention towards one or more concepts is mostly heterogeneous, with some disciplines focusing more on one aspect than others. While heterogeneity in the attention to the words is legitimate and given by the intrinsic nature of each discipline, the strong changes in the meaning assigned to the same word by different disciplines may inhibit understanding and collaboration among different fields. The word transparent has been interpreted as “providing meaningful information about the underlying logic" in the EU legislation, whereas by technical developers this is often understood as a certain degree of understanding of the system mechanics, decomposability and simulability. In other words, if technicians and legislators were to think of the degrees of transparency of a vehicle, they would see different aspects. The former would think of pistons, fusible and the combination of these elements to the final engine. The latter would think of the degree of information available to the user about the working principles of the vehicle: starting the engine, stopping it from running, changing the direction and so on.

Fig. 3
figure 3

Differences of definitions in other domains than ML development. In this diagram, interpretable is equated to explainable since most of the social domains equate the two terms for simplicity

4.2 A global definition of interpretable AI

As an important contribution of this work, we derive a multidisciplinary definition of interpretable AI that may be adopted in both the social and the legal sciences.

In daily language, an instance, or an object of interest, is defined as interpretable if it is possible to find its interpretation, hence if we can find its meaning (Simpson 2009). Interpretability can thus be conceived as the capability to characterize something as interpretable. A formal definition of interpretability exists in the field of mathematical logic, and it can be summarized as the possibility of interpreting, or translating, one formal theory into another while preserving the validity of each theorem in the original theory during the translation (Tarski et al. 1953. The translated theory as such assigns meaning to the original theory and it is an interpretation of it. The translation may be needed, for instance, to move into a simplified space where the original theory is easier to understand and can be presented in a different language.

From these explicit definitions, we can derive a multidisciplinary definition of interpretability that embraces both technical and social aspects: “Interpretability is the capability of assigning meaning to an instance by a translation that does not change its original validity”. The definition of interpretable AI can then be derived by clarifying what should be translated: “An AI system is interpretable if it is possible to translate its working principles and outcomes in human-understandable language without affecting the validity of the system”. This definition represents the shared goal that several technical approaches aim to obtain when applied to AI. In some cases, as we discuss in Sec. 4.4, the definition is relaxed to include approximations of the AI system that maintain its validity as much as possible. Interpretability is needed to make the output generation process of an AI system explainable and understandable to humans and it is often obtained as a translation process. Such a process may be introduced directly at the design stage as an additional task of the system. If not available by design, interpretability may be obtained by post-hoc explanations that aim at improving the understandability of how the outcome was generated. Interpretability can thus be sought through iterations and in multiple forms (e.g. graphical visualizations, natural language, or tabular data) which can be adapted to the receiver. This fosters the auditability and accountability of the system.

4.3 A global taxonomy

In what follows we present a global taxonomy for interpretable AI, and summarize the multiple viewpoints and perspectives gathered in this work. Table 4 presents the taxonomy with further detail on domain-specific definitions used in each of the eight fields studied in this work, namely law, ethics, cognitive psychology, machine learning, symbolic AI, sociology, labour rights, and healthcare research. Brackets specify the domain in which each definition applies. If a term applies to both social and technical experts it is provided first and marked by the (global) identifier. Otherwise it is marked as the domain specific identified, i.e. EU law, sociology, etc. This table may be resorted to by practitioners in any of the above-mentioned fields to obtain a common definition for each term in the taxonomy and to inspect all the exceptions and variations of the same term in the literature. Our objective is not to impose one taxonomy above another, rather to raise awareness on the multiple definitions of each word in each domain, and to create a common terminology that researchers may refer to in order to reduce misinterpretations.

Table 4 Taxonomy of Interpretable AI for the social and technical sciences

The following subsections explain how the proposed taxonomy adapts to the fields with their respective needs, challenges and goals in terms of ML interpretability.

4.4 Use of the proposed terminology to classify interpretability techniques

In this section, we show how the terminology in Table 3 can be used to classify ML interpretability techniques. To do so, we group popular interpretability techniques into the families shown in Table 5. On the basis of this, Table 6 summarizes how each family of techniques can provide the properties described in Table 3. In the following, we give more insights concerning the classifications provided in Tables 5 and 6.

Due to their low complexity, models such as decision trees and sparse linear models have inherent interpretability, meaning they can be interpreted without the use of additional interpretability techniques (Molnar 2019). These methods are intelligible, according to the definition in Table 3 ID 4. Black-box models, such as deep learning models, have surpassed the performance of traditional systems over complex problems such as image classification. However, due to their high complexity, they require techniques to interpret their decisions and behavior. These techniques often involve considering a close approximation of the model behavior that may be true in the locality of an instance (i.e. local interpretability) or for the entire set of inputs (i.e. global interpretability). They can be grouped according to the following criteria: (1) scope, (2) model-agnostic, and (3) result of explanation.

The scope of the technique shows the granularity of the decisions that are allowed as explanation, either global or local. Global interpretability techniques explain the behavior of the system as a whole, answering the question “How does the model make predictions?”, while local interpretability techniques explain an individual or group of predictions, answering the question “How did the model make a certain prediction or a group of predictions?” (Lipton 2018).

Model-agnostic techniques can be applied to any model class to extract explanations, unlike model-specific techniques that are restricted to a specific model class. Interpretability techniques can also be roughly divided by their result or the type of explanation they produce, creating multiple families of techniques. It is important to note that some types of explanations are strongly preferred, as half the studies using interpretability techniques in the oncological field use either saliency maps or feature importance (Amorim et al. 2021). These techniques can produce data points that explain the behavior of the model (Kim et al. 2016; Lapuschkin et al. 2015), visualizations of internal features (Olah et al. 2017) or produce simpler models that approximate the model (Ribeiro et al. 2016; Lakkaraju et al. 2016; Lundberg and Lee 2017). It is important to choose the right technique based on its scope and family to reach the desired objective. Table 5 presents the families of techniques, their definitions and important references (Molnar 2019).

Based on Tables 12 and 4 we present Table 6 where we group families of interpretability techniques based on their scope and classify them based on their suitability to achieve each of the objectives mentioned in Tables 1 and  2. To achieve interpretability as intended in Table 3 (ID 1), local techniques are preferable since they allow users to interpret the outcomes of a system and thus increase its interpretability. Global techniques can be rather inaccurate at a local level, although they are more adequate to expose the mechanisms of a system in general. The decision-making process can become more transparent (ID 3) at the local or global level, depending on the scope of the interpretability techniques. Intelligibility (ID 4) is a characteristic of inherently interpretable models. It can be achieved for more complex models by approximating the decision function either locally or globally with an inherent interpretable model. It is also important to point out that even with the model being inherently interpretable, sometimes the features being used to train the models can be hard to understand, particularly for non-experts in feature engineering.

As for accountability, systems would need to justify their outcomes and behavior to be accountable, and thus the techniques that offer any interpretability or explainability can help to achieve this. Similarly, these techniques can also be used to examine the global behavior or reasoning of local decisions and provide auditability (ID 7). Finally, Robustness (ID 9) is not achievable by only understanding the behavior of the model. It would rather require finding or producing instances that make the model misbehave, limitations of the model or data points which are outside the training data distribution.

Table 5 Definitions of families of interpretability techniques
Table 6 Classification of families of interpretability techniques

At this point, we remark that interpretability techniques come with inherent risks. A desired property of interpretability is to help the end-user with creating the right mental model of an AI system. However, if one considers AI models to be lossy compression of data, then interpretability outcomes are a lossy compression of the model and are severely underspecified. In other words, it is possible to generate several different interpretations for the same observations. If used improperly, interpretability techniques can open new sources of risk. In some settings, interpretability outcomes can be arbitrarily changed. For example, (Aïvodji et al. 2019) demonstrate a case of “fair washing", where fair rules can be obtained that represent an underlying unfair model. It is also possible for an AI system that predicts grades to be gamed if the underlying logic is fully transparent. Model explanations can demonstrate an AI model criterion to be illegal or provide grounds for appeals (Weller 2019). Finally, transparency also conveys trade-offs involved in decisions in an explicit manner that may otherwise be hidden (Coyle and Weller 2020).

From these considerations, it follows that interpretability requires a context-based scientific evaluation. Two standard approaches for such evaluations are (a) to establish baselines based on domain insights to evaluate the quality of explanations, and (b) to leverage end-user studies to determine effectiveness. For instance, user experiments have been used for trust calibration (knowing when and when not to trust AI outputs) in joint decision-making (Zhang et al. 2020). In another interesting approach, (Lakkaraju et al. 2016) measured the teaching performance of end-users in establishing how effective explanations are in communicating model behavior with good teaching performance indicating better model understanding.

Several quantitative measures to assess explanation risks have also been proposed in the literature. A common measure using surrogates involves approximating a complex model with a simpler interpretable one. Properties of the simpler model can then help address questions on the extent of interpretability of the original model. Common measures include fidelity, the fraction of time the simpler model agrees with the complex one, or complexity, the number of elements in the simpler model a user needs to parse to understand an outcome. Faithfulness metrics measure the correlation between feature importance as deemed by an AI model versus deemed by an explanation. Sensitivity measures (Yeh et al. 2019) the degree to which explanations are impacted by non-trivial perturbations.

4.5 Terminology in the cognitive sciences

From the point of view of the cognitive sciences, interpretability (as defined in line 1 of Table 3), is considered part of the social interaction between an AI system and a user (Hilton 1990). As the definition underlines, the concept of interpretability is strictly connected to the human ability of understanding information. The process of understanding is defined in cognitive psychology as the ability of the human brain to infer or make predictions in the semantic memory. The semantic memory is wired by connections of neurons that are created and consolidated by positive enforcement. A high-level model of such neural connections identifies areas that are specialized for reacting to specific stimuli (e.g. numbers, words, shapes, colors, actions, sounds). Depending on what kind of information is being understood, these areas may be used individually or share functions (Ward 2019). The understandability of something is thus the property of an object, may this be a model or the outcome of interpretability methods, to be understood by a human. Because the wiring of the neurons constituting the areas in the semantic memory is a result of individual experiences, understandability incorporates some degree of subjectivity and variability, e.g. what is understandable to someone may not be understandable to someone else. Users may vary greatly, so may their background and understanding of explanations. Thus to be widely applicable and useful to a variety of users, understandability shall not require any prior training of the addressees concerning the feature extraction, hyper-parameter selection and training of AI systems.

Some aspects of human explanation generation (i.e. explainability as in ID 2 Table 3) do not coincide directly with what is intuitively thought about as transparency (ID 3 in Table 3). The first difference is that explanations are selected by humans. The selection is generally biased to reflect the mental model of the explainee. Even having a complete set of causal relations, people are more likely to rely on a few causes that may explain certain key aspects of the event (Hilton 2017). It may at this point be noted that explainability should thus be intended differently from transparency, that is rather the unbiased provision of insights about the internal mechanics of an AI system.

4.6 Social and working environment

To develop a social relationship between humans and machines, interpretability needs to act as a social contract of trust between these two parties. Trust in the system leads to reliability (as intended in ID 6 of Table 3) and this can only be built through sustained understanding. Using understanding to build trust is a well-understood social science research problem, complicated by the fact that humans accept explanations first and foremost in a highly biased manner (Lombrozo 2006). The fact that bias is part of every human understanding, however, should not limit the potential success of explainable AI. For this reason, AI explainability (ID 2 in Table 3) should be seen as a social translation, as investigated in recent studies in HCI like (Kaur et al. 2020). If only computer scientists are considered within the project ideation and development, however, there is the main risk, discussed by Miller et al. (2017), of having the helpless being led by the cluelessFootnote 3, namely having ML engineers building explainability mostly for other ML engineers. Social scientists and workers should be introduced in the analyses proposed by ML researchers, as the actual addressee and users of the algorithms. Collaborations should be built to develop types of human-computer interactions in ML that are more understandable to non-ML experts. If interpretability is not developed with the help of the social sciences, the risk of creating AI systems mainly for other researchers is high and it would undermine the efforts in building reliable and trustworthy automated systems.

AI may not be developed with the only intent of prioritizing the reduction of human input, as this may lead to the perception of AI as “inhuman” intelligence (Dick 2019). New algorithms should prioritize the creation of a relationship of trust above the desire to automate and reduce human input.

Within the realm of employment relations, work and labor markets, the concept of "democracy at work" is generating into the discussion of the criteria for AI transparency (as defined in Table 3 ID 3). Of particular importance are the employees’ rights of participation and consultation if AI algorithms are employed to make decisions at the workplace. Employees should be guaranteed the possibility to get involved in management decisions about the organization of work and of working conditions. Democracy is thus essential to let the employees create optimal conditions for work and it translates into the need of transparency if AI systems are used to manage the working personnel. In particular the workers’ autonomy (the right of a worker to intervene), skill grading and the ruling of organization and production processes should be regulated by transparent AI decisions. Transparency is thus desired to decide whether an algorithm is performing non-democratic practices, such as discrimination. It is thus intended in the sense of a means to improve the worker’s satisfaction and safety at work (see Fig. 3). Even further, it may help to identify the workplace conditions enabling discrimination in the first place.

4.7 The EU law on interpretability

In law, there is no precise definition of AI explainability. The High-Level Expert Group on AI (AI HLEG) set up by the European Commission lists explicabilityFootnote 4 as one of the ethical principles that must be respected in order to ensure that AI systems are developed, deployed and used in a trustworthy manner. The principle of explicability encompasses both the terms of transparency and explainability as defined in Table 3. From a legal point of view, explainability is seen as collecting meaningful insights on how a particular decision is made (Bibal et al. 2020). According to Bibal et al. (2020), it does not set the requirement for an interpretable representation of a mathematical model. Most important is that the explanation should assign meaning to the decision, i.e. so that the decision improves the explainee’s understandingFootnote 5 of the decision generation process. It follows from the AI HLEG Guidelines that explainability should be adapted to the level of expertise and understanding of the individual concerned. Bibal et al. (2020) argue that in private decision-making, the legal requirements relate to the following four levels of ML explainability concepts: (i) providing the main features used for a decision, (ii) providing all features used for a decision, (iii) providing explanation on the way the features are combined to make the decision, and (iv) providing an understandable representation of the whole model. Wachter et al. (2017) propose the following categorization of what one may mean by an explanation of automated decision-making. Two kinds of explanations are possible, depending on whether one refers to: system functionality, i.e. the logic, significance, envisaged consequences, and general functionality of an automated decision-making system, e.g. the system’s requirements specification, decision trees, pre-defined models, criteria, and classification structures; or to specific decisions, i.e. the rationale, reasons, and individual circumstances of a specific automated decision, e.g. the weighting of features, machine-defined case-specific decision rules, information about reference or profile groups. Furthermore, one can also distinguish between an ex-ante explanation (i.e. prior to the automated decision-making taking place) and an ex-post explanation (i.e. after the automated decision has taken place) (Wachter et al. 2017). The focus of many legal scholars has been on the meaning of explainability from the data protection law point of view. The core debate has primarily focused on whether or not the General Data Protection Regulation 2016/679 (GDPR) creates a right to an explanation of an algorithmic decision, as argued by Goodman and Flaxman (2016) and further discussed by Wachter et al. (2017). The latter, in particular, argue that a non-existing “right to explanation" of a specific automated decision should not be mistaken with other GDPR provisions. The actual GDPR rather forms a “right to be informed" by claiming: (i) the right not to be subject to automated decision-making and safeguards enacted thereof (Article 22 and Recital 71); (ii) notification duties of data controllers (Articles 13-14 and Recitals 60-62); and (iii) the right to access (Article 15 and Recital 63). Others, like (Selbst and Powles 2018), point out that whether one uses the phrase “right to explanation" or not, data controllers need to provide the data subject with the “meaningful information about the logic involved, as well as the significance and the envisaged consequences of such processing for the data subject" (Article 13(2), 14(2), 15(1) of the GDPR). Such information must be meaningful to an individual confronted with a decision (Selbst and Powles 2018). The test for whether the information is meaningful should therefore be functional - explanations are a means to help a data subject act rather than merely understand the mathematical processes behind decisions (Edwards and Veale 2017). This is also in line with some of the claims done in the applicative domain at high-stakes, e.g. clinical decision-making (Tonekaboni et al. 2019).

Some scholars have studied how the legal requirements on explainability could be interpreted and applied to ML (Bibal et al. 2020). Hamon et al. (2021) used a COVID-19 use case scenario to assess the feasibility of legal requirements on algorithmic explanations. They concluded that the use of complex deep learning models in AI applications makes it hard to reconcile with the existing EU data protection law requirements, especially with regards to human legibility of explanations for non-expert data subjects. Similarly, (Edwards and Veale 2017) note that the legal concept of explanations as “meaningful information about the logic of processing" may not be provided by the kind of ML “explanations" computer scientists have developed. This further motivates the need to resort to a common ground where the objectives regarding interpretability can be discussed among the disciplines involved, for example on the basis of the taxonomy provided in this paper. It is possible that in some cases transparency or explanation rights may be overrated or even irrelevant—the problem that is often referred to as transparency fallacy. In many cases what the data subject wants is not an explanation-but rather for the disclosure, decision or action simply not to have occurred (Edwards and Veale 2017). In high-risk AI systems, however, the recently proposed draft Regulation on AI (the AI Act) envisions transparency as one of the obligations for the operators. Article 13 of the draft AI Act requires high-risk AI systems to be “designed and developed in such a way to ensure that their operation is sufficiently transparent to enable users to interpret the system’s output and use it appropriately." The obvious difference here, in comparison with the AI HLEG Guidelines, is that the transparency is addressed towards the users of the AI systems, that are not necessarily familiar with ML theory. This aligns with the requirement of personalized explanations discussed in Sect. 4.5 and contrasts with the current definition of transparency in the ML community where this property is rather intended as an objective peek through inside the AI algorithm.

For AI systems that interact with natural persons, e.g. an emotion recognition system or a biometric categorization system and AI systems that generate deep fakes, the draft AI Act prescribes an obligation to inform or disclose the fact that they interact or are exposed to such systems. It is interesting that even though the draft AI Act does use the very term transparency, it does not refer to the explainability and the traceability dimension that were part of the concept according to the AI HLEG Guidelines. This shows the inconsistency of the terminology from a legal point of view. One obvious solution would be to amend the text of the regulation; if not, it would be subject to interpretation by the Court of Justice of the European Union that is likely to rely on other branches of science to complement the legal gaps, which shows the clear necessity of unified taxonomy.

4.8 An ethical point of view

The requirement of interpretability is often made on the basis of an analogy with human decision-making (Coeckelbergh 2020). We expect bankers to explain why they reject a loan, physicians to explain why they discontinue treatment and politicians to explain why they want to implement a certain policy. This requirement is often based on the idea of transparency: that seeing how a phenomenon happens generates accountability and the possibility of change (Ananny and Crawford 2018). The interpretation of phenomena in this sense derives from the epistemological concerns being debated since antiquity in philosophy. In the historical sense (in Table 3), interpreting has to do with understanding a particular course of action or decision-making and ethical concerns have to do with providing reasons for moral choices. Even prior to that, interpretation has been primarily a religious issue, namely concerning the interpretation of the holy scripture, which was supposed to transmit the word of God, in a way such that the true meaning of the text would be preserved.

Unlike other technologies, interpretation is one of the primary ethical concerns that are raised with the application of AI. While other technologies are also able to replace human functions (e.g., a walking stick takes over the function of a leg), AI is arguably the first technology that has the capacity to make decisions. And this raises both the epistemological question of why certain decisions were made by an AI system, as well as the ethical question of whether good reasons can be given for this decision, in case it is of ethical significance.

What sets the ethical discussion apart from the technical perspective in Sect. 4.4, is its primary focus on the ethical value of an explanation, rather than in its epistemic value (Robbins 2019). That is, a causal chain leading to the damage needs to be provided if an AI-generated decision may affect a human being.

As scholars have argued, however, human beings often do not need complete causal chains of explanation (Coeckelbergh 2020). This opens up some new ethical issues and problems such as the intentional concealing of information, which may be obtained even by simply providing explanations of which the understandability is limited by the requirement of prior expert knowledge (Ananny and Crawford 2018). A patient might not be helped by a full causal explanation of a diagnosis but rather by a trustworthy account of understandable reasons expressed in clear and simple language.

From this perspective, we may raise three overarching ethical concerns of interpretable AI. First, there is the concern of “sacrifice". Because interpretation is always situated between the system and the user, it generates the inevitable risk of omission during interpretation. This can be due to either oversimplification (simplifying the model dynamics missing out on important technical details) or to overcomplexify (providing too technical explanations most users cannot grasp) (Nissenbaum 2011). Interpretation therefore inevitably sacrifices meaning. Second, we should be concerned about “hospitality", here intended as a common ground of understanding between strangers that aims to remedy the potential of conflict. Interpretation requires building bridges between different world visions, for instance between a physician and a patient, or a civil servant and a citizen. Third, interpretation raises the question of professional virtues. It is often part of a particular profession (a notary, a physician, a school teacher) to uphold certain standards of excellence in providing interpretability, for instance under the heading of the virtue of “fidelity". Importantly, what these standards mean in practice can differ significantly between different professional contexts.

In light of the above three (and other) ethical challenges, researchers have to consider how the ethical interpretability of AI systems should be realized in practice. Often, this requires finding ways in which humans and AI systems are able to work together in providing interpretations that are related to practices, sensitive to context, and provide good reasons for making ethical choices if required.

4.9 Not only humans: XAI in intelligent autonomous systems

Virtual agents are the most common embodiment of symbolic AI (Russell and Norvig 2002). They can operate singularly, in a cooperative or adversarial fashion (within Multi-Agent Systems—MAS). The agents composing intelligent autonomous systems (MAS) are hardware/software-based computer systems characterized by any or all of the following: (i) autonomy (no direct intervention or human control), (ii) social ability (free to interact with other agents and humans), (iii) reactivity (perception of their environment and according reactions), and (iv) pro-activeness (being goal-directed, they can take the initiative) (Franklin and Graesser 1996). MAS have increasingly become part of modern society and as such are incorporated in an increasing number of everyday tasks (Calvaresi et al. 2017).

Beyond their symbolic nature, modern agents can also leverage sub-symbolic algorithms (i.e., ML and DL), integrating them into their reasoning processes (Schwartz 2014). While symbolic agents are explainable by design (being mainly rule-based), the behavior of sub-symbolic or hybrid agents can result in being opaque for both human users and other agents. Such opacity harms the reputation of the single agents and the trust into the overall intelligent system (Anjomshoae et al. 2019; Ciatto et al. 2020). In the last decades, the majority of the articles in explainable agents focused on making intelligent systems understandable primarily to humans (Rosenfeld and Richardson 2019; Anjomshoae et al. 2019; Guidotti et al. 2018). Bridging symbolic and sub-symbolic approaches is called neuro-symbolic integration (Stammer et al. 2021; Sarker et al. 2021). For example, (De Raedt et al. 2019) proposed to adopt neuro-symbolic and probabilistic approaches, (Riveret et al. 2015) to adopt neuro-argumentative techniques, and (Besold and Kühnberger 2015) proposed two paths to achieve such an integration. Nevertheless, current research indicates that the forthcoming decades will focus on the full development of conversational informatics (Nishida 2014; Calvaresi et al. 2021). MAS are modeled after human societies and within MAS agents communicate with each other, sharing syntax and ontology. They interact via the Agent Communication Languages (ACL) standard [?] shaped around Searle’s theory of human communication based on speech acts (Searle et al. 1969). Therefore, multi-agent interpretability and explainability require multi-disciplinary efforts to capture all the diverse dimensions and nuances of human conversational acts, transposing such skills to conversational agents (Ciatto et al. 2019, 2020). Equipping virtual entities with explanation capabilities (either directed to humans or other virtual agents) fits into the view of socio-technical systems, where both humans and artificial components play the role of system components (Whitworth 2006). Ongoing international projects revolve around these concepts. For example, they are tackling intra- and inter-agent explainability (EXPECTATION), actualizing explainable assistive robots (COHERENT), countering information manipulation with knowledge graphs and semantics (CIMPLE), and relating action to effect via causal models of the environment (CausalXRL) Footnote 6. Explainable agents can leverage symbolic AI techniques to provide a rational and shareable representation of their own specific cognitive processes and results. Being able to manipulate such a representation allows building one or more personalized explanations to meet the explainee (human and virtual) background and boost the success of the explanation process and overall interaction.

5 A case study: the medical domain

In this Section, we present a case study in a medical scenario. We show how each of the perspectives from the multiple domains (i.e. from the legislation, cognitive, social, ethical, philosophical, rights at work, ML and symbolic AI) comes into play in a possible use case. As argued by Tonekaboni et al. (2019) and Banja et al. (2022), the application of ML to clinical settings represents a relevant use case for interpretability, motivated by the high stakes, the complexity of the modeling task and the need for reliability. From the legal perspective, clinicians are the sole people legally accountable for any diagnosis and decision-making, hence accepting ML suggestions is seen as taking an acknowledged risk that may affect the survival and life quality of the patient. As the cognitive sciences suggest, clinicians should be able to revise their mental model of the AI system to be able to understand the principles applied by the systems’ decision-making, ensuring the reliability of the systems. It is only through time and sustained use that a social relationship of trust between the physician and the automated system can be installed. Interpretability is to be sought in the medical application not only for the sake of the philosophical and epistemic value of explanations per se, but also as an ethical requirement to provide a factual, direct and clear explanation of the decision-making process, especially in the event of unwanted consequences" (Floridi et al. 2018; Robbins 2019). An AI-generated decision arguably needs to be interpretable if it can affect a human being. Given the high cost of making a mistake, the ML application cannot be allowed to take decisions independently, differently from other contexts where ML tools are used lightly, e.g. recommendation systems. This sets a major requirement to ensure the well-being of the physicians in the workplace, making sure that their confidence with the tools may increase over time and provide them with sufficient transparency to take the decisions on whether to rely or not on the AI system. To satisfy the requirements set by this analysis from the social sciences, the ML and symbolic-AI tools deployed for clinical use should interact with the experts for which technical solutions must be developed.

The interaction between humans and ML systems is a non-trivial task. Human reasoning is mostly based on high-level concepts that interact with each other to form a semantic representation. These interactions with semantic meaning are not necessarily represented by ML models that mostly operate on numeric features such as input pixel values, internal activations and model weights (Kim et al. 2018). When the features used by the model are expressed in clinical terms, the interaction of the clinicians with the system is enhanced and can lead to successful cooperation. An example is the case described in Caruana et al. (2015). Despite its high performance, the model for pneumonia risk detection had a hidden flaw. Cases of pneumonia with concurring asthma were assigned a lower risk of death than those without, despite the presence of this condition being known to worsen the severity of the cases. A correct prediction would have been the opposite diagnosis given the high risk of death. The misleading correlation (i.e. presence of asthma thus low risk of death from pneumonia) was rather a consequence of the effective care given to these patients by healthcare specialists that were promptly reacting to reduce the risk of death, and as a consequence lowering the recorded risk for these patients. The misleading feature “presence of asthma" was captured by the interpretability analysis and it was promptly understood by physicians since it was expressed as a clinical feature.

It is now worth pointing out that, as described by Asan et al., “maximizing the user’s trust does not necessarily yield the best decisions from a human-AI collaboration" and that the optimal trust level can be achieved when the user knows when the model makes errors. After recalling that the role of humans in the practical applications of AI has been overlooked (Asan et al. 2020; Verma et al. 2021), they suggest that achieving such an understanding of both strengths and weaknesses of the models requires a combination of three main elements: (i) increasing transparency, (ii) ensuring robustness (Briganti and Le Moine 2020) and (iii) encouraging fairness. Concerning (i), XAI was mentioned as the most promising approach to alleviate the black-box effects (Morin et al. 2018; Reyes et al. 2020; Verma et al. 2021). In addition, we believe that current AI model lifecycles are often too short for the user to acquire a sufficiently high confidence, where novel approaches, or even retrained versions of the same algorithm are constantly released, sometimes with only little quantitative performance improvement. This can be compared to a situation where drivers must flawlessly master their vehicle while the latter is continuously changing shape and characteristics. One must therefore foster patience to achieve an adequate level of trust, which involves an intimate relationship between the end-user and a particular instance of the model to seize the situations where the model is working well and where it does not. This was de facto encouraged by the U.S. Food and Drug Administration (FDA), which as of June 2021 only approved static algorithms. However, as pointed out by Pianykh et al. the performance of static AI algorithms tends to degrade over time, owing to the naturally occurring changes in local data and the environment (Pianykh et al. 2020). Furthermore, the access to a large collection of well-curated, expert-labeled data from a source that has high relevance to the studied population and the question asked is also a severe barrier for widespread adoption in the clinics (Willemink et al. 2020). We can conclude that an optimal model lifecycle has yet to be discovered to balance between model performance and robustness as well as adequate user trust and data access to optimally train AI models.

6 Conclusion

This work proposes an in-depth discussion of the terminology in interpretable AI, highlighting the risks of misunderstanding that exist if differing definitions are employed in the technical and social sciences. As noted by the experts, there are important gaps between how, for example, the legal legislation shows the notion of transparency and the meaning that is assigned to this word by ML experts and developers. While in the first case transparency is intended as a subjective property that is influenced by the receiver’s understanding and prior knowledge, in the technical sciences transparency is rather seen as an objective property that is not influenced by the receiver of the information. Similarly, the notion of interpretability is seen as the creation of a social contract of trust by social sciences, whereas this is yet too often intended as the explanation of the automated generation process of the AI system by most AI experts.

The taxonomy proposed in this paper has the objective to harmonize the terminology used by lawyers, philosophers, developers, physicians and sociologists, with the goal of building a solid basis for discussing the future of AI development in a multidisciplinary setting. We show how the proposed terminology is used in multiple domains and also its versatility to social and technical discussions. By discussing these points on the concrete application of the medical domain we show that the need for a common terminology is real and that further reflection is needed to define how effective human-machine cooperation can be established. Without the help of the social sciences, it would not be possible to obtain a sustainable human-machine partnership and further research needs to be pursued at the frontier of the social and technical sciences. This paper may then constitute a strong foundation for scientists and humanists to collaborate and interact on such matters.