1 Introduction

Machine learning algorithms are increasingly used in a variety of high-stakes domains, from credit scoring to medical diagnosis. However, many such methods are opaque, in that humans cannot understand the reasoning behind particular predictions. This raises fundamental issues of trust, fairness, and accountability that cannot be easily resolved. Post-hoc, model-agnostic local explanation tools—algorithms designed to shed light on the individual predictions of other algorithms—are at the forefront of a fast-growing research area dedicated to addressing these concerns. Prominent examples include feature attributions, rule lists, and counterfactuals, each of which will be critically examined below. The subdiscipline of computational statistics devoted to this problem is variously referred to as interpretable machine learning or explainable artificial intelligence (XAI). For recent reviews, see Murdoch et al. (2019), Rudin et al. (2021), and Linardatos et al. (2021).

Many authors have pointed out the inconsistencies between popular XAI tools, raising questions as to which method is more reliable in particular cases (Krishna et al., 2022; Mothilal et al., 2021; Ramon et al., 2020). Theoretical foundations have proven elusive in this area, perhaps due to the perceived subjectivity inherent to notions such as “simple” and “relevant” (Watson & Floridi, 2020). Practitioners often seek refuge in the axiomatic guarantees of Shapley values, which have become the de facto standard in many XAI applications, due in no small part to their attractive theoretical properties (Bhatt et al., 2020). This method, formally defined in Sect. 4, quantifies the individual contribution of each feature toward a particular prediction. However, ambiguities regarding the underlying assumptions of existing software (Kumar et al., 2020) and the recent proliferation of mutually incompatible implementations (Merrick & Taly, 2020; Sundararajan & Najmi, 2019) have complicated this picture. Despite the abundance of alternative XAI tools (Molnar, 2019), a dearth of theory persists. This has led some to conclude that the goals of XAI are underspecified (Lipton, 2018), and even that post-hoc methods do more harm than good (Rudin, 2019).

We argue that this lacuna at the heart of XAI should be filled by a return to fundamentals—specifically, to necessity and sufficiency. As the building blocks of all successful explanations, these dual concepts deserve a privileged position in the theory and practice of XAI. In this article, an expanded version of a paper originally presented at the 37th Conference on Uncertainty in Artificial Intelligence (Watson et al., 2021), we propose new formal and computational methods to operationalize this insight. Whereas our original publication focused largely on the properties and performance of our proposed algorithm, in this work we elaborate on the conceptual content of our approach, which relies on a subtle distinction between inverse and converse probabilities, as well as a pragmatic commitment to context-dependent, agent-oriented explanations.Footnote 1

We make three main contributions. (1) We present a formal framework for XAI that unifies several popular approaches, including feature attributions, rule lists, and counterfactuals. Our framework is flexible and pragmatic, enabling users to incorporate domain knowledge, search various subspaces, and select a utility-maximizing explanation. (2) We introduce novel measures of necessity and sufficiency that can be computed for any feature subset. Our definitions are uniquely expressive and accord better with intuition than leading alternatives on challenging examples. (3) We present a sound and complete algorithm for identifying explanatory factors, and illustrate its performance on a range of tasks.

The remainder of this paper is structured as follows. Following a review of related work (Sect. 2), we introduce a unified framework (Sect. 3) that reveals unexpected affinities between various XAI tools and fundamental quantities in the study of causation (Sect. 4). We proceed to implement a novel procedure for computing model explanations that improves upon the state of the art in quantitative and qualitative comparisons (Sect. 5). After a brief discussion (Sect. 6), we conclude with a summary and directions for future work (Sect. 7).

2 Necessity and Sufficiency

Necessity and sufficiency have a long philosophical tradition, spanning logical, probabilistic, and causal variants. In propositional logic, we say that x is a sufficient condition for y iff \(x \rightarrow y,\) and x is a necessary condition for y iff \(y \rightarrow x.\) So stated, necessity and sufficiency are logically converse. However, by the law of contraposition, both definitions admit alternative formulations, whereby sufficiency may be rewritten as \(\lnot y \rightarrow \lnot x\) and necessity as \(\lnot x \rightarrow \lnot y.\) By pairing the original definition of sufficiency with the latter definition of necessity (and vice versa), we find that the two concepts are also logically inverse.

These formulae immediately suggest probabilistic relaxations, in which we measure the sufficiency of x for y by P(y|x) and the necessity of x for y by P(x|y). Because there is no probabilistic law of contraposition, these quantities are generally uninformative w.r.t. \(P(\lnot x|\lnot y)\) and \(P(\lnot y|\lnot x),\) which may be of independent interest. Thus, while necessity is both the converse and inverse of sufficiency in propositional logic, the two formulations come apart in probability calculus. This distinction between probabilistic conversion and inversion will be crucial to our proposal in Sect. 3, as well as our critique of alternative measures in Sect. 4. Counterintuitive implications of contrapositive relations abound, most famously in confirmation theory’s raven paradox (Good, 1960; Hempel, 1945; Mackie, 1963), but also in the literature on natural language conditionals (Crupi & Iacona, 2020; Gomes, 2019; Stalnaker, 1981). Our formal framework aims to preserve intuition while extinguishing any potential ambiguity.

Logical and probabilistic definitions of necessity and sufficiency often fall short when we consider causal explanations (Tian & Pearl, 2000; Pearl, 2009). It may make sense to say in logic that if x is a necessary condition for y, then y is a sufficient condition for x; it does not follow that if x is a necessary cause of y, then y is a sufficient cause of x. We may amend both concepts using counterfactual probabilities—e.g., the probability that Alice would still have a headache if she had not taken an aspirin, given that she does not have a headache and did take an aspirin. Let \(P(y_x|x^{\prime}, y^{\prime})\) denote such a quantity, to be read as “the probability that Y would equal y under an intervention that sets X to x, given that we observe \(X = x^{\prime}\) and \(Y = y^{\prime}.\)” Then, according to Pearl (2009, Chap. 9) the probability that x is a sufficient cause of y is given by \(\texttt {suf}(x, y) := P(y_x|x^{\prime}, y^{\prime}),\) and the probability that x is a necessary cause of y is given by \(\texttt {nec}(x, y) := P(y^{\prime}_{x^{\prime}}|x,y).\)

Analysis becomes more difficult in higher dimensions, where variables may interact to block or unblock causal pathways. This problem is the primary focus of the copious literature on “actual causality”, as famously laid out in a pair of influential articles by Halpern and Pearl (2005a, 2005b), and later given book-length treatment in a monograph by Halpern (2016). For a recent survey and refinement of the formal definitions, see Beckers (2021). The common thread in all these works, cashed out in various ways by philosophers including Mackie (1965) and Wright (2013), is that x causes y iff x is a necessary element of a sufficient set for y. These authors generally limit their analyses to Boolean systems with convenient structural properties. Operationalizing their theories in a practical method without such restrictions is one of our primary contributions.

Necessity and sufficiency have begun to receive explicit attention in the XAI literature. Ribeiro et al. (2018a) propose a bandit procedure for identifying a minimal set of Boolean conditions that entails a predictive outcome (more on this in Sect. 4). Dhurandhar et al. (2018) propose an autoencoder for learning pertinent negatives and positives, i.e. features whose presence or absence is decisive for a given label, while Zhang et al. (2018) develop a technique for generating symbolic corrections to alter model outputs. Both methods are optimized for neural networks, unlike the model-agnostic approach we pursue here.

Another strand of research in this area is rooted in logic programming. Several authors have sought to reframe XAI as either a SAT (Ignatiev et al., 2019; Narodytska et al., 2019) or a set cover problem (Grover et al., 2019; Lakkaraju et al., 2019). Others have combined classical work on prime implicants with recent advances in tractable Boolean circuits (Darwiche & Hirth, 2020). These methods typically derive approximate solutions on a prespecified subspace to ensure computability in polynomial time. We adopt a different strategy that prioritizes completeness over efficiency, an approach we show to be feasible in moderate dimensions and scalable under certain restrictions on admissible feature subsets (see Sect. 6 for a discussion).

Mothilal et al. (2021) build on Halpern (2016)’s definitions of necessity and sufficiency to critique popular XAI tools, proposing a new feature attribution measure with some purported advantages. Their method relies on the strong assumption that predictors are mutually independent. Galhotra et al. (2021) adapt Pearl (2009)’s probabilities of causation for XAI under a more inclusive range of data generating processes. They derive analytic bounds on multidimensional extensions of nec and suf, as well as an algorithm for point identification when graphical structure permits. Oddly, they claim that non-causal applications of necessity and sufficiency are somehow “incorrect and misleading” (p. 2), a normative judgment that is inconsistent with many common uses of these terms.

Rather than insisting on any particular interpretation of necessity and sufficiency, we propose a general framework that admits logical, probabilistic, and causal interpretations as special cases. Whereas previous works evaluate individual predictors, we focus on feature subsets, allowing us to detect and quantify interaction effects. Our formal results clarify the relationship between existing XAI methods and probabilities of causation, while our empirical results demonstrate their applicability to a wide array of tasks and datasets.

3 A Unifying Framework

We propose a unifying framework that highlights the role of necessity and sufficiency in XAI. Its constituent elements are described below. As a running example, we will consider the case of a hypothetical loan applicant named Anne.Footnote 2

3.1 The Basis Tuple

3.1.1 Target Function

Post-hoc explainability methods assume access to a target function \(f: \mathcal{X} \mapsto \mathcal{Y},\) i.e. the machine learning model whose prediction(s) we seek to explain. For simplicity, we restrict attention to the binary setting, with \(Y \in \{0, 1\}.\) Multi-class extensions are straightforward, while continuous outcomes may be accommodated via discretization. Though this inevitably involves some information loss, we follow authors in the contrastivist tradition in arguing that, even for continuous outcomes, explanations always involve a juxtaposition (perhaps implicit) of “fact and foil” (Lipton, 1990). For instance, Anne is probably less interested in knowing why her credit score is precisely y than she is in discovering why it is below some threshold (say, 700). Of course, binary outcomes can approximate continuous values with arbitrary precision over repeated trials. We generally regard f as deterministic, although stochastic variants can easily be accommodated.

3.1.2 Context

The context \(\mathcal{D}\) is a probability distribution over which we quantify sufficiency and necessity.Footnote 3 Contexts may be constructed in various ways but always consist of at least some input (point or space) and reference (point or space). For example, say Anne’s loan application is denied. The specific values of all her recorded features constitute an input point. To figure out why she was unsuccessful, Anne may want to compare herself to some similar applicant who succeeded (i.e., a reference point), or perhaps the set of all successful applicants (i.e., a reference space). Alternatively, she may expand the input space to include all unsuccessful applicants of similar income and age range, and compare them to a reference class of successful applicants in this same income and age range. Anne may make this comparison by (optionally) exploring intermediate inputs that gradually make the input space more reference-like or vice versa. For instance, Anne may change the income of all applicants in the input space to some reference income. Contexts capture the range of all such intermediate inputs that Anne examines in comparing the input(s) and reference(s). This distribution provides a semantics for explanatory measures by bounding the scope of necessity and sufficiency claims.

Observe that the “locality” of Anne’s explanation is determined by the extent to which input and reference spaces are restricted. An explanation that distinguishes all successful applicants from all unsuccessful applicants is by definition global. One that merely specifies why Anne failed, whereas someone very much like her succeeded, is local—perhaps even maximally so, if Anne’s successful counterpart is as similar as possible to her without crossing the decision boundary. In between, we find a range of intermediate alternatives, characterized by spaces that overlap with Anne’s feature values to varying degrees. Thus we can relax the hard boundary between types and tokens, so pervasive in the philosophical literature on explanation (Hausman, 2005), and admit instead a spectrum of generality that may in some cases be precisely quantified (e.g., with respect to some distance metric over the feature space).

In addition to predictors and outcomes, the context can optionally include information exogenous to f. A set of auxiliary variables \(\varvec{W}\) may span sensitive attributes like gender and race that are not recorded in \(\varvec{X},\) which Anne could use to audit for bias on the part of her bank. Other potential auxiliaries include engineered features, such as those learned via neural embeddings, or metadata about the conditioning events that characterize a given distribution. Crucially, such conditioning events need not be just observational. If, for example, Anne wants to compare her application to a treatment group of customers randomly assigned some promotional offer \((W=1),\) then her reference class is sampled from \(P(\varvec{X}|do(W=1)).\) Alternatively, W may index different distributions, serving the same function as so-called “regime indicators” in Dawid (2002, 2021)’s decision-theoretic approach to statistical causality. This augmentation allows us to evaluate the necessity and sufficiency of factors beyond those observed in \(\varvec{X}.\) Going beyond observed data requires background assumptions (e.g., about structural dependencies) and/or statistical models (e.g., learned vector representations). Errors introduced by either may propagate to final explanations, so both should be handled with care. Contextual data take the form \(\varvec{Z} = (\varvec{X}, \varvec{W}) \sim \mathcal{D}.\) We extend the target function to augmented inputs by defining \(f(\varvec{z}) := f(\varvec{x}).\)

3.1.3 Factors

Factors pick out the properties whose necessity and sufficiency we wish to quantify. Formally, a factor \(c: \mathcal{Z} \mapsto \{0, 1\}\) indicates whether its argument satisfies some criteria with respect to predictors or auxiliaries. Say Anne wants to know how her odds of receiving the bank loan might change following an intervention that sets her income to at least $50,000. Then a relevant factor may be \(c(\varvec{z}) = \mathbbm {1}[\varvec{x}[\mathsf{gender} = \text{``female''}] \wedge \varvec{w}[do(\mathsf{income}>\$50\text{k})]],\) which checks whether the random sample \(\varvec{z}\) corresponds to a female drawn from the relevant interventional distribution. We use the term “factor” as opposed to “condition” or “cause” to suggest an inclusive set of criteria that may apply to predictors \(\varvec{x}\) and/or auxiliaries \(\varvec{w}.\) Such criteria are always observational w.r.t. \(\varvec{z}\) but may be interventional or counterfactual w.r.t. \(\varvec{x}.\)Footnote 4 We assume a finite space of factors \(\mathcal{C}.\)

3.1.4 Partial Order

When multiple factors pass a given necessity or sufficiency threshold, users will tend to prefer some over others. Say Anne learns that either of two changes would be sufficient to secure her loan: increasing her savings or getting a college degree. She has just taken a new job and expects to save more each month as a result. At this rate, she could hit her savings target within a year. Quitting her job to go to college, by contrast, would be a major financial burden, one that would take years to pay off. Anne therefore judges that boosting her savings is preferable to getting a college degree—i.e., the former precedes the latter in her partial ordering of possible actions.

To the extent that XAI methods consider agentive preferences at all, they tend to focus on minimality. The idea is that, all else being equal, factors with fewer conditions and smaller changes are generally preferable to those with more conditions and greater changes. Rather than formalize this preference in terms of a distance metric, which unnecessarily constrains the solution space, we treat the partial ordering as primitive and require only that it be complete and transitive. This covers not just distance-based measures but also more idiosyncratic orderings that are unique to individual agents. Ordinal preferences may be represented by cardinal utility functions under reasonable assumptions (see, e.g., Jeffrey, 1965; Savage, 1954; von Neumann & Morgenstern, 1944), thereby linking our formalization with a rich tradition of decision theory and associated methods for expected utility maximization.

We are now ready to formally specify our framework.

Definition 1

(Basis) A basis for computing necessary and sufficient factors for model predictions is a tuple \(\mathcal{B} = \langle f, \mathcal{D}, \mathcal{C}, \preceq \rangle ,\) where f is a target function, \(\mathcal{D}\) is a context, \(\mathcal{C}\) is a set of possible factors, and \(\preceq\) is a partial ordering on \(\mathcal{C}.\)

3.2 Explanatory Measures

For some fixed basis \(\mathcal{B} = \langle f, \mathcal{D}, \mathcal{C}, \preceq \rangle ,\) we define the following measures of sufficiency and necessity, with probability taken over \(\mathcal{D}.\)

Definition 2

(Probability of sufficiency) The probability that c is a sufficient factor for outcome y is given by:

$$\begin{aligned} PS(c, y) := P(f(\varvec{z}) = y\,|\,c(\varvec{z}) = 1). \end{aligned}$$

The probability that factor set \(C= \{c_1, \ldots , c_k\}\) is sufficient for y is given by:

$$\begin{aligned} PS(C, y) := P(f(\varvec{z}) = y~|~\sum _{i=1}^k c_i(\varvec{z}) \ge 1). \end{aligned}$$

Definition 3

(Probability of necessity) The probability that c is a necessary factor for outcome y is given by:

$$\begin{aligned} PN(c, y) := P(c(\varvec{z}) = 1~|~f(\varvec{z}) = y). \end{aligned}$$

The probability that factor set \(C= \{c_1, \ldots , c_k\}\) is necessary for y is given by:

$$\begin{aligned} PN(C, y) := P(\sum _{i=1}^k c_i(\varvec{z}) \ge 1~|~f(\varvec{z}) = y). \end{aligned}$$

Our definitions cast sufficiency and necessity as converse probabilities. We argue that this has major advantages over the more familiar inverse formulation, which has been dominant since Tian and Pearl (2000)’s influential paper, further developed and popularized in several subsequent publications (Halpern, 2016; Halpern & Pearl, 2005b; Pearl, 2009). To see why, observe that our notions of sufficiency and necessity can be likened to the “precision” (positive predictive value) and “recall” (true positive rate) of a hypothetical classifier that predicts whether \(f(\varvec{z}) = y\) based on whether \(c(\varvec{z}) = 1.\) By examining the confusion matrix of this classifier, one can define other related quantities, such as the true negative rate \(P(c(\varvec{z}) = 0|\;f(\varvec{z}) \ne y)\) and the negative predictive value \(P(f(\varvec{z}) \ne y~|~c(\varvec{z}) = 0),\) which are contrapositive transformations of our proposed measures (see Table 1). We can recover these values exactly via \(PN(1 - c, 1 - y)\) and \(PS(1 - c, 1 - y),\) respectively. When necessity and sufficiency are defined as probabilistic inversions (rather than conversions), such transformations are impossible. This is a major shortcoming given the explanatory significance of all four quantities, which correspond to probabilistic variants of the classical logical formulae for necessity and sufficiency. Definitions that can describe only two are fundamentally impoverished, bound to miss half the picture.

Table 1 Confusion matrix of labels (rows) and factors (columns), with accompanying definitions of the four fundamental explanatory probabilities

Pearl (2009) motivates the inverse formulation by interpreting his probabilities of causation as the tendency for an effect to respond to its cause in both ways—turning off in its absence, and turning on in its presence. As we show in the next section, these are better understood as two different sorts of sufficiency, i.e. the sufficiency of x for y and the sufficiency of \(\lnot x\) for \(\lnot y\) (see Proposition 4 for an exact statement of the correspondence). Our definition of necessity starts from a different intuition. We regard an explanatory factor as necessary to the extent that it covers all possible pathways to a given outcome. This immediately suggests our converse formulation, where we condition on the prediction itself—the “effect” in a causal scenario—and observe how often the factor in question is satisfied. Large values of PN(cy) suggest that there are few alternative routes to y except through c, which we argue is the essence of a necessary explanation.

In many cases, differences between inverse and converse notions of necessity will be negligible. Indeed, the two are strictly equivalent when classes are perfectly balanced (i.e., when \(P(c(\varvec{z}) = 1) = P(f(\varvec{z}) = y) = 0.5\)), or when the relationship between a factor and an outcome is deterministic (in which case we are back in the logical setting). More generally, the identity is obtained whenever \(q_{11} = q_{00},\) to use the labels from Table 1. However, the greater the difference between these values, the more these two ratios diverge. Consider Anne’s loan application. Say she wants to evaluate the necessity of college education for loan approval, so defines a factor that indicates whether applicants attained a bachelor’s degree (BA). She samples some 100 individuals, with data summarized in Table 2. Observing that successful applicants are twice as likely to have no BA as they are to have one, we judge college education to be largely unnecessary for loan approval. Specifically, we have that \(P(\text{``BA''}|\text{``Approved''}) = {1}/{3}.\) On an inverse notion of necessity, however, we get a very different result, with \(P(\text{``Denied''}|\text{``No BA''}) = {4}/{5}.\) This counterintuitive conclusion overestimates the necessity of education by a factor of 2.4. A more persuasive interpretation of this quantity is that lacking a BA is often sufficient for loan denial—an informative discovery, perhaps, but not an answer to the original question, which asked to what extent college education was necessary for loan approval.

Table 2 Toy example of a contingency table for Anne’s loan application

Pearl may plausibly object that this example is limited to observational data, and therefore uninformative with respect to causal mechanisms of interest. In fact, our critique is far more general. For illustration, imagine that Table 2 represents the results of a randomized control trial (RCT) in which applicants are uniformly assigned to the “BA” and “No BA” groups.Footnote 5 Though counterfactual probabilities remain unidentifiable even with access to experimental data, Tian and Pearl (2000) demonstrate how to bound their probabilities of causation with increasing tightness as we make stronger structural assumptions. However, we are unconvinced that counterfactuals are even required here—and not just because of lingering metaphysical worries about the meaning of unobservable quantities such as \(P(y_x, y_{x^{\prime}})\) (Dawid, 2000; Quine, 1960). Instead, we argue that the relevant probabilities for causal sufficiency and necessity are simpler. Using the notation of regime indicators (Correa & Bareinboim, 2020; Dawid, 2021), let \(P_{\sigma }\) denote the probability distribution resulting from the stochastic regime imposed by our RCT, i.e. a trial in which college education is randomly assigned to all applicants with probability \(1/2.\) Then our arguments from above go through just the same, with the context \(\mathcal{D}\) now given by \(P_{\sigma}.\)Footnote 6 We emphasize once again that we are perfectly capable of recovering Pearl’s counterfactual definitions in our framework—see Proposition 4 below—but reiterate that probabilistic conversions are preferable to inversions even in causal contexts.

These toy examples illustrate a more general point. The converse formulation of necessity and sufficiency is not just more expressive than the inverse alternative, but also aligns more closely with our intuition when class imbalance pulls the two apart. In the following section, we present an optimal procedure for computing these quantities on real-world datasets, unifying a variety of XAI methods in the process.

figure a

3.3 Minimal Sufficient Factors

We introduce Local Explanations via Necessity and Sufficiency (LENS), a procedure for computing explanatory factors with respect to a given basis \(\mathcal{B}\) and threshold parameter τ (see Algorithm 1). First, we calculate a factor’s probability of sufficiency (see probSuff) by drawing n samples from \(\mathcal{D}\) and taking the maximum likelihood estimate \(\hat{PS}(c, y).\) Next, we sort the space of factors w.r.t. \(\preceq\) in search of those that are τ-minimal.

Definition 4

(τ-minimality) We say that c is τ-minimal iff (i) \(PS(c, y) \ge \tau\) and (ii) there exists no factor \(c^{\prime}\) such that \(PS(c^{\prime}, y) \ge \tau\) and \(c^{\prime} \prec c.\)

Our next step is to span the τ-minimal factors and compute their cumulative PN (see probNec). Since no strictly preferable factor can match the sufficiency of a τ-minimal c, in reporting probability of necessity we expand \(C\) to its upward closure.

Theorems 1 and 2 state that this procedure is optimal in a sense that depends on whether we assume access to oracle or sample estimates of PS (see Appendix 1 for all proofs).

Theorem 1

With oracle estimates PS(cy) for all \(c \in \mathcal{C},\) Algorithm 1 is sound and complete. That is, for any \(C\) returned by Algorithm 1 and all \(c\in \mathcal{C},c\) is τ-minimal iff \(c\in C.\)

Population proportions may be obtained if the target function f is deterministic and data fully saturate the context \(\mathcal{D},\) a plausible prospect with categorical variables of low to moderate dimensionality. Otherwise, proportions will need to be estimated.

Theorem 2

With sample estimates \(\hat{PS}(c, y)\) for all \(c \in \mathcal{C},\) Algorithm 1 is uniformly most powerful. That is, Algorithm 1 identifies the most τ-minimal factors of any method with fixed type I error \(\alpha.\)

Multiple testing adjustments can easily be accommodated, in which case modified optimality criteria apply (Storey, 2007).

Figure 1 provides a visual example of LENS outputs for a hypothetical loan application. We compute the minimal subvectors most likely to preserve or alter a given prediction, as well as cumulative necessity scores for all subsets. We take it that the main quantity of interest in most applications is sufficiency, be it for the original or alternative outcome, and therefore define τ-minimality w.r.t. sufficient (rather than necessary) factors. However, necessity serves an important role in tuning τ, as there is an inherent trade-off between the parameters. More factors are excluded at higher values of τ, thereby inducing lower cumulative PN; more factors are included at lower values of τ, thereby inducing higher cumulative PN. As noted above, the resulting trade-off is similar to that of a precision-recall curve quantifying and qualifying errors in classification tasks (see Fig. 2). Different degrees of necessity may be warranted for different tasks, depending on how important it is to (approximately) exhaust all paths towards an outcome. Users can therefore adjust τ to accommodate desired levels of PN over successive calls to LENS.

Fig. 1
figure 1

A schematic overview of LENS outputs for an unsuccessful loan applicant. We describe minimal sufficient factors (here, sets of features) for a given input (top row), with the aim of preserving or flipping the original prediction. We report a sufficiency score for each set and a cumulative necessity score for all sets, indicating the proportion of paths towards the outcome that are covered by the explanation. Feature colors indicate source of feature values (input or reference)

Fig. 2
figure 2

An example curve visualizing the relationship between sufficiency and necessity from the German credit dataset (see Sect. 5). Setting τ amounts to thresholding the x-axis at a fixed point, with PN given by the corresponding y-coordinate of this curve

4 Encoding Existing Measures

Explanatory measures can be shown to play a central role in many seemingly unrelated XAI tools, albeit under different assumptions about the basis tuple \(\mathcal{B}.\) In this section, we relate our framework to a number of existing methods.

4.1 Feature Attributions

Several popular feature attribution algorithms are based on Shapley values (Shapley, 1953), originally proposed as a solution to the attribution problem in cooperative game theory, which asks how best to distribute the surplus generated by a coalition of players. Substituting features for players and predictions for surplus, researchers have repurposed the method’s combinatoric strategy for XAI to quantify the contribution of each input variable toward a given output. The goal is to decompose the predictions of any target function as a sum of weights over d features:

$$\begin{aligned} f(\varvec{x}_i) = \sum _{j=0}^{d}\phi _j(i), \end{aligned}$$

where \(\phi _0(i)\) represents a baseline expectation and \(\phi _j(i)\) the weight assigned to \(X_j\) at point \(\varvec{x}_i.\)Footnote 7 Let \(v: [n] \times 2^d \mapsto \mathbb{R}\) be a value function such that v(iS) is the payoff associated with feature subset \(S \subseteq [d]\) for sample i and \(v(i, \{\emptyset \}) = 0\) for all \(i \in [n].\) Define the complement \(R = [d] \backslash S\) such that we may rewrite any \(\varvec{x}_i\) as a pair of subvectors, \((\varvec{x}_i^S, \varvec{x}_i^R).\) Payoffs are given by:

$$\begin{aligned} v(i,S) = \mathop {\mathbb{E}}[f(\varvec{x}_i^S, \varvec{X}^R)], \end{aligned}$$

although this introduces some ambiguity regarding the reference distribution for \(\varvec{X}^R\) (more on this below). The Shapley value \(\phi _j(i)\) is then j’s average marginal contribution to all subsets that exclude it:

$$\begin{aligned} \phi _j(i) = \sum _{S \subseteq [d] \backslash \{j\}} \frac{|S|!(d - |S| - 1)}{d} v(i, S \cup \{j\}) - v(i, S). \end{aligned}$$

It can be shown that this is the unique solution to the attribution problem that satisfies certain desirable properties, including efficiency, linearity, sensitivity, and symmetry.

Reformulating this in our framework, we find that the value function v is a sufficiency measure. To see this, let each \(\varvec{z} \sim \mathcal{D}\) be a sample in which a random subset of variables S are held at their original values, while remaining features R are drawn from a fixed distribution \(\mathcal{D}(\cdot |S).\)Footnote 8

Proposition 1

Let \(c_S(\varvec{z}) = 1\) iff \(\varvec{x} \subseteq \varvec{z}\) was constructed by holding \(\varvec{x}^S_i\) fixed and sampling \(\varvec{X}^R\) according to \(\mathcal{D}(\cdot |S).\) Then \(v(i, S) = PS(c_S, y).\)

Thus, the Shapley value \(\phi _j(i)\) measures \(X_j\)’s average marginal increase to the sufficiency of a random feature subset. The advantage of our method is that, by focusing on particular subsets instead of weighting them all equally, we disregard irrelevant permutations and home in on just those that meet a τ-minimality criterion. Kumar et al. (2020) observe that, “since there is no standard procedure for converting Shapley values into a statement about a model’s behavior, developers rely on their own mental model of what the values represent” (p. 8). By contrast, necessary and sufficient factors are more transparent and informative, offering a direct path to what Shapley values indirectly summarize.

4.2 Rule Lists

Rule lists are sequences of if-then statements that describe hyperrectangles in feature space, creating partitions that can be visualized as decision or regression trees. Rule lists have long been popular in XAI. While early work in this area tended to focus on global methods (Friedman & Popescu, 2008; Letham et al., 2015), more recent efforts have prioritized local explanation tasks (Lakkaraju et al., 2019; Sokol & Flach, 2020).

We focus in particular on the Anchors algorithm (Ribeiro et al., 2018a), which learns a set of Boolean conditions A (the eponymous “anchors”) such that \(A(\varvec{x}_i) = 1\) and

$$\begin{aligned} P_{\mathcal{D}_{(\varvec{x}|A)}}(f(\varvec{x}_i) = f(\varvec{x})) \ge \tau . \end{aligned}$$

The lhs of Eq. 4 is termed the precision, prec(A), and probability is taken over a synthetic distribution in which the conditions in A hold while other features are perturbed. Once τ is fixed, the goal is to maximize coverage, formally defined as \(\mathbb{E}[A(\varvec{x}) = 1],\) i.e. the proportion of datapoints to which the anchor applies.

The formal similarities between Eq. 4 and Definition 2 are immediately apparent, and the authors themselves acknowledge that Anchors are intended to provide “sufficient conditions” for model predictions.

Proposition 2

Let \(c_A(\varvec{z}) = 1\) iff \(A(\varvec{x}) = 1.\) Then \(\text{prec}(A) = PS(c_A, y).\)

While Anchors output just a single explanation, our method generates a ranked list of candidates, thereby offering a more comprehensive view of model behavior. Moreover, our necessity measure adds a mode of explanatory information entirely lacking in Anchors. Finally, by exhaustively searching over a space of candidate factors rather than engineering auxiliary variables on the fly, our method is certifiably sound and complete, whereas Anchors are at best probably approximately correct (i.e., satisfy a PAC bound).

4.3 Counterfactuals

Counterfactual explanations are rooted in the seminal work of Lewis (1973a, 1973b), who famously argued that a causal account of an event x should appeal to the nearest possible world in which \(\lnot x.\) In XAI, this is accomplished by identifying one or several nearest neighbors with different outcomes, e.g. all datapoints \(\varvec{x}\) within an \(\epsilon\)-ball of \(\varvec{x}_i\) such that labels \(f(\varvec{x})\) and \(f(\varvec{x}_i)\) differ (for classification) or \(f(\varvec{x}) > f(\varvec{x}_i) + \delta\) (for regression).Footnote 9 The optimization problem is:

$$\begin{aligned} \varvec{x}^{\ast} = \mathop {{\mathrm{argmin}}}\limits _{\varvec{x}\in \text{CF}(\varvec{x}_i)} ~cost(\varvec{x}_i, \varvec{x}), \end{aligned}$$

where \(\text{CF}(\varvec{x}_i)\) denotes a counterfactual space such that \(f(\varvec{x}_i) \ne f(\varvec{x})\) and cost is a user-supplied cost function, typically equated with some distance measure. Wachter et al. (2018) recommend using generative adversarial networks to solve Eq. 5, while others have proposed alternatives designed to ensure that counterfactuals are coherent and actionable (Karimi et al., 2020a; Ustun et al., 2019; Wexler et al., 2020). As with Shapley values, the variation in these proposals is reducible to the choice of context \(\mathcal{D}.\)

For counterfactuals, we rewrite the objective as a search for minimal perturbations sufficient to flip an outcome. We interpret the cost function as encoding agentive preferences by representing the partial ordering on factors. This can be guaranteed under some constraints on \(\preceq\); see Steele and Stefánsson (2020) for an overview of representation theorems in decision theory.

Proposition 3

Let cost be a function representing \(\preceq ,\) and let c be some factor spanning reference values. Then the counterfactual recourse objective is:

$$\begin{aligned} c^{\ast} = \mathop {{\mathrm{argmin}}}\limits _{c \in \mathcal{C}} \;cost(c)\;\text{s.t.}\;PS(c, 1 - y) \ge \tau , \end{aligned}$$

where τ denotes a decision threshold. Counterfactual outputs will then be any \(\varvec{z} \sim \mathcal{D}\) such that \(c^{\ast}(\varvec{z}) = 1.\)

4.4 Probabilities of Causation

Our framework can describe Pearl (2009)’s aforementioned probabilities of causation, however in this case \(\mathcal{D}\) must be constructed with care.

Proposition 4

Consider the bivariate Boolean setting, as in Sect. 2. We have two counterfactual distributions: an input space \(\mathcal{I},\) in which we observe \(X=1, Y=1\) but intervene to set \(X = 0\); and a reference space \(\mathcal{R},\) in which we observe \(X=0, Y=0\) but intervene to set \(X = 1.\) Let \(\mathcal{D}\) denote a uniform mixture over both spaces, and let auxiliary variable W tag each sample with a label indicating whether it comes from the input (\(W = 0\)) or reference (\(W = 1\)) distribution. Define \(c(\varvec{z}) = w.\) Then we have \(\texttt {suf}(x, y) = PS(c, y)\) and \(\texttt {nec}(x, y) = PS(1 - c, 1-y).\)

In other words, we regard Pearl’s notion of necessity as sufficiency of the negated factor for the alternative outcome. By contrast, Pearl (2009) has no analogue for our probability of necessity. This is true of any measure that defines necessity and sufficiency via inverse, rather than converse probabilities. While conditioning on the same variable(s) for both measures may have some intuitive appeal, especially in the causal setting, it comes at a substantial cost to expressive power. Whereas our framework can recover all four fundamental explanatory measures, corresponding to the classical definitions and their contrapositive forms, definitions that merely negate instead of transpose the antecedent and consequent are limited to just two.

Remark 1

We have assumed that factors and outcomes are Boolean throughout. Our results can be extended to continuous versions of either or both variables, so long as This conditional independence holds whenever which is true by construction since \(f(\varvec{z}) := f(\varvec{x}).\) However, we defend the Boolean assumption on the grounds that it is well motivated by contrastivist epistemologies (Blaauw, 2013; Kahneman & Miller, 1986; Lipton, 1990) and not especially restrictive, given that partitions of arbitrary complexity may be defined over \(\varvec{Z}\) and Y.

5 Experiments

In this section, we demonstrate the use of LENS on a variety of tasks and compare results with popular XAI tools, using the basis configurations detailed in Table 3. A comprehensive discussion of experimental design, including datasets and pre-processing pipelines, is left to Appendix 2. Code for reproducing all results is available at https://github.com/limorigu/LENS.

Table 3 Overview of experimental settings by basis configuration

5.1 Contexts

We consider a range of contexts \(\mathcal{D}\) in our experiments. For the input-to-reference (I2R) setting, we replace input values with reference values for feature subsets S; for the reference-to-input (R2I) setting, we replace reference values with input values. We use R2I for examining the sufficiency/necessity of the original model prediction, and I2R for examining the sufficiency/necessity of a contrastive model prediction. We sample from the empirical data in all experiments, except in Sect. 5.6.3, where we assume access to a structural causal model (SCM).

5.2 Partial Orderings

We consider two types of partial orderings in our experiments. The first, \(\preceq _{subset},\) evaluates subset relationships. For instance, if \(c(\varvec{z}) = \mathbbm {1}[\varvec{x}[\mathsf{gender} = \text{``female''}]]\) and \(c^{\prime}(\varvec{z}) = \mathbbm {1}[\varvec{x}[\mathsf{gender} = \text{``female''} \wedge \mathsf{age} \ge \mathsf{40}]],\) then we say that \(c \preceq _{subset} c^{\prime}.\) The second, \(c \preceq _{cost} c^{\prime} := c \preceq _{subset} c^{\prime} ~\wedge ~ cost(c) \le cost(c^{\prime}),\) adds the additional constraint that c has cost no greater than \(c^{\prime}.\) The cost function could be arbitrary. Here, we consider distance measures over either the entire state space or just the intervention targets corresponding to c.

5.3 Feature Attributions

Feature attributions are often used to identify the top-k most important features for a given model outcome (Barocas et al., 2020). However, we argue that these feature sets may not be explanatory with respect to a given prediction. To show this, we compute R2I and I2R sufficiency—i.e., PS(cy) and \(PS(1 - c, 1 - y),\) respectively—for the top-k most influential features (\(k \in [9]\)) as identified by SHAP (Lundberg & Lee, 2017) and LENS. Fig. 3 shows results from the R2I setting for German credit (Dua & Graff, 2017) and SpamAssassin datasets (SpamAssassin, 2006). Our method attains higher PS for all cardinalities, indicating that our ranking procedure delivers more informative explanations than SHAP at any fixed degree of sparsity. Results from the I2R setting can be found in Appendix 2.

Fig. 3
figure 3

Comparison of top k features ranked by SHAP against the best performing LENS subset of size k in terms of PS(cy). German results are over 50 inputs; SpamAssassins results are over 25 inputs. Shaded regions indicate 95% confidence intervals

5.4 Rule Lists

5.4.1 Sentiment Sensitivity Analysis

Next, we use LENS to study model weaknesses by considering minimal factors with high R2I and I2R sufficiency in text models. Our goal is to answer questions of the form, “What are words with/without which our model would output the original/opposite prediction for an input sentence?” For this experiment, we train an LSTM network on the IMDB dataset for sentiment analysis (Maas et al., 2011). If the model mislabels a sample, we investigate further; if it does not, we inspect the most explanatory factors to learn more about model behavior. For the purpose of this example, we only inspect sentences of length 10 or shorter. We provide two examples below and compare with Anchors (see Table 4).

Consider our first example: read book forget movie is a sentence we would expect to receive a negative prediction, but our model classifies it as positive. Since we are investigating a positive prediction, our reference space is conditioned on a negative label. For this model, the classic unk token receives a positive prediction. Thus we opt for an alternative, plate. Performing interventions on all possible combinations of words with our token, we find the conjunction of read, forget, and movie is a sufficient factor for a positive prediction (R2I). We also find that changing any of read, forget, or movie to plate would result in a negative prediction (I2R). Anchors, on the other hand, perturbs the data stochastically (see Appendix 2), suggesting the conjunction read AND book. Next, we investigate the sentence: you better choose paul verhoeven even watched. Since the label here is negative, we use the unk token. We find that this prediction is brittle—a change of almost any word would be sufficient to flip the outcome. Anchors, on the other hand, reports a conjunction including most words in the sentence. Taking the R2I view, we still find a more concise explanation: choose or even would be enough to attain a negative prediction. These brief examples illustrate how LENS may be used to find brittle predictions across samples, search for similarities between errors, or test for model reliance on sensitive attributes (e.g., gender pronouns).

Table 4 Example prediction given by an LSTM model trained on the IMDB dataset

5.5 Anchors Comparison

Anchors also includes a tabular variant, against which we compare LENS’s performance in terms of R2I sufficiency. We present the results of this comparison in Fig. 4, and include additional comparisons in Appendix 2. We sample 100 inputs from the German dataset, and query both methods with \(\tau =0.9\) using the classifier from Sect. 5.3. Anchors satisfy a PAC bound controlled by parameter \(\delta .\) At the default value \(\delta =0.1,\) Anchors fail to meet the τ threshold on 14% of samples; LENS meets it on 100% of samples. This result accords with Theorem 1, and vividly demonstrates the benefits of our optimality guarantee. Note that we also go beyond Anchors in providing multiple explanations instead of just a single output, as well as a cumulative probability measure with no analogue in their algorithm.

Fig. 4
figure 4

We compare PS(cy) against precision scores attained by the output of LENS and Anchors for examples from German. We repeat the experiment for 100 inputs, and each time consider the single example generated by Anchors against the mean PS(cy) among LENS’s candidates. Dotted line indicates \(\tau =0.9\)

5.6 Counterfactuals

5.6.1 Adversarial Examples: Spam Emails

R2I sufficiency answers questions of the form, “What would be sufficient for the model to predict y?”. This is particularly valuable in cases with unfavorable outcomes \(y^{\prime}.\) Inspired by adversarial interpretability approaches (Lakkaraju & Bastani, 2020; Ribeiro et al., 2018b), we train an MLP classifier on the SpamAssassins dataset and search for minimal factors sufficient to relabel a sample of spam emails as non-spam. Our examples follow some patterns common to spam emails: received from unusual email addresses, includes suspicious keywords such as enlargement or advertisement in the subject line, etc. We identify minimal changes that will flip labels to non-spam with high probability. Options include altering the incoming email address to more common domains, and changing the subject or first sentences (see Table 5). These results can improve understanding of both a model’s behavior and a dataset’s properties.

Table 5 (Top) A selection of emails from SpamAssassins, correctly identified as spam by an MLP. The goal is to find minimal perturbations that result in non-spam predictions. (Bottom) Minimal subsets of feature-value assignments that achieve non-spam predictions with respect to the emails above

5.6.2 Diverse Counterfactuals

Our explanatory measures can also be used to secure algorithmic recourse. For this experiment, we benchmark against DiCE (Mothilal et al., 2020), which aims to provide diverse recourse options for any underlying prediction model. We illustrate the differences between our respective approaches on the Adult dataset (Kochavi & Becker, 1996), using an MLP and following the procedure from the original DiCE paper.

According to DiCE, a diverse set of counterfactuals is one that differs in values assigned to features, and can thus produce a counterfactual set that includes different interventions on the same variables (e.g., CF1: \(\mathsf{age} = 91, \mathsf{occupation}=\text{``retired''}\); CF2: \(\mathsf{age} = 44, \mathsf{occupation}=\text{``teacher''}\)). Instead, we look at diversity of counterfactuals in terms of intervention targets, i.e. features changed (in this case, from input to reference values) and their effects. We present minimal cost interventions that would lead to recourse for each feature set but we summarize the set of paths to recourse via subsets of features changed. Thus, DiCE provides answers of the form “Because you are not 91 and retired” or “Because you are not 44 and a teacher”; we answer “Because of your age and occupation”, and present the lowest cost intervention on these features sufficient to flip the prediction.

With this intuition in mind, we compare outputs given by DiCE and LENS for various inputs. For simplicity, we let all features vary independently. We consider two metrics for comparison: (a) the mean cost of proposed factors, and (b) the number of minimally valid candidates proposed, where a factor c from a method M is minimally valid iff for all \(c^{\prime}\) proposed by \(M^{\prime},\) \(c \succeq _{cost} c^{\prime}\) (i.e., \(M^{\prime}\) does not report a factor preferable to c). We report results based on 50 randomly sampled inputs from the Adult dataset, where references are fixed by conditioning on the opposite prediction. The cost comparison results are shown in Fig. 5, where we find that LENS identifies lower cost factors for the vast majority of inputs. Furthermore, DiCE finds no minimally valid candidates that LENS did not already account for. Thus LENS emphasizes minimality and diversity of intervention targets, while still identifying low cost intervention values.

Fig. 5
figure 5

A comparison of mean cost of outputs by LENS and DiCE for 50 inputs sampled from the Adult dataset

5.6.3 Causal vs. Non-causal Recourse

When a user relies on XAI methods to plan interventions on real-world systems, causal relationships between predictors cannot be ignored. In the following example, we consider the DAG in Fig. 6, intended to represent dependencies in the German credit dataset. For illustrative purposes, we assume access to the structural equations of this data generating process. [There are various ways to extend our approach using only partial causal knowledge as input (Heskes et al., 2020; Karimi et al., 2020b).] We construct D by sampling from the SCM under a series of different possible interventions. Table 6 describes an example of how using our framework with augmented causal knowledge can lead to different recourse options. Computing explanations under the assumption of feature independence results in factors that span a large part of the DAG depicted in Fig. 6. However, encoding structural relationships in \(\mathcal{D},\) we find that LENS assigns high explanatory value to nodes that appear early in the topological ordering. This is because intervening on a single root factor may result in various downstream changes once effects are fully propagated.

Fig. 6
figure 6

Example DAG for German dataset

Table 6 Recourse example comparing causal and non-causal (i.e., feature independent) \(\mathcal{D}.\)

6 Discussion

Our results, both theoretical and empirical, rely on access to the true context \(\mathcal{D}\) and the complete enumeration of all relevant feature subsets. Neither may be feasible in practice. When elements of \(\varvec{Z}\) are based on assumptions about structural dependencies or estimated from data via some statistical model, errors could lead to suboptimal explanations. For high-dimensional settings such as image classification, LENS cannot be naïvely applied without substantial data pre-processing. The first issue is extremely general. No method is immune to model misspecification, and attempts to recreate a data generating process must always be handled with care. Empirical sampling, which we rely on above, is a reasonable choice when data are fairly abundant and representative. However, generative models may be necessary to correct for known biases or sample from low-density regions of the feature space. This comes with a host of challenges that no XAI algorithm alone can easily resolve.

The second issue, regarding the difficulty of the optimal subset selection procedure, is somewhat subtler. First, we observe that the problem is only NP-hard in the worst case. Partial orderings may vastly reduce the complexity of the task by, for instance, encoding a preference for greedy feature selection, or pruning the search space through branch and bound techniques, as our \(\preceq _{subset}\) ordering does above. Thus agents with appropriate utility functions can always ensure efficient computation. Second, we emphasize that complex explanations citing many contributing factors pose cognitive as well as computational challenges. In an influential review of XAI, Miller (2019) finds near unanimous consensus among philosophers and social scientists that, “all things being equal, simpler explanations—those that cite fewer causes... are better explanations” (p. 25). Even if we could efficiently compute all τ-minimal factors for some large value of d, it is not clear that such explanations would be helpful to humans, who famously struggle to hold more than seven objects in short-term memory at any given time (Miller, 1955). That is why many popular XAI tools include some sparsity constraint to encourage simpler outputs.

Rather than throw out some or most of our low-level features, we prefer to consider a higher level of abstraction (Floridi, 2008), where explanations are more meaningful to end users. For instance, in our SpamAssassins experiments, we started with a pure text example, which can be represented via high-dimensional vectors (e.g., word embeddings). However, we represent the data with just a few intelligible components: From and To email addresses, Subject, etc. In other words, we create a more abstract object and consider each segment as a potential intervention target, i.e. a candidate factor. This effectively compresses a high-dimensional dataset into a 10-dimensional abstraction. Similar strategies could be used in many cases, either through domain knowledge (Hilgard et al., 2021; Kim et al., 2018; Koh et al., 2020) or data-driven clustering and dimensionality reduction techniques (Beckers et al., 2019; Chalupka et al., 2017; Kinney & Watson, 2020; Locatello et al., 2019). In general, if data cannot be represented by a reasonably low-dimensional, intelligible abstraction, then post-hoc XAI methods are unlikely to be of much help.

An anonymous reviewer raised concerns about the factor set \(\mathcal{C},\) which is generally unconstrained in our formulation, and therefore may lead to explanations that are “not sensible”. First, we note that unexplanatory factors should receive low probabilities of necessity and sufficiency, and therefore pose no serious problems in practice. Second, we observe that XAI practitioners generally query models with some hypotheses already in mind. For instance, Anne may want to know if her loan was denied due to her savings, her education, or her race. Perhaps none of these variables explains her unfavorable outcome, which would itself be informative. Her effort to understand the bank’s credit risk model may well be circuitous, iterative, and occasionally less than fully sensible. Yet we strongly object to the notion that we could somehow automate the procedure of selecting the “right” factors in a subject-neutral, agent-independent manner. We consider it a feature, not a bug, that LENS requires some user engagement to better understand model predictions. XAI is a tool, not a panacea.

7 Conclusion

We have presented a unified framework for XAI that foregrounds necessity and sufficiency, which we argue are the building blocks of all successful explanations. We defined simple measures of both, and showed how they undergird various XAI methods. Our formulation, which relies on converse rather than inverse probabilities, is uniquely expressive. It covers all four fundamental explanatory measures—i.e., the classical definitions and their contrapositive transformations—and unambiguously accommodates logical, probabilistic, and/or causal interpretations, depending on how one constructs the basis tuple \(\mathcal{B}.\) We argued that alternative formulations which rely on probabilistic inversion are better understood as alternative sufficiency measures. We illustrated illuminating connections between our framework and existing proposals in XAI, as well as Pearl (2009)’s probabilities of causation. We introduced a sound and complete algorithm for identifying minimally sufficient factors—LENS—and demonstrated its performance on a range of tasks and datasets. The approach is flexible and pragmatic, accommodating background knowledge and explanatory preferences as input. Though LENS prioritizes completeness over efficiency, the method may provide both for agents with certain utility functions. Future research will explore more scalable approximations and model-specific variants. User studies will guide the development of heuristic defaults and a graphical interface.