1 Introduction

Consider Fig. 1 which depicts how most machine learning systems are constructed today. We have a labeled dataset that is used to learn a classifier, which is commonly a neural network, a Bayesian network or a random forest. These classifiers are effectively functions that map instances to decisions. For example, an instance could be a loan application and the decision is whether to approve or decline the loan. There is now considerable interest in reasoning about the behavior of such systems. Explaining decisions is at the forefront of current interests: Why did you decline Maya’s application? Quantifying the robustness of these decisions is also attracting a lot of attention: Would reversing the decision on Maya require many changes to her application? In some domains, one expects the learned classifiers to satisfy certain properties, like monotonicity, and there is again an interest in proving such properties formally. For example, can we guarantee that a loan applicant will be approved when the only difference they have with another approved applicant is their higher income? These interests, however, are challenged by the numeric nature of machine learning classifiers and the fact that these systems are often model-free, e.g., neural networks, so they appear as black boxes that are hard to analyze.

Even though these machine learning classifiers are learned from data and are numeric in nature, they often implement discrete decision functions. One can therefore extract these functions and represent them symbolically. The outcome of this process is normally a logical formula or a Boolean circuit that precisely captures the input-output behavior of the learned classifier, which can then be used to reason about its behavior, symbolically. This includes explaining decisions, measuring robustness and formally proving properties. For a concrete example, consider Fig. 2 which depicts one of the simplest machine learning systems: a Naive Bayes classifier. We have a class variable \(P\) and three features \(B\), \(U\) and \(S\). Given an instance (patient) and their test results \(b\), \(u\) and \(s\), this classifier renders a decision by computing the posterior probability \(\textit{Pr}(p\vert b,u,s)\) and then checking whether it passes a given threshold \(T\). If it does, we declare a positive decision; otherwise, a negative decision. While this classifier is numeric and its decisions are based on probabilistic reasoning, it does induce a discrete decision function. In fact, the function is Boolean in this case as it maps the Boolean variables \(B\), \(U\) and \(S\), which correspond to test results, into a binary decision (yes or no). This observation was originally made in Chan and Darwiche (2003), which proposed the compilation of Naïve Bayes classifiers into symbolic decision graphs as shown in Fig. 2. The compilation process guarantees that for every instance, the decision made by the (probabilistic) Naïve Bayes classifier is identical to the one made by the (symbolic) decision graph. This compilation algorithm was recently extended to Bayesian network classifiers with tree structures (Shih et al. 2018) and later to Bayesian network classifiers with arbitrary structures (Shih et al. 2019a).Footnote 1 Certain classes of neural networks can also be compiled into, or reasoned about using, decision diagrams as shown in Shih et al. (2019b), Shi et al. (2020). While Bayesian and neural networks are numeric in nature, random forests are not (at least the ones with majority voting). Hence, one can easily encode their input-output behavior using Boolean formulas. Since a random forest is an ensemble of decision trees, we first encode each decision tree into a Boolean formula. This is straightforward even in the presence of continuous variables as the learning algorithm discretizes variables by identifying a set of thresholds for each variable.Footnote 2 We then combine these formulas using a majority formula or circuit; see, e.g., (Audemard et al. 2020; Choi et al. 2020).

Fig. 1
figure 1

Reasoning about machine learning classifiers

Fig. 2
figure 2

Compiling a Naïve Bayes classifier into a symbolic decision graph. To classify an instance using the decision graph, we start at the root node and repeat the following. If the feature we are at is positive, we follow the left edge, otherwise we follow the left edge. We finally reach a leaf node, which determines the class of the given instance. The figure shows the path followed from the root to a leaf (no) for the instance \({(\!\!=\!B},{ -ve})\), \({(\!\!=\!U},{ +ve})\) and \({(\!\!=\!S},{ -ve})\)

This methodology for reasoning about the behavior of machine learning classifiers has three dimension: (1) the kind of machine learning classifiers are we reasoning about, (2) the symbolic representation we use to encode their input-output behavior, and (3) the class of queries we are interested in and how to compute them efficiently. We will not concern ourselves with the first dimension in this paper as we will assume that the input-output behavior has already been encoded symbolically. Hence, our discussion will be orthogonal to where the symbolic representation came from. As to the second dimension, one can encode input-output behavior using standard logical formulas, which is the approach we shall pursue. While logical formulas are sufficient for our treatment as far as semantics is involved, we will use a particular class of logical representations for computational reasons: tractable Boolean circuits (Darwiche and Marquis 2002).Footnote 3 As for the third dimension, we will focus on developing a theory for reasoning about the decisions made by classifiers: What are the reasons behind them? How can they counterfactually change? And are they biased?

In the proposed theory, a classifier is a Boolean function. Its variables are called features, a particular input is called an instance, and the function output on some instance is called a decision. If the function outputs \(1\) on an instance, the instance and decision are said to be positive; otherwise, they are negative. Our main goal is to explain the decisions made by Boolean classifiers on specific instances by way of providing various insights into what caused these decisions. For some examples, consider Fig. 3 which depicts two classifiers (\({{\mathscr {C}}}_1\) and \({{\mathscr {C}}}_2\)) for college admission, represented as Ordered Binary Decision Diagrams (OBDDs) (Bryant 1986) (in which variables are binary and ordered similarly on any path from the root to a leaf).Footnote 4 Consider also Susan who passed the entrance exam, is a first-time applicant, has no work experience and a high GPA. Susan will be admitted by classifier \({{\mathscr {C}}}_1\). She also comes from a rich hometown and will be admitted by classifier \({{\mathscr {C}}}_2\). We can say that Susan was admitted by classifier \({{\mathscr {C}}}_1\) because she passed the entrance exam and has a high GPA. We can also say that one reason why classifier \({{\mathscr {C}}}_2\) admitted Susan is that she passed the entrance exam and has a high GPA (there are other reasons in this case). Moreover, we can say that classifier \({{\mathscr {C}}}_2\) would still admit Susan even if she did not have a high GPA because she passed the entrance exam and comes from a rich hometown. Finally, we can say that classifier \({{\mathscr {C}}}_2\) can make biased decisions: ones that are based on protected features. For example, it will make different decisions on two applicants who have the same characteristics except that one comes from a rich hometown and the other does not. We will also show that one can sometimes prove classifier bias by inspecting the reasons behind one of its unbiased decisions. We will give formal definitions and semantics for the statements exemplified above and show how to evaluate them algorithmically and efficiently. As far as semantics, the main tool we will employ is Boolean logic and particularly the classical notion of prime implicants (Crama and Hammer 2011; Quine 1952; McCluskey 1956; Quine 1959). On the computational side, we will exploit tractable Boolean circuits as mentioned earlier (Darwiche and Marquis 2002), while providing some new fundamental results that further extend the reach of these circuits to computing explanations. At the core of our theory is the notion of complete reason behind a decision, which can be viewed as a necessary and sufficient condition for why the decision was made. Most of what we shall discuss will be based on complete reasons, both semantically and computationally.

This paper is structured as follows. We start in Sect. 2 by reviewing some Boolean logic preliminaries including prime implicants. We then introduce the notion of complete reason and related notions such as sufficient reasons and necessary characteristics in Sects. 35. Counterfactual statements about decisions are discussed in Sect. 6, followed by a discussion of decision and classifier bias in Sect. 7. We dedicate Sect. 8 to algorithms that compute the introduced notions and then illustrate them in Sect. 9 using a case study. We finally close with some concluding remarks in Sect. 10.

Fig. 3
figure 3

Two OBDD classifiers: \({{\mathscr {C}}}_1\) (left) and \({{\mathscr {C}}}_2\) (right). Solid edges represent true values of a variable. Dotted edges represent false values

2 Boolean Logic Preliminaries

A literal is a Boolean variable \(X\) (positive literal) or its negation \(\lnot X\) (negative literal). A term is a consistent conjunction of literals (e.g., \(A \wedge \lnot B \wedge C\)). A Disjunctive Normal Form (CNF) is a disjunction of terms (e.g., \((A \wedge \lnot B) \vee (B \wedge C) \vee (\lnot A \wedge C \wedge D)\)). An instance is a term that includes precisely one literal for each Boolean variable. Term \(\tau _i\) subsumes term \(\tau _j\) iff \(\tau _j \models \tau _i\), where \(\models \) denotes logical entailment. For example, term \(E \wedge \lnot F\) subsumes term \(E \wedge \lnot F \wedge G\). We treat a term as the set of its literals so we may write \(\tau _i \subseteq \tau _j\) to also mean that \(\tau _i\) subsumes \(\tau _j\). We will often refer to a literal as a characteristic and to a term \(\tau \) as a property (of an instance that contains the term). We use \({\overline{\tau }}\) to denote the property resulting from negating every characteristic in property \(\tau \). We sometimes use a comma (\(,\)) instead of a conjunction (\(\wedge \)) when describing properties and instances (e.g., \(E,\lnot F\) instead of \(E \wedge \lnot F\)).

We represent a classifier by a Boolean formula \(\varDelta \) whose models (i.e., satisfying assignments) correspond to positive instances. The negation of the formula characterizes negative instances. Classifiers \({{\mathscr {C}}}_1\) and \({{\mathscr {C}}}_2\) in Fig. 3 are represented by the following formulas:

$$\begin{aligned} \varDelta _1= & {} E \wedge (\lnot F \vee G \vee W) \\ \varDelta _2= & {} E \wedge (\lnot F \vee G \vee W \vee R) \end{aligned}$$

We use \(\varDelta (\alpha )\) to denote the decision (\(0\) or \(1\)) of classifier \(\varDelta \) on instance \(\alpha \). That is, \(\varDelta (\alpha ) =1\) iff \(\alpha \models \varDelta \) and \(\varDelta (\alpha ) =0\) iff \(\alpha \models \lnot \varDelta \). We also define \(\varDelta _\alpha = \varDelta \) if the decision is positive and \(\varDelta _\alpha = \lnot \varDelta \) if the decision is negative. This notation is critical and will be used frequently later. By definition, for any two instances \(\alpha \) and \(\beta \), we have \(\varDelta _\alpha = \varDelta _\beta \) iff \(\alpha \models \varDelta _\alpha \) and \(\varDelta (\alpha ) = \varDelta (\beta )\). Again, we use this observation frequently later.

An implicant \(\tau \) of Boolean formula \(\varDelta \) is a term that satisfies \(\varDelta \), \(\tau \models \varDelta \). A prime implicant is an implicant that is not subsumed by any other implicant. For example, \(E \wedge \lnot F \wedge G\) is an implicant of \(\varDelta _1\) but is not prime since it is subsumed by another implicant \(E \wedge \lnot F\), which happens to be prime. Classifier \({{\mathscr {C}}}_1\) has the following prime implicants:

$$\begin{aligned} \varDelta _{1}&:&(E \wedge \lnot F)\,\, (E \wedge G)\,\, (E \wedge W) \\ \lnot \varDelta _{1}&:&(\lnot E) \,\, (F \wedge \lnot G \wedge \lnot W) \end{aligned}$$

Classifier \({{\mathscr {C}}}_2\) has the following prime implicants:

$$\begin{aligned} \varDelta _{2}&:&(E \wedge \lnot F)\,\, (E \wedge G)\,\, (E \wedge W)\,\, (E \wedge R) \\ \lnot \varDelta _{2}&:&(\lnot E) \,\, (F \wedge \lnot G \wedge \lnot W \wedge \lnot R) \end{aligned}$$

The set of prime implicants for a Boolean formula can be quite large, which motivated the notion of a prime implicant cover (Quine 1952; McCluskey 1956; Quine 1959). A set of terms \(\tau _1, \ldots , \tau _n\) is prime implicant cover for Boolean formula \(\varDelta \) if each term \(\tau _i\) is a prime implicant of \(\varDelta \) and \(\tau _1 \vee \ldots \vee \tau _n\) is equivalent to \(\varDelta \). A cover may not include all prime implicants, with the missing ones called redundant. While covers can be useful computationally, they may not always be appropriate for explaining classifiers as they may lead to incomplete explanations (more on this later).

We will make use of the conditioning operation on Boolean formulas. To condition formula \(\varDelta \) on term \(\tau \), denoted \(\varDelta \vert \tau \), is to replace every literal \(l\) in \(\varDelta \) with constant \(1\) (true) if \(l \in \tau \), and to replace it with constant \(0\) (false) if \(\lnot l \in \tau \). For example, if \(\alpha = (A \vee \lnot B) \wedge (C \vee D)\) and \(\tau = B, \lnot C\), then \(\alpha \vert \tau = (A \vee \lnot 1) \wedge (0 \vee D) = A \wedge D\). We will also use the existential quantification operation: \(\exists X \varDelta = (\varDelta \vert X) \vee (\varDelta \vert \lnot X)\).

In the next few sections, we introduce notions such as the sufficient and complete reasons behind a decision. We use these notions later to define decision and classifier bias in addition to giving semantics to counterfactual statements relating to decisions.

3 Sufficient Reasons

Prime implicants have been studied and utilized extensively in the AI and computer science literature.Footnote 5 However, their active utilization in explaining decisions is more recent, e.g., (Shih et al. 2018; Ignatiev et al. 2019a, b; Lindner and Möllney 2019). This recent utilization introduced a key connection to properties of instances that we highlight next and exploit computationally later.

Definition 1

(Sufficient Reason (Shih et al. 2018)) A sufficient reason for decision \(\varDelta (\alpha )\) is a property of instance \(\alpha \) that is also a prime implicant of \(\varDelta _\alpha \) (recall \(\varDelta _\alpha =\varDelta \) if the decision is positive and \(\varDelta _\alpha =\lnot \varDelta \) if the decision is negative).

A sufficient reason identifies characteristics of an instance that justify the decision: The decision will stick even if other characteristics of the instance were different. A sufficient reason is minimal: None of its strict subsets can justify the decision. A decision can have multiple sufficient reasons, sometimes a large number of them.Footnote 6 There is a key difference between prime implicants and sufficient reasons as the latter must be properties of the given instance. This has significant computational implications that we exploit in Sect. 8.

Sufficient reasons were introduced in Shih et al. (2018) under the name of PI-explanations. They were also referred to as abductive explanations in Ignatiev et al. (2019a).Footnote 7 The new name we adopt is motivated by further distinctions that we draw later and was also used in Lindner and Möllney (2019). We will sometimes say “a reason” to mean “a sufficient reason.”

Greg passed the entrance exam, is not a first time applicant, does not have a high GPA but has work experience (\(\alpha = E, \lnot F, \lnot G, W\)). Classifier \({{\mathscr {C}}}_1\) admits Greg, a decision that can be explained using either of the following sufficient reasons:

  • Passed the entrance exam and is not a first time applicant (\(E, \lnot F\)).

  • Passed the entrance exam and has work experience (\(E, W\)).

Since Greg passed the entrance exam and has applied before, he will be admitted even if his other characteristics were different. Similarly, since Greg passed the entrance exam and has work experience, he will be admitted even if his other characteristics were different.

Proposition 1

Every decision has at least one sufficient reason.

Proof

Consider decision \(\varDelta (\alpha )\). We have \(\alpha \models \varDelta _\alpha \), which means \(\varDelta _\alpha \) is consistent and must have at least one prime implicant (the empty term if \(\varDelta _\alpha \) is valid). Moreover, at least one of these prime implicants must be a property of instance \(\alpha \) since \(\alpha \models \varDelta _\alpha \) and since \(\varDelta _\alpha \) is equivalent to the disjunction of its prime implicants. Hence, we have at least one sufficient reason for the decision. \(\square \)

A classifier may make the same decision on two instances but for different reasons (i.e., disjoint sufficient reasons). However, if two decisions on distinct instances share a reason, they must be equal.

Proposition 2

If decisions \(\varDelta (\alpha )\) and \(\varDelta (\beta )\) share a sufficient reason, the decisions must be equal \(\varDelta (\alpha ) = \varDelta (\beta )\).

Proof

Suppose the decisions share sufficient reason \(\tau \). Then \(\tau \) is property of both \(\alpha \) and \(\beta \) and \(\tau \) is a prime implicant of both \(\varDelta _\alpha \) and \(\varDelta _\beta \). Hence, \(\varDelta _\alpha = \varDelta _\beta \) since \(\tau \) is consistent and \(\varDelta (\alpha ) = \varDelta (\beta )\). \(\square \)

We will see later that sufficient reasons can provide insights about a classifier that go well beyond explaining its decisions.

4 Complete Reasons

A sufficient reason identifies a minimal property of an instance that can trigger a decision. The complete reason behind a decision characterizes all properties of an instance that can trigger the decision.

Definition 2

(Complete Reason) The complete reason for a decision is the disjunction of all its sufficient reasons.

The complete reason for decision \(\varDelta (\alpha )\) captures every property of instance \(\alpha \), and only properties of instance \(\alpha \), that can trigger the decision. It precisely captures why this particular decision was made on instance \(\alpha \).

Theorem 1

Let \({{\mathscr {R}}}\) be the complete reason for decision \(\varDelta (\alpha )\). If instance \(\beta \) does not satisfy \({{\mathscr {R}}}\) and \(\varDelta (\beta ) = \varDelta (\alpha )\), then no sufficient reason for decision \(\varDelta (\beta )\) can be a property of instance \(\alpha \).

Proof

Suppose \(\beta \not \models {{\mathscr {R}}}\) and \(\varDelta (\beta ) = \varDelta (\alpha )\). Then \(\varDelta _\beta = \varDelta _\alpha \). Let \(\tau \) be a sufficient reason for decision \(\varDelta (\beta )\). Then \(\tau \) is a property of instance \(\beta \) and a prime implicant of both \(\varDelta _\beta \) and \(\varDelta _\alpha \). If \(\tau \) were a property of instance \(\alpha \), then \(\tau \) is a sufficient reason for decision \(\varDelta (\alpha )\), \(\tau \models {{\mathscr {R}}}\) and \(\beta \models \tau \models {{\mathscr {R}}}\), a contradiction. Hence, \(\tau \) cannot be a property of instance \(\alpha \). \(\square \)

We will sometimes say “the reason” to mean “the complete reason.” Recall that we also say “a reason” to mean “a sufficient reason.” According to Theorem 1, if the same decision is made on instances \(\alpha \) and \(\beta \), and if instance \(\beta \) does not satisfy the complete reason for decision \(\varDelta (\alpha )\), then these decisions were made for different reasons.

Classifier \({{\mathscr {C}}}_1\) admits Greg (\(\alpha = E, \lnot F, \lnot G, W\)) for the reason \({{\mathscr {R}}}= E \wedge (\lnot F \vee W)\). Greg was admitted because he passed the entrance exam and satisfied one of two additional requirements: he applied before and has work experience. Classifier \({{\mathscr {C}}}_1\) also admits Susan (\(\beta = E, F, G, \lnot W\)). Susan does not satisfy the reason \({{\mathscr {R}}}\). There is one sufficient reason for admitting Susan: she passed the entrance exam and has a good GPA (\(E,G\)), which is not a property of Greg. Hence, classifier \({{\mathscr {C}}}_1\) admitted Greg and Susan for different reasons.

The complete reason behind a decision is unique up to logical equivalence and can be used to enumerate all of the decision’s sufficient reasons.

Theorem 2

Let \({{\mathscr {R}}}\) be the complete reason for decision \(\varDelta (\alpha )\). The prime implicants of \({{\mathscr {R}}}\) are the sufficient reasons for decision \(\varDelta (\alpha )\).

Proof

Let \(\tau _1, \ldots , \tau _n\) be the sufficient reasons for decision \(\varDelta (\alpha )\) and hence \({{\mathscr {R}}}= \tau _1 \vee \ldots \vee \tau _n\). The key observation is that each term \(\tau _i\) is a property of instance \(\alpha \). Hence, for every two terms \(\tau _i\) and \(\tau _j\), if term \(\tau _i\) contains some literal \(X\) then term \(\tau _j\) cannot contain literal \(\lnot X\). The DNF \(\tau _1 \vee \ldots \vee \tau _n\) is then closed under consensus.Footnote 8 Since no term \(\tau _i\) subsumes another term \(\tau _j\), the DNF \(\tau _1 \vee \ldots \vee \tau _n\) contains all prime implicants of \({{\mathscr {R}}}\). Hence, the prime implicants of complete reason \({{\mathscr {R}}}\) are precisely the sufficient reasons of decision \(\varDelta (\alpha )\). \(\square \)

We will later use Theorem 2 to provide a new approach for enumerating sufficient reasons, compared to earlier approaches such as those reported in Shih et al. (2018), Ignatiev et al. (2019a).

We will close this section by further highlighting how the complete reason for a decision can be viewed as a necessary and sufficient condition for explaining the decision. Consider the complete reason \({{\mathscr {R}}}\) for decision \(\varDelta (\alpha )\) and recall that it characterizes all properties of instance \(\alpha \) that can trigger the decision: \({{\mathscr {R}}}\equiv \bigvee _{\tau \models \varDelta _\alpha } \tau ,\) where \(\tau \) is a property of instance \(\alpha \). The reason \({{\mathscr {R}}}\) is then a logical condition that triggers the decision (\({{\mathscr {R}}}\models \varDelta _\alpha \)). If the complete reason is weakened into a condition \({{\mathscr {R}}}_w\) that continues to trigger the decision (\({{\mathscr {R}}}\models {{\mathscr {R}}}_w \models \varDelta _\alpha \)), then \({{\mathscr {R}}}_w\) will admit properties not satisfied by instance \(\alpha \). Moreover, if it is strengthened into a condition \({{\mathscr {R}}}_s\), then \({{\mathscr {R}}}_s\) is guaranteed to trigger the decision (\({{\mathscr {R}}}_s \models {{\mathscr {R}}}\models \varDelta _\alpha \)) but will not admit some properties of instance \(\alpha \) that can trigger the decision. Hence, the complete reason \({{\mathscr {R}}}\) is a necessary and sufficient condition for explaining the decision on instance \(\alpha \).

5 Necessary Characteristics and Properties

The necessary property of a decision is a maximal property of an instance that is essential for explaining the decision on that instance.

Definition 3

(Necessary Characteristics and Properties) A characteristic is necessary for a decision if it appears in every sufficient reason for the decision. The necessary property for a decision is the set of all its necessary characteristics.

The necessary property is unique but could be empty (when the decision has no necessary characteristics). If an instance ceases to satisfy one necessary characteristic, the corresponding decision is guaranteed to change.

Proposition 3

If instance \(\beta \) disagrees with instance \(\alpha \) on only one characteristic necessary for decision \(\varDelta (\alpha )\), then \(\varDelta (\alpha ) \ne \varDelta (\beta )\).

Proof

Suppose \(\alpha \) and \(\beta \) are as premised. If \(\varDelta (\alpha ) = \varDelta (\beta )\) then \(\varDelta _\alpha = \varDelta _\beta \) and \(\tau = \alpha \cap \beta \) is an implicant of \(\varDelta _\alpha \) by consensus on the flipped characteristic \(\rho \). Moreover, \(\tau \) does not contain characteristic \(\rho \) so it cannot be necessary, a contradiction. \(\square \)

If an instance ceases to satisfy more than one necessary characteristic, the decision does not necessarily change. However, if the decision sticks then it would be for completely different reasons.

Theorem 3

Let \(\beta \) be an instance that disagrees with instance \(\alpha \) on at least one characteristic necessary for decision \(\varDelta (\alpha )\). Decisions \(\varDelta (\alpha )\) and \(\varDelta (\beta )\) must have disjoint sufficient reasons.

Proof

Let \({\sigma }\) be the necessary characteristics of decision \(\varDelta (\alpha )\) that instances \(\alpha \) and \(\beta \) disagree on. A sufficient reason \(\tau \) of \(\varDelta (\alpha )\) cannot be a property of instance \(\beta \) since \({\sigma }\subseteq \tau \) and \(\beta \) contains \({{\overline{\sigma }}}\). Hence, \(\tau \) cannot be a sufficient reason for decision \(\varDelta (\beta )\) and the two decisions must have disjoint sufficient reasons. \(\square \)

Consider a classifier \(\varDelta = (X\wedge Y \wedge Z) \vee (\lnot X \wedge \lnot Y \wedge Z)\) and instance \(\alpha = X,Y,Z\). The decision \(\varDelta (\alpha )\) is positive with \(X,Y,Z\) as the only sufficient reason. Hence, all three characteristics of \(\alpha \) are necessary: Flipping any single characteristic of instance \(\alpha \) will lead to a negative decision. However, flipping the two characteristics \(X\) and \(Y\) preserves the positive decision but leads to a new, single sufficient reason \(\lnot X, \lnot Y, Z\).

The complete reason for a decision has enough information to compute its necessary characteristics and property.

Proposition 4

A characteristic is necessary for a decision iff it is implied by the decision’s complete reason.

Proof

Follows from Definition 3 and Theorem 2. \(\square \)

6 Decision Counterfactuals

We mentioned Susan earlier who passed the entrance exam, is a first time applicant, has a high GPA but no work experience (\(\alpha = E, F, G, \lnot W\)). Classifier \({{\mathscr {C}}}_1\) admits Susan because she passed the entrance exam and has a high GPA as this is the only sufficient reason for the decision. Greg was also admitted by this classifier. His application is similar to Susan’s except that he applied before and has work experience (\(\beta = E, \lnot F, G, W\)). The decision on Greg has multiple sufficient reasons so we cannot issue a “because” statement when explaining this decision.

Definition 4

(Because) Consider decision \(\varDelta (\alpha )\) and property \(\tau \) of instance \(\alpha \). We say the decision is made because \(\tau \) if \(\tau \) is the only sufficient reason for the decision.

Proposition 5

Consider decision \(\varDelta (\alpha )\) and property \(\tau \) of instance \(\alpha \). The decision is made because \(\tau \) iff \(\tau \) is the decision’s complete reason.

Proof

Follows from Definitions 1 and 2. \(\square \)

One may be interested in statements that provide insights into a decision beyond the reasons behind it. For example, we may want to know how the classifier may have decided an instance if some of its characteristics were to be different. An example of this is the statement we mentioned in Sect. 1 with regards to classifier \({{\mathscr {C}}}_2\): Susan would still be admitted even if she did not have a high GPA because she comes from a rich hometown and passed the entrance exam. This statement exemplifies counterfactuals of the following form: The decision will stick even if \({\overline{\rho }}\) because \(\tau \), where \(\rho \) and \(\tau \) are properties of the given instance. Recall that \({\overline{\rho }}\) is the property which results from flipping every characteristic in property \(\rho \).

Definition 5

(Even-If-Because) Consider decision \(\varDelta (\alpha )\) and properties \(\rho \) and \(\tau \) of instance \(\alpha \). We say the decision sticks even if \({\overline{\rho }}\) because \(\tau \) if \(\tau \) is the complete reason for decision \(\varDelta (\beta )\), where instance \(\beta \) is the result of replacing property \(\rho \) in instance \(\alpha \) with property \({\overline{\rho }}\).

The following result justifies the above definition.

Theorem 4

Suppose decision \(\varDelta (\alpha )\) sticks even if \({\overline{\rho }}\) because \(\tau \), and let instance \(\beta \) be the result of replacing property \(\rho \) in instance \(\alpha \) with \({\overline{\rho }}\). Then \(\varDelta (\beta )=\varDelta (\alpha )\). Moreover, \(\tau \) is the only sufficient reason for decision \(\varDelta (\beta )\) and must be disjoint from \(\rho \).

Proof

Suppose decision \(\varDelta (\alpha )\) sticks even if \({\overline{\rho }}\) because \(\tau \), and let \(\beta \) be the described instance. By Definition 5, \(\tau \) is the complete reason for decision \(\varDelta (\beta )\). Since \(\tau \) is a property, it must be the only sufficient reason for decision \(\varDelta (\beta )\) by Theorem 2. Hence, \(\tau \) is a property of instance \(\beta \) and must therefore be disjoint from property \(\rho \) since flipping the characteristics of \(\rho \) in instance \(\alpha \) left property \(\tau \) intact. Since property \(\tau \) justifies decision \(\varDelta (\beta )\), \(\tau \models \varDelta _\beta \), and since \(\tau \) is also a property of instance \(\alpha \), \(\alpha \models \tau \), we now have \(\alpha \models \tau \models \varDelta _\beta \) and therefore \(\varDelta (\beta )=\varDelta (\alpha )\). \(\square \)

Applicant Susan who we discussed earlier (\(\alpha = E, F, G, \lnot W, R\)) is admitted by classifier \({{\mathscr {C}}}_2\). The decision will stick even if Susan had a low GPA (\(\lnot G\)) because she comes from a rich hometown and passed the entrance exam (\(E,R\)). This statement is justified since \(E,R\) is the complete reason for decision \(\varDelta (\beta )\). Here, \(\beta = E, F, \lnot G, \lnot W, R\) is the result of replacing characteristic \(G\) by \(\lnot G\) in instance \(\alpha \).

Jackie did not pass the entrance exam, is not a first time applicant, has a low GPA but has work experience (\(\alpha = \lnot E, \lnot F, \lnot G, W\)). Jackie is denied admission by classifier \({{\mathscr {C}}}_1\). The decision will stick even if Jackie had a high GPA (\(G\)) because she did not pass the entrance exam (\(\lnot E\)). This statement is justified since \(\lnot E\) is the complete reason for decision \(\varDelta (\beta )\), where \(\beta = \lnot E, \lnot F, G, W\) is the result of replacing characteristic \(\lnot G\) by \(G\) in instance \(\alpha \).

7 Decision Bias and Classifier Bias

We will now discuss the dependence of decisions on certain features, with a particular application to detecting decision and classifier bias.

Intuitively, a decision is biased if it depends on some protected features: ones that should not be used when making the decision (e.g., gender, zip code, or ethnicity).Footnote 9 We formalize bias next while making a distinction between classifier bias and decision bias. A classifier is biased if it makes some biased decisions, yet some of the other decisions it makes may still be unbiased. While classifier bias can always be detected by examining its decision function, we will show that it can sometimes be detected by examining the complete reason behind one of its unbiased decisions.

Definition 6

(Decision Bias) Decision \(\varDelta (\alpha )\) is biased if \(\varDelta (\alpha ) \ne \varDelta (\beta )\) for some \(\beta \) that disagrees with \(\alpha \) on only protected features.

Bias can be positive or negative. For example, an applicant may be admitted because they come from a rich hometown, or may be denied admission because they did not come from a rich hometown. The following result provides a necessary and sufficient condition for detecting decision bias.

Theorem 5

A decision is biased iff each of its sufficient reasons contains at least one protected feature.

Proof

We will show both directions of the theorem next.

Suppose decision \(\varDelta (\alpha )\) is biased yet has a sufficient reason \(\tau \) with no protected features. We will now show a contradiction. Since the decision is biased, there must exist an instance \(\beta \) that disagrees with instance \(\alpha \) on only protected features and \(\varDelta (\alpha ) \ne \varDelta (\beta )\). Since \(\tau \) is a property of \(\alpha \) and \(\beta \), we have \(\alpha \models \tau \models \varDelta _\alpha \) and \(\beta \models \tau \models \varDelta _\alpha \). Hence, \(\varDelta _\alpha = \varDelta _\beta \) and \(\varDelta (\alpha ) = \varDelta (\beta )\), which is a contradiction.

Suppose every sufficient reason for decision \(\varDelta (\alpha )\) contains at least one protected feature. Let \({\textbf {X}}\) be these protected features and let \(\tau \) be the characteristics of instance \(\alpha \) that do not involve features \({\textbf {X}}\). Assume \(\varDelta (\alpha ) = \varDelta (\beta )\) for every instance \(\beta \) that agrees with instance \(\alpha \) on characteristics \(\tau \) (that is, \(\beta \) disagrees with \(\alpha \) only on the protected features \({\textbf {X}}\)). Term \(\tau \) must then be an implicant of \(\varDelta _\alpha \) and a subset \({\sigma }\) of \(\tau \) must be a prime implicant of \(\varDelta _\alpha \) (could be \(\tau \) itself). Since \(\tau \) is a property of instance \(\alpha \), decision \(\varDelta (\alpha )\) has sufficient reason \({\sigma }\) that does not include a protected feature in \({\textbf {X}}\), which is a contradiction. Hence, \(\varDelta (\alpha ) \ne \varDelta (\beta )\) for some instance \(\beta \) that disagrees with instance \(\alpha \) on only protected features in \({\textbf {X}}\), and decision \(\varDelta (\alpha )\) is biased. \(\square \)

We emphasize that Theorem 5 does not require sufficient reasons to share protected features, only that each must contain at least one protected feature.

Consider classifier \({{\mathscr {C}}}_3\), which admits applicants who have a good GPA (\(G\)) as long as they pass the entrance exam (\(E\)), are male (\(M\)) or come from a rich hometown (\(R\)):

$$\begin{aligned} \varDelta _3 = (G \wedge E) \vee (G \wedge M) \vee (G \wedge R). \end{aligned}$$
(1)

Bob has a good GPA, did not pass the entrance exam and comes from a rich hometown (\(\alpha = G, \lnot E, M, R\)). He is admitted with two sufficient reasons: \(G, M\) and \(G,R\). The decision is biased since each sufficient reason contains a protected feature (\(M\) and \(R\)). This classifier will not admit Nancy who has similar characteristics but does not come from a rich hometown: \(\beta = G, \lnot E, \lnot M, \lnot R\). It will also admit Scott who has the same characteristics as Nancy: \(\gamma = G, \lnot E, M, \lnot R\).

Even though this classifier is biased, some of its decisions may be unbiased. If an applicant has a good GPA and passes the entrance exam (\(G, E\)), they will be admitted regardless of their protected characteristics. Moreover, if an applicant does not have a good GPA (\(\lnot G\)), they will be denied admission regardless of their other characteristics, including protected ones.

Definition 7

(Classifier Bias) A classifier is biased if at least one of its decisions is biased.

We emphasize again that a biased classifier may still make some unbiased decisions. As we show next, one can sometimes infer classifier bias by inspecting the sufficient reasons behind one of its unbiased decisions.

Theorem 6

A classifier is biased iff one of its decisions has a sufficient reason that includes at least one protected feature.

Proof

We will next show both directions of the theorem.

Suppose classifier \(\varDelta \) is biased. By Definition 7, some decision \(\varDelta (\alpha )\) is biased. By Theorem 5, every sufficient reason of decision \(\varDelta (\alpha )\) must contain at least one protected feature.

Suppose decision \(\varDelta (\alpha )\) has a sufficient reason \(\tau \) that contains protected features \({\textbf {X}}\ne \emptyset \). For any instance \(\beta \) such that \(\beta \models \tau \), we must have \(\varDelta (\beta ) = \varDelta (\alpha )\). We now show that there is an instance \(\beta \models \tau \) and instance \(\gamma \) that disagrees with \(\beta \) on only features \({\textbf {X}}\) such that \(\varDelta (\beta ) \ne \varDelta (\gamma )\). Suppose the contrary: for all such \(\beta \) and \(\gamma \), we have \(\varDelta (\beta )=\varDelta (\gamma )=\varDelta (\alpha )\). Then \(\tau \setminus \rho \) is an implicant of \(\varDelta _\alpha \), where \(\rho \) are the protected characteristics in \(\tau \). This is impossible since \(\tau \) is a prime implicant of \(\varDelta _\alpha \). Hence, \(\varDelta (\beta ) \ne \varDelta (\gamma )\) for some \(\beta \) and \(\gamma \) with the stated properties so the classifier is biased. \(\square \)

If decision \(\varDelta (\alpha )\) has protected features in some but not all of its sufficient reasons, the decision is not biased according to Theorem 5. But classifier \(\varDelta \) is biased according to Theorem 6 as it will make a biased decision on some other instance \(\beta \ne \alpha \).

Consider classifier \({{\mathscr {C}}}_3\) in (1) and Lisa who has a good GPA, passed the entrance exam and comes from a rich hometown (\(G, E, \lnot M, R\)). The classifier will admit Lisa for two sufficient reasons: \(G,E\) and \(G,R\). The decision is unbiased: any applicant who has similar unprotected characteristics will be admitted. However, since one of the sufficient reasons contains a protected feature, the classifier is biased as it can make a biased decision on a different applicant. The proof of Theorem 6 suggests that the classifier will make different decisions on two applicants with a good GPA who disagree only on whether they come from a rich hometown. Nancy (\(G, \lnot E, \lnot M, \lnot R\)) and Heather (\(G, \lnot E, \lnot M, R\)) are such applicants.

The following theorem shows how one can detect decision bias using the complete reason behind the decision. We will use this theorem (and Theorem 8) when discussing algorithms in Sect. 8.

Theorem 7

A decision is biased iff \(\exists (X_1, \ldots , X_n) {{\mathscr {R}}}\) is not valid where \(X_1, \ldots , X_n\) are all unprotected features and \({{\mathscr {R}}}\) is the complete reason behind the decision.

Proof

Let \(\tau _1, \ldots , \tau _n\) be the decision’s sufficient reasons and hence \({{\mathscr {R}}}= \tau _1 \vee \ldots \vee \tau _n\). Existentially quantifying variables \(X_i\) from a DNF is done by replacing their literals with \(1\). The result is valid iff some term \(\tau _i\) contains only variables in \(X_1, \ldots , X_n\). Hence, \(\exists X_1, \ldots , X_n {{\mathscr {R}}}\) is not valid iff each term \(\tau _i\) contains variables beyond \(X_i\) (i.e., each sufficient reason contains protected features). \(\square \)

The following result shows how classifier bias can sometimes be detected based on the complete reason behind an unbiased decision.

Theorem 8

A classifier is biased if \({{\mathscr {R}}}|X \not \equiv {{\mathscr {R}}}|\lnot X\) where \(X\) is a protected feature and \({{\mathscr {R}}}\) is the complete reason for some decision.

Proof

Given Theorems 2 and 6, it is sufficient to show that \({{\mathscr {R}}}|X \not \equiv {{\mathscr {R}}}|\lnot X\) iff feature \(X\) appears in some prime implicant of \({{\mathscr {R}}}\). Let \(\tau _1, \ldots , \tau _n\) be the prime implicants of \({{\mathscr {R}}}\). Feature \(X\) appears either positively or negatively in these prime implicants since terms \(\tau _i\) are all properties of the same instance. Suppose without loss of generality that feature \(X\) appears positively in terms \(\tau _i\) (if any). Then \({{\mathscr {R}}}|X \equiv \bigvee _{X \not \in \tau _i} \tau _i \vee \bigvee _{X \in \tau _i} \tau _i \setminus \{X\}\) and \({{\mathscr {R}}}|\lnot X \equiv \bigvee _{X \not \in \tau _i} \tau _i\). Hence \({{\mathscr {R}}}|X \not \equiv {{\mathscr {R}}}|\lnot X\) iff \(X \in \tau _i\) for some prime implicant \(\tau _i\). \(\square \)

Theorem 8 follows from Theorems 2 and 6 and a known result: A Boolean function depends on a variable \(X\) iff \(X\) appears in one of its prime implicants. We included the full proof for completeness.

8 Computing Reasons and Related Queries

The enumeration of PI-explanations (sufficient reasons) was treated in Shih et al. (2018) by modifying the algorithm in Coudert and Madre (1993) for computing prime implicant covers; see also (Coudert et al. 1993; Minato 1993). The modified algorithm optimizes the original one by integrating the instance into the prime implicant enumeration process, but we are unaware of a complexity bound for the original algorithm or its modification. Moreover, since the algorithm is based on prime implicant covers, it is incomplete. Consider classifier \(\varDelta = (X \wedge Z) \vee (Y \wedge \lnot Z)\), which has three prime implicants: \((X \wedge Z)\), \((Y \wedge \lnot Z)\) and \((X \wedge Y)\). The last prime implicant is redundant and may not be generated when computing a cover. Instance \(\alpha = X, Y, Z\) leads to a positive decision and two sufficient reasons: \((X \wedge Z)\) and \((X \wedge Y)\). An algorithm based on covers may miss the sufficient reason \((X \wedge Y)\) and is therefore incomplete. This can be problematic for queries that rely on examining all sufficient reasons, such as decision and classifier bias (Definitions 6 and 7).

We next propose a new approach based on computing the complete reason \({{\mathscr {R}}}\) for a decision (Definition 2), which characterizes all sufficient reasons, and then use it to compute multiple queries. For example, we can enumerate all sufficient reasons using the reason \({{\mathscr {R}}}\) (Theorem 2). We can also use it to compute necessary characteristics (Proposition 4) and to detect decision bias (Theorem 7). Even classifier bias can sometimes be inferred directly using the reason \({{\mathscr {R}}}\) (Theorem 8) among other queries.

Assuming the classifier is represented using a suitable tractable Boolean circuit, our approach will compute the complete reason for a decision in linear time regardless of how many sufficient reasons it may have (could be exponential). Moreover, it will ensure that the computed complete reason is represented by a tractable circuit, allowing us to answer many queries in polytime.

8.1 Computing Complete Reasons

Our approach for computing complete reasons requires the classifier \(\varDelta \) and its negation \(\lnot \varDelta \) to be represented as Decision-DNNF circuits, which we define next.

Definition 8

(Decision-DNNF Circuit) An NNF circuit is a Boolean circuit that has literals or constants as inputs and two type of gates: and-gates and or-gates. A DNNF circuit is an NNF circuit in which the subcircuits feeding into each and-gate share no variables.Footnote 10 A Decision-DNNF circuit is a DNNF circuit in which every or-gate has exactly two inputs of the form: \(X \wedge \mu \) and \(\lnot X \wedge \nu \), where \(X\) is a variable.Footnote 11

DNNF circuits were introduced in Darwiche (2001). Decision-DNNF circuits were identified in Huang and Darwiche (2005, 2007). OBDDs which we discussed earlier are a subset of Decision-DNNF circuits as one can convert an OBDD into a Decision-DNNF circuit in linear time. Figure 4 depicts an OBDD and its corresponding Decision-DNNF circuit. The circuit is obtained by mapping each OBDD node with variable \(X\), high child \(\mu \) and low child \(\nu \) into the circuit fragment \((X\wedge \mu ) \vee (\lnot X \wedge \nu )\) (two and-gates and one or-gate). For more on Decision-DNNF circuits and OBDD, see (Bryant 1986; Darwiche and Marquis 2002; Huang and Darwiche 2007; Oztok and Darwiche 2014). One can obtain a Decision-DNNF circuit by compiling a Boolean formula in Conjunctive Normal Form (CNF) using systems such as c2dFootnote 12 (Darwiche 2004) and d4Footnote 13 (Lagniez and Marquis 2017). One can also compile an OBDD from any Boolean formula using systems such as cudd.Footnote 14

Fig. 4
figure 4

From left to right: OBDD, Decision-DNNF circuit, consensus circuit, and the filtering of consensus circuit by instance \(\lnot A,B,C\)

We compute the complete reason for a decision \(\varDelta (\alpha )\) by applying two operations to a Decision-DNNF circuit for \(\varDelta _\alpha \): consenus then filtering.

Definition 9

(Consensus Circuit) The consensus circuit of Decision-DNNF circuit \(\varGamma \) is denoted \(\textsf {consensus}(\varGamma )\) and obtained by adding input \(\mu \wedge \nu \) to every or-gate with inputs \(X \wedge \mu \) and \(\lnot X \wedge \nu \).

Figure 4 depicts a Decision-DNNF circuit and its consensus circuit (third from left). The consensus operation adds four and-gates denoted with double circles.

Proposition 6

A Decision-DNNF circuit \(\varGamma \) has the same satisfying assignments as its consensus circuit \(\textsf {consensus}(\varGamma )\).

Proof

\((X\wedge \mu )\vee (\lnot X\wedge \nu ) \equiv (X\wedge \mu )\vee (\lnot X\wedge \nu )\vee (\mu \wedge \nu )\). \(\square \)

A consensus circuit can be obtained from a Decision-DNNF circuit in time linear. We next discuss the filtering of a consensus circuit, which leads to a tractable circuit.

Definition 10

(Filtered Circuit) The filtering of consensus circuit \(\varGamma \) by instance \(\alpha \), where \(\varGamma (\alpha )=1\), is denoted \(\textsf {filter}(\varGamma ,\alpha )\) and obtained by replacing every literal \(l \not \in \alpha \) by constant \(0\).

Filtering is defined only on consensus circuits and requires an instance that satisfies the consensus circuit (we are only interested in such instances). Figure 4 depicts an example. The filtered circuit is on the far right of the figure, where grayed out nodes and edges can be dropped due to replacing literals by constant \(0\).

Filtering is also a linear time operation. Consensus preserves models (i.e., satisfying assignments of the circuit), but filtering drops some of them. We will characterize the models preserved by filtering after presenting two required results.

Let \(\varGamma \) be a circuit that results from filtering by instance \(\alpha \). The circuit is monotone in the following sense. If the common literals between instances \(\alpha \) and \(\beta \) are a subset of the common literals between instances \(\alpha \) and \(\gamma \), then \(\beta \models \varGamma \) only if \(\gamma \models \varGamma \). For example, if \(\alpha = X,Y,Z\), \(\beta = \lnot X, Y, \lnot Z\) and \(\gamma = \lnot X, Y, Z\), then \(\alpha \) and \(\beta \) agree on literals \(\{Y\}\) while \(\alpha \) and \(\gamma \) agree on literals \(\{Y,Z\}\) so the condition is met in this case.

Theorem 9

If circuit \(\varGamma \) results from filtering by instance \(\alpha \) then every literal \(l\) that appears in \(\varGamma \) also appears in \(\alpha \). Moreover, \(\varGamma (\gamma ) \ge \varGamma (\beta )\) if \(\gamma \cap \alpha \supseteq \beta \cap \alpha \).

Proof

Filtering removes every literal not in instance \(\alpha \). Hence, every literal in the filtered circuit \(\varGamma \) is in \(\alpha \), which implies the next result. Suppose that \(\gamma \cap \alpha \supseteq \beta \cap \alpha \) and \(\varGamma (\beta )=1\). When evaluating circuit \(\varGamma \) at \(\gamma \) compared to \(\beta \), the only literals that change values are \(l_1 \in \gamma \setminus \beta \) and \(l_2 \in \beta \setminus \gamma \). Literals \(l_1\) change values from \(0\) to \(1\) and literals \(l_2\) change values from \(1\) to \(0\). Changes to the values of \(l_1\) cannot decrease the output of circuit \(\varGamma \) since it is an NNF circuit. Literals \(l_2\) are not in \(\alpha \) since \(\gamma \cap \alpha \supseteq \beta \cap \alpha \) so do not appear in circuit \(\varGamma \) and changes to their values do not matter. Hence, \(\varGamma (\gamma ) = 1\). \(\square \)

We also need the following result which identifies circuit models that are preserved by the filtering of a consensus circuit.

Proposition 7

Consider a Decision-DNNF circuit \(\varDelta \) and instance \(\alpha \) such that \(\varDelta (\alpha )=1\). If \(\tau \) is an implicant of \(\varDelta \) and \(\alpha \models \tau \) then \(\tau \) is also an implicant of \(\textsf {filter}(\textsf {consensus}(\varDelta ),\alpha )\).

Proof

Let \(\varGamma = \textsf {filter}(\textsf {consensus}(\varDelta ),\alpha )\), \({{\mathscr {I}}}(\varDelta ) = \{\tau : \tau \models \varDelta \}\) and \({{\mathscr {I}}}(\varDelta ,\alpha ) = \{\tau : \tau \models \varDelta \text{ and } \alpha \models \tau \}\). We need to show that \({{\mathscr {I}}}(\varDelta ,\alpha ) \subseteq {{\mathscr {I}}}(\varGamma )\). That is, \(\varGamma \) preserves the implicants \(\tau \) of \(\varDelta \) satisfied by \(\alpha \). The proof is by induction on the structure of \(\varDelta \).

(Base Case) If \(\varDelta \) is a literal \(l\) or a constant, then \(\varDelta = \varGamma \) since consensus is not applicable and filtering will not replace literal \(l\) by constant \(0\) (\(l \in \alpha \) since \(\varDelta (\alpha )=1\)). Hence, \({{\mathscr {I}}}(\varDelta ,\alpha ) \subseteq {{\mathscr {I}}}(\varGamma )\).

(Inductive Step) If \(\varDelta = \varDelta _1 \wedge \varDelta _2\) then \(\varGamma = \varGamma _1\wedge \varGamma _2\) where \(\varGamma _1 = \textsf {filter}(\textsf {consensus}(\varDelta _1),\alpha )\) and \(\varGamma _2= \textsf {filter}(\textsf {consensus}(\varDelta _2),\alpha )\). Since \(\varDelta _1\) and \(\varDelta _2\) share no variables (decomposability), \({{\mathscr {I}}}(\varDelta ) = {{\mathscr {I}}}(\varDelta _1) \times {{\mathscr {I}}}(\varDelta _2)\) (Cartesian product). Similarly, \({{\mathscr {I}}}(\varGamma ) = {{\mathscr {I}}}(\varGamma _1) \times {{\mathscr {I}}}(\varGamma _2)\). By the induction hypothesis, \({{\mathscr {I}}}(\varDelta _1,\alpha ) \subseteq {{\mathscr {I}}}(\varGamma _1)\) and \({{\mathscr {I}}}(\varDelta _2,\alpha ) \subseteq {{\mathscr {I}}}(\varGamma _2)\). Hence,

$$\begin{aligned} {{\mathscr {I}}}(\varDelta ,\alpha ) = {{\mathscr {I}}}(\varDelta _1,\alpha ) \times {{\mathscr {I}}}(\varDelta _2,\alpha ) \subseteq {{\mathscr {I}}}(\varGamma _1) \times {{\mathscr {I}}}(\varGamma _2) = {{\mathscr {I}}}(\varGamma ). \end{aligned}$$

(Inductive Step) If \(\varDelta = (l \wedge \varDelta _1) \vee (\lnot l \wedge \varDelta _2)\) and literal \(l \in \alpha \) then \(\varGamma = (l\wedge \varGamma _1) \vee (\varGamma _1 \wedge \varGamma _2)\) where \(\varGamma _1 = \textsf {filter}(\textsf {consensus}(\varDelta _1),\alpha )\) and \(\varGamma _2 = \textsf {filter}(\textsf {consensus}(\varDelta _2),\alpha )\). Due to decomposability, \(l\) and \(\lnot l\) do not appear in \(\varDelta _1\) or \(\varDelta _2\). Hence, \({{\mathscr {I}}}(\varDelta ) = {{\mathscr {I}}}_1 \cup {{\mathscr {I}}}_2 \cup {{\mathscr {I}}}_c\) where

$$\begin{aligned} {{\mathscr {I}}}_1= & {} \{l,\tau : \tau \in {{\mathscr {I}}}(\varDelta _1)\} \\ {{\mathscr {I}}}_2= & {} \{\lnot l,\tau : \tau \in {{\mathscr {I}}}(\varDelta _2)\} \\ {{\mathscr {I}}}_c= & {} {{\mathscr {I}}}(\varDelta _1 \wedge \varDelta _2). \end{aligned}$$

Since \({{\mathscr {I}}}_2 \cap {{\mathscr {I}}}(\varDelta ,\alpha ) = \emptyset \) we have

$$\begin{aligned} {{\mathscr {I}}}(\varDelta ,\alpha ) = \{l,\tau : \tau \in {{\mathscr {I}}}(\varDelta _1,\alpha )\} \cup {{\mathscr {I}}}(\varDelta _1 \wedge \varDelta _2,\alpha ). \end{aligned}$$

Moreover, \( {{\mathscr {I}}}(\varGamma ) = \{l,\tau : \tau \in {{\mathscr {I}}}(\varGamma _1)\} \cup {{\mathscr {I}}}(\varGamma _1 \wedge \varGamma _2) \). By the induction hypothesis, \({{\mathscr {I}}}(\varDelta _1,\alpha ) \subseteq {{\mathscr {I}}}(\varGamma _1)\) and \({{\mathscr {I}}}(\varDelta _2,\alpha ) \subseteq {{\mathscr {I}}}(\varGamma _2)\), which gives \(\{l,\tau : \tau \in {{\mathscr {I}}}(\varDelta _1,\alpha )\} \subseteq \{l,\tau : \tau \in {{\mathscr {I}}}(\varGamma _1)\}\) and \({{\mathscr {I}}}(\varDelta _1 \wedge \varDelta _2,\alpha ) \subseteq {{\mathscr {I}}}(\varGamma _1 \wedge \varGamma _2)\). Hence, \({{\mathscr {I}}}(\varDelta ,\alpha ) \subseteq {{\mathscr {I}}}(\varGamma )\). \(\square \)

The following fundamental result reveals the role of filtering a consensus circuit. It also reveals our linear-time procedure for computing the complete reason behind a decision as a (tractable) circuit that compactly characterizes all sufficient reasons.

Theorem 10

Consider a Decision-DNNF circuit \(\varDelta \) and instance \(\alpha \) such that \(\varDelta (\alpha )=1\). Term \(\tau \) is a prime implicant of \(\varDelta \) and \(\alpha \models \tau \) (that is, \(\tau \) is a sufficient reason for decision \(\varDelta (\alpha )\)) iff \(\tau \) is a prime implicant of \(\textsf {filter}(\textsf {consensus}(\varDelta ),\alpha )\).

Proof

Let \(\varGamma = \textsf {filter}(\textsf {consensus}(\varDelta ),\alpha )\). Observe that \(\varGamma \models \varDelta \) since \(\textsf {consensus}(\varDelta ) \equiv \varDelta \) and since \(\varGamma \) is the result of replacing some inputs of \(\textsf {consensus}(\varDelta )\) with constant \(0\).

Suppose \(\tau \) is a prime implicant of circuit \(\varDelta \) and \(\alpha \models \tau \). Then \(\tau \) is an implicant of circuit \(\varGamma \) by Proposition 7, \(\tau \models \varGamma \). If \(\tau \) is not a prime implicant of \(\varGamma \), we must have some term \(\rho \subset \tau \) such that \(\rho \models \varGamma \). Therefore \(\rho \models \varDelta \) since \(\varGamma \models \varDelta \), which means that \(\tau \) is not a prime implicant of \(\varDelta \), a contradiction. Hence, \(\tau \) is a prime implicant of \(\varGamma \).

Suppose \(\tau \) is a prime implicant of circuit \(\varGamma \). Then \(\tau \) is an implicant of \(\varDelta \) since \(\varGamma \models \varDelta \). We next show that \(\tau \) is a prime implicant of \(\varDelta \) and \(\alpha \models \tau \). Let \(\beta \) be an instance such that \(\beta \models \tau \) and \(\beta \) disagrees with \(\alpha \) on all variables outside \(\tau \). Then \(\varGamma (\beta )=1\) and \(\alpha \cap \beta \subseteq \tau \). Every instance \(\gamma \) such that \(\gamma \models \alpha \cap \beta \) must satisfy \(\varGamma (\gamma )=1\) since \(\alpha \cap \gamma \supseteq \alpha \cap \beta \), leading to \(\varGamma (\gamma ) \ge \varGamma (\beta )\) by Theorem 9. Hence, \(\alpha \cap \beta \) is an implicant of \(\varGamma \). Since \(\tau \) is a prime implicant of \(\varGamma \), we must have \(\alpha \cap \beta = \tau \) and hence \(\alpha \models \tau \). Suppose now \(\tau \) is not a prime implicant of \(\varDelta \). Some term \(\rho \subset \tau \) is then a prime implicant of \(\varDelta \) and \(\alpha \models \rho \). By the first part of this theorem, \(\rho \) is a prime implicant of \(\varGamma \), a contradiction. Therefore, \(\tau \) is a prime implicant of \(\varDelta \). \(\square \)

This is our final definition in this section, which captures the computation of complete reasons using circuits.

Definition 11

(Reason Circuit) For classifier \(\varDelta \), instance \(\alpha \) and a Decision-DNNF circuit \(\varGamma \) for \(\varDelta _\alpha \), the circuit \(\textsf {filter}(\textsf {consensus}(\varGamma ),\alpha )\) is called a reason circuit and is denoted by \(\textsf {reason}(\varDelta ,\alpha )\).

The circuit \(\textsf {reason}(\varDelta ,\alpha )\) depends on the specific Decision-DNNF circuit \(\varGamma \) used to represent \(\varDelta _\alpha \) but will always have the same models.

8.2 Tractability of Reason Circuits

We next show that reason circuits are tractable. Since we represent the complete reason for a decision as a reason circuit, many queries relating to the decision can then be answered efficiently.

Definition 12

(Monotone) An NNF circuit is monotone if every variable appears only positively or only negatively in the circuit.

Reason circuits are filtered circuits and hence monotone as shown by Theorem 9. The following theorem mirrors what is known about monotone Boolean formulas, but we include it for completeness.

Theorem 11

The satisfiability of a monotone NNF circuit can be decided in linear time. A monotone NNF circuit can be negated and also conditioned in linear time to yield a monotone NNF circuit.

Proof

The satisfiability of a monotone NNF circuit can be decided using the following procedure. Constant \(0\) is not satisfiable. Constant \(1\) and literals are satisfiable. An or-gate is satisfiable iff any of its inputs is satisfiable. An and-gate is satisfiable iff all its inputs are satisfiable. All previous statements are always correct except the last one which depends on monotonicity. Consider a conjunction \(\mu \wedge \nu \) and suppose every variable shared between the conjuncts appears either positively or negatively in both. Any model of \(\mu \) can be combined with any model of \(\nu \) to form a model for \(\mu \wedge \nu \). Hence, the conjunction is satisfiable iff each of the conjuncts is satisfiable. Conditioning replaces literals by constants so it preserves monotonicity. To negate a monotone circuit, replace and-gates by or-gates, or-gates by and-gates and literals by their negations. Monotonicity is preserved. \(\square \)

Given Theorem 11, the validity of a monotone NNF circuit can be decided in linear time (we check whether the negated circuit is unsatisfiable).Footnote 15 We can also conjoin the circuit with a literal in linear time to yield a monotone circuit since \(\varDelta \wedge l = (\varDelta \vert l) \wedge l\).

Variables can be existentially quantified from a monotone circuit in linear time, with the resulting circuit remaining monotone. This is critical for efficiently detecting decision bias as shown by Theorem 7.

Theorem 12

Replacing every literal of variable \(X\) with constant \(1\) in a monotone NNF circuit \(\varGamma \) yields a monotone NNF circuit equivalent to \(\exists X \varGamma \).

Proof

If variable \(X\) appears only positively in circuit \(\varGamma \) then \(\varGamma \vert \lnot X \models \varGamma \vert X\) and \(\exists X \; \varGamma = (\varGamma \vert X) \vee (\varGamma \vert \lnot X) = \varGamma \vert X\). If variable \(X\) appears only negatively in \(\varGamma \) then \(\varGamma \vert X \models \varGamma \vert \lnot X\) and \(\exists X \; \varGamma = (\varGamma \vert X) \vee (\varGamma \vert \lnot X) = \varGamma \vert \lnot X\). Variable \(X\) can therefore be existentially quantified by replacing its literals with constant \(1\). \(\square \)

8.3 Computing Queries

figure a

We can now discuss algorithms. To compute the sufficient reasons for a decision \(\varDelta (\alpha )\): get a Decision-DNNF circuit for \(\varDelta _\alpha \), transform it into a consensus circuit, filter it by instance \(\alpha \) and finally compute the prime implicants of filtered circuit. Algorithm 1 does this in place, that is without explicitly constructing the consensus or filtered circuits. It assumes a positive decision (otherwise we pass \(\lnot \varDelta \)).

Algorithm 1 uses subroutine \(\textsf {cartesian\_product}\) which conjoins two DNFs by computing the Cartesian product of their terms. It also uses \(\textsf {remove\_subsumed}\) to remove subsumed terms from a DNF.

Theorem 13

Consider a Decision-DNNF \(\varDelta \) and instance \(\alpha \). If \(\varDelta (\alpha )=1\) then a call \(\textsf {PI}(\varDelta ,\alpha )\) to Algorithm 1 returns the prime implicants of circuit \(\textsf {filter}(\textsf {consensus}(\varDelta ),\alpha )\).

Proof

Consensus and filtering are applied implicitly on Lines 10-11. Filtered circuit are monotone. We compute the prime implicants of a monotone circuit by converting it into DNF and removing subsumed terms (Crama and Hammer 2011, Chapter 3). This is precisely what Algorithm 1 does. \(\square \)

Consider now a decision \(\varDelta (\alpha )\) and its complete reason \({{\mathscr {R}}}= \textsf {reason}(\varDelta ,\alpha )\), which is a monotone NNF circuit. Let \(n\) be the size of circuit \({{\mathscr {R}}}\) and \(m\) be the number of features. We next show how to compute various queries using circuit \({{\mathscr {R}}}\).

Sufficient Reasons. By Theorems 2 and 13, the call \(\textsf {PI}(\varDelta _\alpha ,\alpha )\) to Algorithm 1 will return all sufficient reasons for decision \(\varDelta (\alpha )\), assuming \(\varDelta _\alpha \) is a Decision-DNNF circuit. The number of sufficient reasons can be exponential, but we can actually answer many questions about them without enumerating them directly as shown below.

Necessary Property. By Proposition 4, characteristic (literal) \(l\) is necessary for decision \(\varDelta (\alpha )\) iff \({{\mathscr {R}}}\models l\). This is equivalent to \({{\mathscr {R}}}\vert \lnot l\) being unsatisfiable, which can be decided in \(O(n)\) time given Theorem 11. The necessary property (all necessary characteristics) can then be computed in \(O(n \cdot m)\) time.

Because Statements. To decide whether decision \(\varDelta (\alpha )\) was made “because \(\tau \)” we check whether property \(\tau \) is the complete reason for the decision (Definition 4): \(\tau \models {{\mathscr {R}}}\) and \({{\mathscr {R}}}\models \tau \). We have \(\tau \models {{\mathscr {R}}}\) iff \(\lnot {{\mathscr {R}}}\vert \tau \) is unsatisfiable. Moreover, \({{\mathscr {R}}}\models \tau \) iff \({{\mathscr {R}}}\vert \lnot l\) is unsatisfiable for every literal \(l\) in \(\tau \). All of this can be done in \(O(n \cdot \vert \tau \vert )\) time.

Even if, Because Statements. To decide whether decision \(\varDelta (\alpha )\) would stick “even if \({\overline{\rho }}\) because \(\tau \)” we replace property \(\rho \) with \({\overline{\rho }}\) in instance \(\alpha \) to yield instance \(\beta \) (Definition 5). We then compute the complete reason for decision \(\varDelta (\beta )\) and check whether it is equivalent to \(\tau \). All of this can be done \(O(n \cdot \vert \tau \vert )\) time.

Decision Bias. To decide whether decision \(\varDelta (\alpha )\) is biased we existentially quantify all unprotected features from circuit \({{\mathscr {R}}}\) and then check the validity of the result (Theorem 7). All of this can be done in \(O(n)\) time given Theorems 11 and 12.

Fig. 5
figure 5

Admission classifier

Fig. 6
figure 6

Applicants and characteristics

Fig. 7
figure 7

From left to right: Reason circuit for the decision on applicants Scott, Robin and April (Fig. 6)

9 A Case Study

We now consider a more refined admission classifier to illustrate the notions and concepts we introduced more comprehensively.

This classifier highly values passing the entrance exam and being a first time applicant. However, it also gives significant leeway to students from a rich hometown. In fact, being from a rich hometown unlocks the only path to acceptance for those who failed the entrance exam. The classifier is depicted as an OBDD in Fig. 5. It corresponds to the following Boolean formula, which is not monotone (the previous classifiers we considered were all monotone):

$$\begin{aligned} \varDelta = [E \wedge [(F \wedge (G \vee W)) \vee (\lnot F \wedge R)]] \vee [G \wedge R \wedge W]. \end{aligned}$$

The classifier has the following prime implicants, some are not essential (all prime implicants of a monotone formula are essential):

$$\begin{aligned} (E,F,W)(E,F,G)(G,R,W)(E,\lnot F,R)(E,R,W)(E,G,R). \end{aligned}$$

We will consider applicants Scott, Robin and April in Fig. 6, where feature \(R\) is protected (whether the applicant comes from a rich hometown). The complete reasons for the decisions on these applicants are shown in Fig. 7. These are reason circuits produced as suggested by Definition 11, except that we simplified the circuits by propagating and removing constant values (a reason circuit is satisfiable as it must be satisfied by the instance underlying the decision).

The decision on applicant Scott is biased. To check this, we can existentially quantify unprotected features \(E, F, G, W\) from the reason circuit in Fig. 7 and then check its validity (Theorem 7). Existential quantification is done by replacing the literals \(E, \lnot F, G, W\) in the circuit with constant \(1\). The resulting circuit is not valid. We can also confirm decision bias by considering the sufficient reasons for this decision, which all contain the protected feature \(R\) (Theorem 5):

$$\begin{aligned} (E, G, R) \, (E, R, W)\, (E, R, \lnot F)\, (G, R, W) \end{aligned}$$

If we flip the protected characteristic \(R\) to \(\lnot R\), the decision will flip with the complete reason being \(\lnot F, \lnot R\) so Scott would be denied admission because he is not a first time applicant and does not come from a rich hometown (Definition 4).

The decision on Robin is not biased. If we existentially quantify unprotected features \(E, F, G, W\) from the reason circuit (by replacing their literals with constant \(1\)), the circuit becomes valid. We can confirm this using the decision’s sufficient reasons:

$$\begin{aligned} (E, F, G)\, (E, F, W)\, (E, G, R)\, (E, R, W)\, (G, R, W) \end{aligned}$$

Two of these sufficient reasons do not contain the protected feature so the decision cannot be biased (Theorem 5). The decision will be the same on any applicant with the same characteristics as Robin except for the protected feature \(R\). However, since some of the sufficient reasons contain a protected feature, the classifier must be biased (Theorem 6): It will make a biased decision on some other applicant. This illustrates how classifier bias can be inferred from the complete reason behind one of its unbiased decisions. This method is not complete though: the classifier may still be biased even if no protected feature appears in a sufficient reason for one of its decisions.

The decision on April is not biased even though the protected feature \(R\) appears in the reason circuit (the circuit is valid if we existentially quantify all features but \(R\)). Moreover, \(E,F\) are all the necessary characteristics for this decision (i.e., the necessary property). Flipping either of these characteristics will flip the decision. Recall that violating the necessary property may either flip the decision or change the reason behind it (Theorem 3) but flipping only one necessary characteristic is guaranteed to flip the decision (Proposition 3).

The decision on April would stick even if she were not to have work experience (\(\lnot W\)) because she passed the entrance exam (\(E\)), has a good GPA (\(G\)) and is a first time applicant (\(F\)). April would be denied admission if she were to also violate one of these characteristics (Definition 5 and Proposition 3).

We close this section by an important remark. Even though most of the notions we defined are based on prime implicants, our proposed theory does not necessarily require the computation of prime implicants which can be prohibitive. Reason circuits characterize all relevant prime implicants and can be obtained in linear time from Decision-DNNF circuits. Reason circuits are also monotone, allowing one to answer many queries about the embedded prime implicants in polytime. This is a major contribution of this work.

10 Concluding Remarks

We introduced a theory for reasoning about the decisions of Boolean classifiers, which is based on the fundamental notion of complete reasons. We presented applications of the theory to explaining decisions, evaluating counterfactual statements about decisions and identifying decision and classifier bias. We showed that if classifiers are represented by Decision-DNNFs, which are a superset of OBDDs, then the complete reason for a decision can be computed in linear time and in the form of a tractable Boolean circuit that we called a reason circuit. We then presented linear-time and polytime algorithms for computing most of the introduced notions based on reason circuits. More recently, the notion of a complete reason was formulated using quantified Boolean logic and shown to be also computable efficiently when classifiers are represented by CNFs or SDDs (Darwiche and Marquis 2021). An SDD is a decision diagram that branches on formulas (sentences) instead of variables (SDD stands for Sentential Decision Diagram) (Darwiche 2011). SDDs are also a superset of OBDDs but they are not comparable to Decision-DNNFs in terms of succinctness (Bollig and Buttkus 2019; Beame and Liew 2015; Beame et al. 2013).

There has been a significant interest recently in the computation and complexity of explanation queries, particularly sufficient reasons. This included investigations into the computation of shortest sufficient reasons which are length-minimal instead of subset-minimal. For Naïve Bayes (and linear) classifiers, it was shown that one sufficient reason can be generated in log-linear time, and all sufficient reasons can be generated with polynomial delay (Marques-Silva et al. 2020). For decision trees, the complexity of generating one sufficient reason was shown to be in polynomial time (Izza et al. 2020). Later works showed the same complexity for decision graphs (Huang et al. 2021b) and some classes of tractable circuits (Audemard et al. 2020; Huang et al. 2021a). The generation of sufficient reasons for decision trees was also studied in Audemard et al. (2021b), including the generation of shortest sufficient reasons which was shown to be hard even for a single reason. The generation of shortest sufficient reasons was also studied in a broader context that includes decision graphs and SDDs (Darwiche and Ji 2022). More general studies of complexity were also conducted in Audemard et al. (2020), Huang et al. (2021a), where classifiers where categorized based on the tractable circuits that represent them (Huang et al. 2021a) or the kinds of processing they permit in polynomial time (Audemard et al. 2020). The complexity of robustness queries and shortest sufficient reasons was studied in Barceló et al. (2020) for Boolean classifiers which correspond to decision graphs and neural networks with ReLU activation functions. A comprehensive study of complexity was presented recently in Audemard et al. (2021a) for a large set of explanation queries and classes of Boolean classifiers.