Causal Sufficiency and Actual Causation

Pearl opened the door to formally defining actual causation using causal models. His approach rests on two strategies: first, capturing the widespread intuition that X=x causes Y=y iff X=x is a Necessary Element of a Sufficient Set for Y=y, and second, showing that his definition gives intuitive answers on a wide set of problem cases. This inspired dozens of variations of his definition of actual causation, the most prominent of which are due to Halpern&Pearl. Yet all of them ignore Pearl's first strategy, and the second strategy taken by itself is unable to deliver a consensus. This paper offers a way out by going back to the first strategy: it offers six formal definitions of causal sufficiency and two interpretations of necessity. Combining the two gives twelve new definitions of actual causation. Several interesting results about these definitions and their relation to the various Halpern&Pearl definitions are presented. Afterwards the second strategy is evaluated as well. In order to maximize neutrality, the paper relies mostly on the examples and intuitions of Halpern&Pearl. One definition comes out as being superior to all others, and is therefore suggested as a new definition of actual causation.


Introduction
Two decades have passed since Judea Pearl's groundbreaking book on causality was published (Pearl, 2000). It offers a formal account of causal models that led causal modeling to become a central part of Artificial Intelligence. One of the book's most important applications for philosophy is its formal definition of actual causation, i.e., causation of particular events.
Pearl defends his account of actual causation using two strategies. The first strategy starts with the widely shared intuition that X = x causes Y = y iff X = x is a Necessary Element of a Sufficient Set for Y = y (the NESS intuition, from now on). 12 Pearl claims that using causal models allows one to make this intuition formally precise, whereas existing logical notions of necessity and sufficiency lack the resources to do so. The second strategy is to demonstrate that his formal account offers intuitive verdicts for a number of problematic examples.
Ever since, Pearl's account has come under severe criticism. By now there are dozens of papers -both from philosophers and from researchers in AI -attempting to improve upon his account. 3 Most prominently, Pearl himself has offered several revisions of his account in collaboration with Halpern, culminating in the most recent revision by Halpern individually (Pearl, 2009;Pearl, 2001, 2005;Halpern, 2015Halpern, , 2016. Together these accounts of causation are referred to as the Halpern & Pearl definitions, or HP definitions for short, and they are by far the most influential accounts of causation out there. The problem with all of these attempts at revising Pearl's initial account, is that they completely ignore the first strategy and focus almost excusively on the second strategy. Roughly put, the typical setup is to go over some examples for which existing definitions give counterintuitive answers, and then to construct a new definition that does not do so. It is unrealistic to expect that this second strategy in and of itself can deliver a satisfactory account of causation, because there are too many examples and even more intuitions (Glymour et al., 2010;Beckers and Vennekens, 2018).
To solve this problem, this paper starts out with an explicit focus on the first strategy. It is striking that immediately after discussing the NESS intuition, Pearl diverges into complicated technical notions like "sustenance" and "causal beams" and never looks back, be it in his book or in the subsequent work on the HP definitions. Instead I offer what is the most natural route down the first strategy, namely to look at formalizations of causal sufficiency (as opposed to logical sufficiency) and combine them with two interpretations of necessity. Taken together this results in twelve distinct formal definitions of actual causation.
These definitions are compared to each other and to the HP definitions, leading to several interesting results. For one, it turns out that one of these twelve definitions is equivalent to the most recent HP definition (Halpern, 2015(Halpern, , 2016. Therefore this paper is the first to show that one of the HP definitions succeeds in delivering Pearl's promise. At the same time, it also shows that the other HP definitions do not. Next we turn to the second strategy. Given the diversity of intuitions about the many examples presented in the literature, the best we can do is arrive at a comparative verdict: does one of the definitions here developed fare better than the HP definitions? In order to avoid relying on my own intuitions, I present two criteria by which we can answer this question. First, I make use of Halpern and Pearl's own examples and rely almost exclusively on their intuitions, which for the most part align with the consensus in the literature. (Example 7.7 forms a notable exception.) Here the answer is that one of the twelve definitions does better than the HP definitions. Second, I present six examples that are very similar to each other, and assess which definitions are able to handle them in a consistent (and preferably also intuitive) manner. Here the answer is that the previous definition again does better than the HP definitions.
Therefore I suggest adopting this definition of actual causation. Roughly, this definition states that X = x causes Y = y iff there is a set ⃗ W = ⃗ w so that (X = x, ⃗ W = ⃗ w) is sufficient for Y = y along a causal network ⃗ N and there exists some value x ′ so that (X = x ′ , ⃗ W = ⃗ w) is not sufficient for Y = y along any causal subnetwork of ⃗ N . This paper is laid out as follows. The next section introduces structural equations models, the formal causal models that are used to express all the definitions. Then I state the three most recent HP definitions in Section 3. Section 4 presents six notions of causal sufficiency and shows how they relate to each other. We then use these six notions to formalize actual causation along the NESS intuition in Section 5, and discuss several interesting results. After this theoretical groundwork, we start looking for the best definition. Two definitions are discarded by showing that they have certain unacceptable properties in Section 6. Finally, Section 7 compares the remaining definitions to each other and to the HP definitions by considering examples from Halpern & Pearl and a few additional ones.

Structural Equations Modeling
This section reviews the definition of causal models as they were introduced by Pearl (2000). Much of the discussion and notation is taken from Halpern (2016) with little change.

Definition 2.1 :
A signature S is a tuple (U, V, R), where U is a set of exogenous variables, V is a set of endogenous variables, and R a function that associates with every variable Y ∈ U ∪ V a nonempty set R(Y ) of possible values for Y (i.e., the set of values over which Y ranges). If ⃗ X = (X 1 , . . . , X n ), R( ⃗ X) denotes the crossproduct R(X 1 ) × ⋯ × R(X n ).
Exogenous variables represent factors whose causal origins are outside the scope of the causal model, such as background conditions and noise. The values of the endogenous variables, on the other hand, are causally determined by other variables within the model (both endogenous and exogenous).
Definition 2.2: A causal model M is a pair (S, F ), where S is a signature and F defines a function that associates with each endogenous variable X a structural equation F X giving the value of X in terms of the values of other endogenous and exogenous variables. Formally, the equation F X maps R(U ∪ V − {X}) to R(X), so F X determines the value of X, given the values of all the other variables in U ∪ V.
Note that there are no functions associated with exogenous variables; their values are determined outside the model. We call a setting ⃗ u ∈ R(U) of values of exogenous variables a context.
The value of X may depend on the values of only a few other variables. X depends on Y in context ⃗ u if there is some setting of the endogenous variables other than X and Y such that if the exogenous variables have value ⃗ u, then varying the value of Y in that context results in a variation in the value of X; that is, there is a setting ⃗ z of the endogenous variables other than X and Y and values y and y ′ of Y such that F X (y, ⃗ z, ⃗ u) ≠ F X (y ′ , ⃗ z, ⃗ u). We then say that Y is a parent of X.
We extend this genealogical terminology in the usual manner, by taking the ancestor relation to be the transitive closure of the parent relation (i.e., Y is an ancestor of X iff there exist variables so that Y is a parent of V 1 , V 1 is a parent of V 2 , ..., and V n is a parent of X). The descendant relation is simply the reversal of the ancestor relation (i.e., X is a descendant of Y iff Y is an ancestor of X.) A path is a sequence of variables in which each element is a child of the previous element.
In this paper we restrict attention to strongly recursive (or strongly acyclic) models, that is, models where there is a partial order ⪯ on variables such that if Y depends on X, then X ≺ Y . In a strongly recursive model, given a context ⃗ u, the values of all the remaining variables are determined (we can just solve for the value of the variables in the order given by ⪯). We often write the equation for an endogenous variable as X = f ( ⃗ Y ); this denotes that the value of X depends only on the values of the variables in ⃗ Y , and the connection is given by the function f . For example, we might have X = Y + 5.
An intervention has the form ⃗ X ← ⃗ x, where ⃗ X is a set of endogenous variables. Intuitively, this means that the values of the variables in ⃗ X are set to the values ⃗ x. The structural equations define what happens in the presence of interventions. Setting the value of some variables ⃗ X to ⃗ x in a causal model Given a signature S = (U, V, R), an atomic formula is a formula of the form X = x, for X ∈ V and x ∈ R(X). A causal formula (over S) is one of the form • ϕ is a Boolean combination of atomic formulas, • Y 1 , . . . , Y k are distinct variables in V, and A causal formula ψ is true or false in a causal setting, which is a causal model given a context. As usual, we write (M, ⃗ u) ⊧ ψ if the causal formula ψ is true in the causal setting (M, ⃗ u). The ⊧ relation is defined inductively. (M, ⃗ u) ⊧ X = x if the variable X has value x in the unique (since we are dealing with recursive models) solution to the equations in M in context ⃗ u (i.e., the unique vector of values that simultaneously satisfies all equations in M with the variables in U set to ⃗ u). The truth of conjunctions and negations is defined in the standard way. Finally, Y ←⃗ y , in which we assess the truth of ϕ).

HP Definitions
Now on to the HP definitions. As Pearl (2000)'s initial definition is a precursor to the HP definitions that gives less intuitive results and is far more complicated, I do not discuss it. (It is safe to say that by now it has been unanimously rejected.) Two of the HP definitions are developed by both Halpern and Pearl, whereas the third one is solely due to Halpern. The relations between them are extensively discussed by Halpern (2016).
The general form of all three definitions is as follows (where ϕ is a Boolean combination of atomic formulas): x is an actual cause of ϕ in (M, ⃗ u) if the following three conditions hold: Questions of actual causation are posed relative to an actual context ⃗ u, because as we know from the previous section a context completely determines which events actually took place. So AC1 represents the trivial requirement that the candidate cause and effect are among the events which took place. AC3 is also fairly straightforward: we should not consider redundant elements to be parts of causes. The real content of the definition lies with AC2.
Throughout the rest of the paper, settings of variables ⃗ V with superscript Settings of variables without any superscript can refer to any setting.
In line with the NESS intuition, we should expect AC2 to consist of formal variants of these two conditions: 4 At first glance, the first two HP definitions seem to meet this expectation: they consist of conditions AC2(a) and AC2(b), and Halpern refers to these as a "necessity condition" and a "sufficiency condition" (2015, p. 3). Upon closer examination, however, it is hard to see how either version of AC2(b) can sensibly be interpreted as capturing causal sufficiency.
We start with Original HP (Halpern and Pearl, 2001): AC2(a). There is a partition of V into two sets ⃗ Z and ⃗ W with ⃗ X ⊆ ⃗ Z and a setting ⃗ x ′ and ⃗ w of the variables in ⃗ X and ⃗ W , respectively, such that For that choice, AC2 states that the effect counterfactually depends on the cause when holding fixed the witness ⃗ ¬ϕ. Therefore AC2(a) can easily be interpeted as expressing a -contrastive -necessity condition: there exist contrast values ⃗ x ′ such that if those values were to obtain, then AC2(b) no longer holds.
The problem lies with interpreting AC2(b) as expressing causal sufficiency. The main obstacle lies in the absence of the requirement that ⃗ w = ⃗ w * , i.e., it is not required that the supposedly sufficient set of events ( ⃗ X = ⃗ x, ⃗ W = ⃗ w) actually took place. Therefore we cannot simply view ( ⃗ X = ⃗ x, ⃗ W = ⃗ w) itself as the causally sufficient set we are looking for. Although it cannot be excluded that the conditions imposed by invoking ⃗ Z (and ⃗ Y ) somehow ensure the existence of some other set that can be interpreted as a causally sufficient set, it is far from obvious that this is the case. This is confirmed by the fact that Halpern & Pearl do not even offer an attempt at giving an interpretation of AC2(b) as expressing causal sufficiency.
Matters get worse when we turn our attention to Updated HP (Halpern and Pearl, 2005):

AC2(b). For all subsets
We see that AC2(b) has become even more complicated, and yet no argument is given as to how this condition formalizes causal sufficiency, despite Halpern explicitly claiming that this is what it aims to do. 5 Instead, the updated version is justified on the basis of examples for which the previous version gave counterintuitive answers.
As a sidenote, Halpern and Pearl (2005) also define strong causation by demanding that the following condition holds in addition to the other two: This definition has received almost no attention in the literature, because according to Halpern & Pearl it is too strong. 6 As we shall see, this is unfortunate, because AC2(c) does adequately capture a variant of causal sufficiency.
Finally we have Modified HP, which is far simpler than the previous two (Halpern, 2015).

Definition 3.4: [Modified HP]
AC2. There is a set ⃗ W of variables in V − ⃗ X, and a setting ⃗ The crucial difference here is that Modified HP does require the witness to consist solely of events which actually took place, i.e., ⃗ w = ⃗ w * . It is straightforward to show that simply adding this requirement ensures that both versions of AC2(b) are satisfied automatically, and therefore an explicit sufficiency condition is not required. Halpern considers this definition to be an improvemement over the other two, and I agree with him. However, Halpern arrives at this conclusion based on the many examples in which it better agrees with intuition. As will become clear, another -and arguably more compelling -justification is to be found in the fact that it is the only definition of the three which has a natural interpretation as formalizing the NESS intuition with which we started. To get there, we need to step away from the HP definitions and start afresh. 5 Concretely, when discussing sufficient causality we find the following (Halpern, 2016, p. 53): The key intuition behind the definition of sufficient causality is that not only does ⃗ X = ⃗ x suffice to bring about ϕ in the actual context (which is the intuition that AC2 ( Halpern (2016) suggests treating "part of a cause" (i.e., any X = x that appears in ⃗ X = ⃗ x) as synonymous with "cause" when talking about Modified HP. I will follow this suggestion throughout whenever discussing the judgment of Modified HP in particular examples, unless stated otherwise. In stating theorems, however, the two are kept apart.
2: The HP definitions allow the effect to be any propositional formula ϕ, whereas the other definitions of causation will require effects to be of the form Y = y. A thorough discussion of complex effects is beyond the scope of this paper. I here limit myself to two observations.
• Although the definitions of causation here developed can be generalized to allow for conjunctive effects (i.e., effects of the form ⃗ Y = ⃗ y), it is not at all clear that we should want to do so. The reason is that we can easily include variables into the effect that have nothing whatsoever to do with the causes. Say we have a variable Y with equation Y = U , where U is an exogenous variable, and we are considering a context where U = 1. Then for any cause-effect pair ⃗ X = ⃗ x and ϕ, we automatically get that ⃗ X = ⃗ x also causes ϕ ∧ Y = 1, which is not a sensible result. Therefore we choose to simply exclude conjunctive effects.
• In the few examples in the literature where the HP definitions actually consider an effect ϕ that is not of the form Y = y, ϕ takes on the form Y = y 1 ∨ Y = y 2 , . . . , ∨Y = y n for some n. The definitions here developed can easily be generalized to also allow for such effects. For reasons of simplicitly I choose not to do so in general and limit the discussion of this generalization to one example for which it is required.
3: The definitions of sufficiency below (and the definitions of actual causation that follow in their wake) could be extended to also allow for exogenous variables as members of a sufficient set, so that exogenous and endogenous variables are treated alike. Since our goal is to make comparisons with the HP definitions, those would also have to be extended. Concretely, the HP definitions restrict causes to being endogenous variables, and they do not allow exogenous variables to be parts of a "witness" (the set ⃗ W above). For example, if we have Y = X ∨U where U ∈ U and we consider a context where U = 1 and X = 1, the HP definitions are unable to identify X = 1 as a cause because they disallow considering what happens when U = 0. The simplest way to sidestep this issue is to restrict ourselves to models where exogenous variables only appear in equations of the form V = U . In that manner, all influence of the exogenous variables can be overriden by interventions, reducing their role to simply providing us with the actual values of all variables. For any model which does not conform to this restriction, we can easily construct a very similar model that does: simply replace any exogenous variable U which appears in some equation that is not of this form with a new endogenous variable V U , and add the equation V U = U . For the previous example this results in the model with equations Y = X ∨ V U , V U = U . (Note that now the HP definitions do consider X = 1 to be a cause of Y = 1.)

Six Variants of Sufficiency
Throughout the rest of the paper, we take ⃗ X and ⃗ Y to be non-identical subsets of the endogenous variables V that appear in a causal model M . 7 Informally, to say that some setting ⃗ X = ⃗ x is sufficient for another setting ⃗ Y = ⃗ y, is to say that the latter follows from the former. 8 To formalize this requires making explicit what it means for one setting to "follow" from another. In the context of causal sufficiency, an obvious minimal demand is that this meaning captures the causal directionality. In the framework of causal models this comes down to treating ⃗ X = ⃗ x as an intervention and ⃗ Y = ⃗ y as a consequence of that intervention: if we set ⃗ X to the values ⃗ x, then ⃗ Y takes on the values ⃗ y. At least this much is clear.
Yet by saying this, we have said nothing at all about the other endogenous variables and their values, nor about the contexts in which we are evaluating the intervention. The difficulty lies in deciding what conditions we choose to impose on the other variables, both endogenous and exogenous. I consider six possible ways in which this decision can be made that are fairly natural, but this is by no means an exhaustive list.
We start with the strongest conditions possible: in all contexts, if we set ⃗ X to the values ⃗ x, then ⃗ Y takes on the values ⃗ y, independent of the values of all other variables. 9 Definition 4.1: We say that ⃗ The strength of this definition is also its weakness: by putting such strong demands on the sufficient set, many interesting sets are excluded. This restrictiveness becomes apparent later on when we add a necessity condition (Proposition 6.1): only parents can ever be part of a minimal directly sufficient set. A trivial example illustrates this point. Say the equation for Y is Y = A, the equation for A is A = X, and we are looking at a context in which X = 1. 10 Then X = 1 is not directly sufficient for Y = 1, because intervening on A overrides any 7 We take them to be non-identical to exclude calling a setting ⃗ X = ⃗ x causally sufficient for itself, and a fortiori to exclude calling it a cause of itself.
8 Note that in this paper we are interested in the causal sufficiency of settings of variables for other settings of variables. This is quite distinct from how the term "causal sufficiency" is sometimes used in the causal modelling literature, namely as a property of a set of variables in a causal graph. 9 Weslake (2015)  influence of X on Y . Still, there is clearly a sense in which X = 1 is causally sufficient for Y = 1. In particular, X = 1 is directly sufficient for (A = 1, Y = 1).
Generalizing this intuition provides us with the second form of sufficiency: there is some setting ⃗ N = ⃗ n that includes ⃗ Y = ⃗ y, so that in all contexts, if we set ⃗ X to the values ⃗ x, then ⃗ N takes on the values ⃗ n, independent of the values of all other variables. This can be formulated more succinctly as: Observe that another intuitive way of viewing X = 1 as being causally sufficient for Y = 1 in the simple example we just discussed, is to note that X = 1 is directly sufficient for A = 1 and A = 1 is directly sufficient for Y = 1. This intuition can also be generalized to define a form of sufficiency. Concretely, we can define strong sufficiency along a network as the transitive closure of direct sufficiency. 11 The following result shows that both forms of strong sufficiency are merely different ways of expressing the same notion of sufficiency (and hence the term is appropriately chosen). Taking in mind the earlier observation (to appear later as Proposition 6.1) that direct sufficiency combined with necessity is a relation between parents and children, we can safely think of a network as consisting of variables that lie on some path between ⃗ X and ⃗ Y . Doing so will make it easier to apply the definitions of causation to examples.
(Proofs of all Theorems are to be found in the Appendix.) Another obvious way to weaken the conditions on the values of the endogenous variables compared to direct sufficiency is to only consider the setting in which we leave the other variables alone, giving: in all contexts, if we set ⃗ X to the values ⃗ x and do not intervene on any other variable, then ⃗ Y takes on the values ⃗ y. 12 The following straightforward result shows the relative strengths of the above three notions of sufficiency.
So far we have considered three definitions that differ only with regards to the conditions they impose on the values of the endogenous variables: they all agreed on requiring their respective conditions to hold in all contexts. Yet questions of actual causation are posed relative to an actual context ⃗ u, and thus it is only natural that we should consider doing the same for questions of causal sufficiency. This adds three more definitions of sufficiency, which are simply the result of replacing the universal quantifier over contexts with a particular context that is assumed to be given.
Definition 4.9: We say that ⃗ Obviously the counterpart of Proposition 4.6 holds as well for these notions of actual sufficiency.

General Form of Causal Sufficiency
We can formalize and generalize the intuitions behind the definitions in the preceding section by showing that all six definitions of sufficiency can be interpreted as simply putting different constraints on the parameters that occur in the following general definition of sufficiency. (We only explicitly discuss the three definitions of "non-actual" sufficiency, but the same analysis trivially applies to the three definitions of actual sufficiency.) This definition is more complicated than Definitions 4.1, 4.2, and 4.5. Its use lies in the fact that it allows us to see exactly how the three definitions relate to each other, and how one can construct other definitions of sufficiency, by invoking the following trivial result.
Proposition 4.11: Definitions 4.1, 4.2, and 4.5, are equivalent to Definition 4.10 when making respectively the following choices for ⃗ N and ⃗ C: Proposition 4.11 could inspire even more variants of sufficiency. In fact, we have already come across the most obvious one: AC2(c). It is easy to see that it consists of choosing ⃗ N to be minimal given ⃗ C, i.e., ⃗ N = ⃗ Y , meaning it sits in between Weak and Strong Sufficiency. The condition also appears as a sufficiency condition in Pearl's notion of sustenance, which is the first step he takes towards formalizing the NESS intuition (2009, p. 317). Unfortunately it is also the last step, because the subsequent notions he introduces are far more complicated and bear no resemblance to NESS. The added complexity is introduced precisely because taken by itself sustenance fails to provide a sensible definition of causation, which is why I leave the exploration of this and other possible variants of sufficiency for another occasion.

Defining Causation using Sufficiency
We are finally ready to take up the main challenge: defining actual causation as the formal expression of the NESS intuition. In order to do so, several questions need to be answered: • Should we use actual sufficiency or not?
• Which of the three definitions of (actual) causal sufficiency should we use?
• Does necessity mean that there exist contrast values of ⃗ X so that the set would not be sufficient if those values obtained, or does it mean that the set is no longer sufficient when we remove the subset ⃗ X?
I have introduced six definitions of causal sufficiency in the previous section. For each definition, we can define causation using either of the two interpretations of necessity, giving twelve definitions of actual causation altogether. However, I will show that several of these are equivalent to each other, and one will be impossible to satisfy, leaving us with six definitions in the end. One of those will be Modified HP.

A Family of Definitions
As with the HP definitions, Definition 3.1 gives the general form of all definitions, except that ϕ is restricted to Y = y. (This restriction is assumed whenever comparisons are made with the HP definitions.) As before, the only difference lies with the content of AC2. Using the first interpretation of necessity, which we shall call contrastive necessity, the general form of AC2 is as follows: By replacing sufficiency in the General Definition of Causation with any of the six definitions of sufficiency from Section 4, we obtain six specific definitions of actual causation. 13 AC2(b) simply expresses causal sufficiency, whatever form it may take. AC2(a c ) offers a somewhat nuanced expression of necessity because it also focusses on subsets of ⃗ N . (Note that this nuance matters only for Strong Sufficiency, since for Weak and Direct Sufficiency ⃗ N = {Y } anyway.) The reason is that our interest lies with the sufficiency for Y = y, and the network ⃗ N is merely a means to that end. If ⃗ X = ⃗ x ′ accomplishes the same end using less means, then ⃗ X = ⃗ x was not necessary for achieving it. Under the second interpretation of necessity, which we shall call minimal necessity, AC2(a c ) is replaced with: Both interpretations of necessity are prima facie plausible. The contrastive interpretation is explicitly counterfactual in nature, whereas the minimal interpretation is more neutral. Our analysis will settle which one of them is to be preferred.
Filling in each of the six definitions of causal sufficiency into both versions of the General Definition of Causation gives twelve specific definitions of actual causation. I refer to each of these as Def x for x ∈ {1, . . . , 12} along the following convention: • Def 1 Contrastive actual weak sufficiency • Def 2 Contrastive actual strong sufficiency • Def 3 Contrastive actual direct sufficiency 13 Definition 5.1 can be made even more general by also incorporating ⃗ C from Definition 4.10. Since we are only considering notions of sufficiency for which ⃗ C is determined entirely by the other sets, there is no need to do so for our purposes. But it is important to keep this additional generality in mind if one wants to use alternative definitions of sufficiency.  4.2, 4.5, 4.7, 4.8, or 4.9), filling that into the General Definition of Causation where AC2(a) takes on AC2(a c ) or AC2(a m ) depending on whether x < 7 or not, and finally, filling those conditions AC2 into Definition 3.1. I illustrate the result of this construction for Def 2.
Admittedly, Def 2 looks even more complicated than Updated HP. Further on I provide some results that allow us in many cases to use simpler definitions as stand-ins for Def 2. More importantly, although the notation of Definition 5.2 is complicated, its meaning can be spelled out intuitively by stat- x is a Minimal Contrastively Necessary Subset of a Strongly Sufficient Set for Y = y (or MCNS 4 ). 14

Analysis
Let us now turn to investigating the relations between these definitions. (Knowing these relations before getting into the discussion of examples makes life a lot easier.) A first remark is that Def 7 is impossible to satisfy, as it requires A second remark is that Def 3 is equivalent to a condition that appears in Pearl's first definition of actual causation (1998). 15 Ignoring Def 7, we are still left with eleven candidate definitions of actual causation (fourteen candidates if we count the three HP definitions), whereas we would like to settle on just one. The rest of the paper is concerned with selecting the best definition out of the lot. As a first step, we can reduce the number of definitions by six.
Theorem 5.3: The following are all equivalences among the twelve definitions and the three HP definitions: • Def 3 iff Def 6 iff Def 9 iff Def 12 Theorem 5.3 offers our first interesting result: it shows that Modified HP succeeds in formalizing the NESS intuition, whereas the other two HP definitions do not. From now on I will ignore the definitions appearing on the right-hand side in Theorem 5.3. The following is a helpful result for applying some of the definitions going forward. (As is well known, the same result holds for Original HP (Halpern, 2016).) according to a definition that uses minimal necessity, then ⃗ X is a singleton.
The following result offers important insights into the relations between the remaining definitions.

Excluding Def 3 and Def 10
Two definitions can be excluded quickly. The following result shows why Def 3 is not a sensible candidate as a general definition of causation, since causation is obviously not restricted to parent-children pairs. Proposition 6.1: If ⃗ X = ⃗ x causes Y = y in (M, ⃗ u) according to Def 3, then ⃗ X is a singleton, and X is a parent of Y .
Although we can dismiss Def 3 as a general definition of causation, it is still a useful stand-in for -the arguably more complicated -Def 2 and Def 8 in case X is a parent of Y and X is not an ancestor of Y along any path that is longer than a single edge (which in fact covers a surprisingly large number of cases discussed in the literature). In such cases we say that X is only a parent of Y .
Proposition 6.2: If X is only a parent of Y , then Def 2, Def 3, and Def 8 are all equivalent for causes X = x.
A cornerstone of the counterfactual approach to causation is that counterfactual dependence is sufficient for causation. More formally, there is widespread consensus that causation should satisfy the following principle: 17 Accepting this principle means that Def 10 is excluded as well.
Proposition 6.3: Out of all definitions we have considered, Def 10 and Def 3 are the only ones which do not satisfy Dependence.
That leaves us with Def 2, Def 4, and Def 8 as possible alternatives to the HP definitions.

Def 2, Def 4, and Def 8, vs the HP definitions
We have shown that all twelve definitions we developed (including Modified HP) are instantiations of the General Definition of Causation (Def. 5.1), and thereby they improve upon Original HP and Updated HP as far as the first strategy goes. We now show that Def 2 also improves upon all three HP definitions as far as the second strategy goes, whereas Def 4 and Def 8 do not. In order to remain as neutral as possible, we go over Halpern & Pearl's own examples, compare the verdicts of our definitions to theirs, and stick as close as possible to their intuitions.

Comparison to Updated HP
The Updated HP definition is by far the most well-known. It was developed as an improvement of Original HP, which sometimes gives unreasonable answers. Halpern and Pearl (2005) offer many examples to illustrate how it works and how it successfully deals with paradigm cases of causation.
Their first example is one of those few cases -recall the beginning of Section 4 -in which the effect is of the form Y = y 1 ∨ Y = y 2 , and therefore allows us to illustrate how we can generalize the General Definition of Causation to such effects. It is also an example for which Def 8 gives the wrong answer, but the subsequent example is far simpler and more convincing in this respect.
Example 7.1: "Suppose that there was a heavy rain in April and electrical storms in the following two months; and in June the lightning took hold. If it hadn't been for the heavy rain in April, the forest would have caught fire in May." (Halpern and Pearl, 2005, p. 15) I agree with Halpern and Pearl's judgment that it would be very counterintuitive to say that the April rain caused the forest fire, since all it did was delay the fire. As they indicate, it is nevertheless perfectly sensible to say that the April rain caused the forest fire in June, as opposed to May. In order to capture this distinction, we need to invoke a disjunctive effect.
Let F represent there being a fire or not, with three possible values: 0 (no fire), 1 (fire in May), or 2 (fire in June). ES is a four-valued variable that captures whether there are electric storms: (0, 0) (no electric storms in either May or June), (1, 0) (electric storms in May but not in June), (0, 1) (storms in June but not May), and (1, 1) (storms in both May and June). Lastly, AS is a binary variable expressing whether or not there was April rain.
The equation for F is then given by: 0)), and F = 0 otherwise. Given that F = 2 counterfactually depends on AS = 1, all definitions we are considering agree that AS = 1 causes F = 2. The question is whether AS = 1 also caused there to be a fire, i.e., whether it caused F = 1 ∨ F = 2.
We can easily generalize sufficiency to such disjunctions: x is sufficient for Y = y ′ . When integrated into our General Definition of Causation, this results in splitting up AC2(a) so that there is one instance for each disjunct. AC2(b) need not be split up, since it can only ever be satisfied for the actual value of Y . 18 Let us apply this idea to our example. To satisfy AC2(b), we have to add ES to the witness: (AS = 1, ES = (1, 1)) is directly sufficient for F = 2 and AS = 1 is not. (We can focus on direct sufficiency because AS is only a parent of F . We cannot invoke Proposition 6.2 though, since that requires an effect Y = y.) We then see that one of the two conditions that now make up AC2(a) is not satisfied for Def 2 and Def 4, because (AS = 0, ES = (1, 1)) is directly sufficient for F = 1. Therefore Def 2 and Def 4 agree with the HP definitions that the April rain did not cause the forest fire. But Def 8 does not reach this verdict, because ES = (1, 1) is not directly sufficient for either F = 1, nor is it for F = 2. This means AC2(a) is fullfilled for Def 8, which leads to a mistaken conclusion.
Although one counterexample need not disqualify a definition, the following example is indicative of a deeper problem with Def 8: whenever X = x strongly suffices for Y = y, it is automatically a cause according to Def 8, since ∅ is never strongly sufficient for Y = y. The following example is but one of many paradigm cases in the literature for which this property leads to a counterintuitive verdict. 19 Therefore Def 8 is also excluded as a definition of causation.
Example 7.2: "The engineer is standing by a switch in the railroad tracks. A train approaches in the distance. She flips the switch, so that the train travels down the right-hand track, instead of the left. Since the tracks reconverge up ahead, the train arrives at its destination all the same...
Again, our causal model gets this right. Suppose we have three random variables: • F for "flip", with values 0 (the engineer doesn't flip the switch) and 1 (she does); • T for "track", with values 0 (the train goes on the left-hand track) and 1 (it goes on the right-hand track); and • A for "arrival", with values 0 (the train does not arrive at the point of reconvergence) and 1 (it does).
" (Halpern and Pearl, 2005, p. 26) 18 Note that this means generalizing to disjunctions across different variables -i.e., something like Y = y ∨ Z = z -is more complicated.
19 McDermott (1995) offers an almost identical example involving a dog biting a terrorist. Another famous case is that involving a boulder rolling towards a hiker (Hitchcock, 2001). All of these examples are counterexamples to the transitivity of causation. The failure of transitivity has become broadly accepted by now (Beckers and Vennekens, 2017). Despite what Def 8's behavior in these examples might suggest, it is also not transitive. A simple counterexample consists of equations Z = Y ∨ W , and Y = X ∧ W . If X = W = 1, Def 8 considers X = 1 a cause of Y = 1, Y = 1 a cause of Z = 1, yet it does not consider X = 1 a cause of Z = 1. First observe that as described, this causal model makes little sense: the equation for A is given by A = T ∨ ¬T , which can be rewritten as A = 1. This can be fixed by extending the range of T with a value 2, representing the train not going down any track (because it breaks down, for example). Then the equations become A = (T ≠ 2) and T = F . The context is such that F = 1. F = 1 is both weakly sufficient for A = 1 and strongly sufficient for A = 1 along {T }, but so is F = 0. Therefore Def 2 and Def 4 agree with Updated HP (and with intuition) that flipping the switch is not a cause of the train's arrival. Def 8 fails to reach this verdict, because ∅ is not strongly sufficient for A = 1.
Def 4 suffers from an even bigger defect than Def 8: it fails to distinguish preempted causes from preempting causes. Since preemption cases are the bread and butter of the literature on actual causation, this means that Def 4 is immediately disqualified. The following is a famous example of late preemption discussed by Halpern and Pearl (2005) (and originally by Hall (2004)).
Example 7.3: Suzy and Billy both throw a rock at a bottle. Suzy's rock gets there first, shattering the bottle. However Billy's throw was also accurate, and would have shattered the bottle had it not been preempted by Suzy's throw. Halpern and Pearl (2005) use the following variables for this example, which capture the fact that Billy's throw was preempted by Suzy's rock hitting the bottle: BS for the bottle shattering, BH, SH for Billy's (resp. Suzy's) rock hitting the bottle, and two more variables (BT , ST ) for either of them throwing their rock. The equations are then as follows: BS = BH ∨ SH, SH = ST , BH = BT ∧ ¬SH. None of the definitions has any problem arriving at the obvious result that Suzy's throw (ST = 1) causes the bottle to shatter (BS = 1). However, Def 4 is the only definition under consideration that mistakenly also judges Billy's throw to be a cause of the bottle's shattering: in all contexts BT = 1 is weakly sufficient for BS = 1, whereas BT = 0 is not weakly sufficient for BS = 1 in the context where ST = 0. This leaves us with Def 2 as the last potential alternative to the HP definitions. Going through the many remaining examples, there is only one in which Def 2 disagrees with Updated HP. I leave it to the reader to verify this claim, and restrict the discussion to that single example.
Example 7.4: Major (M ) and sergeant (S) stand before corporal, and both shout 'Charge!' (M = 1, S = 1). The corporal charges (C = 1). Orders from higher-ranking soldiers trump those of lower rank, so if the major had shouted 'Halt' (M = 0) the corporal would not have charged. If the major remains quiet (M = −1), the corporal listens to the sergeant. 20 The equation for C is thus: C = M if M ≠ −1 and C = S otherwise. The majority intuition is that the sergeant did not cause the corporal to charge, because his order was trumped by that of the major. 21 Def 2 agrees, as it does not consider S = 1 a cause of C = 1. The reason is that M = 1 is directly sufficient by itself, and yet S = 1 needs M = 1 as a witness to form a sufficient set. S = 1 is a cause of C = 1 according to both Original HP and Updated HP. Halpern & Pearl do not consider this to be problematic, but they do go through the trouble of showing how Original HP and Updated HP change their verdict if one adds extra variables to the model. Moreover, Modified HP also agrees with Def 2 here. Given Halpern's later preference for Modified HP, it is fair to say that Def 2 does at least as good as Updated HP on this example.

Comparison to Modified HP
Dissatisfied with Updated HP due to the many counterexamples that were presented in the literature, Halpern (2015)  There is only one example in which Def 2 disagrees with Modified HP. 23 Crucially, it is an example for which Halpern agrees that Modified HP reaches the wrong verdict.
Example 7.6: A ranch has five individuals: a 1 , . . . , a 5 . They have to vote on two possible outcomes: staying at the campfire (O = 0) or going on a round-up (O = 1). Let A i be the random variable denoting a i 's vote, so A i = j if a i votes for outcome j. There is a complicated rule for deciding on the outcome. If a 1 and a 2 agree (i.e., if A 1 = A 2 ), then that is the outcome. If a 2 , . . . , a 5 agree, and a 1 votes differently, then the outcome is given by a 1 's vote (i.e., 21 See Weslake (2015) for a discussion. 22 When discussing Example 3.8 again in (Halpern, 2016), he mistakenly claims that Modified HP agrees with Updated HP when treating parts of causes as causes. In response, Halpern has suggested a small variation on the example in which Modified HP indeed does agree with Updated HP (personal communication). For that variation, Def 2 also agrees with the HP definitions. 23 Halpern (2016) discusses far more cases, but none of them reveal any further disagreements between these definitions. O = A 1 ). Otherwise, majority rules. In the actual situation, A 1 = A 2 = 1 and A 3 = A 4 = A 5 = 0, so by the first mechanism, O = 1. 24 Halpern states, and I agree, that intuitively one should expect only A 1 = 1 and A 2 = 1 to be causes of O = 1. After all, a 3 , . . . , a 5 voted against O = 1. Def 2 gives that result, whereas Modified HP considers every vote to be a cause. Halpern argues for adding more variables to the model in order to get the right outcome, but it speaks in favor of Def 2 that it is able to give the right answer with just these variables.
We conclude that judged by the second strategy and Halpern & Pearl's own examples, Def 2 does better than Updated HP and at least as good as Modified HP. Lastly we consider a very simple example that was offered as a counterexample to Modified HP by Rosenberg and Glymour (2018).
Example 7.7: We have equations Y = X ∨ D and X = D, and we consider a context such that D = 1. This looks very much like a standard case of overdetermination in which X = 1 and D = 1 are both overdetermining causes. That is also the verdict of all of the definitions considered in this paper, except for Modified HP: it does not consider X = 1 a cause of Y = 1. The reason for this is that Y = 1 depends counterfactually on D = 1 by itself, whereas it does not depend on X = 1 by itself and nor does it when we take D = 1 as a witness. Rosenberg and Glymour (2018) state that Halpern endorses this conclusion, but offer the following story to motivate why they consider that an untenable position.
"An obedient gang is ordered by its leader to join him in murdering someone, and does so, all of them shooting the victim at the same time, or all of them together pushing the plunger connected to a bomb. The action of any one of the gang would suffice for the victim's death. If responsibility implies causality, whom among them is responsible? Were you among the jury, whom would you convict? What ought the Hague Court to do in cases of subordinates sure to obey orders? Halpern's theory says the gang leader and only the gang leader is a cause of the victim's death. This is a morally intolerable result; absent a plausible general principle severing responsibility from causation, any theory that yields such a result should be rejected." Even if one disagrees with this judgment, the next section offers further motivation for preferring Def 2 over Modified HP.

Def 2 vs the Others
Finally I will argue that Def 2 does better than all of the other definitions on a few more examples according to two metrics: it offers verdicts that are both intuitively plausible and consistent across minor changes of the examples. Before doing so, I present an example that illustrates a special property of Def 2.
Recall from Section 3 that it is a necessary condition for all three HP definitions that there exists some [ ⃗ W ← ⃗ w] such that Y = y counterfactually depends on ⃗ X = ⃗ x under that intervention. The same is true for the most well-known definitions out there that have been inspired by the HP definitions (see Weslake (2015) for an overview), as well as for Def 3, Def 4, and Def 10. Let us call definitions with this property strongly counterfactual. Although Def 2 clearly also relies on counterfactuals, and thus falls within the counterfactual approach to causation, it is not strongly counterfactual, as the following example shows. 25 Example 7.8: The equation for a binary variable Y is such that Y = 1 iff N ≠ 0, and the range for N is {0, 1, 2, 3}. The equation for N is as follows: . In a context where A = W = X = 1, we get that X = 1 causes Y = 1 according to Def 2. Yet there is no intervention such that Y = 1 depends on X = 1 under that intervention (and thus none of the other definitions would consider X = 1 a cause of Y = 1). In this case, both answers seem plausible. Def 2 reaches its verdict because of the asymmetry between (A = 1, X = 1) and (A = 1, X = 0): only the former is by itself causally sufficient for a network that results in Y = 1, whereas the latter also needs the assistance of W = 1 or W = 0. Intuitively, I would find it unacceptable to consider X = 1 a cause whenever D = 0, regardless of the relation between A and D. The disjunct in which X appears is false, and therefore it played no positive part whatsoever in causing Y = 1. Perhaps others are more tolerant. But even if that is the case, one should expect one's verdicts to exhibit some consistency. As we will see, Def 2 and Original HP are the only definitions which can meet this demand.
The situation is simplest for Original HP: it considers X = 1 a cause of Y = 1 no matter what. To see why, take as a witness (D = 1, A = 0). Holding fixed that witness, Y = 1 counterfactually depends on X = 1. Since ⃗ Z = {X}, the former is equivalent to AC2 for Original HP. So we gain consistency, but at the price of extreme tolerance. In fact, Halpern and Pearl use precisely this example to argue against Original HP and in favor of Updated HP (2005, p. 35): Example 7.9: "Suppose that a prisoner dies either if X loads D's gun and D shoots, or if A loads and shoots his gun. Taking Y to represent the prisoner's death and making the obvious assumptions about the meaning of the variables, ... [we can use the equation described above]. Suppose that X loads D's gun (X = 1), D does not shoot (D = 0), but A does load and shoot his gun (A = 1), so that the prisoner dies. Clearly A = 1 is a cause of Y = 1. We would not want to say that X = 1 is a cause of Y = 1, given that D did not shoot (i.e., given that D = 0)." [emphasis added] If we agree with Halpern and Pearl here -which I do -then Original HP can be discarded on the basis of this example (and on the basis of the many others we discussed previously, of course). I leave it to the reader to verify that none of the other definitions consider X = 1 to be a cause here.
However, the only definition that applies the intuition underlying this example to all cases in which D = 0 is Def 2. Moreover, it is the only remaining definition that offers a simple consistent answer in all cases: X = 1 is a cause of Y = 1 iff D = 1. To see why this is the case, we go over the possible directly sufficient sets. (Since X is only a parent of Y , we can invoke Proposition 6.2 and use Def 3 instead of Def 2.) Clearly X = 1 is not directly sufficient for Y = 1 by itself. It is also clear that we cannot add A = 1 to the witness, because A = 1 is directly sufficient for Y = 1 all by itself. Therefore we are forced to choose D as our witness. If D = 0, this gives (X = 1, D = 0), which is not directly sufficient for Y = 1 and thus X = 1 is not a cause. If D = 1, we get (X = 1, D = 1), which is directly sufficient for Y = 1. Since the same does not hold for (X = 0, D = 1), X = 1 is a cause of Y = 1.
The following examples show that Updated HP and Modified HP flipflop between calling X = 1 a cause or not even when holding fixed the value of D. Of course I cannot exclude the possibility that some consistent argumentation can be offered to explain the results of one of these definitions, but in its absence all of this speaks in favor of Def 2. We start with the three possible ways in which it can arise that D = 1.
Example 7.10: First consider the case where D is determined by the context, and we have a context such that D = 1. Here all four definitions agree that X = 1 is a cause of Y = 1.
Example 7.11: Second consider the case where the equation for D is given by D = A and thus again D = 1 in the context under consideration. Here Updated HP and Modified HP flip their verdict, as they no longer consider X = 1 a cause of Y = 1.
Example 7.12: Third, we simply flip the relation between A and D so that A = D, and again D = 1 in the context under consideration. Now Updated HP and Modified HP go back to considering X = 1 a cause of Y = 1.
Next we consider the two remaining possible cases where D = 0 (Example 7.9 was the first such case).
Example 7.13: Consider the case where the equation for D is D = ¬A. As with Example 7.9, we have that D = 0, and yet Updated HP changes its verdict, calling X = 1 a cause of Y = 1.
Example 7.14: 26 Lastly, consider the case where the equation for D is A = ¬D, and thus we again have that D = 0. Now both Modified HP and Updated HP flip their verdicts as compared to Example 7.9. To see why, it suffices to consider Modified HP. The result for Updated HP then follows from Theorem 5.5. D = 0 by itself is not a cause of Y = 1 because there is no choice of witness that makes Y = 1 counterfactually depend on D = 0. Since Y = 1 does counterfactually depend on (X = 1, D = 0), X = 1 is part of a cause of Y = 1.

Conclusion
I have developed twelve definitions of actual causation that formalize the NESS intuition with which Pearl started, and have shown that the most recent of the HP definitions is among them. Although these definitions vary widely in terms of the verdicts they reach, they all resemble each other as being instantiations of the same general definition. Each definition is made up of two elements: a definition of causal sufficiency, and a definition of necessity. Other definitions can easily be developed by playing around with these elements.
After studying various properties of these definitions and the relations between them, I moved on to the process of selecting the definition that does best in practice. In the majority of the many examples that we have considered, Def 2 agrees with Modified HP. However, in Section 7.2 we came across two examples for which Def 2 disagreed with Modified HP and where Modified HP gave the wrong verdict. Moreover, contrary to Modified HP, Def 2 manages to give consistent (and intuitive) answers to the group of cases considered in the previous section. Therefore I conclude by suggesting that we should adopt Def 2 as a definition of actual causation. This definition is made up of strong sufficiency and contrastive necessity. It states that ⃗ X = ⃗ x causes Y = y iff ⃗ X = ⃗ x is a Minimal Contrastively Necessary Subset of a Strongly Sufficient Set for Y = y, or MCNS 4 .

A Appendix
Causal Sufficiency Proof: First assume ⃗ X = ⃗ x is strongly sufficient for ⃗ Y = ⃗ y in M and ⃗ N can be used to show this. Then the result follows immediately from the observation In particular, we have that for all ⃗ a ∈ R( ⃗ A) and all ⃗ u ∈ R(U), Combined with the conclusion from the previous paragraph, it follows that for all ⃗ a ∈ R( ⃗ A) and all ⃗ u ∈ R(U), we can generalize this reasoning for all consecutive i ∈ {3, . . . , k + 1} to get the desired outcome.

Defining Causation using Sufficiency
Theorem 5.3: The following are all equivalences among the twelve definitions and the three HP definitions: • Modified HP iff Def 1 • Def 2 iff Def 5 • Def 8 iff Def 11 • Def 3 iff Def 6 iff Def 9 iff Def 12 Proof: First we consider the equivalences that do hold.
We start with the first equivalence: Modified HP iff Def 1. This is simply a matter of explicitly writing out the definitions, starting with actual weak sufficiency: x]Y = y. Next we note that the following condition is trivially satisfied for any ⃗ W ⊆ V: Combining both claims, we can rewrite Modified HP as follows, which gives the desired result: Next we consider all of the following equivalences: Def 2 iff Def 5, Def 8 iff Def 11, Def 3 iff Def 6, Def 9 iff Def 12. The reason we can group these together, is because we can prove all of them by invoking the following observation and two subsequent lemmas.
Observation 1 Recall our restriction on causal models that exogenous variables only appear in equations of the form V = U . Say ⃗ R ⊆ V are all variables which have such an equation, and call these the root variables. It is clear that if we intervene on all of the root variables, they take over the role of the exogenous variables. Concretely, given strong recursivity, for any setting ⃗ r ∈ R( ⃗ R) there exists a unique setting ⃗ v ∈ R(V) so that for all contexts ⃗ u ∈ R(U) we have that Lemma A.1: Given a setting ⃗ X = ⃗ x, a setting ⃗ N = ⃗ n that includes Y = y and such that ⃗ N ∩ ⃗ R = ∅, a context ⃗ u, the following holds: 27 Proof: Filling in the definitions of direct and actually direct sufficiency, the first equivalence reduces to the following: Because of Observation 1, we have that for any setting ⃗ v ∈ V and any setting Combining this with the fact that ⃗ R ⊆ ( ⃗ C ∪ ⃗ X) gives the desired result.
The second equivalence can be reformulated as follows: , and therefore we can apply the same reasoning as before.
Lemma A.2: For all twelve instances of the General Definition of Causation we can restrict ourselves to sets For all definitions using either variants of direct or weak sufficiency the result follows immediately from the fact that First consider the case where we use non-actual strong sufficiency (Def 5 or Def 11). In that case, AC2(b) can never be satisfied unless ⃗ A = ∅. To see why, 27 ⃗ R is defined in Observation 1.
note that in all contexts ⃗ u ′′ ∈ R(U), it has to hold that (M, ⃗ and the equation for each element A i ∈ ⃗ A is of the form A i = U for some exogenous variable U , this is impossible. (Strictly speaking it is possible, namely if the range of U consists only of the single value a * i . Although I did not make this explicit in Section 2, it is standard to assume that all variables have a range that contains at least two elements.) Second consider the case where we use actual strong sufficiency and contrastive necessity (Def 2). (The case of Def 8 is entirely analogous.) Say we are considering a candidate cause ⃗ X = ⃗ x, a candidate witness ⃗ W = ⃗ w * , contrast values ⃗ x ′ , and a setting ⃗ N = ⃗ n that includes Y = y. Given AC1, we can safely assume that ⃗ n = ⃗ n * . I claim that the following holds, from which the result follows:

Using these observations and the fact that ⃗
A ⊆ ⃗ N , we get that the following two conditions are equivalent, for which the result follows as far as AC2(b) is concerned: Now we focus on AC2(a c ). Let us first assume AC2(a c ) holds for ⃗ . Therefore we can choose ⃗ t = ( ⃗ a 1 , ⃗ t 1 ). Next we consider the other direction: assume AC2(a c ) holds for ⃗ X = ⃗ x, contrast values ⃗ x ′ , witness ⃗ W = ⃗ w * , and network ⃗ N . We need to show that it holds for ⃗ Because of the above lemmas, all that remains is to show that the above equivalences hold also when Y ∈ ⃗ R. This is accomplished by showing that settings of such variables do not have any cause, regardless of the definition one uses.
AC2(a) requires us to look at all subsets of ⃗ N = ⃗ n that include Y = y, and verify that the candidate cause and witness ( ⃗ is not sufficient for that subset. One such subset is the one containing just Y = y. By AC1, we have that there is no intervention on the other endogenous variables so that Y ≠ y under that intervention in ⃗ u. Therefore any definition of causation using a version of actual sufficiency (i.e., Def 2, Def 3, Def 8, and Def 9) considers all sets that do not include Y to be sufficient for Y = y in (M, ⃗ u). In particular, they consider ( ⃗ ⃗ u), and thus fail to meet condition AC2(a). For the definitions using non-actual variants of sufficiency (Def 5, Def 6, Def 11, and Def 12), it is condition AC2(b) that can never be satisfied. Analogous to what we saw in the proof of Lemma A.2, this follows from the fact that whatever version of sufficiency we use, Y = y has to hold in all contexts, which is impossible given that Y ∈ ( ⃗ X ∪ ⃗ W ). From this the result follows. Now we prove the only remaining equivalence: Def 6 iff Def 12. (Given the previous equivalences, other choices are possible too.) We need to show that the following two statements are equivalent: Filling in Definition 4.1, the result follows immediately: Second, we go over some examples to show that none of the other equivalences hold. (Obviously, from now on we may ignore Def 1, Def 5, Def 6, Def 7, Def 9, Def 11, and Def 12.) Example A.3: Equations: Y = (X ∧ A) ∨ D, D = A. Context: A = 1. Then X = 1 is a cause of Y = 1 according to: • Modified HP: We can always consider choosing ⃗ W = ∅, in which case we simply get counterfactual dependence: Doing so in this example, we see that Y = 1 counterfactually depends on (X = 1, D = 1). There is clearly also no witness ⃗ W = ⃗ w * to show that X = 1 or D = 1 are causes by themselves, so X = 1 is part of a cause.
• Updated HP and Original HP: taking (A = 1, D = 0) as a witness meets the conditions.
• Def 2: follows from the previous item and Theorem 5.5.
• Def 8: follows from the previous item and Theorem 5.5. X = 1 is not a cause of Y = 1 according to: • Def 10: X = 1 by itself does not weakly suffice for Y = 1 (just look at a context in which A = 0), so we need to add A or D to the witness. But both A = 1 and D = 1 each weakly suffice for Y = 1.
So we know that Def 4 and Def 10 are not equivalent to any of the other definitions. We give an example to show that Def 4 and Def 10 are not equivalent to each other either.
Example A.4: Equations: Y = X ∧ A, X = A. Context: A = 1. Since X = 1 is not weakly sufficient for Y = 1, we need to include A = 1 in the witness. Indeed, (X = 1, A = 1) is weakly sufficient for Y = 1. However, so is A = 1, and therefore X = 1 does not cause Y = 1 according to Def 10. Yet (X = 0, A = 1) is not weakly sufficient for Y = 1, and therefore X = 1 causes Y = 1 according to Def 4. This leaves us with the HP definitions, Def 2, Def 3, and Def 8. The next example shows that the former are not equivalent to the latter.
Example A.5: Equations: Y = (X ∧ ¬A) ∨ D, D = A. Context: A = 1. Then X = 1 is a cause of Y = 1 according to: • Modified HP: Y = 1 counterfactually depends on (X = 1, A = 1), and not on either X = 1 or A = 1. So X = 1 is part of a cause.
• Updated HP and Original: take A = 0 as a witness. X = 1 is not a cause of Y = 1 according to: • Def 3: X = 1 by itself does not directly suffice for Y = 1 (just look at [A ← 1, D ← 0]), so we need to add A or D to the witness. Since the actual value of A is 1, it is of no use, which leaves us with D. But D = 1 directly suffices for Y = 1 by itself, and thus so does (X = 0, D = 1).
• Def 2: follows from the previous item and Proposition 6.2.
• Def 8: follows from the previous item and Proposition 6.2.
That none of the HP definitions are equivalent is of course a well-established fact, and also follows from the examples we consider in Section 7. Therefore we are left with showing that Def 2, Def 3, and Def 8 are not equivalent. That Def 3 differs from the other two is a direct consequence of some of our later results, but a simple example illustrates this as well.
Example A.6: Equations: Y = A, A = X. Context: A = 1. Then it is easy to see that X = 1 causes Y = 1 according to all definitions here considered, except for Def 3.
Lastly, I refer the reader to Example 7.2 in Sections 7 for an example that shows Def 2 and Def 8 are not equivalent.
according to a definition that uses minimal necessity, then ⃗ X is a singleton.
Proof: Since we know that Def 7 is unsatisfiable and we have Theorem 5.3, we only need to consider Def 3, Def 8, and Def 10. The following applies to both weak and direct sufficiency (i.e., Def 3 and Def 10.) is not minimal. So let us assume that neither ( ⃗ is sufficient for Y = y. This means we can move ⃗ X 2 to the witness to show that ⃗ X 1 = ⃗ x 1 satisfies AC2 by itself, and likewise for ⃗ X 2 and ⃗ X 1 reversed. From this the result follows. Now we prove that it also holds for strong sufficiency, i.e., for Def 8. Assume So let us assume that neither ( ⃗ If the same is true for all subnetworks ⃗ S ⊆ ⃗ N , then as before, we can move either one of ⃗ X 1 and ⃗ X 2 to the witness to show that the other satisfies AC2 by itself.
So let us assume that there is some subnetwork ⃗ (Obviously the same reasoning applies to ⃗ X 2 .) Since all subnetworks ⃗ S ′′ of ⃗ S ′ are also subnetworks of ⃗ N , it follows from the above that ( ⃗ X 1 = ⃗ x 1 ) satisfies AC2 by itself when taking ⃗ W as witness and ⃗ S ′ as network. From this the result follows.
Theorem 5.5: The only implications -involving either causes or parts of causes -between the remaining five definitions (Def 2, Def 3, Def 4, Def 8, and Def 10) and the three HP definitions are the following ones (and their immediate consequences, of course): Fourth we prove the last implication. Assume X = x causes Y = y with witness ⃗ W according to Def 10. (We know because of Proposition 5.4 that ⃗ X is a singleton.) In other words, (X = x, ⃗ W = ⃗ w * ) is weakly sufficient for Y = y, and ⃗ W = ⃗ w * is not weakly sufficient for Y = y. Remains to be shown that there exist a value x ′ so that ( which is what remained to be shown. Fifth, we show that none of the remaining implications hold. (Again, we do not consider the relations amongst the HP definitions explicitly and refer the reader to the examples in Section 7. We also do not explicitly consider the remaining implications for parts of causes, but the reader can verify that the following examples suffice to falsify all those implications as well. For the left-hand side of all implications this follows immediately from the fact that the causes in all the following examples are singletons. For the right-hand side of implications, Propositions 5.4, 6.1, and 6.2 come in handy.) Example A.4 shows that Def 4 does not imply Def 10. Example A.3 shows that none of the other definitons imply either Def 4 or Def 10. So there are no remaining implications with either Def 4 or Def 10 on the right-hand side.
Example A.6 shows that Def 3 is not implied by any definition. Example A.5 shows that none of the HP definitions imply Def 2 or Def 8. Note that Def 4 and Def 10 also consider X = 1 a cause of Y = 1 in that example (since X = 1 is weakly sufficient for Y = 1, whereas X = 0 or the emptyset is not). Further, Example 7.2 shows that Def 8 does not imply Def 2. Therefore there are no remaining implications with Def 2 or Def 8 on the right-hand side.
That leaves us to consider implications with one of the HP definitions on the right-hand side. Given the first two implications of Theorem 5.5, it suffices to show that none of Def 4, Def 2, Def 8, or Def 10, imply Original HP, and that Def 3 does not imply Updated HP.
I refer the reader to Example 7.8 in Section 7 for an example where Def 2 -and thus also Def 8 -hold and Original HP does not.
The following example shows that neither Def 4 nor Def 10 implies Original HP.
• Def 4: follows from the previous one. Yet X = 1 is not a cause of Y = 1 according to Original HP. To see why, note that we need to include A = 0 into the witness in order to get AC2(a), and we must exclude Z 1 . Also, we clearly cannot add Z 2 = 1. Therefore the witness has to be A = 0. The actual value of Z 2 is 0. Since we have (M, ⃗ u) ⊧ [X ← 1, A ← 0, Z 2 ← 0]Y = 0, AC2(b) is not satisfied.
Lastly, an example to show that Def 3 does not imply Updated HP.
Example A.8: Equations: Y = (X ∧ D) ∨ A, D = A. Context: A = 1 and X = 1. Then X = 1 is a cause of Y = 1 according to Def 3: (X = 1, D = 1) is directly sufficient for Y = 1, and (X = 0, D = 1) is not. But X = 1 is not a cause of Y = 1 according to Updated HP. To see why, note that we need to include A = 0 into the witness in order to get AC2(a). But (M, ⃗ u) ⊧ [X ← 1, A ← 0]Y = 0, thus falsifying AC2(b) for Updated HP.
Excluding Def 3 and Def 10 Proposition 6.1: If ⃗ X = ⃗ x causes Y = y in (M, ⃗ u) according to Def 3, then ⃗ X is a singleton, and X is a parent of Y .
Proof: That ⃗ X is always a singleton is a direct consequence of the combination of Proposition 5.4 and Theorem 5.3.
Recall that X is a parent of Y iff there exists a context ⃗ u ′′ , a setting ⃗ z ∈ R(V − {X, Y }), and values x, x ′′ of X so that F Y (⃗ u ′′ , ⃗ z, x) ≠ F Y (⃗ u ′′ , ⃗ z, x ′′ ). This means precisely that for some y ∈ R(Y ), (M, ⃗ If X = x causes Y = y according to Def 3, the existence of values such that the previous holds follows immediately.
Proposition 6.2: If X is only a parent of Y , then Def 3, Def 2, and Def 8 are all equivalent for causes X = x.
Proof: Given Theorem 5.5, we only need to prove the implication from Def 8 to Def 3.
Assume X is only a parent of Y , and X = x causes Y = y according to Def 8. Thus, there is a witness ⃗ W and some network ⃗ N such that (X = x, ⃗ W = ⃗ w * ) is strongly sufficient for Y = y along ⃗ N , and ( ⃗ W = ⃗ w * ) is not strongly sufficient for Y = y along any subnetwork of ⃗ N . First consider the case where ⃗ N = ∅. This means that (X = x, ⃗ W = ⃗ w * ) is directly sufficient for Y = y, and ( ⃗ W = ⃗ w * ) is not directly sufficient for Y = y. That means precisely that X = x causes Y = y according to Def 12. The result now follows from Theorem 5.3.
Second consider the case where there exists some N ∈ ⃗ N . If N is not an ancestor of Y , it can be removed from ⃗ N without consequence. If N is an ancestor of Y , then it cannot be a descendant of X. But in that case it does not depend on X, and thus we can remove it from ⃗ N and add it to the witness ⃗ W without consequence. Therefore there always exists a choice of witness so that ⃗ N = ∅, and thus the result follows. Proof: For the HP definitions this is proven in (Halpern, 2016, p. 26).
Example A.6 shows the result for Def 3. Example A.4 shows the result for Def 10. Therefore it remains to be shown that Dependence implies Def 2, Def 4, and Def 8. This is a direct consequence of the fact that Dependence implies Modified HP, combined with Proposition 7.5.
Def 2, Def 4, and Def 8, vs the HP definitions Proposition 7.5: If Modified HP with ⃗ X a singleton, then Def 2, Def 4, and Def 8.
Proof: Recall the root variables ⃗ R from Observation 1. Note that for any setting ⃗ r ∈ R( ⃗ R), for any set ⃗ Y ⊆ (V − ⃗ R), there exists some ⃗ y so that ⃗ R = ⃗ r is both weakly, actually weakly, and strongly, sufficient for ⃗ Y = ⃗ y. Assume X = x causes Y = y according to Modified HP with witness ⃗ W . This means there exists a x ′ so that . First we focus on Def 4. Note that (X = x, ⃗ S = ⃗ s * , ⃗ W = ⃗ w * ) is weakly sufficient for Y = y. Furthermore, changing X from x to x ′ obviously has no effect on any of the values in ⃗ R. Therefore (Also, we may assume that ⃗ W ∩ ⃗ R = ∅.) From this it follows that (X = x ′ , ⃗ S = ⃗ s * , ⃗ W = ⃗ w * ) is not weakly sufficient for Y = y. So taking ( ⃗ S = ⃗ s * , ⃗ W = ⃗ w * ) as witness gives the desired result.
Second we focus on Def 2 (from which Def 8 follows due to Theorem 5.5). Combining the previous statement about (X = x ′ , ⃗ S = ⃗ s * , ⃗ W = ⃗ w * ) with Proposition 4.6 it follows immediately that there does not exist any network ⃗ N so that (X = x ′ , ⃗ S = ⃗ s * , ⃗ W = ⃗ w * ) is strongly sufficient for Y = y along ⃗ N . Clearly there exists some ⃗ N so that ⃗ R = ⃗ r * is strongly sufficient for Y = y along ⃗ N . (We can start by picking parents ⃗ A of Y = y such that ⃗ A = ⃗ a * is directly sufficient for Y = y. Then we can take parents of all elements in ⃗ A, to get a set ⃗ B so that ⃗ B = ⃗ b * is directly sufficient for ⃗ A = ⃗ a * , etc.) But then also (X = x, ⃗ S = ⃗ s * , ⃗ W = ⃗ w * ) is strongly sufficient for Y = y along ⃗ N , from which the result follows.