Appendix
Appendix 1: Constraints for Models T and C
I consider the general constraints on the parameterization that two models of the structure of T and C must satisfy in order to be indistinguishable for a passive observation and all surgical interventions on the observed variables. In the following T and C refer to models with the respective structures in Fig. 1, rather than the specific parameterizations listed in Table 1, and I use the notation p(A|B||C) to refer to the probability of the variables in A conditional on the variables in B in the distribution in which the variables in C have been subject to a surgical intervention.
To remain indistinguishable when V = {x, y, z}, model T and model C must be identical for the following seven distributions over the observed variables:
-
1.
the passive observational distribution:
$$ \begin{aligned} P(X, Y, Z) &= \sum_{uv} P(U) P(V) P(X | U) P(Y | V, X) P(Z | U, V, X, Y) \end{aligned} $$
-
2.
the manipulated distribution
Footnote 9 with an intervention on X
$$ \begin{aligned} P(Y, Z | X || X) &= \sum_{uv} P(U) P(V) P(Y | V, X || X) P(Z | U, V, X, Y || X) \\ &= \sum_{uv} P(U) P(V) P(Y | V, X) P(Z | U, V, X, Y) \\ \end{aligned} $$
To illustrate the implied constraints, I substitute the distribution parameters from Table 1 for this particular case. It can be done analogously for all seven distributions.
$$ \begin{aligned} P(y=1, z=1 | x=1 || x=1) &= t_{1} t_{2} t_{5} t_{9} + (1- t_{1}) t_{2} t_{5} t_{17} + t_{1} (1- t_{2}) t_{7} t_{13} + (1- t_{1}) (1- t_{2}) t_{7} t_{21} \\ P(y=1, z=0 | x=1 || x=1) &= t_{1} t_{2} t_{5} (1- t_{9}) + (1- t_{1}) t_{2} t_{5} (1-t_{17}) \\ & \quad + t_{1} (1- t_{2}) t_{7} (1-t_{13}) + (1- t_{1}) (1- t_{2}) t_{7} (1-t_{21}) \\ P(y=0, z=1 | x=1 || x=1) &= t_{1} t_{2} (1- t_{5}) t_{10} + (1- t_{1}) t_{2} (1- t_{5}) t_{18} \\ & \quad + t_{1} (1- t_{2}) (1- t_{7}) t_{14}+ (1- t_{1}) (1- t_{2}) (1- t_{7}) t_{22} \\ P(y=0, z=0 | x=1 || x=1) &= t_{1} t_{2} (1- t_{5}) (1- t_{10}) + (1- t_{1}) t_{2} (1- t_{5}) (1-t_{18}) \\ & \quad + t_{1} (1- t_{2}) (1- t_{7}) (1- t_{14}) + (1- t_{1}) (1- t_{2}) (1- t_{7}) (1-t_{22}) \\ P(y=1, z=1 | x=0 || x=0) &= t_{1} t_{2} t_{6} t_{11} + (1- t_{1}) t_{2} t_{6} t_{19} \\ & \quad + t_{1} (1- t_{2}) t_{8} t_{15} + (1- t_{1}) (1- t_{2}) t_{8} t_{23} \\ P(y=1, z=0 | x=0 || x=0) &= t_{1} t_{2} t_{6} (1- t_{11}) + (1- t_{1}) t_{2} t_{6} (1-t_{19}) \\ & \quad + t_{1} (1- t_{2}) t_{8} (1- t_{15}) + (1- t_{1}) (1- t_{2}) t_{8} (1-t_{23}) \\ P(y=0, z=1 | x=0 || x=0) &= t_{1} t_{2} (1- t_{6}) t_{12} + (1- t_{1}) t_{2} (1- t_{6}) t_{20} \\ & \quad + t_{1} (1- t_{2}) (1- t_{8}) t_{16} + (1- t_{1}) (1- t_{2}) (1- t_{8}) t_{24} \\ P(y=0, z=0 | x=0 || x=0) &= t_{1} t_{2} (1- t_{6}) (1-t_{12}) + (1- t_{1}) t_{2} (1- t_{6}) (1-t_{20}) \\ & \quad + t_{1} (1- t_{2}) (1- t_{8}) (1- t_{16}) + (1- t_{1}) (1- t_{2}) (1- t_{8}) (1-t_{24}) \\ \end{aligned} $$
-
3.
the manipulated distribution with an intervention on Y
$$ \begin{aligned} P(X, Z | Y || Y) &= \sum_{uv} P(U) P(V) P(X | U, Y || Y) P(Z | U, V, X, Y || Y) \\ &= \sum_{uv} P(U) P(V) P(X | U) P(Z | U, V, X, Y) \\ \end{aligned} $$
-
4.
the manipulated distribution with an intervention on Z (since this distribution does not involve the parameters specifying p(z|u, v, x, y) that distinguish the models, these equations are trivially satisfied by T and C)
$$ \begin{aligned} P(X, Y | Z || Z) = \sum_{uv} P(U) P(V) P(X | U) P(Y | V, X) \end{aligned} $$
-
5.
the manipulated distribution with an intervention on X and Y simultaneously
$$ \begin{aligned} P(Z | X, Y || X, Y) &= \sum_{uv} P(U) P(V) P(Z | U, V, X, Y || X, Y)\\ &= \sum_{uv} P(U) P(V) P(Z | U, V, X, Y) \\ \end{aligned} $$
-
6.
the manipulated distribution with an intervention on X and Z simultaneously (since this distribution does not involve the parameters specifying p(z|u, v, x, y) that distinguish the models, these equations are trivially satisfied by T and C)
$$ \begin{aligned} P(Y | X, Z || X, Z) &= \sum_{uv} P(U) P(V) P(Y | U, V, X, Z || X, Z)\\ &= \sum_{v} P(V) P(Y | V, X) \\ \end{aligned} $$
-
7.
the manipulated distribution with an intervention on Y and Z simultaneously (since this distribution does not involve the parameters specifying p(z|u, v, x, y) that distinguish the models, these equations are trivially satisfied by T and C)
$$ \begin{aligned} P(X | Y, Z || Y, Z) &= \sum_{uv} P(U) P(V) P(X | U, V, Y, Z || Y, Z)\\ &= \sum_{u} P(U) P(X | U) \\ \end{aligned} $$
In addition, in order to establish the relevant causal effects, both models must satisfy the following inequalities. The bold font indicates (at least one way) in which the parameterizations of T and C in Table 1 satisfy the inequalities.
-
1.
to make u a cause of x:
$$ {\bf t}_{{\bf 3}} \neq {\bf t}_{{\bf 4}} $$
-
2.
to make x and v causes of y:
$$ (( t_{5} \neq t_{7}) \lor ({\bf t}_{{\bf 6}} \neq {\bf t}_{{\bf 8}}))\land (( t_{5} \neq t_{6}) \lor ({\bf t}_{{\bf 7}} \neq {\bf t}_{{\bf 8}})) $$
-
3.
to make u, v and y a cause of z:
$$ \begin{aligned} &(({\bf t}_{{\bf 9}} \neq {\bf t}_{{\bf 17}})\lor( t_{10} \neq t_{18})\lor( t_{11} \neq t_{19})\lor(t_{12} \neq t_{20})\lor( t_{13} \neq t_{21})\lor( t_{14} \neq t_{22})\lor( t_{15} \neq t_{23})\lor( t_{16} \neq t_{24})) \\ &\land (({\bf t}_{{\bf 9}} \neq {\bf t}_{{\bf 13}}\lor( t_{10} \neq t_{14})\lor( t_{11} \neq t_{15})\lor(t_{12} \neq t_{16})\lor(t_{17} \neq t_{21})\lor(t_{18} \neq t_{22})\lor(t_{19} \neq t_{23})\lor(t_{20} \neq t_{24}))\\ &\land ( ({\bf t}_{{\bf 9}} \neq {\bf t}_{{\bf 10}})\lor(t_{11} \neq t_{12})\lor( t_{13} \neq t_{14})\lor( t_{15} \neq t_{16})\lor(t_{17} \neq t_{18})\lor(t_{19} \neq t_{20})\lor(t_{21} \neq t_{22})\lor(t_{23} \neq t_{24})) \end{aligned} $$
Model T must in addition make x a cause of z by satisfying the following inequality:
$$ \begin{aligned} &({\bf t}_{{\bf 9}} \neq {\bf t}_{{\bf 11}})\lor ( {\bf t}_{{\bf 13}} \neq {\bf t}_{{\bf 15}})\lor(t_{17} \neq t_{19})\lor(t_{21} \neq t_{23}) \\ & \quad \lor( t_{10} \neq t_{12})\lor( t_{14} \neq t_{16})\lor(t_{18} \neq t_{20})\lor(t_{22} \neq t_{24}) \\ \end{aligned} $$
(1)
while model C must satisfy its negations, i.e. all the parameter pairs must be equal.
Since model T must satisfy at least one disjunct of Constraint (1), while C must satisfy its negation, one can easily detect the distributional constraints from the list 1–7 above that will not be trivially satisfied. All such quantities contain either only parameters from the first line, or only parameters from the second line of Constraint (1). I will focus only on the satisfaction of disjuncts from the first line, the case for the second line is exactly analogous.
In the most general case model T differs from model C by satisfying every disjunct in Constraint (1), and we can write the parameters as t
9 = t
11 + d
1, t
17 = t
19 + d
2, t
13 = t
15 + d
3, and t
21 = t
23 + d
4 for non-zero \(d_1,\ldots,d_4.\) There are seven distributional quantities containing the parameters t
9, t
13, t
17 and t
21, giving rise to the following four independent constraints if models T and C are to be indistinguishable for a passive observation and all surgical interventions on the observed variables:
$$ \begin{aligned} t_{1} t_{2} t_{3} t_{5}d_1&+(1- t_{1}) t_{2} t_{4} t_{5}d_2+ t_{1}(1- t_{2}) t_{3} t_{7}d_3+(1- t_{1})(1- t_{2}) t_{4} t_{7}d_4 = 0 \\ t_{1} t_{2} t_{5}d_1&+(1- t_{1}) t_{2} t_{5}d_2+ t_{1}(1- t_{2}) t_{7}d_3+(1- t_{1})(1- t_{2}) t_{7}d_4 = 0 \\ t_{1} t_{2} t_{3}d_1&+(1- t_{1}) t_{2} t_{4}d_2+ t_{1}(1- t_{2}) t_{3}d_3+(1- t_{1})(1- t_{2}) t_{4}d_4 = 0 \\ t_{1} t_{2}d_1&+(1- t_{1}) t_{2}d_2+ t_{1}(1- t_{2})d_3+(1- t_{1})(1- t_{2})d_4 = 0 \\ \end{aligned} $$
Solving these constraints implies that a model T must satisfy the following constraints on its parameters
$$ \begin{aligned} t_{5}&= t_{7} \\ t_{11}&= t_{9}-(d_3(-1+ t_{2})/ t_{2})\\ t_{15}&= t_{13}-d_3\\ t_{19}&=t_{17}-(d_4(-1+ t_{2})/ t_{2})\\ t_{23}&=t_{21}-d_4\\ \end{aligned} $$
(2)
where d
3 and d
4 can be chosen freely as long as at least one of them is non-zero and the resulting quantities remain probabilities. An analogous set of constraints results when the difference between models T and C results from disjuncts in the second line of Constraint (1). These are non-trivial algebraic constraints on the parameter space, which, following Meek (1995), implies that their solution space has measure zero compared to arbitrary parameterizations of a model with a structure like T.
Similarly, these constraints can be used to construct a parameterization for a model T that is indistinguishable from a parameterized model C, as long as the parameterization of C also respects the t
5 = t
7 constraint (or t
6 = t
8). In particular, the parameterization of T in Table 1 is constructed from the parameterization of C in that table using d
3 = 0.1 and d
4 = 0.
Appendix 2: Deterministic Parameterizations of T and C
Deterministic parameterizations of the two models in Fig. 1 that are indistinguishable for a passive observation and any surgical intervention on the observed variables.
Parameter
|
Conditional probability terms
|
T
|
C
|
---|
t
1
|
p(u = 1)
|
0.5
|
0.5
|
t
2
|
p(v = 1)
|
0.5
|
0.5
|
t
3
|
p(x = 1|u = 1)
|
0
|
0
|
t
4
|
p(x = 1|u = 0)
|
1
|
1
|
t
5
|
p(y = 1|v = 1,x = 1)
|
1
|
1
|
t
6
|
p(y = 1|v = 1,x = 0)
|
1
|
1
|
t
7
|
p(y = 1|v = 0,x = 1)
|
1
|
1
|
t
8
|
p(y = 1|v = 0,x = 0)
|
0
|
0
|
t
9
|
p(z = 1|u = 1, v = 1, x = 1, y = 1)
|
1
|
0
|
t
10
|
p(z = 1|u = 1, v = 1, x = 1, y = 0)
|
1
|
1
|
t
11
|
p(z = 1|u = 1, v = 1, x = 0, y = 1)
|
0
|
0
|
t
12
|
p(z = 1|u = 1, v = 1, x = 0, y = 0)
|
1
|
1
|
t
13
|
p(z = 1|u = 1, v = 0, x = 1, y = 1)
|
0
|
1
|
t
14
|
p(z = 1|u = 1, v = 0, x = 1, y = 0)
|
1
|
1
|
t
15
|
p(z = 1|u = 1, v = 0, x = 0, y = 1)
|
1
|
1
|
t
16
|
p(z = 1|u = 1, v = 0, x = 0, y = 0)
|
1
|
1
|
t
17
|
p(z = 1|u = 0, v = 1, x = 1, y = 1)
|
1
|
1
|
t
18
|
p(z = 1|u = 0, v = 1, x = 1, y = 0)
|
1
|
1
|
t
19
|
p(z = 1|u = 0, v = 1, x = 0, y = 1)
|
1
|
1
|
t
20
|
p(z = 1|u = 0, v = 1, x = 0, y = 0)
|
1
|
1
|
t
21
|
p(z = 1|u = 0, v = 0, x = 1, y = 1)
|
1
|
1
|
t
22
|
p(z = 1|u = 0, v = 0, x = 1, y = 0)
|
1
|
1
|
t
23
|
p(z = 1|u = 0, v = 0, x = 0, y = 1)
|
1
|
1
|
t
24
|
p(z = 1|u = 0, v = 0, x = 0, y = 0)
|
1
|
1
|
Note that if the latent variables u and v are supposed to be non-extreme, then only u = v = 0.5 are possible values.
I do not find the deterministic case particularly enlightening. Moreover, it is well known that deterministic causal relations are often more difficult to discover than probabilistic ones. In that sense I think that the examples of parameterizations for model T and C in Table 1 with purely positive distributions provide a much stronger case.
Appendix 3: Soft Interventions
Note that the constraints in (2) do not contain the parameters t
3 or t
4 which could be influenced by a soft intervention on x, hence a soft intervention on x is not going to distinguish between models T and C.
A soft intervention on y that changes t
5 will break the first equality in (2), thus the models become distinguishable. In particular, if t
5 is changed from 0.8 to 0.85 by a soft intervention on y in both models T and C, then in the resulting manipulated distribution, we will have
$$ \begin{aligned} p_T^{*}(x=y=z=1) &= 0.24856 \\ {\rm vs.}\quad p_C^{*}(x=y=z=1) &= 0.24928 \\ \end{aligned} $$
which is not a rounding error.
Note that t
6 does not feature in the constraints in (2), so a soft intervention that changes it in both T and C, will not distinguish between the two models.
Lastly, as Meek (1995) showed, violations of faithfulness occur for particular constellations of the parameters that make up the distribution. A violation of faithfulness always constitutes a non-trivial algebraic constraint on these parameters. Since soft interventions influence individual parameters, they can be (in principle, leaving the mentioned concerns of implementation aside) used to break these algebraic constraints. Thus, soft interventions are in general sufficient to make unfaithful models faithful. Once faithfulness is achieved, the causal relations can be detected as usual. In the case of canceling pathways with unobserved intermediary variables (Fig. 2), the soft intervention must occur on the final variable.