Appendix
Here we prove Theorem 3 for linear recursive path models with imperfect interventions. The proof assumes familiarity with path modeling and Bayesian networks.
Theorem 3
(Distinguishability under intervention) For any
M
1(θ1), M
2(θ2) (which model the same space and are positive)
Footnote 13
such that
\(M_1\neq M_2\), if
\(P_{M_1(\theta_1)} = P_{M_2(\theta_2)}\), then under interventions
\(P_{M^{\prime}_1(\theta_1)}\neq P_{M^{\prime}_2(\theta_2)}.\)
Proof
Since M
1 and M
2 differ structurally, for some pair of nodes X and Y one contains
\(X\rightarrow Y\) while the other contains
\(X\leftarrow Y\) or else X:Y (i.e., they are not directly connected). In either case, we can find a probabilistic dependency induced by one model that must fail relative to the other. To fix ideas, let us suppose the former holds for M
1 and the latter for M
2.
Case 1
M
2 contains X:Y. Let the set of variables Z be the set of all variables in M′1 (equivalently,
\(M^{\prime}_2\)) except for
\(\{X,I_Y,I_X\}\) (and so
\(Y\in {\bf Z}\)). Then
\(r_{I_{Y} X\cdot {\bf Z}}\neq 0\) in
\(M^{\prime}_1(\theta_1)\) and
\(r_{I_{Y} X\cdot {\bf Z}} = 0\) in M′2(θ2). This follows from Wright’s theory of path models, given that
\(p_{YI_Y}p_{YX}\neq 0\) in M′1(θ1) and
\( p_{YI_Y}p_{YX} = 0\) in M′2(θ2). (Given Z, there is only one d-connecting path between X and I
Y
in M
1 and none in M
2.)
Case 2
M
2 contains X← Y. Let the set of variables Z be the set of all variables in M′1 except for
\(\{X,I_{Y},I_{X}\}\) (and so
\(Y\in {\bf Z}\)). Then
\(r_{I_Y X\cdot {\bf Z}}\neq 0 \) in
\(M^{\prime}_1(\theta_1)\) and
\(r_{I_Y X\cdot {\bf Z}}= 0\) in
\(M^{\prime}_2(\theta_2)\). This follows from Wright’s theory of path models, given that
\(p_{YI_Y}p_{YX}\neq 0\) in
\(M^{\prime}_1(\theta_1)\). (As for Case 1, given Z, there is only one d-connecting path between X and I
Y
in M
1 and none in M
2.) Similarly, letting Z be the set of all variables in
\(M^{\prime}_1\) except for
\(\{Y,I_Y,I_X\}\) (and so
\(X\in{\bf Z}\)),
\(r_{I_X Y\cdot {\bf Z}}\neq 0\) in
\(M^{\prime}_2(\theta_2)\) and
\(r_{I_X Y\cdot {\bf Z}} = 0\) in
\(M^{\prime}_1(\theta_1)\). □
This implies, given perfect data, that distinguishability between any two distinct models of size n can be achieved with n independent interventions and O(n
2) conditional independence tests. Thus, the computational cost of finding such empirical distinctions is modest.
Footnote 14