Skip to main content

Bayesian Networks and Causal Ecumenism


Proponents of various causal exclusion arguments claim that for any given event, there is often a unique level of granularity at which that event is caused. Against these causal exclusion arguments, causal ecumenists argue that the same event or phenomenon can be caused at multiple levels of granularity. This paper argues that the Bayesian network approach to representing the causal structure of target systems is consistent with causal ecumenism. Given the ubiquity of Bayesian networks as a tool for representing causal structure in both philosophy of science and science itself, this result speaks in favor of the ecumenical view, and against rival exclusionary accounts. Gebharter’s (Philos Phenomenol Res 95(2):353–375, 2017) argument that the Bayes nets formalism is consistent with causal exclusion is considered and rebutted.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4


  1. 1.

    Thus, I leave to one side versions of the causal exclusion argument in an interventionist setting that rely on proportionality constraints on causation, e.g. List and Menzies (2009) or Hoffmann-Kolss (2014).

  2. 2.

    Many readers will be familiar with the Faithfulness condition for Bayes nets, which is strictly stronger than Minimality. I choose Minimality over Faithfulness as an adequacy condition for the causal interpretation of Bayes nets, since there is a case to be made that Bayes nets that satisfy Minimality but not Faithfulness are accurate representations of some causal systems. For a perspicuous comparison of the two conditions, see Zhang (2012).

  3. 3.

    See Spirtes (2007) and Eberhardt (2016) for rigorous demonstrations of this point.

  4. 4.

    Note that, since we assume that the value of X is \(x_{i}\), the interventional conditional probability \(p(y_{j}|do(x_{i}))\) denotes the probability that \(Y=y_{j}\) on the supposition that X has been set to \(x_{i}\) via an intervention, as opposed to the probability that \(Y=y_{j}\) conditional on the observation that \(X=x_{i}\).

  5. 5.

    Note that this is distinct from the notion of “stability” in causal models discussed in Woodward (2010).

  6. 6.

    I am grateful to an anonymous reviewer for suggesting this example.

  7. 7.

    Here I have translated Stern and Eva’s thesis into my terminology.


  1. Davidson, D. (1970). Mental events. In L. Foster & J. W. Swanson (Eds.), Experience and theory (pp. 79–101). London: Humanities Press.

    Google Scholar 

  2. Eberhardt, F. (2016). Green and grue causal variables. Synthese, 193(4), 1029–1046.

    Article  Google Scholar 

  3. Eronen, M. (2012). Pluralistic physicalism and the causal exclusion argument. European Journal for Philosophy of Science, 2(2), 219–232.

    Article  Google Scholar 

  4. Fenton-Glynn, L. (2017). A proposed probabilistic extension of the Halpern and Pearl definition of ‘actual cause’. British Journal for the Philosophy of Science, 68(4), 1061–1124.

    Article  Google Scholar 

  5. Gebharter, A. (2017). Causal exclusion and causal Bayes nets. Philosophy and Phenomenological Research, 95(2), 353–375.

    Article  Google Scholar 

  6. Glymour, C. (2004). Review of Woodward (2003). British Journal for the Philosophy of Science, 55, 779–790.

    Article  Google Scholar 

  7. Glymour, C., Danks, D., Glymour, B., Eberhardt, F., Ramsey, J., Scheines, R., et al. (2010). Actual causation: A stone soup essay. Synthese, 175(2), 169–192.

    Article  Google Scholar 

  8. Hausman, D., & Woodward, J. (1999). Independence, invariance and the causal Markov condition. British Journal for the Philosophy of Science, 50(4), 521–583.

    Article  Google Scholar 

  9. Hitchcock, C. (2012). Theories of causation and the causal exclusion argument. Journal of Consciousness Studies, 19(5–6), 40–56.

    Article  Google Scholar 

  10. Hoffmann-Kolss, V. (2014). Interventionism and higher-level causation. International Studies in the Philosophy of Science, 28(1), 49–64.

    Article  Google Scholar 

  11. Huang, Y., & Valtorta, M. (2006). Pearl’s calculus of intervention is complete. In Proceedings of the twenty-second conference on uncertainty in artificial intelligence (pp. 217–224).

  12. Jackson, F., & Pettit, P. (1988). Functionalism and broad content. Mind, 97(July), 318–400.

    Google Scholar 

  13. Jackson, F., & Pettit, P. (1990a). Causation and the philosophy of mind. Philosophy and Phenomenological Research, 50, 195–214.

    Article  Google Scholar 

  14. Jackson, F., & Pettit, P. (1990b). Program explanation: A general perspective. Analysis, 50(2), 107–17.

    Article  Google Scholar 

  15. Jackson, F., & Pettit, P. (1992). In defense of explanatory ecumenism. Economics and Philosophy, 8(1), 1–21.

    Article  Google Scholar 

  16. Janzing, D., & Scholkopf, B. (2010). Causal inference using the algorithmic Markov condition. IEEE Transactions on Information Theory, 56(10), 5168–5194.

    Article  Google Scholar 

  17. Kim, J. (1989). Mechanism, purpose, and explanatory exclusion. Philosophical Perspectives, 3, 77–108.

    Article  Google Scholar 

  18. Kim, J. (2000). Mind in a physical world: An essay on the mind-body problem and mental causation. Cambridge: MIT Press.

    Google Scholar 

  19. List, C., & Menzies, P. (2009). Nonreductive physicalism and the limits of the exclusion principle. Journal of Philosophy, 106(9), 475–502.

    Article  Google Scholar 

  20. Pearl, J. (2000). Causality: Models, reasoning, and inference. Cambridge: Cambridge University Press.

    Google Scholar 

  21. Pearl, J., Glymour, M., & Jewell, N. P. (2016). Causal inference in statistics: A primer. New York: Wiley.

    Google Scholar 

  22. Pettit, P. (2017). The program model, difference-makers, and the exclusion problem. In H. Price, H. Beebe, & C. Hitchcock (Eds.), Making a difference: Essays on the Philosophy of Causation (pp. 232–250). Oxford: Oxford University Press.

    Google Scholar 

  23. Polger, T. W., Shapiro, L. A., & Stern, R. (2018). In defense of interventionist solutions to exclusion. Studies in History and Philosophy of Science Part A, 68, 51–57.

    Article  Google Scholar 

  24. Raatikainen, P. (2010). Causation, exclusion, and the special sciences. Erkenntnis, 73(3), 349–363.

    Article  Google Scholar 

  25. Schaffer, J. (2016). Grounding in the image of causation. Philosophical Studies, 173(1), 49–100.

    Article  Google Scholar 

  26. Shimizu, S., Hoyer, P. O., Hyvärinen, A., & Kerminen, A. (2006). A linear non-Gaussian acyclic model for causal discovery. Journal of Machine Learning Research, 7(Oct), 2003–2030.

    Google Scholar 

  27. Spirtes, P. (2007). Variable definition and causal inference. In Proceedings of the international congress for logic, methodology and philosophy of science.

  28. Spirtes, P., Glymour, C., & Richard, S. N. (2000). Causation, prediction, and search. Cambridge: MIT Press.

    Google Scholar 

  29. Spirtes, P., & Scheines, R. (2004). Causal inference of ambiguous manipulations. Philosophy of Science, 71(5), 833–845.

    Article  Google Scholar 

  30. Stern, R., & Eva, B. (forthcoming). Antireductionist interventionism. British Journal for the Philosophy of Science

  31. Weslake, B. (2010). Explanatory depth. Philosophy of Science, 77(2), 273–294.

    Article  Google Scholar 

  32. Wilson, A. (2018). Metaphysical causation. Noûs, 52(4), 723–751.

    Article  Google Scholar 

  33. Woodward, J. (2003). Making things happen: A theory of causal explanation. Oxford: Oxford University Press.

    Google Scholar 

  34. Woodward, J. (2010). Causation in biology: Stability, specificity, and the choice of levels of explanation. Biology and Philosophy, 25(3), 287–318.

    Article  Google Scholar 

  35. Woodward, J. (2015). Interventionism and causal exclusion. Philosophy and Phenomenological Research, 91(2), 303–347.

    Article  Google Scholar 

  36. Zhang, J. (2012). A comparison of three Occam’s razors for Markovian causal models. The British Journal for the Philosophy of Science, 64(2), 423–448.

    Article  Google Scholar 

  37. Zhong, L. (2014). Sophisticated exclusion and sophisticated causation. The Journal of Philosophy, 111(7), 341–360.

    Article  Google Scholar 

Download references


I am grateful to Sander Beckers, Jonathan Birch, Luc Bovens, Paul Daniell, Chris Dorst, Christopher Hitchcock, Christian List, Philip Pettit, Katie Steele, David Watson, an audience at the 2018 meeting of the APA Central Division in Chicago, and several anonymous reviewers for feedback on various versions of this paper.

Author information



Corresponding author

Correspondence to David Kinney.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.




Calculation of Inequality (4)

For the purpose of these calculations, yes, no, low, medium, and high are shortened to y, n, l, m and h, respectively. We begin with left-hand term of the inequality in (4). Since \(S^{\prime }\notin \{A^{\prime }\}\) and \(A^{\prime }\) is the sole parent of \(S^{\prime }\), (2) and (3) jointly imply:

$$\begin{aligned} p(S^{\prime }=y|do(A^{\prime }=l))=p(S^{\prime }=y|A^{\prime }=l) \end{aligned}$$

Which can be calculated as follows:

$$\begin{aligned} p(S^{\prime }= & {} y|do(A^{\prime }=l))=\frac{p(A^{\prime }=l,B^{\prime }=h,S^{\prime }=y)+p(A^{\prime }=l,B^{\prime }=l,S^{\prime }=y)}{p(A^{\prime }=l)} \end{aligned}$$
$$\begin{aligned} p(S^{\prime }= & {} y|do(A^{\prime }=l))=\frac{.042+.267}{p(A^{\prime }=l)} \end{aligned}$$

We can calculate \(p(A^{\prime }=l)\) as follows:

$$\begin{aligned} p(A^{\prime }=l)= \,& {} p(A^{\prime }=l,B^{\prime }=h,S^{\prime }=y)+p(A^{\prime }=l,B^{\prime }=h,S^{\prime }=n) \nonumber \\&+ p(A^{\prime }=l,B^{\prime }=l,S^{\prime }=y)+p(A^{\prime }=l,B^{\prime }=l,S^{\prime }=n) \end{aligned}$$
$$\begin{aligned} p(A^{\prime }=l)= & {} .042+.003+.267+.021=.333 \end{aligned}$$

This yields the result:

$$\begin{aligned} p(S^{\prime }=y|do(A^{\prime }=l))=\frac{.042+.267}{.333}\approx .93 \end{aligned}$$

Next, we calculate the right-hand term:

$$\begin{aligned} p(S^{\prime }=y)= \,& {} p(A^{\prime }=h,B^{\prime }=h,S^{\prime }=y)+p(A^{\prime }=h,B^{\prime }=l,S^{\prime }=y) \nonumber \\&+ p(A^{\prime }=m,B^{\prime }=h,S^{\prime }=y)+p(A^{\prime }=m,B^{\prime }=l,S^{\prime }=y) \nonumber \\&+ p(A^{\prime }=l,B^{\prime }=h,S^{\prime }=y)+p(A^{\prime }=l,B^{\prime }=l,S^{\prime }=y) \end{aligned}$$
$$\begin{aligned} p(S^{\prime }=y)= & {} .042 + .003 + .1 + .067 + .042 + .267=.52 \end{aligned}$$

Calculation of Eq. (5)

As we have already calculated the right-hand term, we focus on the left-hand term. From the law of total probability, we have:

$$\begin{aligned}&p(S^{\prime }= y|do(B^{\prime }=l))=p(S^{\prime }=y,A^{\prime }=l|do(B^{\prime }=l)) \nonumber \\&\quad + \ p(S^{\prime }=y,A^{\prime }=m|do(B^{\prime }=l)) + p(S^{\prime }=y,A^{\prime }=h|do(B^{\prime }=l)) \end{aligned}$$

Since \(S^{\prime }\notin \{B^{\prime }\}\), \(A^{\prime }\notin \{B^{\prime }\}\), and \(A^{\prime }\) is a parent of \(B^{\prime }\), the equation above together with (2) and (3) jointly imply:

$$\begin{aligned}&p(S^{\prime }=y|do(B^{\prime }=l))=p(S^{\prime }=y|A^{\prime }=l)p(A^{\prime }=l) \nonumber \\&\quad + \ p(S^{\prime }=y|A^{\prime }=m)p(A^{\prime }=m) + p(S^{\prime }=y|A^{\prime }=h)p(A^{\prime }=h) \end{aligned}$$
$$\begin{aligned}&p(S^{\prime }=y|do(B^{\prime }=l))=p(A^{\prime }=l,S^{\prime }=y) + p(A^{\prime }=m,S^{\prime }=y) + p(A^{\prime }=h,S^{\prime }=y) \end{aligned}$$

To see this implication clearly, note that since \(S^{\prime }\notin \{B^{\prime }\}\) and \(A^{\prime }\notin \{B^{\prime }\}\), Eq. (2) and (3) imply that \(p(S^{\prime }=y,A^{\prime }=l|do(B^{\prime }=l))=p(S^{\prime }=y|\mathbf {pa}_{S^{\prime }})p(A^{\prime }=l|\mathbf {pa}_{A^{\prime }})\). Since \(A^{\prime }\) is the sole parent of \(S^{\prime }\) and \(A^{\prime }\) has no parents, \(p(S^{\prime }=y|\mathbf {pa}_{S^{\prime }})p(A^{\prime }=l|\mathbf {pa}_{A^{\prime }})=p(S^{\prime }=y|A^{\prime }=l)p(A^{\prime }=l)\). We repeat these steps to obtain the other summands of (18). By the law of total probability, we obtain:

$$\begin{aligned} p(S^{\prime }=y|do(B^{\prime }=l))=p(S^{\prime }=y)\approx .52 \end{aligned}$$

Calculation of Inequality (6)

We begin with left-hand term of the inequality in (6). Since \(S^{\prime }\notin \{A\}\) and A is the sole parent of \(S^{\prime }\), (2) and (3) jointly imply:

$$\begin{aligned} p(S^{\prime }=y|do(A=l/m))=p(S^{\prime }=y|A=l/m) \end{aligned}$$

Which can be calculated as follows:

$$\begin{aligned}&p(S^{\prime }=y|do(A=l/m))\nonumber \\&\quad =\frac{p(A=l/m,B^{\prime }=h,S^{\prime }=y)+p(A=l/m,B^{\prime }=l,S^{\prime }=y)}{p(A=l/m)} \end{aligned}$$
$$\begin{aligned}&p(S^{\prime }=y|do(A=l/m))=\frac{.142+.334}{p(A=l/m)} \end{aligned}$$

We can calculate \(p(A^{\prime }=l/m)\) as follows:

$$\begin{aligned} p(A=l/m)= & \,{} p(A=l/m,B^{\prime }=h,S^{\prime }=y)+p(A=l/m,B^{\prime }=h,S^{\prime }=n) \nonumber \\&+ p(A=l/m,B^{\prime }=l,S^{\prime }=y)+p(A=l/m,B^{\prime }=l,S^{\prime }=n) \end{aligned}$$
$$\begin{aligned} p(A=l/m)= & {} .142+.103+.334 +.088=.666 \end{aligned}$$

This yields the result:

$$\begin{aligned} p(S^{\prime }=y|do(A=l/m))=\frac{.142+.334}{.666}=.714 \end{aligned}$$

Finally, we recall the value of the right-hand term \(p(S^{\prime }=y)=.52\).

Proof of Proposition 1


Assume \(p(y_{j}|do({\mathbf {x}}_{i}))>p(y_{j})\), where \({\mathbf {x}}_{i}\) is a vector of values that contains both \(x_{i}\) and the actual values taken by all parents of Y not on some directed path from X to Y. Let \(\varphi\) be the set of values of Y other than \(y_{j}\). This implies that \(1-p(y_{j}|do({\mathbf {x}}_{i}))<1-p(\varphi )\), which implies in turn that there is a value \(y_{l}\) such that \(p(y_{l}|do({\mathbf {x}}_{i}))<p(y_{l})\). Suppose that the set \(\mathcal {PA}_{X}\) containing all variables that are parents of X and all parents of Y not on the stipulated path from X to Y has the set of possible vectors of values \(\{\mathbf {pa_{X1}},\mathbf {pa_{X2}},\ldots ,\mathbf {pa_{Xq}}\}\). It is well known (see Pearl et al. 2016, p. 59) that we can derive the following:

$$\begin{aligned} p(y_{l}|do({\mathbf {x}}_{i}))=\sum _{t=1}^{q}p(y_{l}|{\mathbf {x}}_{i},\mathbf {pa_{Xt}})p(\mathbf {pa_{Xt}})<p(y_{l}) \end{aligned}$$

The law of total probability implies the following, where X has n values:

$$\begin{aligned} p(y_{l})=\sum _{k=1}^{n}\sum _{t=1}^{q}p(y_{l}|{\mathbf {x}}_{i},\mathbf {pa_{Xt}})p({\mathbf {x}}_{i},\mathbf {pa_{Xt}}) \end{aligned}$$

This implies that there exists a set of values \({\mathbf {x}}_{k}\) such that \({\mathbf {x}}_{i}\cup {\mathbf {x}}_{k}\setminus {\mathbf {x}}_{i}\cap {\mathbf {x}}_{k}=\{x_{i},x_{k}\}\), and:

$$\begin{aligned} \sum _{t=1}^{q}p(y_{l}|{\mathbf {x}}_{i},\mathbf {pa_{Xt}})p(\mathbf {pa_{Xt}})<p(y_{l})<\sum _{t=1}^{q}p(y_{l}|{\mathbf {x}}_{k},\mathbf {pa_{Xt}})p({\mathbf {x}}_{k},\mathbf {pa_{Xt}}) \end{aligned}$$

The fact that \(p({\mathbf {x}}_{i})>0\) implies that \(p({\mathbf {x}}_{k})<1\), which implies in turn that \(p({\mathbf {x}}_{k},\mathbf {pa_{Xt}})<p(\mathbf {pa_{Xt}})\) for all \(\mathbf {pa_{Xt}}\) with positive probability. It follows from this that:

$$\begin{aligned} p(y_{l})<\sum _{t=1}^{q}p(y_{l}|{\mathbf {x}}_{k},\mathbf {pa_{Xt}})p({\mathbf {x}}_{k},\mathbf {pa_{Xt}}) < \sum _{t=1}^{q}p(y_{l}|{\mathbf {x}}_{k},\mathbf {pa_{Xt}})p(\mathbf {pa_{Xt}})=p(y_{l}|do({\mathbf {x}}_{k})) \end{aligned}$$

Which immediately implies that \(p(y_{l}|do(x_{k}))>p(y_{l})\) when we hold fixed all parents of Y not on the stipulated path from X to Y. \(\square\)

Proof of Proposition 2


Consider a Bayes net \({\mathcal {N}}^{\prime }=\langle {\mathcal {V}}^{\prime },{\mathcal {E}}^{\prime },p(\cdot )\rangle\), with \(C^{\prime }\in {\mathcal {V}}^{\prime }\), where \(C^{\prime }=c^{\prime }_{l}\) causes \(E^{\prime }=e^{\prime }_{s}\). This fact implies that \(p(e^{\prime }_{s}|do(c^{\prime }_{l},{\mathbf {z}}))>p(e^{\prime }_{s})\), where \({\mathbf {z}}\) is the actual setting of values for all parents of \(E^{\prime }\) not on some directed path from \(C^{\prime }\) to \(E^{\prime }\). Define an equivalence relation \(\sim\) over the range of \(C^{\prime }\) such that \(c^{\prime }_{l}\sim c^{\prime }_{u}\) if and only if either:

  1. (a)

    \(p(e^{\prime }_{s}|do(c^{\prime }_{l},{\mathbf {z}}))>p(e^{\prime }_{s})\) and \(p(e^{\prime }_{s}|do(c^{\prime }_{u},{\mathbf {z}}))>p(e^{\prime }_{s})\), or

  2. (b)

    \(p(e^{\prime }_{s}|do(c^{\prime }_{l},{\mathbf {z}}))\le p(e^{\prime }_{s})\) and \(p(e^{\prime }_{s}|do(c^{\prime }_{u},{\mathbf {z}}))\le p(e^{\prime }_{s})\).

Let the range of C be a quotient set of the range of \(C^{\prime }\) according to this equivalence relation. Define a Bayes net \({\mathcal {N}}=\langle {\mathcal {V}},{\mathcal {E}},p(\cdot )\rangle\) such that \({\mathcal {V}}=\{C\}\cup {\mathcal {V}}^{\prime }\setminus \{C^{\prime }\}\) and such that, for any edge \(\langle V^{\prime },C^{\prime }\rangle \in {\mathcal {E}}^{\prime }\) or \(\langle C^{\prime },V^{\prime }\rangle \in {\mathcal {E}}^{\prime }\), there is an edge \(\langle V^{\prime },C\rangle \in {\mathcal {E}}\) or \(\langle C,V^{\prime }\rangle \in {\mathcal {E}}\); all other elements of \({\mathcal {E}}^{\prime }\) are included in \({\mathcal {E}}\). \({\mathcal {N}}\) trivially satisfies (1), (2) and (4); we now show that it satisfies (3) and (5) as well.

Let us begin with condition (3). To show that \({\mathcal {N}}\) satisfies CMC, let \(X^{\prime }\in {\mathcal {V}}^{\prime }\) and \(Y^{\prime }\in {\mathcal {V}}^{\prime }\) be two variables such that \(Y^{\prime }\) is not a descendant of \(X^{\prime }\) in \({\mathcal {N}}^{\prime }\). If \(X^{\prime }\ne C^{\prime }\), \(Y^{\prime }\ne C^{\prime }\), and \(C^{\prime }\notin \mathcal {PA}_{X^{\prime }}\), then the supposition that \({\mathcal {N}}^{\prime }\) satisfies CMC, along with the truth of conditions (1) and (2), implies that \(X^{\prime }\) and \(Y^{\prime }\) are independent, given \(\mathcal {PA}_{X^{\prime }}\), in \({\mathcal {N}}\).

If \(C^{\prime }=X^{\prime }\), then the fact that \({\mathcal {N}}^{\prime }\) satisfies CMC implies that for any values \(c^{\prime }_{l}\), \(y^{\prime }_{o}\), and vector of values \({\mathbf {pa}}_{\mathbf {C}^{\prime }}\) of the variables in \(\mathcal {PA}_{C^{\prime }}\), \(p(c^{\prime }_{l}|y^{\prime }_{o},{\mathbf {pa}}_{\mathbf {C}^{\prime}})=p(c^{\prime }_{l}|{\mathbf {pa}}_{\mathbf {C}^{\prime}})\). If C is a coarsening of \(C^{\prime }\), then for each value \(c_{j}\), each conditional probability \(p(c_{j}|y^{\prime }_{o},\mathbf {pa_{C}})\) and \(p(c_{j}|\mathbf {pa_{C}})\) is a sum of terms of the form \(p(c^{\prime }_{l}|y^{\prime }_{o},{\mathbf {pa}}_{\mathbf {C}^{\prime }})\) and \(p(c^{\prime }_{l}|{\mathbf {pa}}_{\mathbf {C}^{\prime }})\), respectively. Thus, if the latter pair of terms are equal for all triples \((c^{\prime }_{l},y^{\prime }_{o},{\mathbf {pa}}_{\mathbf {C}^{\prime }})\), then the former pair of terms are equal for all values \((c_{j}, y^{\prime }_{o},\mathbf {pa_{C}})\). Thus, C is independent of its non-descendants, given its parents, in \({\mathcal {N}}\).

If \(C^{\prime }=Y^{\prime }\), then the fact that \({\mathcal {N}}^{\prime }\) satisfies CMC implies that for any values \(c^{\prime }_{l}\), \(x^{\prime }_{o}\), and set of values \({\mathbf {pa}}_{\mathbf {X}^{\prime }}\) of the variables in \(\mathcal {PA}_{X^{\prime }}\), \(p(x^{\prime }_{o}|c^{\prime }_{l},{\mathbf {pa}}_{\mathbf {X}^{\prime }})=p(x^{\prime }_{o}|{\mathbf {pa}}_{\mathbf {X}^{\prime }})\). These conditional probabilities can be expressed as the following ratios:

$$\begin{aligned}&p(x^{\prime }_{o}|c^{\prime }_{l},\mathbf {pa_{X^{\prime }}})=\frac{p(x^{\prime }_{o},c^{\prime }_{l},{\mathbf {pa}}_{\mathbf {X}^{\prime }})}{p(c^{\prime }_{l},{\mathbf {pa}}_{\mathbf {X}^{\prime }})} \end{aligned}$$
$$\begin{aligned}&p(x^{\prime }_{o}|{\mathbf {pa}}_{\mathbf {X}^{\prime }})=\frac{p(x^{\prime }_{o},\mathbf {pa_{X^{\prime }}})}{p({\mathbf {pa}}_{\mathbf {X}^{\prime }})} \end{aligned}$$

If C is a coarsening of \(C^{\prime }\), then for each value \(c_{j}\), each joint probability \(p(x^{\prime }_{o},c_{j},{\mathbf {pa}}_{\mathbf {X}^{\prime }})\) and \(p(c_{j},{\mathbf {pa}}_{\mathbf {X}^{\prime }})\) is equal to a sum of joint probabilities of the form \(p(x^{\prime }_{o},c^{\prime }_{l},{\mathbf {pa}}_{\mathbf {X}^{\prime }})\) and \(p(c^{\prime }_{l},{\mathbf {pa}}_{\mathbf {X}^{\prime }})\), respectively. Thus, if \(p(x^{\prime }_{o}|c^{\prime }_{l},{\mathbf {pa}}_{\mathbf {X}^{\prime }})=p(x^{\prime }_{o}|{\mathbf {pa}}_{\mathbf {X}^{\prime }})\) for all triples \((c^{\prime }_{l}, x^{\prime }_{o},{\mathbf {pa}}_{\mathbf {X}^{\prime }})\), then \(p(x^{\prime }_{o}|c_{j},{\mathbf {pa}}_{\mathbf {X}^{\prime }})=p(x^{\prime }_{o}|{\mathbf {pa}}_{\mathbf {X}^{\prime }})\) for all triples \((c_{j},x^{\prime }_{o},{\mathbf {pa}}_{\mathbf {X}^{\prime }})\). Thus, if C is a non-descendant of \(X^{\prime }\) in \({\mathcal {N}}\), then \(X^{\prime }\) is independent of C, given \(X^{\prime }\)’s parents, in \({\mathcal {N}}\). The immediately preceding analysis could be repeated if \(C\in \mathcal {PA}_{X^{\prime }}\), to show that any variable \(X^{\prime }\) is independent of its non-descendants in \({\mathcal {N}}\), given its parents, when those parents include C. Together, these results show that \({\mathcal {N}}\) satisfies CMC. Minimality can be achieved by stipulation, by simply removing any edges that are not necessary for \({\mathcal {N}}\) to satisfy CMC.

Finally, we can show that (5) is true. Suppose that \(C^{\prime }=c^{\prime }_{l}\) implies that \(C=c_{j}\). In other words, \(c^{\prime }_{l}\) is mapped to \(c_{j}\) in the coarsening function from the range of \(C^{\prime }\) to the range of C. We know that \(p(e^{\prime }_{s}|do(c^{\prime }_{l},{\mathbf {z}}))>p(e^{\prime }_{s})\), and \(c^{\prime }_{l}\) is \(\sim\)-related to all and only those values of \(C^{\prime }\) such that conditioning on an intervention bringing about those values increases the probability that \(E^{\prime }=e^{\prime }_{s}\), relative to its marginal probability. Thus, the conditional probability \(p(e^{\prime }_{s}|do(c_{j},{\mathbf {z}}))\) is a sum of terms of the form \(p(e^{\prime }_{s}|do(c^{\prime }_{l},{\mathbf {z}}))\), each of which is such that \(p(e^{\prime }_{s}|do(c^{\prime }_{l},{\mathbf {z}}))>p(e^{\prime }_{s})\). This implies that \(p(e^{\prime }_{s}|do(c_{j},{\mathbf {z}}))>p(e^{\prime }_{s})\), and therefore that \(C=c_{j}\) causes \(E^{\prime }=e^{\prime }_{s}\). \(\square\)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Kinney, D. Bayesian Networks and Causal Ecumenism. Erkenn (2020).

Download citation