Skip to main content
Log in

Causality as a theoretical concept: explanatory warrant and empirical content of the theory of causal nets

  • Published:
Synthese Aims and scope Submit manuscript

An Erratum to this article was published on 26 January 2016

Abstract

We start this paper by arguing that causality should, in analogy with force in Newtonian physics, be understood as a theoretical concept that is not explicated by a single definition, but by the axioms of a theory. Such an understanding of causality implicitly underlies the well-known theory of causal (Bayes) nets (TCN) and has been explicitly promoted by Glymour (Br J Philos Sci 55:779–790, 2004). In this paper we investigate the explanatory warrant and empirical content of TCN. We sketch how the assumption of directed cause–effect relations can be philosophically justified by an inference to the best explanation. We then ask whether the explanations provided by TCN are merely post-facto or have independently testable empirical content. To answer this question we develop a fine-grained axiomatization of TCN, including a distinction of different kinds of faithfulness. A number of theorems show that although the core axioms of TCN are empirically empty, extended versions of TCN have successively increasing empirical content.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. For renewals of Hume’s challenge cf. Psillos (2009) and Norton (2009). For supporters of causality as something real see Beebee et al. (2009, parts II and III).

  2. \(\hbox {P}(\hbox {X}_{1})\) is defined from \(\hbox {P}(\hbox {X}_{1},\ldots ,\hbox {X}_{\mathrm{n}})\) by the usual projection postulate. \(\upalpha \) in \(\hbox {P}(\hbox {X}(\upalpha ) = \hbox {x})\) is an individual variable that is bound by the probability operator P (we use the letter “\(\upalpha \)” because “x” is reserved for X-values). In the statistical interpetation, P(x) is the limiting frequency of result x in an infinite sequence of random drawings of individuals \(\upalpha \) from D. This covers also the generic propensity interpretation, in which one interprets P(x) as the limiting frequency of result x in an infinite sequence of performances of a random experiment; here D consists of the individual performances of the experiment. In the single propensity interpretation, in contrast, P is attached to individual events (such as this throwing of this coin); here P is assumed as a primitively given function over \(\hbox {AL}(\Pi _{1\le \mathrm{i}\le \mathrm{n}}\hbox {Val}(\hbox {X}_{\mathrm{i}}))\).

  3. \(\hbox {P(y)} > 0\) is required because \(\hbox {P}(\hbox {x}{\vert }\hbox {y})\) is defined as P(x,y)/P(y). If one wants to cover the case P(y) = 0, one may assume independently axiomatized conditional probabilities (cf. Pearl 2000, p. 11; Carnap 1971, 38f).

  4. Cf. Carnap (1956), Lewis (1970), Sneed (1971), Balzer et al. (1987), Papineau (1996), French (2008).

  5. The claim that the causal connection between X and Z in Fig. 1 is “direct” is relative to the set of variables {X,Y,Z}.

  6. In the sun-tower-shadow example we can infer every Y-value from every X-value for every Z-value by the equation Y = Z/tan(X). In other examples, X and Y become only correlated when conditionalizing on certain values of the common effect Z.

  7. More precisely, we must deactivate all other causal connections between X and Y and other causal influences on Y; see Sect. 2.3, (9).

  8. Proof By probability theory we have (a) \(\hbox {P}(\hbox {y}{\vert }\hbox {x}) = \Sigma _{\mathrm{z}}\hbox {P}(\hbox {y}{\vert }\hbox {x,z})\cdot \hbox {P}(\hbox {z}{\vert }\hbox {x})\) and (b) \(\hbox {P(y)} = \Sigma _{\mathrm{z}}\hbox {P}(\hbox {y}{\vert }\hbox {z})\cdot \hbox {P}(\hbox {z})\). The sum in (a) equals (c) \(\Sigma _{\mathrm{z}}\hbox {P}(\hbox {y}{\vert }\hbox {z})\cdot \hbox {P}(\hbox {z}{\vert }\hbox {x})\) by condition (C) of Sect. 2.3, since Y is not d-connected with X given Z. It follows that \(\hbox {P}(\hbox {y}{\vert }\hbox {x}) \ne \hbox {P(y)}\) holds exactly if the two sums in (c) and (b) are unequal. \(\square \)

  9. Proof By probability theory, (a) \(\hbox {P}(\hbox {x}{\vert }\hbox {y}) = \hbox {P}(\hbox {x}{\vert }\hbox {y,z})\cdot \hbox {P}(\hbox {z}{\vert }\hbox {y}) +\hbox {P}(\hbox {x}{\vert }\hbox {y}{,}\lnot \hbox {z})\cdot \hbox {P} (\lnot \hbox {z}{\vert }\hbox {y})\) and (b) \(\hbox {P(x)} = \hbox {P}(\hbox {x}{\vert }\hbox {z})\cdot \hbox {P(z)}+\hbox {P}(\hbox {x} {\vert }\lnot \hbox {z})\cdot \hbox {P}(\lnot \hbox {z})\). By INDEP(X,Y) we have \(\hbox {P}(\hbox {y}{\vert }\hbox {x}) = \hbox {P(y)}\). So the sums in (a) and (b) must be equal. These sums are weighted averages, with the weights in the sum in (a) being \(\hbox {P}(\hbox {z}{\vert }\hbox {y})\) and \(\hbox {P}(\lnot \hbox {z}{\vert }\hbox {y}) = 1{-}\hbox {P}(\hbox {z}{\vert }\hbox {y})\), and the weights in the sum in (b) being P(z) and \(\hbox {P}(\lnot \hbox {z}) = 1{-}\hbox {P(z)}\). By (DO) we have (i) \(\hbox {P}(\hbox {z}{\vert }\hbox {y}) \ne \hbox {P}(\hbox {z}{\vert }\lnot \hbox {y})\) and (ii) \(\hbox {P}(\hbox {x}{\vert }\hbox {z}) \ne \hbox {P}(\hbox {x}{\vert }\lnot \hbox {z})\). It follows from (i), (ii), and the laws of weighted averages that the two sums in (a) and (b) would have to be different if \(\hbox {INDEP}(\hbox {x,y}{\vert }\hbox {Z})\), i.e. \(\hbox {P}(\hbox {x}{\vert }\hbox {y,z}) = \hbox {P}(\hbox {x}{\vert }\hbox {z})\) and \(\hbox {P}(\hbox {x}{\vert }\hbox {y}{,}\lnot \hbox {z}) = \hbox {P}(\hbox {x}{\vert }\lnot \hbox {z})\), would hold. For if a \(\ne \) b and w \(\ne \) w\(^{\prime }\), then \(\hbox {a}\cdot \hbox {w}+\hbox {b}\cdot (1{-}\hbox {w}) = (\hbox {a}{-}\hbox {b})\cdot \hbox {w}+\hbox {b} \ne \hbox {a}\cdot \hbox {w}^{\prime }+\hbox {b}\cdot (1{-}\hbox {w}^{\prime }) = (\hbox {a}{-}\hbox {b})\cdot \hbox {w}^{\prime }+\hbox {b}\). Thus either \(\hbox {P}(\hbox {x}{\vert }\hbox {y,z}) \ne \hbox {P}(\hbox {x}{\vert }\hbox {z})\) or \(\hbox {P}(\hbox {x}{\vert }\hbox {y}{,}\lnot \hbox {z}) \ne \hbox {P}(\hbox {x}{\vert }\lnot \hbox {z})\) must hold, which gives us \(\hbox {DEP}(\hbox {X,Y}{\vert }\hbox {Z})\). \(\square \)

  10. Condition (DO) is sufficient but not necessary for \(\hbox {DEP}(\hbox {Y,X}{\vert }\hbox {Z})\) in linking up cases (it can be shown that a necessary condition for \(\hbox {DEP}(\hbox {Y,X}{\vert }\hbox {Z})\) is \(\exists \hbox {x,y,z}: \hbox {DEP}(\hbox {y,z}{\vert }\hbox {x})\wedge \hbox { DEP}(\hbox {z,x}{\vert }\hbox {y}))\). On the other hand, condition (DO) is not sufficient but necessary for DEP(Y,X) in screening off cases (we are indebted to an anonymous reviewer for pointing this out to us). Here is a proof of the necessity-claim by contraposition \((\lnot (\hbox {DO}) \Rightarrow \hbox {INDEP(X,Y)})\). Assume that (DO) fails (in one of the causal structures \(\hbox {X}\rightarrow \hbox {Z}\rightarrow \hbox {Y},\, \hbox {X}\leftarrow \hbox {Z}\leftarrow \hbox {Y}\), or \(\hbox {X}\leftarrow \hbox {Z}\rightarrow \hbox {Y}\)). Thus \(\hbox {P}(\hbox {z}{\vert }\hbox {y}) = \hbox {P(z)}\) holds for every z with \(\hbox {P}(\hbox {x}{\vert }\hbox {z}) \ne \hbox {P}(\hbox {x}{\vert }\lnot \hbox {z})\). Let Zy be the set of Z’s values that are Y-independent but X-dependent, and Zx the set of Z’s values that are X-independent but Y-dependent. Then \(\hbox {P}(\hbox {x}{\vert }\hbox {y}) = \Sigma _{\mathrm{z}}\hbox {P}(\hbox {x}{\vert }\hbox {z})\cdot \hbox {P}(\hbox {z}{\vert }\hbox {y}) = \Sigma _{\mathrm{z}{\in }\mathrm{Zy}}\hbox {P}(\hbox {x}{\vert }\hbox {z})\cdot \hbox {P}(\hbox {z}{\vert }\hbox {y}) + \Sigma _{\mathrm{z}{\in } \mathrm{Zx}}\hbox {P}(\hbox {x}{\vert }\hbox {z})\cdot \hbox {P}(\hbox {z}{\vert }\hbox {y}) = \hbox {(by our assumptions)} \Sigma _{\mathrm{z}{\in }\mathrm{Zy}}\hbox {P}(\hbox {x}{\vert }\hbox {z})\cdot \hbox {P}(\hbox {z}) + \Sigma _{\mathrm{z}{\in } \mathrm{Zx}}\hbox {P(x)}\cdot \hbox {P}(\hbox {z}{\vert }\hbox {y}) = \Sigma _{\mathrm{z}{\in }\mathrm{Zy}}\hbox {P(x,z)} + \hbox {P(x)}\Sigma _{\mathrm{z}{\in }\mathrm{Zx}}\hbox {P}(\hbox {z}{\vert }\hbox {y}) = (*) \hbox {P}(\hbox {x, z}\in \hbox {Zy}) + \hbox {P(x)}\cdot \hbox {P}(\hbox {z}\in \hbox {Zx}{\vert }\hbox {y})\). By our assumption, \(\hbox {P}(\hbox {y}{\vert }\hbox {z}\in \hbox {Zy)} = \hbox {P(y)}\) holds, which implies \(\hbox {P}(\hbox {z}\in \hbox {Zy}{\vert }\hbox {y}) = \hbox {P}(\hbox {z}\in \hbox {Zy})\). This implies \(\hbox {P}(\hbox {z}\in \hbox {Zx}{\vert }\hbox {y}) = \hbox {P}(\hbox {z}\in \hbox {Zx})\) (via \(\hbox {P}(\hbox {z}\in \hbox {Zx}{\vert }\hbox {y}) = 1{-}\hbox {P}(\hbox {z}\in \hbox {Zy}{\vert }\hbox {y}) = 1{-}\hbox {P}(\hbox {z}\in \hbox {Zy}) = \hbox {P}(\hbox {z}\in \hbox {Zx})\)), which in turn implies \(\hbox {P(x)}\cdot \hbox {P}(\hbox {z}\in \hbox {Zx}{\vert }\hbox {y}) = \hbox {P(x)}\cdot \hbox {P}(\hbox {z}\in \hbox {Zx}) = \hbox {P}(\hbox {x, z}\in \hbox {Zx})\). So we continue as follows \(\hbox {P}(\hbox {x}{\vert }\hbox {y}) = \ldots (*) = \hbox {P}(\hbox {x, z}\in \hbox {Zy}) + \hbox {P}(\hbox {x, z}\in \hbox {Zx}) = \hbox {P(x)}\). Thus, INDEP(x,y).

  11. An example of a cyclic CG violating (C) \(\Leftrightarrow \) (M) is found in (Spirtes et al. (1993), 359f.).

  12. These parameters can equivalently be formulated as functions X = \(\hbox {f(par(X))} + \hbox {U}_{\mathrm{X}}\) together with a random distribution P over mutually independent error variables \(\hbox {U}_{\mathrm{X}}\) (Pearl 2000, p. 44). If the causal influences are non-interactive and linear, one can factorize the parents’ influences in the form of a structural equation model, \(\hbox {X} = \Sigma _{\mathrm{P_\mathrm{i}}{\in }\mathrm{par(X)}}\hbox {c}_{\mathrm{i}}\cdot \hbox {P}_{\mathrm{i}}+\hbox {U}_{\mathrm{X}}\) (SGS, 14f).

  13. This follows from (MC).

  14. Also Pearl (2000, preface) and SGS (Chaps. 3.4–3.5) support a realistic understanding of causal relations.

  15. Empirical submodels correspond to what is called “partial (potential) models” in structuralist philosophy of science (cf. Balzer et al. 1987; Sneed 1971, Chap. 3).

  16. If \(\hbox {INDEP}(\hbox {Z}_{\mathrm{i}-1},\hbox {Z}_{\mathrm{i}+1})\) holds, then either \(\hbox {Z}_{\mathrm{i}}\) is a common effect and must be in E (the “intended case”), or \(\hbox {Z}_{\mathrm{i}}\) is a common or intermediate cause which violates dependence-transitivity (in which case the inclusion of \(\hbox {Z}_{\mathrm{i}}\) in E does no harm), or the dependence between \(\hbox {Z}_{\mathrm{i}-1}\) and \(\hbox {Z}_{\mathrm{i}+1}\) is canceled by a compensating path, which is made improbable by assumption (EN).

  17. A universe in a high entropy state could admit temporally inverted causal processes (cf. Reichenbach 1956, 136ff; Savitt 1996, p. 353).

  18. What we prove here is slightly stronger than what follows from Dawid’s axiom of intersection (Pearl 1988, p. 84) applied to \(\hbox {INDEP}(\mathbf{P},\hbox {Z}{\vert }\hbox {X})\) and \(\hbox {INDEP}(\mathbf{P},\hbox {X}{\vert }\hbox {Z})\).

References

  • Armstrong, D. M. (1983). What is a law of nature?. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Balzer, W., Moulines, C. U., & Sneed, J. D. (1987). An architectonic for science. Dordrecht: Reidel.

    Book  Google Scholar 

  • Beebee, H., Hitchcock, C., & Menzies, P. (Eds.). (2009). The Oxford handbook of causation. Oxford: Oxford University Press.

    Google Scholar 

  • Blalock, H. (1961). Correlation and causality: The multivariate case. Social Forces, 39, 246–251.

    Article  Google Scholar 

  • Carnap, R. (1956). The methodological character of theoretical concepts. In H. Feigl & M. Scriven (Eds.), The foundations of science (pp. 38–76). Minneapolis: University of Minnesota Press.

    Google Scholar 

  • Carnap, R. (1971). A basic system of inductive logic, Part I. In R. Carnap & R. Jeffrey (Eds.), Studies in inductive logic and probability (pp. 33–166). Berkeley: University of California Press.

    Google Scholar 

  • Cartwright, N. (1999). The dappled world: A study of the boundaries of science. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Cartwright, N. (2007). Hunting causes and using them. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Eberhardt, F., & Scheines, R. (2007). Interventions and causal inference. Philosophy of Science, 74, 981–995.

    Article  Google Scholar 

  • Fales, E. (1990). Causation and universals. London: Routledge.

    Google Scholar 

  • French, S. (2008). The structure of theories. In S. Psillos & M. Curd (Eds.), The Routledge companion to philosophy of science (pp. 269–280). London: Routledge.

    Google Scholar 

  • Friedman, M. (1974). Explanation and scientific understanding. Journal of Philosophy, 71, 5–19.

    Article  Google Scholar 

  • Glymour, C. (2004). Critical notice. British Journal for the Philosophy of Science, 55, 779–790.

    Article  Google Scholar 

  • Hausman, D. (1998). Causal asymmetries. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Healey, R. (2009). Causation in quantum mechanics. In H. Beebee et al. (Eds.), The Oxford handbook of causation (pp. 673–686). Oxford: Oxford University Press.

  • Hitchcock, C. (2010). Probabilistic causation. In E. N. Zalta (Ed.), Stanford encyclopedia of philosophy (Winter 2011 ed). http://plato.stanford.edu/archives/win2011/entries/causation-probabilistic/.

  • Hoover, K. (2001). Causality in macroeconomics. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Kitcher, P. (1989). Explanatory unification and the causal structure of the world. In P. Kitcher & W. Salmon (Eds.), Scientific explanation (pp. 410–505). Minneapolis: University of Minnesota Press.

    Google Scholar 

  • Lauritzen, S. L., Dawid, A. P., Larsen, B. N., & Leimer, H.-G. (1990). Independence properties of directed Markov-fields. Networks, 20, 491–505.

    Article  Google Scholar 

  • Lewis, D. (1970). How to define theoretical terms. Journal of Philosophy, 67, 427–446.

    Article  Google Scholar 

  • McDermott, M. (1995). Redundant causation. British Journal for the Philosophy of Science, 40, 523–544.

    Article  Google Scholar 

  • Norton, J. D. (2009). Is there an independent principle of causality in physics? British Journal for the Philosphy of Science, 60, 475–486.

    Article  Google Scholar 

  • Papineau, D. (1992). Can we reduce causal direction to probabilities? In PSA: Proceedings of the biennial meeting of the philosophy of science association, Vol. 2: Symposia and invited papers (pp. 238–252).

  • Papineau, D. (1996). Theory-dependent terms. Philosophy of Science, 63, 1–20.

    Article  Google Scholar 

  • Pearl, J. (1988, 1997). Probabilistic reasoning in intelligent systems. San Francisco: Morgan Kaufmann.

  • Pearl, J. (2000, 2009). Causality. Cambridge: Cambridge University Press.

  • Psillos, S. (2009). Regularity theories. In H. Beebee et al. (Eds.), The Oxford handbook of causation (pp. 131–157). Oxford: Oxford University Press.

  • Reichenbach, H. (1956). The direction of time. Berkeley: University of California Press.

    Google Scholar 

  • Savitt, S. F. (1996). The direction of time. British Journal for the Philosophy of Science, 47, 347–370.

    Article  Google Scholar 

  • Sneed, J. D. (1971). The logical structure of mathematical physics. Dordrecht: Reidel.

    Book  Google Scholar 

  • Spirtes, P., Glymour, C., & Scheines, R. (1993, 2000). Causation, prediction, and search. Cambridge: MIT Press.

  • Steel, D. (2006). Homogeneity, selection, and the faithfulness condition. Minds and Machines, 16, 303–317.

    Article  Google Scholar 

  • Suppes, P. (1970). A probabilistic theory of causality. Amsterdam: North-Holland.

  • Tomasello, M. (1999). The cultural origins of human cognition. Cambridge: Harvard University Press.

    Google Scholar 

  • Uhler, C., Raskutti, G., Bühlmann, P., & Yu, B. (2013). Geometry of the faithfulness assumption in causal inference. Annals of Statistics, 41, 436–463.

    Article  Google Scholar 

  • Verma, T. S. (1986). Causal networks: Semantics and expressiveness. Technical Report R-65. Cognitive Systems Laboratory, University of California, Los Angeles.

  • Woodward, J. (2003). Making things happen. Oxford: Oxford University Press.

    Google Scholar 

  • Wright, S. (1921). Correlation and causation. Journal of Agricultural Research, 20, 557–585.

    Google Scholar 

  • Zhang, J., & Spirtes, P. (2003). Strong faithfulness and uniform consistency in causal inference. In Proceedings of the 19th conference on uncertainty in artificial intelligence (pp. 632–639). San Francisco: Morgan Kaufmann.

  • Zhang, J., & Spirtes, P. (2008). Detection of unfaithfulness and robust causal inference. Minds and Machines, 18, 239–271.

    Article  Google Scholar 

  • Zhang, J., & Spirtes, P. (2011). Intervention, determinism, and the causal minimality condition. Synthese, 182, 335–347.

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported by Deutsche Forschungsgemeinschaft, research unit “Causation \({\vert }\) Laws \({\vert }\) Dispositions \({\vert }\) Explanation” (FOR 1063). For important discussions we are indebted to Clark Glymour, Paul Näger, Jon Williamson, Peter Spirtes, Mathias Frisch, Michael Baumgartner, Andreas Hüttemann, Oliver Scholz, Markus Schrenk, Stathis Psillos and Marie Kaiser.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gerhard Schurz.

Appendix: Proofs of lemmata and theorems

Appendix: Proofs of lemmata and theorems

Proof of Lemma 1

For (1.1): Assume \(({{\varvec{\mathcal {V}}}}{,}{{\varvec{\mathcal {E}}}})\) is acyclic and \(\hbox {X}\rightarrow \hbox {Y}\) in \({{\varvec{\mathcal {E}}}}\). For reductio, assume \({\uppi }\) is a path in \({{\varvec{\mathcal {E}}}}{-}\{\hbox {X}{\rightarrow }\hbox {Y}\}\) that connects X with Y and is activated by \(\hbox {par(Y)}{-}\{\hbox {X}\}\). So \({\uppi }\) must have the form \(\hbox {X}--\,\hbox {Z}\leftarrow \hbox {Y}\). Thus \({\uppi }\) must carry at least one common effect Z*; otherwise \({\uppi }\) would have the form \(\hbox {X}\leftarrow \leftarrow \hbox {Y}\) and \(({{\varvec{\mathcal {V}}}}{,}{{\varvec{\mathcal {E}}}})\) would be cyclic. But since \(\hbox {Z}^{*}\notin \hbox {par(Y)}{-}\{\hbox {X}\}, \,{\uppi }\) is blocked by \(\hbox {par(Y)}{-}\{\hbox {X}\}\), contradicting our assumption.

For (1.2): Assume \(\mathbf{U} \supset \hbox {par(Y)}{-}\{\hbox {X}\}\). Both \(\hbox {par(Y)}{-}\{\hbox {X}\}\) and U d-separate X from Y in \(({{\varvec{\mathcal {V}}}}{,}{{\varvec{\mathcal {E}}}}{-}\{\hbox {X}\rightarrow \hbox {Y}\})\) (by assumption and lemma 1.1, respectively). Direction \(\Leftarrow \) is trivial. For direction \(\Rightarrow \) we have to prove that \(\hbox {INDEP}(\hbox {X,Y}{\vert }\hbox {par(Y)}{-}\{\hbox {X}\})\) implies \(\hbox {INDEP}(\hbox {X,Y}{\vert }\mathbf{U})\). We proceed by induction on the number of elements in \(\mathbf{U}-(\hbox {par(Y)}{-}\{\hbox {X}\})\). Assume the claim has been proved for some \(\mathbf{U} \supseteq \hbox {par(Y)}{-}\{\hbox {X}\}\) and let \(\mathbf{U}^{\prime } = \mathbf{U}\cup \{\hbox {Z}\}\), where U and U \(^{\prime }\) d-separate X from Y in \(({{\varvec{\mathcal {V}}}}{,}{{\varvec{\mathcal {E}}}}{-}\{\hbox {X}\rightarrow \hbox {Y}\})\). We show that

(*) Either Z is d-separated from X by \(\mathbf{U}\cup \{\hbox {Y}\}\), or Z is d-separated from Y by \(\mathbf{U}\cup \{\hbox {X}\}\).

For reductio, assume (*) does not hold. We distinguish two cases. Case (A): Z is d-connected with X given \(\mathbf{U}\cup \{\hbox {Y}\}\), and with Y given \(\mathbf{U}\cup \{\hbox {X}\}\), by two paths \({\uppi }_{\mathrm{X}}\): Z \(-\) X and \({\uppi }_{\mathrm{Y}}{:}\;\hbox {Z}-\hbox {Y}\), respectively, which both do not carry X \(\rightarrow \) Y. In this case, Z is d-connected with X and with Y by the respective paths given U alone. Let \({\uppi }\) be the concatenation of \({\uppi }_{\mathrm{X}}\) and \({\uppi }_{\mathrm{Y}}\). If Z is a common effect on \({\uppi }\), then U \(^{\prime }\) would activate a new X–Y-connecting path, which is excluded, and if Z is not a common effect on \({\uppi }\), then \({\uppi }\) would d-connect X and Y given U, which is excluded. So case (A) is impossible. The other possible case is (B): Z is d-connected with Y and with X only by paths \({\uppi }\) that contain \(\hbox {X}\rightarrow \hbox {Y}\) as a subpath; these paths must either have the form (i) \(\hbox {X}\rightarrow \hbox {Y}-\hbox {Z}\) or (ii) \(\hbox {Z}-\hbox {X}\rightarrow \hbox {Y}\), but not both (else we would have case (A)). But this is impossible since in case (i) X is d-separated from Z by \(\mathbf{U}\cup \{\hbox {Y}\}\) (since if \(\hbox {Y}\leftarrow \hbox {Y}^{\prime }\) is on \({\uppi }\), then \(\hbox {Y}^{\prime } \in \mathbf{U}\)), and in case (ii) Z is d-separated from Y by \(\mathbf{U}\cup \{\hbox {X}\}\).

Dawid’s axioms of probabilistic independence (cf. Pearl 1988, p. 84) include the following (probabilistically valid) axioms:

  • Contraction: \(\hbox {INDEP}(\hbox {X,Y}{\vert }\{\hbox {Z}\}\cup \mathbf{U})\wedge \hbox {INDEP}(\hbox {X,Z}{\vert }\mathbf{U}) \Rightarrow \hbox {INDEP}(\hbox {X},\{\hbox {Y,Z}\}{\vert }\mathbf{U})\).

  • Decomposition: \(\hbox {INDEP}(\hbox {X},\{\hbox {Y,Z}\}{\vert }\mathbf{U}) \Rightarrow \hbox {INDEP}(\hbox {X,Y}{\vert }\mathbf{U})\wedge \hbox {INDEP}(\hbox {X,Z}{\vert }\mathbf{U})\).

  • Weak union: \(\hbox {INDEP}(\hbox {X},\{\hbox {Y,Z}\}{\vert }\mathbf{U}) \Rightarrow \hbox {INDEP}(\hbox {X,Y}{\vert }\{\hbox {Z}\}\cup \mathbf{U})\).

Assume by (*) that Z is d-separated from X by \(\mathbf{U}\cup \{\hbox {Y}\}\). (The other possibility is that Z is d-separated from Y by \(\mathbf{U}\cup \{\hbox {X}\}\); the proof proceeds in exactly the same way.) Since \(({{\varvec{\mathcal {V}}}}{,}{{\varvec{\mathcal {E}}}}{,}\hbox {P})\) satisfies (C), (a) \(\hbox {INDEP}(\hbox {X,Z}{\vert }\mathbf{U}\cup \{\hbox {Y}\})\) follows. From the induction hypothesis \(\hbox {INDEP}(\hbox {X,Y}{\vert }\mathbf{U})\) and (a) \(\hbox {INDEP}(\hbox {X,Z}{\vert }\mathbf{U}\cup \{\hbox {Y}\})\) we get (b) \(\hbox {INDEP}(\hbox {X},\{\hbox {Y,Z}\}{\vert }\mathbf{U})\) by contraction, and from (b) we get \(\hbox {INDEP}(\hbox {X,Y}{\vert }\mathbf{U}\cup \{\hbox {Z}\})\), i.e. \(\hbox {INDEP}(\hbox {X,Y}{\vert }\mathbf{U}^{\prime })\) by weak union.\(\square \)

Proof of Theorem 2

Proof of \((P) \Rightarrow (Min)\): Assume \(({{\varvec{\mathcal {V}}}}{,}{{\varvec{\mathcal {E}}}}{,}\hbox {P})\) is not minimal. So there exists an \(\hbox {X}{\rightarrow }\hbox {Y}\) in \({{\varvec{\mathcal {E}}}}\) such that \(({{\varvec{\mathcal {V}}}}{,}{{\varvec{\mathcal {E}}}}^{-},\hbox {P})\) satisfies (C), where \({{\varvec{\mathcal {E}}}}^{-} := {{\varvec{\mathcal {E}}}}{-}\{\hbox {X}{\rightarrow }\hbox {Y}\}\). Since \(\hbox {par(Y)}{-}\{\hbox {X}\}\) d-separates X from Y in \(({\varvec{\mathcal {V}}},{\varvec{\mathcal {E}}}^{-})\) (by lemma 1.1), \(\hbox {INDEP}(\hbox {X,Y}{\vert }\hbox {par(Y)}{-}\{\hbox {X}\})\) holds because of (C). So (P) is violated.

Proof of (Min) \(\Rightarrow (P)\): Assume that \(({{\varvec{\mathcal {V}}}}{,}{{\varvec{\mathcal {E}}}}{,}\hbox {P})\) satisfies (Min), which means that there is no \(\hbox {X,Y}\in {{\varvec{\mathcal {V}}}}\) with \(\hbox {X}{\rightarrow }\hbox {Y}\in {{\varvec{\mathcal {E}}}}\) such that \(({{\varvec{\mathcal {V}}}}{,}{{\varvec{\mathcal {E}}}}{-}\{\hbox {X}{\rightarrow }\hbox {Y}\},\hbox {P})\) still satisfies (C). The latter is the case iff

(*) the parent set par(Y) of every \(\hbox {Y}\in {{\varvec{\mathcal {V}}}}\) (with \(\hbox {par(Y)} \ne \varnothing \)) is minimal in the sense that removing one of Y’s parents X from par(Y) would make a difference for Y, meaning that \(\hbox {P}(\hbox {y}{\vert }\hbox {x,par(Y)}{-}\{\hbox {X}\}) \ne \hbox {P}(\hbox {y}{\vert }\hbox {par(Y)}{-}\{\hbox {X}\})\) holds for some X-value x, Y-value y, and some instantiations of \(\hbox {par(Y)}{-}\{\hbox {X}\}\).

For otherwise P would admit the Markov factorization according to (8.2) both relative to \(({{\varvec{\mathcal {V}}}}{,}{{\varvec{\mathcal {E}}}}{,}\hbox {P})\) and relative to \(({{\varvec{\mathcal {V}}}}{,}{{\varvec{\mathcal {E}}}}{-}\{\hbox {X}{\rightarrow }\hbox {Y}\},\hbox {P})\). This implies by Theorem 1 that \(({{\varvec{\mathcal {V}}}}{,}{{\varvec{\mathcal {E}}}}{,}\hbox {P})\) and \(({{\varvec{\mathcal {V}}}}{,}{{\varvec{\mathcal {E}}}}{-}\{\hbox {X}{\rightarrow }\hbox {Y}\},\hbox {P})\) satisfy (C), i.e. \(({{\varvec{\mathcal {V}}}}{,}{{\varvec{\mathcal {E}}}}{,}\hbox {P})\) is not minimal, which contradicts our assumption. Now, (*) entails that \(\hbox {Dep}(\hbox {X,Y}{\vert }\hbox {par(Y)}{-}\{\hbox {X}\})\) holds for all \(\hbox {X,Y}\in \mathbf{V}\) with \(\hbox {X}{\rightarrow }\hbox {Y}\), i.e., that \(({{\varvec{\mathcal {V}}}}{,}{{\varvec{\mathcal {E}}}}{,}\hbox {P})\) satisfies (P). \(\square \)

Proof of Theorem 5

For (5.1): Recall Dawid’s axioms of probabilistic independence from the proof of lemma (1.2). By switching Y with Z and setting \(\mathbf{U} = \varnothing \), the contraposed forms of decomposition and contraction give us \(\hbox {DEP(X,Z)} \Rightarrow \hbox {DEP}(\hbox {X}{,}\{\hbox {Y,Z}\})\) and \(\hbox {DEP}(\hbox {X}{,}\{\hbox {Y,Z}\}) \wedge \hbox {INDEP}(\hbox {X,Z}{\vert }\hbox {Y}) \Rightarrow \hbox {DEP}(\hbox {X,Y})\). In the same way, switching X with Y and setting \(\mathbf{U} = \varnothing \) gives us \(\hbox {DEP(Y,Z)} \Rightarrow \hbox {DEP}(\hbox {Y}{,}\{\hbox {X,Z}\})\) and \(\hbox {DEP}(\hbox {Y}{,}\{\hbox {X,Z}\}) \wedge \hbox { INDEP}(\hbox {X,Y}{\vert }\hbox {Z}) \Rightarrow \hbox {DEP}(\hbox {Y,Z})\).

By these considerations, the assumptions DEP(X,Z) and \(\hbox {INDEP}(\hbox {X,Z}{\vert }\hbox {Y})\) of Theorem 5.1 entail DEP(X,Y) and DEP(Y,Z). So (by (C)) X and Y as well as Y and Z are d-connected given \(\varnothing \) by two paths \(\hbox {X} --\,\hbox {Y}\) and \(\hbox {Y}--\,\hbox {Z}\), whence (a) these two paths don’t carry a common effect. (F) implies that X and Z are d-separated by Y which together with (a) implies (b) that \(\hbox {X}-\cdots \rightarrow \hbox {Y}\leftarrow \cdots -\hbox {Z}\) is impossible. (a) + (b) entail that either \(\hbox {X}\leftarrow \leftarrow \hbox {Y}\) or \(\hbox {Y}\rightarrow \rightarrow \hbox {Z}\). But both possibilities are excluded by condition (T).

For (5.2): Because of (F) and INDEP(X,Z), X and Z are d-separated (by \(\varnothing \)). Thus and because of (C) and \(\hbox {DEP}(\hbox {X,Z}{\vert }\hbox {Y})\), X and Z are d-connected by a path \({\uppi }\) that carries a common effect Y\(^{\prime }\) that is either identical with Y or has Y as an effect. So \({\uppi }{:}\;\hbox {X}-{\cdots }\rightarrow \hbox {Y}^{\prime } \leftarrow {\cdots }-\hbox {Z}\), where \({\uppi }\) contains no colliders except Y\(^{\prime }\). By condition (T) and assumption (*), this path cannot carry a common cause of Y\(^{\prime }\) and X and one of Y\(^{\prime }\) and Z. So either (a) \(\hbox {X}\rightarrow \rightarrow \hbox {Y}^{\prime }\) or (b) \(\hbox {Y}^{\prime } \leftarrow \leftarrow \hbox {Z}\) must be a subpath of \({\uppi }\). Both cases are excluded by (T): In case (a), t(Y) \(<\) t(X) holds by assumption (*), whence also t(Y\(^{\prime }\)\(<\) t(X) must hold (since either Y\(^{\prime }\) = Y or \(\hbox {Y}^{\prime } \rightarrow \rightarrow \hbox {Y}\), which implies by (T) that \(\hbox {t}(\hbox {Y}^{\prime }) < \hbox {t(Y)}\)). The same argument applies to case (b). \(\square \)

Proof of Theorem 6

For (6.1): By DEP(X,Y) and (C), X and Y are d-connected (by \(\varnothing \)), and by (T) and t(X) = t(Y), X and Y can only be d-connected by common causes Z lying in their past. By (C), conditionalization on the set U of all such common causes must screen off X from Y.

For (6.2): Assume Z is in the future of X and Y and screens off X from Y, whence DEP(X,Y) and \(\hbox {INDEP}(\hbox {X,Y}{\vert }\hbox {Z})\). It follows from the axioms of decomposition and contraction (similar as in the proof of Theorem 5.1) that DEP(X,Z) and DEP(Y,Z) hold. Thus, X and Y, X and Z, and also Y and Z must be d-connected given \(\varnothing \). From this together with (C), (T) and \(\hbox {t(Z)} > \hbox {t(X)} = \hbox {t(Y)}\) it follows that X and Y must be d-connected by a common cause path \({\uppi }_{\mathbf{U}}{:}\; \hbox {X}\leftarrow \leftarrow \mathbf{U}\rightarrow \rightarrow \hbox {Y}\) (where U is the set of all common causes of X and Y), and that X and Z must be d-connected by a cause–effect or a common cause path; the same holds for Y instead of X. So there will also be a path \(\hbox {X}-{\cdots }\rightarrow \hbox {Z}^{\prime } \leftarrow {\cdots }-\hbox {Y}\), where Z\(^{\prime }\) is the only collider on this path and either (i) Z\(^{\prime }\) = Z or (ii) \(\hbox {Z}^{\prime } \rightarrow \rightarrow \hbox {Z}\) holds. In what follows we write Z for \(Z^{\prime }.\)

Let \(\mathbf{P} = \hbox {par(X)}\cup \hbox {par(Y)}\) be the set of all parents of X and of Y. By the Markov-condition (M) and (T), P is a past screen-off set for {X,Y} (though a redundant one: par(X) or par(Y) alone is one, too); so \(\hbox {INDEP}(\hbox {X,Y}{\vert }\mathbf{P})\) holds. Note also that for some p, DEP(X,p) and DEP(Y,p) must hold, since otherwise, by the proof in footnote 10, the path \(\pi _{\mathbf{U}}\) that d-connects X and Y could not transmit dependence between X and Y.

There are two possible cases: either (A) \(\hbox {INDEP}(\hbox {X,Y}{\vert }\mathbf{P}\cup \{\hbox {Z}\})\) or (B) \(\hbox {DEP}(\hbox {X,Y}{\vert }\mathbf{P}\cup \{\hbox {Z}\})\). Assume (B) is the case. Then we have \(\hbox {INDEP}(\hbox {X,Y}{\vert }{Z})\) and \(\hbox {DEP}(\hbox {X,Y}{\vert }\mathbf{P}\cup \{\hbox {Z}\})\), i.e. X and Y are linked up and thus d-connected given Z conditional on P, but not unconditionally. In other words, conditionalizing on P isolates the common effect path \(\hbox {X}-\cdots \rightarrow \hbox {Z}\leftarrow \cdots -\hbox {Y}\) between X and Y which was exactly canceled by \(\pi _\mathbf{U}\) before. This would be a case of cancelation unfaithfulness, which is, according to (RF), highly improbable.

Now assume case (A), \(\hbox {INDEP}(\hbox {X,Y}{\vert }\mathbf{P}\cup \{\hbox {Z}\})\). Since X and Y are d-connected given \(\mathbf{P}\cup \{\hbox {Z}\}\), this constitutes, again, a case of unfaithfulness. We will show that it is (a) either one of cancelation and, hence, made improbable by axiom (RF), or (b) a case of deterministic dependence of some X- or P-value on every \(\hbox {z}\in \hbox {Val(Z)}\). Let us assume that (b) is false, i.e.:

(*) \(\forall \mathbf{p}\forall \hbox {x}\exists \hbox {z}{:}\;\hbox {P}(\mathbf{p},\!\hbox {x,z}) \ne 0,1\).

Without restricting the assumption of our proof we can assume that the prior probabilities of all values of P, X, and Z are positive; simply by removing all values with zero-probability from the value-space.

We compute, attaching indices to the identity signs for easy reference:

figure b

Identity “\(=_{3}\)” holds by assumption \(\hbox {INDEP}(\hbox {X,Y}{\vert }\hbox {Z})\), given (*). (Note that by the definition of “\(\hbox {INDEP}(\hbox {X,Y}{\vert }\hbox {Z})\)\(\forall \hbox {x,y,z}{:}\;\hbox {P}(\hbox {y}{\vert }\hbox {x,z}) = \hbox {P}(\hbox {y}{\vert }\hbox {z})\) or \(\hbox {P}(\hbox {x}{\vert }\hbox {z}) = 0\) or \(\hbox {P(z)} = 0\) holds; the latter case is excluded by our assumption of positive prior probabilities.) The identities “\(=_{1}\)” and “\(=_{5}\)” hold by probability theory, and “\(=_{2}\)” follows from case (A) (plus assumption (*) and positive priors). This gives identity “\(=_{4}\)”. The identity “\(=_{3}\)” can only hold in two cases:

Case (A.1): \(\hbox {DEP}(\hbox {Y},\!\mathbf{P}{\vert }\hbox {Z})\), i.e. there exist at least two distinct values \(\hbox {P}(\hbox {Y}{\vert }\mathbf{p}_{1},\!\hbox {Z})\ne \hbox {P}(\hbox {Y}{\vert }\mathbf{p}_{2},\!\hbox {Z})\) in the above sums at the right hand side. Assume \(\hbox {P}(\mathbf{p}{\vert }\hbox {X,Z})\) differs from \(\hbox {P}(\mathbf{p}{\vert }\hbox {Z})\) for some P-values p. The values of \(\hbox {P}(\mathbf{p}{\vert }\hbox {X,Z})\) and \(\hbox {P}(\mathbf{p}{\vert }\hbox {Z})\) are weights whose sums always add up to one. Since the differences in these weights do not change the resulting sums \((\Sigma _{\mathrm{p}}\hbox {P}(\hbox {Y}{\vert }\mathbf{p},\hbox {Z}) \cdot \hbox { weight}_{\mathrm{p}})\), this can only be because these differences are exactly canceled by the differences in the \(\hbox {P}(\hbox {Y}{\vert }\mathbf{p}{,}\hbox {Z})\)-values. This would constitute a case of unfaithfulness due to internal canceling paths (in the sense of Näger, see Sect. 3.3), which is made improbable by axiom (RF). So we infer \(\hbox {P}(\mathbf{p}{\vert }\hbox {X,Z}) = \hbox {P}(\mathbf{p}{\vert }\hbox {Z})\) for all P-values p, i.e. \(\hbox {INDEP}(\hbox {X},\!\mathbf{P}{\vert }\hbox {Z})\).

  • Case (A.2): \(\hbox {INDEP}(\hbox {Y},\!\mathbf{P}{\vert }\hbox {Z})\). From the two cases (A.1+2) we conclude:

  • Case (A*): \(\hbox {INDEP}(\hbox {X},\!\mathbf{P}{\vert }\hbox {Z}) \vee \hbox {INDEP}(\hbox {Y},\!\mathbf{P}{\vert }\hbox {Z})\), i.e. Z screens off either X or Y from P.

For the rest of the proof we assume

  1. (a)

    \(\hbox {INDEP}(\hbox {X},\!\mathbf{P}{\vert }\hbox {Z})\).

If \(\hbox {INDEP}(\hbox {Y},\!\mathbf{P}{\vert }Z)\), X and Y change their roles and the proof works in exactly the same way.

By the remarks above, the causal structure between X, P and Z has one of the following three forms:

figure c

Note that the path(s) from P to Z in (i) and (ii) may or may not include a common cause path (that may go through Y); this is indicated by “\(- \rightarrow \)”.

Assume causal structure (i): We assume (+) \(\hbox {DEP}(\hbox {X,Z}{\vert }\mathbf{P})\)—otherwise the case is treated exactly as in the proof for structure (ii), which rests on condition \(\hbox {INDEP}(\hbox {X,Z}{\vert }\mathbf{P})\). Likewise we assume (++) \(\hbox {DEP}(\mathbf{P},\!\hbox {Z}{\vert }\hbox {X})\)—otherwise the case is treated exactly as in the proof for structure (iii), which rests on \(\hbox {INDEP}(\mathbf{P},\!\hbox {Z}{\vert }\hbox {X})\).

From (+) and (++) it follows that DEP(Z,P) must hold (otherwise, INDEP(P,Z) and \(\hbox {INDEP}(\mathbf{P}{,}\hbox {X}{\vert }\hbox {Z})\) would imply by the axioms of contraction and decomposition INDEP(P,X), which contradicts DEP(P,X)). This means that we have a case of cancelation unfaithfulness: though both paths \(\mathbf{P}\rightarrow \hbox {X}\) and \(\mathbf{P}-\cdots \rightarrow \hbox {Z}\leftarrow \hbox {X}\) transmit probabilistic P–X-dependence when conditionalizing on Z, they exactly compensate each other. This case is made improbable by axiom (RF).

Assume causal structure (ii): Here we have \(\hbox {INDEP}(\hbox {X,Z}{\vert }\mathbf{P})\) by the causal Markov condition (M).Footnote 18 From \(\hbox {INDEP}(\hbox {X,Z}{\vert }\mathbf{P}), \hbox {INDEP}(\hbox {X},\!\mathbf{P}{\vert }\hbox {Z})\) and our assumption of positive priors we get:

  1. (b)

    \(\forall \hbox {x,z},\!\mathbf{p}{:}\;\hbox {P}(\hbox {x}{\vert }\hbox {z},\!\mathbf{p}) = \hbox {P}(\hbox {x}{\vert }\hbox {z})\vee \hbox {P}(\mathbf{p}{\vert }\hbox {z}) = 0\), and

  2. (c)

    \(\forall \hbox {x,z},\!\mathbf{p}{:}\;\hbox {P}(\hbox {x}{\vert }\hbox {z},\!\mathbf{p}) = \hbox {P}(\hbox {x}{\vert }\mathbf{p}) \vee \hbox {P}(\mathbf{p}{\vert }\hbox {z}) = 0\).

  3. (d)

    By DEP(X,P) there exist \(\mathbf{p}_{1},\!\mathbf{p}_{2}\) such that \(\hbox {P}(\hbox {x}{\vert }\mathbf{p}_{1}) \ne \hbox {P}(\hbox {x}{\vert }\mathbf{p}_{2})\).

Thus either \(\forall \hbox {z}{:}\;\hbox {P}(\mathbf{p}_{1}{\vert }\hbox {z}) = 0 \,\vee \, \hbox {P}(\mathbf{p}_{2}{\vert }\hbox {z}) = 0\); or for some z: \(\hbox {P}(\hbox {x}{\vert }\mathbf{p}_{\mathrm{i}}) = \hbox {P}(\hbox {x}{\vert }\hbox {z})\) holds by (b) + (c) for i = 1 and i = 2, which contradicts (d). So \(\forall \hbox {z}{:}\;\hbox {P}(\mathbf{p}_{1}{\vert }\mathrm{z}) = 0\, \vee \,\hbox {P}(\mathbf{p}_{2}{\vert }\hbox {z}) = 0)\), i.e. every Z-value has some P-value that depends deterministically on it.

Finally assume causal structure (iii): Here we have \(\hbox {INDEP}(\mathbf{P},\!\hbox {Z}{\vert }\hbox {X})\) since P and Z are d-separated by X. \(\hbox {INDEP}(\mathbf{P}{,}\hbox {Z}{\vert }\hbox {X})\) and \(\hbox {INDEP}(\mathbf{P}{,}\hbox {X}{\vert }\hbox {Z})\) plus positive priors give us (similarly as in case (ii))

  • (b\({^\prime }\))\(\,\,\forall \hbox {x,z},\!\mathbf{p}{:}\;\hbox {P}(\mathbf{p}{\vert }\hbox {z,x}) = \hbox {P}(\mathbf{p}{\vert }\hbox {x}) \vee \hbox { P}(\hbox {x}{\vert }\hbox {z}) = 0\), and

  • (c\({^\prime }\)) \(\forall \hbox {x,z},\!\mathbf{p}{:}\;\hbox {P}(\mathbf{p}{\vert }\hbox {z,x}) = \hbox {P}(\mathbf{p}{\vert }\hbox {z}) \vee \hbox { P}(\hbox {x}{\vert }\hbox {z}) = 0\).

  • (d\({^\prime }\)) By DEP(X,P) exist \(\hbox {x}_{1},\hbox {x}_{2}\) such that \(\hbox {P}(\mathbf{p}{\vert }\hbox {x}_{1}) \ne \hbox {P}(\mathbf{p}{\vert }\hbox {x}_{2})\).

Thus either \(\forall \hbox {z}{:}\;\hbox {P}(\hbox {x}_{1}{\vert }\hbox {z}) = 0 \vee \hbox {P}(\hbox {x}_{2}{\vert }\hbox {z}) = 0\); or for some z: \(\hbox {P}(\mathbf{p}{\vert }\hbox {x}_{\mathrm{i}}) = \hbox {P}(\mathbf{p}{\vert }\hbox {z})\) holds by (b\(^{\prime }\)) + (c\(^{\prime }\)) for i = 1 and i = 2, which contradicts (d\(^{\prime }\)). So every Z-value has some X-value that depends deterministically on it.

Thus, assumption (*) must be false, which concludes our proof.\(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Schurz, G., Gebharter, A. Causality as a theoretical concept: explanatory warrant and empirical content of the theory of causal nets. Synthese 193, 1073–1103 (2016). https://doi.org/10.1007/s11229-014-0630-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11229-014-0630-z

Keywords

Navigation