Skip to main content

Direct Causes and the Trouble with Soft Interventions

Abstract

An interventionist account of causation characterizes causal relations in terms of changes resulting from particular interventions. I provide a new example of a causal relation for which there does not exist an intervention satisfying the common interventionist standard. I consider adaptations that would save this standard and describe their implications for an interventionist account of causation. No adaptation preserves all the aspects that make the interventionist account appealing. Part of the fallout is a clearer account of the difficulties in characterizing so-called “soft” interventions.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3

Notes

  1. Note that I have exchanged y for z from the original formulation to reduce confusion in the application of the definition in the subsequent discussion.

  2. For the close reader, I literally mean “all” here, i.e. even noise terms. As will be seen in Fig. 3 below, cases similar to model T are possible when particular unobserved noise terms are permitted.

  3. In its standard form, faithfulness only refers to the passive observational distribution, and since the models in Fig. 1 do not exhibit any (conditional) independencies in the passive observational distribution, they do not violate the standard formulation of faithfulness. However, it is natural to extend faithfulness to apply to all manipulated graphs and their interventional distributions as well. Model T clearly violates this stronger version of faithfulness, since it leaves x and z independent in the distribution where x and y are simultaneously subject to an intervention, even though x is a direct cause of z (as determined by the intervention on the full causal graph including u and v). As with violations of standard faithfulness, model T exhibits a particular constellation of parameters that can be characterized by an algebraic constraint on the parameters. Meek’s measure theoretic argument that such constellations only occur with measure zero (see Theorem 7 in Meek 1995) can be similarly applied here (see “Appendix 1”).

  4. I am grateful to an anonymous reviewer for making this proposal. I hope this clarifies why such a move will not work, at least not with the definition of ‘contributing cause’ that Woodward gives in Woodward (2008, p. 209).

  5. I am grateful to Dominik Janzing for pointing this out. The example is similar to the “matching pennies” game, only with one coin flip unobserved.

  6. I am grateful to a reviewer for alerting me to this route and at the same time supplying the reasons why it does not sound promising.

  7. Standard discussions of actual causation do not consider unobserved variables, so the problem exhibited by model T does not arise.

  8. The interventionist may want to be cautious not to throw out the baby with the bathwater, since instrumental variables have formally the same structure as soft interventions and are widely used as a causal discovery tool.

  9. I condition on the intervened variable(s) in order to avoid having to specify a particular intervention distribution.

References

  • Baumgartner, M. (2012). The logical form of interventionism. Philosophia, 40(4), 751–761.

    Article  Google Scholar 

  • Eberhardt, F. (2013). Experimental indistinguishability of causal structures. In Proceedings of the 2012 Philosophy of Science Association Meeting, forthcoming. http://philsci-archive.pitt.edu/9511/.

  • Fisher, R. A. (1935). The design of experiments. NewYork: Hafner.

    Google Scholar 

  • Glymour, C. (2004). Review of James Woodward, Making Things Happen: A theory of causal explanation. British Journal for Philosophy of Science, 55, 779–790.

    Article  Google Scholar 

  • Glymour, C. (2007). Learning the structure of deterministic systems. In A. Gopnik & L. Schulz (Eds.), Causal learning psychology philosophy computation. Oxford, SA: Oxford University Press.

    Google Scholar 

  • Hyttinen, A., Eberhardt, F., & Hoyer, P.O. (2012). Learning linear cyclic causal models with latent variables. Journal of Machine Learning Research, 13, 3387–3439.

    Google Scholar 

  • McDermott, M. (1995). Redundant causation. British Journal for Philosophy of Science, 46, 523–544.

    Article  Google Scholar 

  • Meek, C. (1995). Strong completeness and faithfulness in Bayesian networks. In P. Besnard & S. Hanks (Eds.), Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence (pp. 411–418). San Francisco, CA: Morgan Kaufmann.

  • Menzies, P., & Price, H. (1993). Causation as a secondary quality. British Journal for Philosophy of Science, 44, 187–203.

    Article  Google Scholar 

  • Pearl, J. (2000). Causality. Oxford: Oxford University Press.

    Google Scholar 

  • Richardson, T., Schulz, L., & Gopnik, A. (2007). Data-mining probabilists or experimental determinists? A dialogue on the principles underlying causal learning in children. In A. Gopnik & L. Schulz (Eds.), Causal learning: Psychology, philosophy, computation (pp. 208–230). Oxford: Oxford University Press.

    Chapter  Google Scholar 

  • Spirtes, P., Glymour, C., & Scheines, R. (2000). Causation, prediction and search (2nd ed.). Cambridge, MA: MIT Press.

    Google Scholar 

  • Strevens, M. (2007). Review of Woodward, Making Things Happen. Philosophy and Phenomenological Research, 74(1), 233–249.

    Article  Google Scholar 

  • Strevens, M. (2008). Comments on Woodward, Making Things Happen. Philosophy and Phenomenological Research, 77(1), 171–192.

    Article  Google Scholar 

  • Woodward, J. (2003). Making Things Happen. Oxford: Oxford University Press.

    Google Scholar 

  • Woodward, J. (2008). Response to Strevens. Philosophy and Phenomenological Research, 77(1), 193–212.

    Article  Google Scholar 

Download references

Acknowledgments

Many people, including many who are not proponents of the interventionist account of causation, have reacted with discomfort or at least some surprise to my presentation of model T and C. I have benefitted enormously from their reactions and discussions with them. In particular, I would like to thank (in alphabetical order) Clark Glymour, Dominik Janzing, Conor Mayo-Wilson, Richard Scheines, Peter Spirtes, Jim Woodward and Jiji Zhang. The models are a development of ones that are indistinguishable by single interventions only, which I worked on in a different context with Antti Hyttinen and Patrik Hoyer. I would also like to thank five anonymous reviewers who pressed me to distinguish more explicitly the case presented here from traditional violations of faithfulness discussed in the literature. This research was supported by a grant from the James S. McDonnell Foundation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Frederick Eberhardt.

Appendix

Appendix

Appendix 1: Constraints for Models T and C

I consider the general constraints on the parameterization that two models of the structure of T and C must satisfy in order to be indistinguishable for a passive observation and all surgical interventions on the observed variables. In the following T and C refer to models with the respective structures in Fig. 1, rather than the specific parameterizations listed in Table 1, and I use the notation p(A|B||C) to refer to the probability of the variables in A conditional on the variables in B in the distribution in which the variables in C have been subject to a surgical intervention.

To remain indistinguishable when V = {xyz}, model T and model C must be identical for the following seven distributions over the observed variables:

  1. 1.

    the passive observational distribution:

    $$ \begin{aligned} P(X, Y, Z) &= \sum_{uv} P(U) P(V) P(X | U) P(Y | V, X) P(Z | U, V, X, Y) \end{aligned} $$
  2. 2.

    the manipulated distribution Footnote 9 with an intervention on X

    $$ \begin{aligned} P(Y, Z | X || X) &= \sum_{uv} P(U) P(V) P(Y | V, X || X) P(Z | U, V, X, Y || X) \\ &= \sum_{uv} P(U) P(V) P(Y | V, X) P(Z | U, V, X, Y) \\ \end{aligned} $$

    To illustrate the implied constraints, I substitute the distribution parameters from Table 1 for this particular case. It can be done analogously for all seven distributions.

    $$ \begin{aligned} P(y=1, z=1 | x=1 || x=1) &= t_{1} t_{2} t_{5} t_{9} + (1- t_{1}) t_{2} t_{5} t_{17} + t_{1} (1- t_{2}) t_{7} t_{13} + (1- t_{1}) (1- t_{2}) t_{7} t_{21} \\ P(y=1, z=0 | x=1 || x=1) &= t_{1} t_{2} t_{5} (1- t_{9}) + (1- t_{1}) t_{2} t_{5} (1-t_{17}) \\ & \quad + t_{1} (1- t_{2}) t_{7} (1-t_{13}) + (1- t_{1}) (1- t_{2}) t_{7} (1-t_{21}) \\ P(y=0, z=1 | x=1 || x=1) &= t_{1} t_{2} (1- t_{5}) t_{10} + (1- t_{1}) t_{2} (1- t_{5}) t_{18} \\ & \quad + t_{1} (1- t_{2}) (1- t_{7}) t_{14}+ (1- t_{1}) (1- t_{2}) (1- t_{7}) t_{22} \\ P(y=0, z=0 | x=1 || x=1) &= t_{1} t_{2} (1- t_{5}) (1- t_{10}) + (1- t_{1}) t_{2} (1- t_{5}) (1-t_{18}) \\ & \quad + t_{1} (1- t_{2}) (1- t_{7}) (1- t_{14}) + (1- t_{1}) (1- t_{2}) (1- t_{7}) (1-t_{22}) \\ P(y=1, z=1 | x=0 || x=0) &= t_{1} t_{2} t_{6} t_{11} + (1- t_{1}) t_{2} t_{6} t_{19} \\ & \quad + t_{1} (1- t_{2}) t_{8} t_{15} + (1- t_{1}) (1- t_{2}) t_{8} t_{23} \\ P(y=1, z=0 | x=0 || x=0) &= t_{1} t_{2} t_{6} (1- t_{11}) + (1- t_{1}) t_{2} t_{6} (1-t_{19}) \\ & \quad + t_{1} (1- t_{2}) t_{8} (1- t_{15}) + (1- t_{1}) (1- t_{2}) t_{8} (1-t_{23}) \\ P(y=0, z=1 | x=0 || x=0) &= t_{1} t_{2} (1- t_{6}) t_{12} + (1- t_{1}) t_{2} (1- t_{6}) t_{20} \\ & \quad + t_{1} (1- t_{2}) (1- t_{8}) t_{16} + (1- t_{1}) (1- t_{2}) (1- t_{8}) t_{24} \\ P(y=0, z=0 | x=0 || x=0) &= t_{1} t_{2} (1- t_{6}) (1-t_{12}) + (1- t_{1}) t_{2} (1- t_{6}) (1-t_{20}) \\ & \quad + t_{1} (1- t_{2}) (1- t_{8}) (1- t_{16}) + (1- t_{1}) (1- t_{2}) (1- t_{8}) (1-t_{24}) \\ \end{aligned} $$
  3. 3.

    the manipulated distribution with an intervention on Y

    $$ \begin{aligned} P(X, Z | Y || Y) &= \sum_{uv} P(U) P(V) P(X | U, Y || Y) P(Z | U, V, X, Y || Y) \\ &= \sum_{uv} P(U) P(V) P(X | U) P(Z | U, V, X, Y) \\ \end{aligned} $$
  4. 4.

    the manipulated distribution with an intervention on Z (since this distribution does not involve the parameters specifying p(z|uvxy) that distinguish the models, these equations are trivially satisfied by T and C)

    $$ \begin{aligned} P(X, Y | Z || Z) = \sum_{uv} P(U) P(V) P(X | U) P(Y | V, X) \end{aligned} $$
  5. 5.

    the manipulated distribution with an intervention on X and Y simultaneously

    $$ \begin{aligned} P(Z | X, Y || X, Y) &= \sum_{uv} P(U) P(V) P(Z | U, V, X, Y || X, Y)\\ &= \sum_{uv} P(U) P(V) P(Z | U, V, X, Y) \\ \end{aligned} $$
  6. 6.

    the manipulated distribution with an intervention on X and Z simultaneously (since this distribution does not involve the parameters specifying p(z|uvxy) that distinguish the models, these equations are trivially satisfied by T and C)

    $$ \begin{aligned} P(Y | X, Z || X, Z) &= \sum_{uv} P(U) P(V) P(Y | U, V, X, Z || X, Z)\\ &= \sum_{v} P(V) P(Y | V, X) \\ \end{aligned} $$
  7. 7.

    the manipulated distribution with an intervention on Y and Z simultaneously (since this distribution does not involve the parameters specifying p(z|uvxy) that distinguish the models, these equations are trivially satisfied by T and C)

    $$ \begin{aligned} P(X | Y, Z || Y, Z) &= \sum_{uv} P(U) P(V) P(X | U, V, Y, Z || Y, Z)\\ &= \sum_{u} P(U) P(X | U) \\ \end{aligned} $$

    In addition, in order to establish the relevant causal effects, both models must satisfy the following inequalities. The bold font indicates (at least one way) in which the parameterizations of T and C in Table 1 satisfy the inequalities.

  1. 1.

    to make u a cause of x:

    $$ {\bf t}_{{\bf 3}} \neq {\bf t}_{{\bf 4}} $$
  2. 2.

    to make x and v causes of y:

    $$ (( t_{5} \neq t_{7}) \lor ({\bf t}_{{\bf 6}} \neq {\bf t}_{{\bf 8}}))\land (( t_{5} \neq t_{6}) \lor ({\bf t}_{{\bf 7}} \neq {\bf t}_{{\bf 8}})) $$
  3. 3.

    to make uv and y a cause of z:

    $$ \begin{aligned} &(({\bf t}_{{\bf 9}} \neq {\bf t}_{{\bf 17}})\lor( t_{10} \neq t_{18})\lor( t_{11} \neq t_{19})\lor(t_{12} \neq t_{20})\lor( t_{13} \neq t_{21})\lor( t_{14} \neq t_{22})\lor( t_{15} \neq t_{23})\lor( t_{16} \neq t_{24})) \\ &\land (({\bf t}_{{\bf 9}} \neq {\bf t}_{{\bf 13}}\lor( t_{10} \neq t_{14})\lor( t_{11} \neq t_{15})\lor(t_{12} \neq t_{16})\lor(t_{17} \neq t_{21})\lor(t_{18} \neq t_{22})\lor(t_{19} \neq t_{23})\lor(t_{20} \neq t_{24}))\\ &\land ( ({\bf t}_{{\bf 9}} \neq {\bf t}_{{\bf 10}})\lor(t_{11} \neq t_{12})\lor( t_{13} \neq t_{14})\lor( t_{15} \neq t_{16})\lor(t_{17} \neq t_{18})\lor(t_{19} \neq t_{20})\lor(t_{21} \neq t_{22})\lor(t_{23} \neq t_{24})) \end{aligned} $$

Model T must in addition make x a cause of z by satisfying the following inequality:

$$ \begin{aligned} &({\bf t}_{{\bf 9}} \neq {\bf t}_{{\bf 11}})\lor ( {\bf t}_{{\bf 13}} \neq {\bf t}_{{\bf 15}})\lor(t_{17} \neq t_{19})\lor(t_{21} \neq t_{23}) \\ & \quad \lor( t_{10} \neq t_{12})\lor( t_{14} \neq t_{16})\lor(t_{18} \neq t_{20})\lor(t_{22} \neq t_{24}) \\ \end{aligned} $$
(1)

while model C must satisfy its negations, i.e. all the parameter pairs must be equal.

Since model T must satisfy at least one disjunct of Constraint (1), while C must satisfy its negation, one can easily detect the distributional constraints from the list 1–7 above that will not be trivially satisfied. All such quantities contain either only parameters from the first line, or only parameters from the second line of Constraint (1). I will focus only on the satisfaction of disjuncts from the first line, the case for the second line is exactly analogous.

In the most general case model T differs from model C by satisfying every disjunct in Constraint (1), and we can write the parameters as t 9 = t 11 + d 1t 17 = t 19 + d 2t 13 = t 15 + d 3, and t 21 = t 23 + d 4 for non-zero \(d_1,\ldots,d_4.\) There are seven distributional quantities containing the parameters t 9t 13t 17 and t 21, giving rise to the following four independent constraints if models T and C are to be indistinguishable for a passive observation and all surgical interventions on the observed variables:

$$ \begin{aligned} t_{1} t_{2} t_{3} t_{5}d_1&+(1- t_{1}) t_{2} t_{4} t_{5}d_2+ t_{1}(1- t_{2}) t_{3} t_{7}d_3+(1- t_{1})(1- t_{2}) t_{4} t_{7}d_4 = 0 \\ t_{1} t_{2} t_{5}d_1&+(1- t_{1}) t_{2} t_{5}d_2+ t_{1}(1- t_{2}) t_{7}d_3+(1- t_{1})(1- t_{2}) t_{7}d_4 = 0 \\ t_{1} t_{2} t_{3}d_1&+(1- t_{1}) t_{2} t_{4}d_2+ t_{1}(1- t_{2}) t_{3}d_3+(1- t_{1})(1- t_{2}) t_{4}d_4 = 0 \\ t_{1} t_{2}d_1&+(1- t_{1}) t_{2}d_2+ t_{1}(1- t_{2})d_3+(1- t_{1})(1- t_{2})d_4 = 0 \\ \end{aligned} $$

Solving these constraints implies that a model T must satisfy the following constraints on its parameters

$$ \begin{aligned} t_{5}&= t_{7} \\ t_{11}&= t_{9}-(d_3(-1+ t_{2})/ t_{2})\\ t_{15}&= t_{13}-d_3\\ t_{19}&=t_{17}-(d_4(-1+ t_{2})/ t_{2})\\ t_{23}&=t_{21}-d_4\\ \end{aligned} $$
(2)

where d 3 and d 4 can be chosen freely as long as at least one of them is non-zero and the resulting quantities remain probabilities. An analogous set of constraints results when the difference between models T and C results from disjuncts in the second line of Constraint (1). These are non-trivial algebraic constraints on the parameter space, which, following Meek (1995), implies that their solution space has measure zero compared to arbitrary parameterizations of a model with a structure like T.

Similarly, these constraints can be used to construct a parameterization for a model T that is indistinguishable from a parameterized model C, as long as the parameterization of C also respects the t 5 = t 7 constraint (or t 6 = t 8). In particular, the parameterization of T in Table 1 is constructed from the parameterization of C in that table using d 3 = 0.1 and d 4 = 0.

Appendix 2: Deterministic Parameterizations of T and C

Deterministic parameterizations of the two models in Fig. 1 that are indistinguishable for a passive observation and any surgical intervention on the observed variables.

Parameter Conditional probability terms T C
t 1 p(u = 1) 0.5 0.5
t 2 p(v = 1) 0.5 0.5
t 3 p(x = 1|u = 1) 0 0
t 4 p(x = 1|u = 0) 1 1
t 5 p(y = 1|v = 1,x = 1) 1 1
t 6 p(y = 1|v = 1,x = 0) 1 1
t 7 p(y = 1|v = 0,x = 1) 1 1
t 8 p(y = 1|v = 0,x = 0) 0 0
t 9 p(z = 1|u = 1, v = 1, x = 1, y = 1) 1 0
t 10 p(z = 1|u = 1, v = 1, x = 1, y = 0) 1 1
t 11 p(z = 1|u = 1, v = 1, x = 0, y = 1) 0 0
t 12 p(z = 1|u = 1, v = 1, x = 0, y = 0) 1 1
t 13 p(z = 1|u = 1, v = 0, x = 1, y = 1) 0 1
t 14 p(z = 1|u = 1, v = 0, x = 1, y = 0) 1 1
t 15 p(z = 1|u = 1, v = 0, x = 0, y = 1) 1 1
t 16 p(z = 1|u = 1, v = 0, x = 0, y = 0) 1 1
t 17 p(z = 1|u = 0, v = 1, x = 1, y = 1) 1 1
t 18 p(z = 1|u = 0, v = 1, x = 1, y = 0) 1 1
t 19 p(z = 1|u = 0, v = 1, x = 0, y = 1) 1 1
t 20 p(z = 1|u = 0, v = 1, x = 0, y = 0) 1 1
t 21 p(z = 1|u = 0, v = 0, x = 1, y = 1) 1 1
t 22 p(z = 1|u = 0, v = 0, x = 1, y = 0) 1 1
t 23 p(z = 1|u = 0, v = 0, x = 0, y = 1) 1 1
t 24 p(z = 1|u = 0, v = 0, x = 0, y = 0) 1 1

Note that if the latent variables u and v are supposed to be non-extreme, then only u = v = 0.5 are possible values.

I do not find the deterministic case particularly enlightening. Moreover, it is well known that deterministic causal relations are often more difficult to discover than probabilistic ones. In that sense I think that the examples of parameterizations for model T and C in Table 1 with purely positive distributions provide a much stronger case.

Appendix 3: Soft Interventions

Note that the constraints in (2) do not contain the parameters t 3 or t 4 which could be influenced by a soft intervention on x, hence a soft intervention on x is not going to distinguish between models T and C.

A soft intervention on y that changes t 5 will break the first equality in (2), thus the models become distinguishable. In particular, if t 5 is changed from 0.8 to 0.85 by a soft intervention on y in both models T and C, then in the resulting manipulated distribution, we will have

$$ \begin{aligned} p_T^{*}(x=y=z=1) &= 0.24856 \\ {\rm vs.}\quad p_C^{*}(x=y=z=1) &= 0.24928 \\ \end{aligned} $$

which is not a rounding error.

Note that t 6 does not feature in the constraints in (2), so a soft intervention that changes it in both T and C, will not distinguish between the two models.

Lastly, as Meek (1995) showed, violations of faithfulness occur for particular constellations of the parameters that make up the distribution. A violation of faithfulness always constitutes a non-trivial algebraic constraint on these parameters. Since soft interventions influence individual parameters, they can be (in principle, leaving the mentioned concerns of implementation aside) used to break these algebraic constraints. Thus, soft interventions are in general sufficient to make unfaithful models faithful. Once faithfulness is achieved, the causal relations can be detected as usual. In the case of canceling pathways with unobserved intermediary variables (Fig. 2), the soft intervention must occur on the final variable.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Eberhardt, F. Direct Causes and the Trouble with Soft Interventions. Erkenn 79, 755–777 (2014). https://doi.org/10.1007/s10670-013-9552-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10670-013-9552-2

Keywords

  • Causal Relation
  • Unobserved Variable
  • Causal Influence
  • Conditional Probability Distribution
  • Markov Condition