Skip to main content
Log in

Causal concepts and temporal ordering

  • S.I.: DecTheory&FutOfAI
  • Published:
Synthese Aims and scope Submit manuscript

Abstract

Though common sense says that causes must temporally precede their effects, the hugely influential interventionist account of causation makes no reference to temporal precedence. Does common sense lead us astray? In this paper, I evaluate the power of the commonsense assumption from within the interventionist approach to causal modeling. I first argue that if causes temporally precede their effects, then one need not consider the outcomes of interventions in order to infer causal relevance, and that one can instead use temporal and probabilistic information to infer exactly when X is causally relevant to Y in each of the senses captured by Woodward’s interventionist treatment. Then, I consider the upshot of these findings for causal decision theory, and argue that the commonsense assumption is especially powerful when an agent seeks to determine whether so-called “dominance reasoning” is applicable.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. For what it’s worth, my father is no stranger to thinking about causation. He is a lawyer, and is sometimes tasked with establishing causation.

  2. Philosophers who attempt to reduce causal relevance to a kind of probabilistic relevance (e.g., Eells 1991; Spohn 2012) are notable exceptions.

  3. For example, Price (2001) argues that retro-causation is at work in the quantum mechanical context of the Bell correlations.

  4. This formulation is slightly different from any of Woodward’s, but it’s a good approximation of his view. The reason for any discrepancy is that Woodward (2008) modifies his original (2003) view in response to Strevens’s (2007) criticism that the original definition of ‘contributing cause’ was relativized to a variable set, and this is my best guess at what Woodward would have said had he anticipated Strevens’s criticism. Woodward’s actual reply to Strevens was that “X is a contributing cause simpliciter of Y (in a sense that isn't relativized to any particular variable set V) as long as it is true that there exists a variable set V such that X is correctly represented as a contributing cause of Y with respect to V.” Here, because I believe that Woodward’s original V-relative notion of ‘contributing cause’ plays no important role once Woodward’s (2008) notion is on the table, I have simply modified Woodward’s original definition of ‘contributing cause’ so that X qualifies as a contributing cause of Y when Woodward (2008) says that X qualifies as a “contributing cause simipliciter” of Y.

  5. The Causal Markov Condition is an axiom of the graphical approach to causal modeling and is argued by Hausman and Woodward (1999) to be implicit in any treatment of causation according to which causes can be used to manipulate their effects, including Woodward’s (2003) treatment. We will discuss this condition in more detail in the next section of this paper.

  6. As we will see later, it does not follow from the Causal Markov Condition that the intervention on X must be correlated with any variables that are causally downstream from X.

  7. This is how Pearl (2009) understands interventions.

  8. This qualifies as an intervention relative to the system at hand since the wind has no effect on whether we manually tip over the domino.

  9. This is a common variant of an example introduced by Hesslow (1976).

  10. The only difference is that there is no local determinism in the model. Zhang and Spirtes (2008) likewise discuss a version of McDermott’s case in which there is no local determinism.

  11. Spohn (2012) takes this option, but limits the domain of his theory to dichotomous variables. (His theory therefore does not apply to the “dog bite” because DB has three values.) Spohn convincingly argues that causation is transitive when the variables are dichotomous, but like Woodward, we are interested in developing an account of causation that extends to settings where the variables are not dichotomous.

  12. Though DB is not a total cause of E, we cannot say that X is a cause of Y exactly when X is a total cause of Y because this gets the Fig. 2 case wrong. Similarly, though there is a causal chain from BC to TH, we cannot say that X is a cause of Y exactly when X there is a directed path from X to Y because this gets the Fig. 3 case wrong.

  13. Elwert and Christakis (2008) use graphical causal models to show that the widowhood effect is real without killing anyone.

  14. Some may quibble with my choice to refer to these sets of necessary and sufficient conditions as analyses because their right-hand sides include causal notions that are too similar to the notions on the left (e.g., causal sufficiency). If you’re one of these people, please feel free to replace every mention of “analysis” with a term that you deem more appropriate. (I do not mean to use the term in a loaded way.).

  15. This is not a trivial restriction. For example, when dealing with type-level variables like income or intelligence, the variables often do not specify what happens at a specific time, and it is therefore difficult to recover a temporal order. When this is impossible, these analyses are silent.

  16. It is worth noting that this does not mean that V must include every common cause of any variables included within V. For example, the agent can leave out distal common causes of X and Υ in the event that she has included more proximate common cause of X and Υ that screens off the distal cause. For ease of exposition, I do not specify exactly which common causes are confounding in the sense that matters for causal sufficiency, but there is at least one juncture of the paper (flagged in footnote 40) at which this is relevant.

  17. The Causal Markov Condition entails that when two variables, X and Y, are correlated, then either (i) X and Y share a common ancestor, (ii) there is a directed path from X to Y, or (iii) there is a directed path from Y to X. Thus when the Causal Markov Condition is assumed of a variable set that excludes a common cause of two variables, X and Y, we must posit a directed path between X and Y even when there isn’t one.

  18. Though Hausman and Woodward argue that the CMC is implicit in interventionism, it is important to note that we can often discover causal relationships by intervening on a variable without considering any causally sufficient variable sets. Consider randomized controlled trials. If the experimental manipulation on X is genuinely random (as it is in the ideal case), then we can infer whether X is a total cause of Y by determining whether the probability of Y changes when we experiment on X. This is one important advantage that Woodward’s interventionist analyses have over the temporal analyses that I propose here. See Eberhardt (2014) for extensive discussion of these issues.

  19. Hausman and Woodward (1999) speak in terms of modularity.

  20. Consider the dominoes from Fig. 1. Under normal circumstances, this is about as close as we get to considering a causal system in which there is local determinism, but even here, it is clear that every assignment of values over V should be assigned non-zero probability since, e.g., each of the dominoes can fall when the others do not. After all, we already mentioned that humans can manipulate whether each of the dominoes is upright without interfering with the status of the other dominoes.

  21. As I mentioned before, Hausman and Woodward (1999) convincingly argue that interventionism presupposes the Causal Markov Condition. Zhang and Spirtes (2011) make the same point about the Causal Minimality Condition when the probability distribution over V is positive.

  22. For example, even though the CMC does not entail that D2 and W are probabilistically independent given any assignment of values over D1 and D3, it does not entail that D2 and W are probabilistically dependent given any assignment of values over D1 and D3.

  23. Throughout this paper, I sometimes speak in terms of variables (rather than paths) being d-separated. When I do so, I follow Pearl (2009) by referring to X and Y as d-separated by Z when every path between X and Y is d-separated by Z.

  24. A DAG, G, qualifies as a proper sub-DAG of G* if and only if (i) G contains fewer arrows than G*, and (ii) every arrow in G is oriented in the same direction as its corresponding arrow in G*.

  25. See Spirtes et al. (2000).

  26. Hitchcock puts this condition differently, but the content is the same for all intents and purposes.

  27. The temporal ordering must be a complete ordering, rather than a partial ordering. Temporal orderings are complete orderings because they are opinionated (in the sense that the value of X is settled either temporally before, temporally after, or at the same time as the value of Y).

  28. Spohn’s (1980) treatment of ‘direct cause’ is very similar to this one.

  29. Proof: Suppose that causes temporally precede their effects and that X temporally precedes Y. (1) Every indirect path between X and Y that is d-connected by the empty set must include an ancestor of Y (either a common ancestor of X and Y, or an ancestor of Y that is a descendant of X), and the set of variables temporally prior to Y excepting X must therefore d-separate these paths (since this set includes these ancestors). (2) Every indirect path between X and Y that is d-separated by the empty set must include a collider between X and Y that is either a descendant of Y or a descendant of some ancestor of Y (because these are the only ways for there to be a collider between X and Y) and the set of variables temporally prior to Y excepting X cannot d-connect any of the paths on which these colliders reside since it cannot include any colliders that are descendants of Y (or their descendants) and must include a common ancestor of any other collider and Y. (3) If the set of variables temporally prior to Y excepting X does not d-connect any indirect paths between X and Y that are d-separated by the empty set, and d-separates every indirect path between X and Y that is d-connected by the empty set, then this set of variables d-separates every indirect path between X and Y.

  30. When the probability distribution over V is not positive, there are sometimes multiple DAGs that are compatible with the temporal ordering and that satisfy the CMC and MIN. In these aberrant circumstances, 1* says that X is a direct cause of Y when there is an arrow from X to Y in every such DAG.

  31. W is correlated with D2 given some assignments of values over D1 and D3, but 1* disallows D3 from being included in the conditioning set that is used to verify whether W → D2 because D3 is settled after D2.

  32. It is easy to show that 1* likewise vindicates the graphical representations found in Figs. 2 and 3, but this is left as an exercise for the reader.

  33. Here, ‘direct cause’ should be understood in terms of 1*.

  34. Proof: Suppose that causes temporally precede their effects and that X temporally precedes Y. (1) Every path between X and Y that (i) is not a directed path from X to Y and (ii) is d-connected by the empty set must include an ancestor of X, and the set of variables temporally prior to X must therefore d-separate these paths (since this set includes these ancestors). (2) Every path between X and Y that is d-separated by the empty set must include a collider between X and Y that is either a descendant of X or a descendant of some ancestor of X (because these are the only ways for there to be a collider between X and Y) and the set of variables temporally prior to X cannot d-connect any of the paths on which these colliders reside since it cannot include any colliders that are descendants of X (or their descendants) and must include a common ancestor of any other collider and X. (3) If the set of variables temporally prior to X does not d-connect any paths between X and Y that are d-separated by the empty set, and d-separates every path between X and Y that (i) is not a directed path from X to Y, and (ii) is d-connected by the empty set, then this set of variables d-separates any paths between X and Y that are not directed paths from X to Y.

  35. An anonymous referee points helpfully out that if the laws of nature are deterministic, then 2* and 2** may disagree with each other, thereby making it the case that 2* and 2** can’t both agree with Woodward’s analysis. The basic idea is that if the laws of nature are deterministic, then if we condition on any possible past, the value of X will be determined, and therefore will not be a total cause of anything (since X cannot co-vary with anything if X does not vary at all). This is worth considering more—especially since (i) it provides a novel way to vindicate Russellian eliminativism about causation and (ii) it is not yet clear how 2* fares in deterministic contexts like these. But the best we can do for now is to bracket these issues by limiting the domain to contexts in which the probability distribution over the variables at hand is positive.

  36. It is tempting to think that conditioning on everything prior to D1 determines whether the domino falls (especially if the world is deterministic), and that D1 therefore can’t be correlated with anything in this setting. It’s thus important to remember that we’ve restricted the domain of applicability to settings where the probability distribution over the variables at play is positive, and therefore ruled out this kind of determinism.

  37. 2** may seem of limited use to agents like us since we lack the capacity to contemplate entire world histories, but we can do our best. This is why the methodological advice suggested by 2** is to check whether X and Y are correlated whenever you condition on everything that you can think of prior to X. This allows the human agent to determine whether X is a total cause of Y, given her best approximation of how things might have been before X. The same goes for 1*, 2*, and 3*—i.e., it may seem that we’re ill equipped to determine whether X is a cause of Y according to any of these conditions since it’s hard to entertain causally sufficient variable sets in their entirety, but here, again, we can try our best, and can answer these queries relative to our best approximation of what’s causally sufficient.

  38. Here, again, ‘direct cause’ should be understood in terms of 1*.

  39. I conjecture that 3* is extensionally equivalent to Woodward’s analysis of ‘contributing cause’ when causes temporally precede their effects, but I have no proof. Still, this is not particularly bothersome because Woodward uses the examples under discussion to motivate his own analysis, and 3* yields the same results when applied to these cases.

  40. An anonymous referee helpfully points out that there are counterexamples to 3* given some ways of formulating causal sufficiency. Consider, for example, the influential construal of causal sufficiency according to which V is causally sufficient exactly when there is no variable L not in V and two variables X, Y in V such that L is a direct cause of X and of Y relative to V + {L}. The referee proposes the following counterexample to 3* when “causally sufficient variable set” is understood in this way.

    Imagine that in the birth control example (Fig. 2), another variable, say, "plan to have children" (PC), affects BC directly (but no other variable directly). That is, PC → BC → TH, BC → P → TH, and the two pathways between BC and TH cancel out. Then PC, intuitively, and by the interventionist’s lights, is a contributing cause of TH (though not a total cause). However, the condition laid out in 3* does not seem to be satisfied. A variable set containing PC and TH either includes BC or not. If it does, then PC is not a direct cause of TH relative to the variable set; if it does not, then the variable set is not causally sufficient unless it leaves out P (and all the other intermediate variables on the path BC → PTH) as well. But if the variable set leaves out P (and all the other intermediate variables on the path BC → PTH), PC won't be a direct cause of TH relative to the variable set (since PC is not a total cause of TH). Therefore, there appears to be no causally sufficient set relative to which PC is a direct cause of TH.

    The referee is right that this is a counterexample to 3* when “causally sufficient variable set” is understood in these terms. But I suspect that we can deal with counterexamples of this sort by revising the operative notion of causal sufficiency, rather than 3* itself. Remember that V should be regarded as causally sufficient when V does not omit any confounding common causes of variables included within V. The question of how exactly we should characterize confounding common causes is substantive and correspondingly evades our grasp here. But I conjecture that common causes must be common total causes in order to confound, and that BC can thus appropriately be omitted from variable sets containing P and TH (since BC is not a total cause of TH). If this is right, then 3* gets the correct result that PC is a contributing cause of TH because PC is a direct cause of TH relative to the variable set that includes P but not BC. Why think that common causes must be common total causes in order to confound? Very roughly, when confounding takes place between X and Y, there is some correlation between X and Y that appears to reflect a causal relationship between X and Y, but that is actually fully explained by the fact that both X and Y are causally influenced by some latent common cause(s), L. In order for L’s causal influence over X and Y to account for such a correlation, L must probabilistically influence X and Y along the causal paths from L to X and Y, and L cannot probabilistically influence X and Y along these paths if L is not a total cause of both X and Y. (The last sentence requires proof, but this must wait for some other occasion.).

  41. For example, if you never took stock of P when considering whether BC is causally relevant to TH, then you could mistakenly infer that BC is not causally relevant to TH (because BC is a direct cause of TH only relative to the variable set that includes P).

  42. Every so-called “causal decision theorist” believes this.

  43. This kind of reasoning is called “dominance reasoning” because smoking is said to dominate abstaining.

  44. There are of course cases that fit this description where Bart should settle on passing—e.g., because Bart really hates studying, doesn’t care much about passing, or is very confident that he’ll fail even if he studies. But it’s clear that Bart should not settle on not studying for the reason that he prefers not studying to studying both if he passes, and if he fails. Why? Because there are likewise cases that fit this description where Bart should not settle on not studying—e.g., when he barely minds studying, really hates failing, or is very confident that he’ll pass if he studies, and otherwise won’t.

  45. There may be some way of repartitioning the states (i.e., whether Bart passes or fails) such that dominance reasoning applies, but this kind of redescription is itself costly (insofar as it takes time and effort to find a partition of states relative to which dominance reasoning is applicable).

  46. In order for dominance reasoning to apply relative to a partition of states, it is of course also necessary that the agent prefers one of her options to the other(s) no matter which cell of the state partition obtains—e.g., prefers smoking to not smoking no matter whether she gets lung cancer or doesn’t.

  47. This follows indirectly from Meek and Glymour’s more general point that causal-decision-theoretic verdicts can be recovered by maximizing conditional expected utility and treating the agent’s options as interventions.

  48. It is unclear whether acetaminophen is harmful to unborn children. See Oster (2014).

  49. In the same way that we use the ratio formula to check whether two variables are correlated when supplied with a frequency table (e.g., in introductory statistics classes), we might try to use the ratio formula to check whether the intervention variable is correlated with the state variable given the joint distribution.

  50. I am of course assuming that the agent does not know the causal structure (because that’s what she is trying to figure out).

  51. Whether there are probabilities for an agent’s options is another matter of great controversy among decision theorists. Since my argument goes through on the more modest claim that these probabilities are hard to access (if they exist), I have no dog in this fight. But see Levi (1997) and Spohn (1977) for classic arguments that there are no probabilities for options, and Hájek (2016) for a dissenting opinion.

  52. This, again, is related to the issue of whether there are probabilities for options.

  53. The 2** construal of total causal relevance may be especially useful in ordinary decision-making contexts. As you wonder whether dominance reasoning is applicable, you can simply check whether, given your best judgment, there exists some way that the past could have been conditional upon which A is correlated with S.

  54. If, for some reason, the agent finds it easier to use 3* check whether A is a contributing cause of S, this will not lead her astray (since A is a total cause of S only when A is a contributing cause of S), but she will miss out on opportunities to apply dominance reasoning when A is a contributing cause of S, but not a total cause of S.

  55. See Spirtes et al. (2000), who treat the intervention on A as a cause of A that is neither included in V nor a direct cause of any variable in V other than A, and that is partitioned such that it contains (i) a value on which one can condition to deterministically set A to a for every a in A and (ii) a value that corresponds to not intervening on A (i.e., to allowing the probability distribution over A to be determined by A’s causes in V).

References

  • Bramley, N., Dayan, P., Griffiths, T., & Lagnado, D. (2017). Formalizing Neurath’s ship: Approximate algorithms for online causal learning. Psychological Review, 124, 301–338.

    Article  Google Scholar 

  • Burns, P., & McCormack, T. (2009). Temporal information and children’s and adults’ causal inferences. Thinking and Reasoning, 15, 167–196.

    Article  Google Scholar 

  • Dummett, M. (1954). Can an effect precede its cause? Proceedings of the Aristotelian Society, 28, 27–44.

    Article  Google Scholar 

  • Dummett, M. (1964). Bringing about the past. Philosophical Review, 73, 338–359.

    Article  Google Scholar 

  • Eberhardt, F. (2014). Direct causes and the trouble with soft interventions. Erkenntnis, 79, 755–777.

    Article  Google Scholar 

  • Eells, E. (1991). Probabilistic causality. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Elwert, F., & Christakis, N. (2008). Wives and ex-wives: A new test for homogamy bias in the widowhood effect. Demography, 45, 851–873.

    Article  Google Scholar 

  • Hájek, A. (2016). Deliberation welcomes prediction. Episteme, 13, 507–528.

    Article  Google Scholar 

  • Hausman, D., & Woodward, J. (1999). Independence, invariance, and the causal Markov condition. British Journal for the Philosophy of Science, 50, 521–583.

    Article  Google Scholar 

  • Hesslow, G. (1976). Discussion: Two notes on the probabilistic approach to causality. Philosophy of Science, 43, 290–292.

    Article  Google Scholar 

  • Hitchcock, C. (2018). Causal models. In E. N. Zalta (Ed.), The Stanford Encyclopedia of Philosophy. https://plato.stanford.edu/entries/causal-models/. Accessed 15 Nov 2018.

  • Lagnado, D., & Sloman, S. (2006). Time as a guide to cause. Journal of Experimental Psychology. Learning, Memory, and Cognition, 32, 451–460.

    Article  Google Scholar 

  • Levi, I. (1997). The covenant of reason: Rationality and the commitments of thought. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • McDermott, M. (1995). Redundant causation. The British Journal for the Philosophy of Science, 46, 523–544.

    Article  Google Scholar 

  • Meek, G., & Glymour, C. (1994). Conditioning and intervening. The British Journal for the Philosophy of Science, 45, 1001–1021.

    Article  Google Scholar 

  • Oster, E. (2014). Pregnant women, here’s one less thing to worry about. FiveThirtyEight. https://fivethirtyeight.com/features/pregnant-women-heres-one-less-thing-to-worry-about/. Accessed 15 Nov 2018.

  • Pearl, J. (1988). Probabilistic reasoning in intelligent systems. San Francisco: Morgan Kaufman.

    Google Scholar 

  • Pearl, J. (2009). Causality: Models, reasoning, and inference (2nd ed.). Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Price, H. (1996). Time’s arrow and archimedes’ point. Oxford: Oxford University Press.

    Google Scholar 

  • Price, H. (2001). Backward causation, hidden variables and the meaning of completeness. Pramana, 56, 199–209.

    Article  Google Scholar 

  • Spirtes, P., Glymour, C., & Scheines, R. (2000). Causation, prediction, and search (2nd ed.). New York: Springer.

    Google Scholar 

  • Spohn, W. (1977). Where Luce and Krantz do really generalize savage’s decision model. Erkenntnis, 11, 113–134.

    Article  Google Scholar 

  • Spohn, W. (1980). Stochastic independence, causal independence, and shieldability. Journal of Philosophical Logic, 9, 73–99.

    Article  Google Scholar 

  • Spohn, W. (2001). Bayesian nets are all there is to causal dependence. In Galavotti, Suppes, & Constantini (Eds.), Stochastic causality (pp. 157–172). Stanford: CSLI Publications.

    Google Scholar 

  • Spohn, W. (2012). The laws of belief: Ranking theory and its philosophical applications. Oxford: Oxford University Press.

    Book  Google Scholar 

  • Strevens, M. (2007). Review of woodward. Making Things Happen. Philosophy and Phenomenological Research, 74, 233–249.

    Article  Google Scholar 

  • Woodward, J. (2003). Making things happen: A theory of causal explanation. Oxford: Oxford University Press.

    Google Scholar 

  • Woodward, J. (2008). Response to strevens. Philosophy and Phenomenological Research, 77, 193–212.

    Article  Google Scholar 

  • Zhang, J., & Spirtes, P. (2008). Detection of unfaithfulness and robust causal inference. Minds and Machines, 18, 239–271.

    Article  Google Scholar 

  • Zhang, J., & Spirtes, P. (2011). Interventionism, determinism, and the causal minimality condition. Synthese, 182, 335–347.

    Article  Google Scholar 

Download references

Acknowledgements

For helpful discussion and comments, I am grateful to Benjamin Eva, Daniel Hausman, Shanna Slank, Olav Vassend, Jiji Zhang, the anonymous reviewers, and the audience at the inaugural workshop on Decision Theory and the Future of Artificial Intelligence in Cambridge, UK. This research was partially funded by Deutsche Forschungsgemeinschaft (Grant No. 623584).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Reuben Stern.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Stern, R. Causal concepts and temporal ordering. Synthese 198 (Suppl 27), 6505–6527 (2021). https://doi.org/10.1007/s11229-019-02235-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11229-019-02235-4

Keywords

Navigation