## Abstract

Jim Joyce has argued that David Lewis’s formulation of causal decision theory is inadequate because it fails to apply to the “small world” decisions that people face in real life. Meanwhile, several authors have argued that causal decision theory should be developed such that it integrates the interventionist approach to causal modeling because of the expressive power afforded by the language of causal models, but, as of now, there has been little work towards this end. In this paper, I propose a variant of Lewis’s causal decision theory that is intended to meet both of these demands. Specifically, I argue that Lewis’s causal decision theory can be rendered applicable to small world decisions if one analyzes his dependency hypotheses as causal hypotheses that depend on the interventionist causal modeling framework for their semantics. I then argue that this interventionist variant of Lewis’s causal decision theory is preferable to interventionist causal decision theories that purportedly generalize Lewis’s through the use of conditional probabilities. This is because Lewisian interventionist decision theory captures the causal decision theorist’s conviction that any correlation between what the agent does and cannot cause should be irrelevant to the agent’s choice, while purported generalizations do not.

This is a preview of subscription content, access via your institution.

## Notes

Savage expressed his definition of expected utility slightly differently, but the content is the same.

More formally, Savage conceives of actions as functions from states to outcomes.

Lewis (1981), pp. 12–13.

Skyrms (1980) and Lewis (1981) are examples of decision theorists who take the first route. Joyce (1999) and Jeffrey (1983) are examples of decision theorists who take the latter route. Taking the latter route amounts to providing a

*partition-invariant*analysis of expected utility and usually involves trading in Savage’s unconditional probabilities of states for conditional probabilities of states given acts.I believe that this has gone unnoticed in the decision theory literature. In abstract, consider a case where

*X*causes \(\varUpsilon \) , and where*Z*is a common cause of*X*and \(\varUpsilon \) . If \(X\rightarrow \varUpsilon \) offsets the probabilistic dependence between*X*and \(\varUpsilon \) that obtains in virtue of \(X\leftarrow Z\rightarrow \varUpsilon \), then*X*and*Y*are causally dependent (in virtue of \(X\rightarrow \varUpsilon \) ) but evidentially independent (since \(X\rightarrow \varUpsilon \) offsets the evidential dependence that results from \(X\leftarrow Z\rightarrow \varUpsilon \) ). In the causal modeling literature, this sort of case is referred to as a failure of faithfulness owing to path cancellation. (See Zhang and Spirtes (2008) for more discussion of failures of faithfulness.) In concrete, consider a case in which an oracle tells you, first, that whether one gets lung cancer causally depends on whether one smokes (in the sense that smoking does something to one’s body to increase the risk of lung cancer), and, second, that there exists a genetic condition that causes people to smoke and to be at reduced risk of lung cancer. Among people who possess the genetic condition, smoking increases the probability of getting lung cancer. The same goes for people who do not possess the genetic condition. But, as it happens, the unconditional probability of getting lung cancer is equivalent to the probability of getting lung cancer given that one smokes (because the prevalence of the genetic condition exactly counterbalances the probabilistic effect of smoking on the body). You do not know whether you possess the genetic condition. Should you smoke? In this case, the evidential decision theorist would seemingly think that dominance reasoning is applicable (and that you should therefore smoke), while the causal decision theorist would not think that dominance reasoning is applicable.*Smoking: the cancer controversy*, R. A. Fisher (1959) entertains (but does not espouse) the hypothesis that the correlation between whether one is a smoker and whether one suffers from lung cancer is due not to some causal influence that smoking exerts on the health of one’s lungs, but rather to the causal influence that one’s genetic makeup has both on whether one smokes and whether one suffers from lung cancer.Moreover, because Lewis’s dependency hypotheses are maximally specific, they are guaranteed to capture all of the agent’s causal influence over the world, and are therefore guaranteed to be causally independent of what the agent does. Lewis (1981, p. 13) provides the following proof by

*reductio ad absurdum*that his dependency hypotheses are causally independent of the agent’s actions:“Suppose [the dependency hypotheses are not causally independent of the agent’s actions]. Consider the dependency hypothesis which we get by taking account of the ways the agent can manipulate dependency hypotheses to enhance his control over other things. This hypothesis seems to be right no matter what he does. Then he has no influence over whether this hypothesis or another is right, contrary to the supposition that the dependency hypotheses are within his influence.”

Lewis believes that this strategy works because, as he sees things, a maximally specific dependency hypothesis can be represented as either a complicated counterfactual whose consequent specifies everything about what would follow (and with what chance) were the agent to take any of the feasible actions, or, alternatively, a conjunction of more ordinary counterfactuals that together contain the same information. So, in the same way that we regard a conjunct as designating every conjunction with which it is consistent, we can regard a single counterfactual as designating every conjunction of counterfactuals (and therefore every maximally specific dependency hypothesis) with which it is consistent.

Eells (1982) covers much of the controversy in chapter 5 of

*Rational Decision and Causality.*The semantics must not backtrack in the sense that counterfactual dependence must never flow in the opposite direction of causal dependence.

It may seem that the agent can naïvely apply Lewis’s decision theory without noticing that the semantics for the relevant counterfactuals must take a particular backtracking form, but this is not right. First, Lewis’s proposal yields the desired results only if the relevant counterfactuals do not backtrack (provided that causes must precede their effects), but as Eells (1982) argues, it can be difficult for the naïve agent to see that the relevant counterfactuals do not backtrack. (Indeed, it is not very difficult to tinker with the details of Newcomb problems in order to make intuitions shift about whether something in the past counterfactually depends on something in the present.) Second, Lewis’s proposal works only if Conditional Excluded Middle is true of the relevant counterfactuals, (since every counterfactual supposition must uniquely determine the world that would result were the agent to act in the supposed way), and Conditional Excluded Middle does not seem plausible for the ordinary counterfactuals that enter into human deliberation. Third, I am persuaded enough by Hájek (2016, unpublished) to believe that reasonable agents may believe that most ordinary counterfactuals are very probably false, thereby rendering the probabilities of counterfactuals not particularly useful in the definition of expected utility.

We can regard each of these variables as dichotomous—i.e. as taking one value when the event in question obtains, and another when the event in question does not.

There are many equivalent formulations of d-separation on the market. I adopt Elwert’s (2013, p. 252) formulation because I find it easier to parse than most.

A variable is a

*collider*along a directed path if and only if it is the direct effect of two variables along the path. This is why*H*is a collider along \(F\rightarrow H\leftarrow G\) but not \(F\rightarrow H\rightarrow G\) .The parenthetical is important because conditioning on the descendant of a collider often induces a spurious correlation between the collider’s ancestors. See Elwert and Winship (2014) for discussion of this phenomenon.

Why does the CMC entail Reichenbach’s principle? In the event that some pair of variables are dependent and neither is a descendant of the other, there must exist some parent(s) of both variables on which one can condition to render the relevant variables independent. So it is provable of every system of variables that satisfies CMC, first, that if neither

*F*nor*G*(directly or indirectly) influences one another and*F*and*G*are probabilistically dependent, then there exists a set*C*of variables not containing*F*or*G*but causing both, and, second, that*F*and*G*must be independent conditional on any value of*C*.The intervention variable is d-separated from

*X*’s causal predecessors because*X*is a collider on every path from the intervention variable to*X*’s causal predecessors. And since the intervention variable cannot be causally downstream from any variable(s) in**V**and likewise cannot be a common cause of any variables in**V**, the fact that*X*is d-separated from any non-descendants that are not causal predecessors of*X*guarantees that the intervention variable is d-separated from all of*X*’s non-descendants.The informal discussion below should help the reader understand how assuming the CMC fuels the interventionist’s ability to deliver these results, but interested readers should consult pp. 75–81 of Spirtes, Glymour, and Scheines’s

*Causation*,*Prediction*,*and Search*. Their Manipulation Theorem precisely characterizes the effects of conditioning on an intervention variable in Markovian models.As a referee points out, construing deliberation in terms of realizing a particular value of

*I*gives the agent one more option than she would have were she to deliberate about what value of*S*to realize—namely, the option*not*to intervene (or to allow the distribution over*S*to continue to be determined by its endogenous causes). This may buck tradition, but the change seems welcome. In the context of choice, it often seems as though we not only can choose what to do (among some list of actions) but also can choose whether to do anything at all (or, equivalently, whether to interfere with the status quo). Modeling choice in terms of realizing a particular value of*I*incorporates both of these aspects of choice.If I want to use IDT to evaluate whether you made the right choice by my lights—i.e. relative to

*my*subjective credence function and value function—then I should use a variable set that I take to be causally sufficient. But when we use IDT to assess the*rationality*of an agent—i.e. whether she does well by her own lights—we should use a variable set that the agent takes to be causally sufficient.This variable set must include a variable for the agent’s action but need not include an intervention variable.

The fact that smoking raises the probability of lung cancer while intervening to make oneself smoke does not can therefore be expressed as follows: \(P( LC=lc|S=s )>P( LC=lc )=P( LC=lc|do( S=s ) )\).

The reader may wonder how to determine what DAGs an agent

*should*entertain in a given context. This is not my focus here. I am instead preoccupied with what an agent should do*given*her beliefs about DAGs, no matter whether her beliefs about DAGs are reasonable.Of course it is possible that the agent

*will*entertain (or assign non-zero probability to) some DAG according to which lung cancer*is*causally downstream of smoking. But doing so would seem to be irrational, given the agent’s knowledge of the details of FNP. This is why I say that every graph the agent*should*entertain has this property.See Savage (1954) for a classic argument against such probabilities.

Even given some DAG, an agent can entertain distinct causal hypotheses that are formulated in terms of distinct probability functions. The agent can be

*n*% confident that \(P_{1}\) is the probability function and 1-*n*% confident that \(P_{2}\) is the probability function, even when she is certain that \(X\rightarrow Z\leftarrow \varUpsilon \) (because multiple probability functions are compatible with a given DAG according to the d-separation criterion).This concern is similar to Isaac Levi’s famous (1987) concern that “prediction crowds out deliberation.”

Since every member of the set—i.e. every particular distribution—must obey the constraints implied by the CMC, the set of distributions will often not be convex.

This requirement may strike some as too stringent because human agents are often in no position to be sure that they attend to every common cause of any two or more variables in

**V**. I am sympathetic to this objection because it is true that we often ignore common causes of variables that we attend to as we deliberate, but I nevertheless believe that the ideally rational agent would be sure to (i) include every variable in**V**that she believes might be a common cause of other variables in**V**, and (ii) spread her credences across the hypotheses according to which said variables are and are not common causes of other variables in**V**. If this proves to be too cognitively demanding, then it may be okay for the agent to sometimes ignore common causes, but this would seem to require a departure from ideal rationality. It is likewise worth noting that it seems possible to weaken the requirement that one must attend to*every*common cause of any two or more variables in**V**by requiring the agent to attend only to some important subset of the common causes any two or more variables in**V**(because, for example, the agent*can*leave out distal common causes of*X*and \(\varUpsilon \) in the event that she has included some more proximate common cause of*X*and \(\varUpsilon \) that screens off the distal cause). But since I am not currently sure*exactly*how causal sufficiency can be plausibly weakened, I leave this task for later.Suppose, for example, that Fig. 1 is true. There is nothing in the causal modeling framework (i.e. no assumption about the DAG) that rules out the possibility that there is some intermediary cause between, say,

*G*and*LC*on which one could condition in order to screen off the correlation between*G*and*LC*. (Put differently, it is consistent with Fig. 1 that*G*influences*LC*only by influencing, say, one’s dietary preferences, and that*G*is therefore probabilistically independent of*LC*conditional on any particular set of dietary preferences.) Nor is there anything in the causal modeling framework that entails that the variables in Fig. 1 are not additionally caused by means that are causally independent of the variables contained therein (since the omission of causes that are not common causes of variables in**V**does not induce any spurious correlations, given the axioms of the causal modeling framework). It is thus plausible to regard a causal hypothesis over the variables in Fig. 1 as a disjunction of more specific causal hypotheses over richer variable sets.One might worry that causal hypotheses are not guaranteed to be causally independent of the agent’s options (or acts) when causal hypotheses need not be maximally specific (because non-maximally specific causal hypotheses need not include every causal relationship and therefore need not include every causal relationship between the agent’s choice and any causal hypothesis), and that non-maximally specific causal hypotheses should therefore not be utilized in a Savage-style decision theory. There are two responses to this concern. First, since causal hypotheses include information about the effect of intervening to make oneself act in a particular way, and since decisions are modeled as interventions on actions, causal hypotheses

*prima facie*seem to include the agent’s causal influence (and therefore the agent’s influence on which causal hypothesis is true, if there is one). Second, if the first response fails, one can simply require that causal hypotheses include every causal relationship between the agent’s choice and any causal hypothesis without succumbing to the Charybdis of maximal specificity.Of course causal hypotheses are closely related to counterfactuals in some way (since there is a definite connection between causal relationships and counterfactuals), but they nevertheless allow us to reason directly from causal facts—i.e. without taking up the very difficult project of specifying the exact relationship between causal facts and counterfactual facts.

This idea is also found in Meek and Glymour (1994). Indeed, Hitchcock credits them with it.

One can think of Pearl’s outcomes as the possible act/state combinations that arise from taking the act in question.

For similar reasons, it is not immediately clear how to apply Pearl’s decision theory when the agent is unsure about what causes what. Perhaps the agent can form her credence that

*y*obtains given that she intervenes to*x*by, first, determining the chance of*y*given that she intervenes in every model that she entertains, and second, calculating a weighted average of these conditional objective probability estimates in correspondence with her subjective probabilities in the live causal hypotheses. But as we will see below, it is by no means obvious that this will yield the desired results.See Lewis (1980) for an explanation of how his Principal Principle imposes constraints on rational probability functions in such cases.

These numbers are taken from Seidenfeld et al. (2010). But they use this feature of mixtures to illustrate a different point—namely, that imprecise Bayesians should not impose a requirement of convexity on sets of probabilities because doing so leads to irrational choices.

In order to talk this way, we must impose a probability distribution over the agent’s choice, which, as discussed earlier, may be problematic in the context of deliberation. Even if this is so, one can simply describe the case third-personally, such that some bystander entertains the relevant chance distributions and adopts the relevant credence function. By the causal decision theorist’s lights, such a bystander should judge

*x*as irrelevant to the agent’s choice, despite what Pearl’s decision theory would recommend.Jim Joyce helped me to see this possibility in personal communication.

Other causal decision theories (e.g. Lewis 1980) allegedly deliver causal decision theoretic verdicts simply by partitioning the states in a particular way. As an anonymous reviewer helpfully points out, some may think that such theories are more parsimonious than IDT (since such decision theories require only one step in order to get causal-decision-theoretic verdicts, while IDT requires two). But as I see things, IDT earns its keep for reasons not pertaining to its parsimony—e.g. its integration of the independently successful interventionist approach to causal modeling and its ability to deliver “small world” causal-decision-theoretic verdicts. I also agree with Meek and Glymour (1994) that the role of interventions in IDT helps us to see that the nature of disagreement between evidential decision theorists and causal decision theorists is

*not*a disagreement about the fundamental normative principles that govern rational choice, but is rather a disagreement about the nature of choice. They write (p. 1009) that we can recharacterize the dispute such that “[t]he difference between the two [i.e. causal decision theory and evidential decision theory] does not turn on any difference in normative principles, but on a substantive difference about the causal processes at work in the context of decision making—the causal decision theorist thinks that when someone decides... an*intervention*occurs, and the ‘evidential’ decision theorist thinks otherwise.”

## References

Eells, E. (1982).

*Rational decision and causality*. Cambridge: Cambridge University Press.Elwert, F. (2013). Graphical causal models. In S. Morgan (Ed.),

*handbook of causal analysis for social research*. New York, NY: Springer.Elwert, F., & Winship, C. (2014). Endogenous selection bias: the problem of conditioning on a collider variable.

*Annual Review of Sociology*,*40*, 31–53.Fisher, R. A. (1959).

*Smoking: the cancer controversy*. London: Oliver and Boyd.Geiger, D. & Pearl, J. (1989). Logical and algorithmic properties of conditional independence and qualitative independence. Report CSD 870056, R-97-IIL, Los Angeles: University of California, Cognitive Systems Laboratory.

Gibbard, A., & Harper, W. (1978). Counterfactuals and two kinds of expected utility. In C. Hooker, J. Leach, & E. McClennen (Eds.),

*Foundations and applications of decision theory*(pp. 125–162). Dordrecht: Riedel.Hájek, A. (2016).

*Most counterfactuals are false*. Unpublished manuscript.Hausman, D., & Woodward, J. (1999). Independence, invariance, and the Causal Markov condition.

*British Journal for the Philosophy of Science*,*50*, 521–583.Hitchcock, C. (2015). Conditioning, intervening, and decision.

*Synthese*,*4*, 1–20.Jeffrey, R. (1983).

*The logic of decision*. Chicago, IL: University of Chicago Press.Joyce, J. (1999).

*Foundations of causal decision theory*. Cambridge: Cambridge University Press.Levi, I. (1987). Rationality, prediction, and autonomous choice.

*Canadian Journal of Philosophy*,*19*, 339–363.Lewis, D. (1980). A subjectivist’s guide to objective chance. In R. Jeffrey (Ed.),

*Studies in inductive logic and probability*(Vol. II, pp. 263–294). Berkeley: University of California Press.Lewis, D. (1981). Causal decision theory.

*Australasian Journal of Philosophy*,*59*, 5–30.Meek, G., & Glymour, C. (1994). Conditioning and intervening.

*The British Journal for the Philosophy of Science*,*45*, 1001–1021.Pearl, J. (1993). Comment: Graphical models, causality, and intervention.

*Statistical Science*,*8*, 266–269.Pearl, J. (2009).

*Causality: Models, reasoning, and inference*(2nd ed.). Cambridge: Cambridge University Press.Reichenbach, H. (1956).

*The direction of time*. Berkeley, CA: University of California Press.Savage, L. (1954).

*The foundations of statistics*. New York: Wiley.Seidenfeld, T., Schervish, M., & Kadane, J. (2010). Coherent choice functions under uncertainty.

*Synthese*,*172*, 157–176.Skyrms, B. (1980).

*Causal necessity*. New Haven: Yale University Press.Spirtes, P., Glymour, C., & Scheines, R. (2000).

*Causation, prediction, and search*(2nd ed.). New York: Springer.Verma, T. (1987). Causal networks: Semantics and expressiveness. Technical Report R-65-I, Los Angeles: University of California, Cognitive Systems Laboratory.

Zhang, J., & Spirtes, P. (2008). Detection of unfaithfulness and robust causal inference.

*Minds and Machines*,*18*, 239–271.

## Acknowledgments

I am grateful to Malcolm Forster, Dmitri Gallow, David O’Brien, Dan Hausman, Jim Joyce, Hanti Lin, Ben Schwan, Elliott Sober, Mike Titelbaum, Naftali Weinberger, Olav Vassend, and the audience at the Self-prediction in Decision Theory and Artificial Intelligence conference in Cambridge, UK for their input and helpful discussion.

## Author information

### Authors and Affiliations

### Corresponding author

## Rights and permissions

## About this article

### Cite this article

Stern, R. Interventionist decision theory.
*Synthese* **194**, 4133–4153 (2017). https://doi.org/10.1007/s11229-016-1133-x

Received:

Accepted:

Published:

Issue Date:

DOI: https://doi.org/10.1007/s11229-016-1133-x

### Keywords

- Causal decision theory
- Causal models
- Causation