1 Introduction

Continuing a theme that he has developed in a number of papers (Hernán 2005, 2010, 2016; Hernán and Taubman 2008; VanderWeele and Hernán 2012), the epidemiologist Miguel Hernán has recently (May 2018) argued against the prohibition of the word “cause” in epidemiology when describing the aims and results of observational studies (Hernán 2018). Epidemiologists should stop using euphemisms for the “C-word”, and be emboldened to come out and say it, with one important proviso: they must say exactly what the C-word means.

Hernán’s proposed definition of “cause”—or, more accurately, “causal effect”—is in the counterfactual/interventionist camp. He argues that “causal effect” can be defined with sufficient clarity that it may be used both in stating the goals of epidemiological research and in describing their results, to the extent that the evidence supports such a description. The definition is inspired by this thought:

…a causal analysis in observational data can be viewed as an attempt to emulate a hypothetical trial.

(Hernán 2018, p. 617)

“Causal effect” is then defined in terms of what would be observed in an idealized randomized controlled trial; the causal inference question is then to what extent the actual results support this claim. Thus clarified, the definition is such as to license use of “cause” even when describing the results observational studies, provided they have been designed to emulate trials, and to the extent they have succeeded.

The extent to which an observational study warrants causal inference is then proportional to the extent to which it successfully emulates a randomized controlled trial. Maybe the emulation will never be perfect, but it need not be perfect to provide some support for a causal inference. After all, even trials provide only imperfect support for causal conclusions (Hernán 2018, p. 617).

Hernán’s argument is compelling for epidemiologists fed up with what Hernán says is a familiar story:

Dear author: Your observational study cannot prove causation. Please replace all references to causal effects by references to associations.

(Hernán 2018, p. 616)

In the recent issue of the American Journal of Public Health from which this quotation is drawn, 10 epidemiologists respond in 5 commentaries one editorial. All but one commentary and 8 out of 10 respondents are positive. The remaining paper by 2 respondents is interpreted as positive in Hernán’s reply, and the sharp criticisms it contains are overlooked. This further creates the impression of an agreement party among a wide range of epidemiologists, including social epidemiologists and those focused more explicitly on the application of epidemiology to public health, about the merit of Hernán’s argument and the approach to causality it advocates.

Hernán is a proponent of what is often known as the Potential Outcomes Approach (POA) to causal inference. The central claim of the POA is that meaningful estimates of causal effect must be specified relative to a well-defined intervention, even in an observational study, where the intervention is then a hypothetical construct. In the present paper, after summarizing the main two lines of objection, I argue that it is an instance of a positivistic attitude to meaning and to science itself. A realist stance towards causal effect recognises both the POA and more traditional approaches as meaningful and useful.

In Sect. 2 I set out Hernán’s argument. The success of Hernán’s argument depends on the clear definition of “cause”. In Sect. 3 I summarise his view, and identify two problems, one pragmatic and one theoretical. The pragmatic problem is that we may wish to estimate causal effects in many instances where specifying an intervention is impossible, rendering the POA an inadequate solution to the problems it sets out to solve. The theoretical problem is that deciding whether an intervention is well-specified requires ruling out residual confounders, which can only be done through a set of pragmatic decisions and assumptions, of just the same kind that one might make when estimating the causal effect of an exposure for which no intervention is specified. This means that estimates of causal effect satisfying POA requirements are not in principle different from, and not necessarily sharper than, estimates arrived at in traditional ways without specifying an intervention. To put it another way, perfectly well-defined interventions are impossible. Imperfectly-defined interventions (supported by pragmatic assumptions about residual confounders) are possible, but then so are well-defined causal estimates without specified interventions.

In Sect. 4 I highlight the tension between rejecting the positivistic restriction on causal talk, while at the same time keeping the spirit of that restriction by substituting another restriction which is supposed to permit some—but not all—of the causal talk epidemiologists are tempted to engage in. If our attitude to science is broadly realist, then we don’t have to achieve precise definition in order to talk meaningfully about something (and conversely we don’t ensure that we are talking about something merely through precise definition).

In Sect. 5 I confront Hernán’s disavowal of philosophy, in favor of epidemiology itself. I argue that, just as it assists to avoid euphemisms for “cause”, it assists to avoid euphemisms for “philosophy”, an activity that Hernán disowns, along with many epidemiologists. Hernán is not only an epidemiologist; he is also a philosopher of epidemiology.

Hernán is not the only advocate of the POA, and certainly not the only person to have contributed to the large body of methodological literature with which it is associated. Nor is this his only paper on the topic. However, he is one of the most well-known proponents, and this is his most recent (at time of writing) contribution to a hot debate that is widely followed among epidemiologists (who are much more time-sensitive than philosophers).

Continuing a tradition of his, Hernán’s paper is published in a venue that makes it significant: the American Journal of Public Health. This is one of the preeminent public health journals, with public health enjoying an intimate, and therefore complex, relationship with epidemiology. Resistance to the POA has been especially strong in public health and social epidemiology, because of the fear that it promotes a methodology unsuited to investigating exposures of concern to public health, and/or because it amounts to “ivory tower epidemiology” (Deleeuw 1993; Ravenholt 2005; Kaufman 2016) that is ill-suited to dealing with the practical problems and imperfect evidence that characterize the social world. For all these reasons, Hernán’s paper, and the reaction to it, are worthy objects of thorough philosophical discussion, as the most recent development in the ongoing philosophical debate about causal inference within the epidemiological literature. For readers unfamiliar with that debate, the present paper is also intended to provide a way in.

2 Against censoring the C-word

The stance that observational trials are capable only of detecting associations, which are then handed over for testing in a randomized trial, is not entirely unreasonable. There have been a plethora of causal inferences that have been supported by multiple studies, only to crumble, decisively and embarrassingly, when high quality trials are done (Broadbent 2013, Ch. 4). And over-enthusiastic inferences continue. Epidemiology has suffered reputational damage because of its persistence in announcing causal effects, despite the theoretical presence of residual confounding, which is what trials are so good at exposing to dramatic effect (Rutter 2007; Vandenbroucke 2008). Nutritional epidemiology is one example. It is a household joke that what was bad for you yesterday (pasta, butter, red wine) may be good for you today, and may be bad for you again tomorrow. Residual confounding is typically responsible for such embarrassments, and it is difficult to eliminate without a randomized trial.

It is this background that gives Hernán’s argument its significance, and explains the adulation and relief with which many epidemiologists have greeted his work, and the work of other leading proponents and developments of the POA (see especially: VanderWeele 2015). Most of the discipline is engaged in nothing other than observational studies. For many epidemiologists, the goal is knowledge of the causes of disease, and often the quantification of their effects. And for many of these (perhaps not all), the ultimate aim is public health intervention, for which causal knowledge is necessary. How frustrating, then, to be constantly told that one’s conclusions are mere suggestions about where trialists should turn their attention next.

I now attempt to extract the substance of Hernán’s argument in the AJPH paper in question.

2.1 Argument for clearly stating causal aims

In extracting the core of the argument, it is helpful to divide it into two parts. The first part of his argument concerns causal aims, and may be stated as follows.

  1. 1.

    Observational epidemiological studies aim to attain causal knowledge.

  2. 2.

    It serves the goals of science to be clear about one’s aims.

  3. 3.

    Therefore it serves the goals of science to be clear that observational studies aim to attain causal knowledge.

The first premise of this argument is supported by a general assertion that “causal inference is a core task of science, regardless of whether the study is randomized or non-randomized” (Hernán 2018, p. 616).

Another source of support for premise 1 is identified in the editorial:

Surely it is the job of the science of population health to understand the drivers of population health to the end of us intervening and being able to improve the health of populations. … When we are clear that we are studying causes, we open up the opportunity to identify and act on them.

(Galea and Vaughan 2018)

The idea is that we need causal information if we are to design effective public health interventions, and furthermore that these are the ultimate goal of epidemiology. Elsewhere Hernán makes remarks that resonate with the first idea (Hernán 2005, p. 620, 2016, p. 678; Hernán and Taubman 2008, p. S12), although I am not sure (I simply don’t know) whether he would accept the second.

The second premise, that it serves the goals of science to be clear about one’s aims, is not directly supported in the text of our focus article. Clarity has obvious merits. However, an insistence on absolute precision may be problematic, especially when we are dealing with things that we do not fully understand. In Sect. 3 I will argue that Hernán’s precise definition of causal claims fails. In Sect. 4 I will argue that insistence on absolute precision is well-motivated within a positivist conceptual framework, but that within a realist framework, insisting on precision before we are ready does not necessarily serve the goals of science.

2.2 Argument for clearly stating causal conclusions

So far, the argument only supports the statement of causal aims, not the statement of causal conclusions. However, Hernán is clear that “cause” and cognates are appropriate well beyond the description of aims of observational studies.

…the term “causal effect” is appropriate in the title and Introduction section of our article when describing our aim, in the Methods section when describing which causal effect we are trying to estimate, through an association measure, and in the Discussion section when providing arguments for and against the causal interpretation of our association measure.

The only part of the article in which the term “causal effect” has no place is the Results section, which should present the findings without trying to interpret them.

(Hernán 2018, p. 617)

Clearly such a position is not supported merely by an argument for the honest statement of causal aims.

It is possible to extract a second argument from Hernán’s brief but rich paper, to validate the use of “cause” in causal conclusions. The argument may be stated as follows.

  1. 4.

    It is reasonable to describe an observed effect as “causal” if and only if

    1. a.

      “causal effect” is well-defined and

    2. b.

      the kind of evidence that will support the application of the term is clearly specified, to the extent that this kind of evidence is available.

  2. 5.

    It is possible to define “causal effect” well and to clearly specify the kind of evidence that will support the application of the term.

  3. 6.

    It is reasonable to describe an observed effect as “causal” in the well-defined sense to the extent that the specified kind of evidence is available.

Support for premises 4 and 5 depends on the availability of a precise definition of “causal effect”. In the next section I turn to Hernán’s proposed definition, and indicate two serious objections.

3 Defining the C-word

The definition of “cause” is a larger task than POA advocates, including Hernán, wish to take on. They restrict their attention to the term “causal effect”. This may sound tautological to a philosopher: surely, by definition, all effects are causal? But the phrase is not tautological in the epidemiological context. It means an observed, quantitatively expressed association that is explained by a causal connection between the two associated variables, as opposed to being caused or modified by a confounding variable, or arising by chance.

“Causal effect” is a quantitative concept. The associations that are deemed to be causal effects are quantified. Typically, the phrase “estimating causal effect” means arriving at a quantitative measure of an observed association that is due to a given cause. This could be a proportion of the total observed association, for example, the proportion of lung cancer among Korean males that is due to smoking, or the proportion of the obesity that is due to genetic factors, or the proportion of morbidity that is due to obesity. It may also mean estimating how much effect a given cause has: for example, the extent to which smokers’ risk of lung cancer is elevated compared to non-smokers, or the extent to which certain dietary features raises the risk of obesity compared to other diets, or the extent to which morbidity is increase by obesity compared to morbidity among the non-obese.

The key point of the POA, including Hernán, is that such measures only make sense if you clearly specify what you are estimating a difference from, and then specify the cause as a specific change in that scenario—as specific as possible. Much existing observational epidemiology fails to be adequately clear in this regard, according to Hernán, and it is to this lack of clarity that Hernán attributes many of the failings of observational epidemiology—rather than to the lack of randomization.

In a trial, the investigators must design a suitable intervention. This means that they are automatically forced to consider this intervention with some care and ensure that it is suitably monolithic: for example, that a drug is administered at the same time of day, by the same means (oral, intravenous, etc.). This doesn’t guarantee that there won’t be residual confounders—differences between the different ways the intervention is carried out that themselves produce some sort of effect that modifies the observed association. However, it forces the investigators at least to think about it. (Moreover, in Hernán’s view, the fact that an intervention is “done” makes it much more likely that the technical property of consistency is satisfied (Hernán and Taubman 2008, p. S11), a point to which I will return in Sect. 5).

In an observational study, by definition, there is no actual intervention. The innovation of the POA is to interpret observational data as if an intervention had been made. We seek to emulate a trial as closely as possible in both the study design and the analysis. For example, rather than comparing mortality between obese with non-obese groups, we imagine that the obese group were the result of some intervention on the non-obese group. If the study has not recorded enough information about lifestyle, this may not be possible, in which case no causal effect can be estimated. If, however, we do have information about exercise habits, diet, and so forth, then we treat the obese group as if we had made these various interventions, and estimate the effect of these interventions on mortality. There is no space for directly estimating the effect of obesity on mortality since there is more than one way that obesity may come about, and these different ways may themselves have an effect on mortality.

Thus causal effect can be meaningfully estimated in observational studies, provided that the estimate is expressed as the effect of a hypothetical intervention.

Hernán’s proposals and those of others in the POA movement have attracted considerable attention among epidemiologists, both positive and negative. In my opinion, the critical themes in the growing literature are crystalizing into two main objections, one pragmatic, the other theoretical.

3.1 The pragmatic objection

The pragmatic objection is that there are situations where we suspect or even know that we cannot specify an intervention with any degree of clarity, but where quantitative estimates of causal effect seem useful nonetheless (Krieger and Davey Smith 2016).

This is the deep concern of social epidemiologists, who often attempt to estimate causal effect of variables such as race status. While they may admit that such estimates are difficult and to be treated circumspectly, they nonetheless don’t want to accept that they are ill-defined and meaningless. For instance, race status seems to be associated with health status, and there are compelling reasons to suspect at least some causality (Krieger 2008). There are also estimates of the degree of the effect that suggests it is substantial, and these estimates sometimes appear to be best explained partly by appeal to socially-mediated effects of race. But the potential cause itself, “race”, is not easy to characterize (Lorusso and Bacchini 2015). Is it to be defined by reference to biological features? If so, which ones? Skin colour? Genetic markers? Or is it a social construct—even a personal choice, if self-identification is the tool of detection? If it is a social construct, what does that mean?

It is even trickier to specify an intervention against which the effect of race is to be measured, as the POA requires for an estimate of causal effect. When we say “if this person were white, they would have got the job”, what exactly is the scenario we are imagining? One where the same person becomes white? One where they were always white—in which case, one where they are shaped by all other potential effects of being white? Do they speak with the same accent? Do they know the same things? Do they share a culture or a mother tongue?

In an observational study one may try to settle such questions as far as possible. Few would doubt that doing so is desirable; for example, a comparison of the effects of race between groups with very different income will also be a study of the effect of income. But one will typically end up with many other differences. In this situation, the usual response is to consider in the discussion section whether or to what extent the residual confounders matter, employing background knowledge, common sense, and other such crude instruments.

The usual response is not to seek to clearly specify a hypothetical intervention whose effect can then be meaningfully estimated. Strategies of this kind have been tried by POA enthusiasts, for example, by estimating race as the residual effect once effects of other more manipulable variables have been estimated (VanderWeele and Robinson 2014). Such approaches cannot be said to have taken off among social epidemiologists (Krieger and Davey Smith 2016). They appear contrived, and even euphemistic (Glymour and Glymour 2014), not dissimilarly to the past reluctance of epidemiologists to talk about causes. If our data suggests that being African American reduces your life expectancy compared to an equivalent Caucasian counterpart, then it is helpful for everyone if we just to come out and say so.

Hence the usual response: to consider in the discussion section whether or to what extent the residual confounders matter, employing background knowledge, common sense, and other such crude instruments. Social epidemiologists thus refuse to accept that their estimates of “causal effect” are meaningless. They hold that they may convey useful information (which implies meaning something) despite being imperfect.

Social epidemiologists are apt to see the POA as “ivory tower epidemiology”: elegant, sophisticated, but impractical and disconnected.

This pragmatic response to the POA is powerful, but it may be resisted by a principled stance that the desirability of estimates of causal effect of variables such as race is no argument for its possibility, and that the social epidemiologists are guilty of wishful thinking. However, if the POA wishes to adopt the stance of the principled theoretician, then it needs to face a principled, theoretical objection, to which I now turn.

3.2 The theoretical objection

Hernán’s claim is that, by clearly specifying an intervention whose effect we are estimating, we achieve a meaningful estimate of causal effect. The objection that I now wish to pursue is that the specification of an intervention relies on the same kind of crude, pragmatic reasoning about unknown or unestimated confounders that one might expect in the discussion section of a traditional paper estimating a causal effect in an observational study.

An example that Hernán and I have both treated elsewhere is obesity (Broadbent 2015; Vandenbroucke et al. 2016; Broadbent et al. 2016a, b). Hernán holds that it is meaningless to estimate the effect of obesity on morbidity unless one has in mind some particular obesity-reducing intervention, against which the effect of obesity is measured; or, equivalently, unless one is measuring the effect of some specific intervention to reduce obesity (Hernán and Taubman 2008).

However, as I have argued elsewhere our ignorance means that we may not be able to meet these criteria. We may not know whether one’s intervention is well-specified. One of Hernán’s own examples of a well-specified intervention is 1 h of exercise per day. But this may not be well-specified; in fact it probably is not (Vandenbroucke et al. 2016). There are many different kinds of exercise and it is plausible that they have different effects on obesity, mortality, or both. Strenuous running, swimming, weightlifting, cycling, squash, rowing, karate, and rugby probably have different effects on both obesity and mortality (even controlling for injuries). “What kind of exercise will help me lose the most weight?” is a common question. There is even research on this (not always very good research), with a market in the fitness industry (“High Intensity Interval Training” being the current hot idea).

Of course, given what we now know about the effects of different kinds of exercise, we could fix up this particular example by specifying the kind of exercise we mean. But the point is that we do not necessarily know when specifying an intervention whether it is well-specified. Even if we did try to fix the example up, we might pass over some detail that future research identifies as crucial–as crucial as the difference between LDL and HDL cholesterol, for example.

In practice, one tries to make educated guesses, relying on background knowledge, common sense, and other such crude approaches. This is just the same way in which one deals with the possibility of residual confounders in a discussion of a traditional attributable risk estimate. That’s because these other variables are residual confounders, with respect to the intervention in question. We may insist that an intervention is not well-specified until all confounders are accounted for—until the specification is so tight that there’s nothing left that could possibly interfere with the inference. But that’s simply not possible, for real human investigators.

Hernán, along with other POA advocates, face a quandary. They can refuse to allow that an intervention is well-specified until it all the causally relevant aspects of the intervention have been specified. But then well-specified interventions are impossible. Alternatively, they may allow some degree of practical, informal, dirty assuming and appeal to background knowledge. But then they are opening the door to all the woes of traditional approaches to causal inference. These are not only the epistemic problems of residual confounding. They are also the corresponding semantic problems of lack of clear specification of contrasting counterfactual scenarios. For it was the multiplicity of residual differences between populations that Hernán objected to, when he insisted that estimates of causal effect were unclear without a clearly specified intervention. Specifying an intervention may appear to help, but it is only an appearance. On closer examination, there’s nothing special about specifying an intervention. One still lives with many unspecified residual differences between one’s comparator populations, the hypothetically intervened-upon and the control.

Another way to put the point is that, while more precise specification of one’s populations is often helpful, there is nothing special about specifying an intervention. That’s one way to increase precision, but it doesn’t force precision. It’s no different in principle from other, more familiar ways of doing the same thing, such as controlling for other potential confounders. Just as the epistemological difference between trials and observational studies is one of degree and not kind, so the semantic difference is also not one of kind.

At this stage, two reactions are possible, one despairing, the other hopeful. The despairing reaction is to see the persistence of residual confounding as the nail in the coffin of observational epidemiology, at least insofar as it hopes to estimate causal effect. The other is to resist the POA’s restrictive approach to causal inference, on the basis that it doesn’t solve the problem of residual confounding, and thus doesn’t bring observational studies any closer to the inferential advantages of randomized trials. I will advocate the latter approach in the next section.

4 Completing the realist turn

Hernán sets out to eliminate censorship of the C-word. But he does so by imposing a further restriction on its use. Why do this? Why not simply say that epidemiologists ought to feel free to use terms like “cause” or “causal effect” whenever they think it appropriate, in stating their aims or conclusions?

The answer I want to explore in this final substantive section is that Hernán, and the POA in general, has confused the measure with the thing measured. The insistence on precise definition of interventions arises from the concern that, without that, expressions of causal effect will be vague. Hernán equates vagueness and imprecision with meaninglessness. It is one thing to prize clarity, another to equate clarity with meaningfulness.

Hernán’s approach is akin to that of the logical positivists, who insisted that unless a statement could be cashed out in terms of its verification conditions, it was meaningless (Ayer 1952). Hernán does not appeal to verification conditions, but to counterfactual scenarios. If a causal claim cannot be cashed out in terms of corresponding counterfactuals, it is not clear enough to be meaningful. This is a different condition on meaningfulness than that used by the positivists, but it is the same approach.

By contrast, if one is a realist about causal effect, then one would hold that more or less precise attempts to refer to it, estimate it, and so forth may be meaningful, and may approach the truth to a greater or lesser degree. On a realist approach to semantics, whether one is right about something, and whether one succeeds in talking about it in the first place, are two different things.

The appropriate attitude to estimates of causal effect is realist, not positivist. Estimates of causal effect are not just so many operations on the data. They are attempts to describe an underlying reality. Different effect measures seek to describe different aspects of that single reality or express it in different ways. This means that an estimate of causal effect need not be precise in order to be meaningful. This is a good thing for traditional epidemiology and the POA alike, since, as I have argued, they are in the same boat as regards the impossibility of a fully precise specification of the meaning of an estimate of causal effect.

In the remainder of this section I will unpack these lines of thought: that the POA’s restrictions are positivistic in character, and that realism about causal effect is a more attractive than positivism.

4.1 The POA’s restrictions as positivistic

At the heart of logical positivism is the idea that “metaphysical” talk is meaningless and should be eschewed in the sciences, which are not concerned with poetry and other non-literal uses of language. Talk is meaningless when we cannot specify the circumstances in which a hypothetical sense experience could show its truth or falsity:

We say that a sentence is factually significant to any given person, if, and only if, he knows how to verify the proposition it purports to express—that is, if he knows what observations would lead him under certain conditions, to accept the proposition as being true, or reject it as being false.

(Ayer 1952, p. 48)

While there is no mention of possible experiences in Hernán’s writing, and while Hernán is obviously not committing himself to verificationism in his work on causal inference, there is nonetheless a resonance between Ayer’s verification conditions and Hernán’s well-defined counterfactuals. The former are conditions under which a proposition would be accepted as true/rejected as false (notably itself a counterfactual), while the latter are scenarios in which a causal claim would be accepted as true/rejected as false. Moreover, there is a striking parallel of strategy between the verificationist and Hernán. That strategy is to insist that a term is not meaningful unless it can be reduced to some other terminology, whose meaningfulness is beyond doubt.

Logical positivism is founded on a deep epistemological skepticism. A world beyond immediate experience cannot be proved, nor even talked about. All our talk must ultimately be reducible into a language of sense data. The same skepticism applies when positivism is applied in other domains, for example, the legal domain. Legal positivism holds that there is nothing more to law that what is written in the law books or uttered in the courts.

Logical positivism is nonetheless epistemically optimistic about science, but only provided that science remains rigorously precise, and does not stray off into meaningless nonsense. Provided that its theoretical terms are suitably defined in terms of an observation language, science is a powerful tool for systematizing our experiences, and expressing large quantities of empirical truth elegantly and concisely. Much philosophical work in the twentieth century involved working out how the theoretical vocabulary got its meaning from the observational vocabulary (Nagel 1961; Lewis 1970), notwithstanding the fact that no science actually distinguished its theoretical and observational vocabularies—one of the points that eventually led to the undoing of the positivist project.

On a positivistic view, there is nothing more to an estimate of causal effect than what is said when it is expressed, just as in legal positivism there is nothing more to the law than what is written or ruled. All the various measures that one finds in an epidemiological textbook have equal “reality”; they are ways of expressing association. Estimates of causal effect differ from non-causal expressions only in that it they are counterfactual assertions about an association. If we do not properly specify the counterfactuals in question then what we are saying is meaningless, or at least does not have a determinate meaning.

I do not know whether epidemiology in general, the POA, or Hernán are logical positivist in history or inclination. I do suspect some strong influences, given that epidemiology is a statistical science, and that twentieth century statistics was influenced by logical positivism. My point is simply that the POA’s approach to estimates of causal effect is positivistic, in the sense described.

4.2 Realism about causal effect

Logical positivism exerted a huge influence on science in general, and continues to do so. The insistence on clarity and technical rigor are valuable. However, the accompanying theories of meaning, and the underlying epistemological and metaphysical worldviews, have largely been rejected by philosophers. Among philosophers who are sympathetic to or optimistic about science, they have usually been replaced by realist approaches or limited forms of empiricism. Among philosophers with a more cynical or pessimistic view of science, as well as among many sociologists, “science studies” scholars, anthropologists, and others whose attitudes to science are less sympathetic, they have often been replaced by relativistic views. The latter are of limited interest in the present context because they tend to step back and critique the scientific project as a whole, or bring in non-methodological and even personal considerations (such as Hernán’s career as an epidemiologist, or mine as a philosopher of science). These are not relevant to the present discussion, which is methodological in focus.

Like logical positivism, scientific realism is also optimistic about science. But it adopts an entirely different view of the world, and of the source of the meaning of theoretical terms. Science aims to describe the world, on a realist view. Its theoretical terms get their meaning, not from the way they are defined, but from what they refer to. Sometimes, what they refer to is quite at odds with how we defined them. That is because we can succeed in referring to something while being quite wrong about it. It took us a while to realise that the evening star was the morning star (both being the planet Venus). On a realist view, we were nevertheless referring to the same thing all the time. On a positivist view, we were either referring to two different sets of sense experience, or else were not successful in referring to anything at all.

It is here that I perceive a tension in Hernán’s thinking, and in the POA’s approach more generally. Despite adopting a positivistic strategy towards causal terminology, Hernán’s implicit stance towards causal effect is realist. It makes no sense, for him, to allow ourselves to speak only about association. It makes no sense because we are looking to understand causation, and causation is distinct from the evidence we have about it. This is fundamentally a realist stance. Causal connections between exposures and outcomes are “out there”, and we want to know about them. Our studies provide evidence that tells, more or less strongly, in favor of or against any hypothesized “out there” causal connection. Trials tell more strongly; observational studies, less. This stance makes sense against a realist background, because scientific realism holds (among other things) that science is the business of trying to describe and understand entities and processes that are really out there, and succeeding or failing to the extent that we get this right. It makes no sense against a positivist background, where all we can talk about—let alone know about—is what we can, in principle, observe.

Finally we get to the point I wish to make about Hernán’s realism, and about the POA more generally. If causality is “out there”, and if we wish to know about it, then it is mistaken to impose a given definition as a necessary condition on the meaningfulness of causal claims. On a realist semantics, the meaning of causal terms is determined by the nature of causation itself. We can succeed in talking about something, yet still be wrong about it. To this extent, what our words mean is not entirely determined by how we define them (Kripke 1972; Putnam 1978). Our definitions can be wrong: a term can be about something that is not very much at all how the definition describes it.

This is so whether or not the definition is precise. Within a positivist framework, precise definition is necessary to avoid talking nonsense. Within a realist framework, precise definition is desirable because it demonstrates a good understanding of the nature of the thing we are talking about. But it is not necessary for the avoidance of nonsense. Indeed, to insist on a precise definition within a realist framework, as a precondition of meaningful talk, is incoherent. Compare some everyday thing about which we are naively realist in everyday talk. To insist on a precise definition of “dog” before accepting that it means anything to say “the dog bit me” is obviously silly. One does not have to be a zoologist to say that one has been bitten by a dog.

If causation is “out there” then our definitions are attempts to correctly describe what is “out there”. We are not at liberty to just define a term any old way we like; we can do this, but then what we are doing is pointless, and is moreover bound to lead us into error, since chimera are not governed by the laws of nature. If we get our definitions wrong, then we go wrong when we start generalizing from the tiny portion of the universe that we have observed. Generalizations and extrapolations only stand a chance of working if we frame them in terms that “carve the world at its joints”.

If we insist on a given definition of cause merely for its precision, then we risk being wrong about the thing we are talking about. Even if we qualify this insistence, as POA advocates sometimes do, to the kind of cause we are talking about, then we may still be wrong: our definition may attribute the wrong properties to the thing that we are, as a matter of fact, talking about. The POA falls into this trap by insisting that causal claims be expressed by reference to well-defined interventions, when a well-defined intervention is an impossibility by the high standards of the POA. Better for the POA and traditional epidemiology both to accept that there is such a thing as causal effect, which various expressions, techniques and so forth may seek to estimate with more or less precision.

5 Conclusion: against censoring the P-word

The POA provides a useful set of tools that can be applied when an intervention is specified. This, in itself, is a good reason to seek well-defined interventions. However, an intervention is only as well-defined at a certain level of resolution. Go deeper, and we will see many features of the intervention left unspecified, features that may correspond to sources of residual confounding. Estimating the causal effect of an exposure on an outcome without specifying an intervention is not categorically different. In both cases we appeal to background knowledge to deal as best we can with the situation, and adopt a pragmatic attitude.

The conclusion of this line of argument is thus a pluralistic one. While the POA provides a useful set of tools, specifying an intervention is not a magic bullet—neither for eliminating confounders nor for ensuring precision in our expression of causal effect estimates. Epidemiologists can continue to estimate attributable risks, adjust for known confounders, and so forth, as they have done before. They would do well to heed Hernán’s insistence on clarity, but they do not have to specify interventions. On the other hand, those who adopt the POA approach must resist the temptation to assume that they have automatically achieved clarity by specifying an intervention, and must certainly not be tempted to think that they have eliminated confounders. Residual confounding is an ineliminable risk of observational studies.

Hernán acknowledges all of the above: that interventions cannot be fully specified, but that specifying them is an advantage of degree of clarity rather than kind, and so forth. He indicates, however, that these points relate to exchangeability rather than consistency, and that the latter is the key attribute that interventions bring.

But his argument on this point becomes difficult to follow. Consistency is a property of an intervention such that, in the scenario where the intervention is done, if it were done, the outcome would be as it actually is. In Hernán’s view this is trivially satisfied in a trial:

…consistency is a trivial condition in randomized experiments. For example, consider a subject who was assigned to the intervention group A = 1 in your randomized trial. By definition, it is true that, had he been assigned to the intervention, his counterfactual outcome would have been equal to his observed outcome.

(Hernán and Taubman 2008, p. S11)

However, in Hernán’s view, it is not trivially satisfied in observational studies:

Suppose that, in [an] observational study …, the data analyst compared the mortality of subjects who happened to have a BMI of 30 (A = 1) and a BMI of 20 (A = 0) at baseline. Now consider a study subject who had a BMI of 20 at baseline. It is not obvious that, had he been assigned to a BMI of 20 some time before baseline, his counterfactual outcome at the end of the study would have been necessarily equal to his observed outcome because there are many possible methods to assign someone to a BMI of 20.

(Hernán and Taubman 2008, p. S11)

But consistency does not concern the value of a variable at some earlier time. It concerns the actual value of a variable, at a time in question. Hernán’s point seems to be simply that “BMI 20” could come about in more than one way. There is really no difference between saying this and saying that an intervention could come about in more than one way, which, as I argued in Sect. 3.2, is an unavoidable reality.

Regardless, I strongly doubt that anyone who has been taken with the POA or Hernán’s work has found the crucial passage on consistency, upon which the whole of the rest hangs, decisive. I suspect that for many it is a forgotten or misunderstood detail. It is one of the jobs of philosophers to chase such details up, and continue digging for answers and seeking clarity when others are satisfied. Hernán himself adopts this attitude towards traditional estimates of causal effect, which indeed often are too vague. In this regard, Hernán is a philosopher of epidemiology.

Epidemiologists who engage in thinking about the nature of causality are doing philosophy, whether they call it that or not. Hernán writes:

In a perfect example of cognitive dissonance, scientific journals often publish articles that avoid ever mentioning their obviously causal goal. It is time to call things by their name. If your thinking clearly separates association from causation, make sure your writing does too.

(Hernán 2018, p. 618)

If it is time to call things by their name, then we should call Hernán’s recent paper, as well as several of his others, philosophical.

The big advantage of calling philosophical work by its name is that of inheriting the fruits of previous labor. Perhaps epidemiologists are afraid of dabbling in a discipline where they are not experts. However, cooperative philosophers—of whom there are some—will want to help them, because in doing so they will discover things that of interest to themselves.

The main disadvantage of calling philosophy by its name is the lack of a guarantee that the discipline of philosophy is a healthy and rigorous one. This is something that concerns many scientists with regard to many humanities disciplines, and not without reason: there is a lot of nonsense out there. There is also the risk that individuals in the discipline are going sidetrack or completely derail you, for the sake of their own careers, or simply to look clever.

I have not addressed these worries directly, but I hope that I have done so indirectly. I hope that this paper may offer some assurance that the relevant areas of analytic philosophy are a rigorous and healthy, and that the risk of derailing is small enough to be worth the potential benefit.