Causality in complex interventions


In this paper I look at causality in the context of intervention research, and discuss some problems faced in the evaluation of causal hypotheses via interventions. I draw attention to a simple problem for evaluations that employ randomized controlled trials. The common alternative to randomized trials, the observational study, is shown to face problems of a similar nature. I then argue that these problems become especially acute in cases where the intervention is complex (i.e. that involves intervening in a complex system). Finally, I consider and reject a possible resolution of the problem involving the simulation of complex interventions. The conclusion I draw from this is that we need to radically reframe the way we think about causal inference in complex intervention research.

This is a preview of subscription content, log in to check access.


  1. 1.

    Testing this involves the splitting a sample into (at least) two groups: ‘treatment’ and ‘control’. The treatment group receives the intervention and the control does not, though it may receive a placebo or an alternative (‘standard’) treatment. The two groups have to be ‘well-matched’ (at least with respect to relevant variables) at baseline in order to allow inferences to be made regarding the causal effects of the intervention. The variables that are viewed as relevant to the outcome and so measured before treatment commences are called ‘covariates’. Matching is carried out with respect to these.

  2. 2.

    In what follows I borrow, in parts, from the excellent presentation of Holland (1986).

  3. 3.

    The experimental units, however, do not have to be individual people; they might be groups of people or even entire populations of people. If we can find variables to measure on these systems too, then we will often find that they vary over \({\mathcal{U}}\)cf. Hertzman et al. (1994, p. 67).

  4. 4.

    As Evans et al. (1994) put it in the title of their book: Why Are Some People Healthy and Others Not?

  5. 5.

    The example of health H(u) and wealth W(u) provides a nice intuitive example here, since one finds that us with high W(u)-values often have high H(u)-values too. The problem is, of course, that we can’t tell from this correlation (that is, from the joint distribution of H(u) and W(u)) whether high H(u)-values cause high W(u)-values or vice versa, or indeed whether some other ‘hidden variable’ (i.e. a ‘confounder’) is the cause of both high H(u)-values and high W(u)-values.

  6. 6.

    Note that Pearl’s investigations aim to provide algorithms for prediction from interventions specified as alterations of the joint distribution of some variables in a system in which causal structure remains invariant—hence, we will immediately be faced with difficulties by complex systems in which this stationarity is violated.

  7. 7.

    N = 1 (longitudinal) trials (such as Interrupted Time Series methods) might appear to give one an escape from this problem. Here exactly one person takes the treatment at t = 1 and an observation is made, whereupon the person ceases the treatment at t = 2 with an observation made again, and then the person takes the treatment again at t = 3, and so the cycle continues up to t = n. This misses the point of the fundamental problem: after the first complete cycle, at t = 3, the conditions are necessarily different from the first instance at t = 1. At the start of each new period in the cycle the conditions will be differ (at least) by the addition of a further treatment.

  8. 8.

    Causal graph advocates refer to this nice feature of interventions as “arrow-breaking”. See Pearl (2000), for example.

  9. 9.

    As a prime case of overt bias requiring adjustment consider the following case described by Cochran (1968). The case involves data from a study of mortality rates among three categories of men: non-smokers, cigarette smokers, and pipe and cigar smokers. The rates for these groups, per 1000 men per year, respectively, are: 20.2, 20.5, and 35.5. Prima facie this suggests that pipe and cigar smoking are extremely harmful, but that it doesn’t make much difference whether one smokes cigarettes or not. However, inspecting the mean ages of the groups soon reveals significant differences: 54.9, 50.5, and 65.9 respectively. Hence, the data needs to be reinterpreted: pipe and cigar smokers are older and so we should expect (independently of the smoking issue) a higher rate of death amongst this group relative to the others. Moreover, the non-smokers are older than the smokers and yet the smokers have a higher mortality rate nonetheless. To control for this bias we must further stratify the groups so that only men within the same age-range are compared. How do we know we have got it right once we have adjusted for age? We don’t. There may be other variables that have a similar biasing effect that aren’t controlled for, perhaps because they aren’t known or measurable.

  10. 10.

    There are, of course, ways that this can be subverted, and for this reason one often adds ‘blinding’ conditions to the experimental setup. Also, randomized experiments often are not ‘really’ random; for example, assignment is carried out on the basis of such things as birthdays, and such like. Since these methods are predictable they are not truly random.

  11. 11.

    Even if randomization does eliminate selection bias, there are also biasing effects that can emerge during the intervention (i.e. after randomization has been done)—hence differences between the wings of the experiment may not reflect the operation of the intervention alone. In other words, even if we allow for perfect distribution of inhomogeneities at the start, once subjects are allocated to some particular arm there is the potential for biasing effects to interfere (cf. Cox 1992, p. 299). Hence, the initially strong experimental control over the ‘baseline’ properties of the groups quickly deteriorates. This is one of the main reasons underlying Peter Urbach’s rejection of randomization (1985, p. 262-264).

  12. 12.

    This value is known as a ‘p-value’; it gives a measure of the confidence with which we can accept or reject the null hypothesis. In other words, it helps us to rule out certain apparent correlations; correlations that are really due to chance. By the same token, it allows us to determine the probability that we have a genuine link. Hence, probabilities play a guiding role in the determination of causal links. This is connected to the notion of a level of significance of some result. Following Fisher (ibid.), this level is usually set at the value 0.05. What this means is that whatever the result you got, you can expect to have got it by pure chance at least 5 times out of 100 runs of the same experiment. In other words, it functions as a demarcation criterion for separating fluky from non-fluky correlations.

  13. 13.

    Note, however, that Lipsey and Cordray are merely surveying current views here, they recognize that difficulties in causal inference can arise even given perfect randomization prior to treatment.

  14. 14.

    Papineau draws attention to these on the grounds that there are often ethical problems facing RCTs. Thus, he writes that “medical enthusiasm for randomization is dangerous and needs to dampened ... not because it is worthless ... but rather because it is often unethical, and because the conclusions it helps us reach can usually be reached by alternative routes of greater economic cost and less epistemological security” (Papineau 1994, p. 438). In this he is in concurrence with Worrall (2002). However, the discussion appears to indicate that Papineau thinks this is pretty much the only problem with randomization.

  15. 15.

    As philosophers of science will notice, there is more than a hint of the ‘no miracles argument’ present here.

  16. 16.

    A more formal (and fundamental) model of this characteristic of a complex system is the Ising model, a two-dimensional lattice of interacting ‘spin’ systems (or just ‘spins’), with spin components s =  +1 or s =  −1 and interaction Hamiltonian \(H = -J \sum_{\langle i, j \rangle} s_{i} s_{j}\) (with coupling constant ‘J’). This model is used to provide an idealized model of an iron magnet: the Hamiltonian describes the interactions between nearby spins and the interaction with an external magnetic field. There is a phase transition in this system when the temperature is tuned to a certain ‘critical point’, separating order (spins pointing in the same direction) and disorder (spins pointing in the different directions). The point of this technical detour, and the Schelling example, is to exhibit the sensitivity of complex systems to small disturbances.

  17. 17.

    Think of ‘cooperative phenomena’ like magnetization in which one tweaks the control parameter to the Curie temperature. Interventions in this system spread over the whole of it due to the fact that the correlation length (between spins) becomes infinite.

  18. 18.

    This has been already been noticed by Rosenbaum (2005, p. 147)—see also Holland (1986).

  19. 19.

    This complexity is a result of the large numbers of economic agents in markets, the interactions between economic agents, and the feedback loops between the agents and the global patterns their interactions determine.


  1. Altman, D.G. 1985. Comparability of randomised groups. The Statistician 34 (1): 125–136.

    Article  Google Scholar 

  2. Campbell, D.T. 1969. Artifact and control. In Artifact in behavioural research, ed. R. Rosenthal and R. Rosnow, 351–382. NY: Academic Press.

    Google Scholar 

  3. Campbell, D.T. and J. Stanley. 1963. Experimental and quasi-experimental designs for research. Chicago: Rand McNally.

    Google Scholar 

  4. Cartwright, N. 2007. Hunting causes and using them. Cambridge University Press.

  5. Cartwright, N. 1989. Nature’s capacities and their measurements. Cambridge University Press.

  6. Cartwright, N. 2002. Against modualrity, the causal Markov condition and link between the two: comments on Housman and Woodward. British Journal for the Philosophy of Science 53: 411–453.

    Article  Google Scholar 

  7. Cochran, W.G. 1965. The planning of observational studies of human populations. Journal of the Royal Statistical Society, Series A (Statistics in Society) 128 (2): 234–266.

    Google Scholar 

  8. Cook, T.D. and D.T. Campbell. 1979. Quasi experimentation: design and analysis issues for field settings. Chicago: Rand McNally.

    Google Scholar 

  9. Cox, D.R. 1992. Causation: some statistical aspects. Journal of the Royal Statistical Society, Series A (Statistics in Society) 155 (2): 291–301.

    Article  Google Scholar 

  10. Dodge, Y. ed. 2003. The Oxford dictionary of statistical terms. Oxford University Press.

  11. Eaton, D and K. Murphy. 2000. Statistics and causal inference: comment: which ifs have causal answers. Journal of Machine Learning Research 1: 1–48.

    Google Scholar 

  12. Epstein, J.M. 2007. Generative social science: studies in agent-based computational modeling. Princeton University Press.

  13. Evans, R.G., M.L. Barer, and T.R. Marmor. 1994. Why are some people healthy and others not? Aldine de Gruyter.

  14. Giere, R. 1979. Understanding scientific reasoning. New York: Holt, Rinehart, and Winston.

    Google Scholar 

  15. Hartmann, S. 1996. The world as process: simulations in the natural and social sciences. In Modelling and simulation in the social sciences from the philosophy of science point of view, ed. R. Hegselmann, U. Mueller, and K.G. Troitzsch, 77–100 . Dordrecht: Kluwer Academic Publishers.

    Google Scholar 

  16. Hausman, D.M. and J. Woodward. 2004. Manipulation and the causal Markov condition. Philosophy of Science 71: 846–856.

    Article  Google Scholar 

  17. Hertzmann, C., J. Frank, and R.G. Evans. 1994. Heterogeneities in health status and determinants of public health. In Why are some people healthy and others not?, ed. R.G. Evans, et al., 62–92. Aldine de Gruyter.

  18. Hill, A.B. 1965. The environment and disease: association or causation? Proceedings of the Royal Society of Medicine 58: 295–300.

    Google Scholar 

  19. Holland, P. 1986. Statistics and causal inference. Journal of the American Statistical Association 81: 945–960.

    Article  Google Scholar 

  20. Hsieh, J.-L., C.-T. Sun, G. Y.-M. Kao, and C.-Y. Huang. 2006. Teaching through simulation: epidemic dynamics and public health policies. Simulation 82 (11): 731–759.

    Article  Google Scholar 

  21. Kleinbaum, D.G., L.L. Kupper, and H. Morgerstern. 1982. Epidemiologic research. Belmont, CA: Lifetime Learning.

    Google Scholar 

  22. Le Baron, B. 2000. Agent-based computational finance: suggested readings and early research. Journal of Economic Dynamics and Control 24 (5): 679–702.

    Article  Google Scholar 

  23. Levy, H., M. Levy, and S. Solomon. 2000. Microscopic simulation of financial markets: From investor behavior to market phenomena. Academic Press.

  24. Lewis, D. 1986. Philosophical papers, vol II. Oxford University Press.

  25. Lipsey, M.W. and D.S. Cordray. 2000. Evaluation methods for social intervention. Annual Review of Psychology 51: 345–375.

    PubMed  Article  CAS  Google Scholar 

  26. MacMahon, B. and T.F. Pugh. 1970. Epidemiology: principles and methods. Boston: Little, Brown.

    Google Scholar 

  27. Mill, J.S. 1864. System of logic, vol I. London: Longmans, Green, Reader, and Dyer.

    Google Scholar 

  28. Medical Research Council [MRC]. 2000. A Framework for development and evaluation of RCTs for complex interventions to improve health.

  29. Olweus, D. 1997. Bully/victim problems in school: facts and intervention. European Journal of Psychology of Education 12 (4): 495–510.

    Article  Google Scholar 

  30. Pearl, J. 2000. Causality: models, reasoning and inference. Cambridge: Cambridge University Press.

    Google Scholar 

  31. Pearl, J. 2002. Causal inference in the health sciences: a conceptual introduction. Health Services and Outcomes Research Methodology 2 (3/4): 189–220.

    Article  Google Scholar 

  32. Papineau, D. 1994. The Virtues of Randomization. British Journal for the Philosophy of Science 45 (2): 437–450.

    Article  Google Scholar 

  33. Petticrew, M., S. Cummins, C. Ferell, A. Findlay, C. Higgins, C. Hoy, A. Kearns, and L. Sparks 2005. Natural experiments: an underused tool for public health? Public Health 119: 751–757.

    PubMed  Article  CAS  Google Scholar 

  34. Pocock, S.R. and D.R. Elbourne. 2000. Randomized trials or observational tribulations? The New England Journal of Medicine 342 (25): 1887–1892.

    Article  Google Scholar 

  35. Rosenbaum, P.R. 2005. Heterogeneity and causality: unit heterogeneity and design sensitivity in observation studies. American Statistical Association 59 (2): 147–152.

    Google Scholar 

  36. Rubin, D. 1974. Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology 66: 688–701.

    Article  Google Scholar 

  37. Rubin, D.B. 1986. Statistics and causal inference: comment: which ifs have causal answers. Journal of the American Statistical Association 81 (396): 961–962.

    Article  Google Scholar 

  38. Schaffner, K.F. 1991. Causing harm: epidemiological and physiological concepts of causation. In Acceptable evidence: science and values in risk management, ed. D.G. Mayo and R.D. Hollander, 204–217. New York: Oxford University Press.

    Google Scholar 

  39. Schelling, T.C. 1978. Micromotives and macrobehavior. W.W. Norton and Co.

  40. Suppes, P. 1982. Arguments for Randomizing. PSA: Proceedings of the biennial meeting of the philosophy of science association,vol 1, 464–475. Volume Two: Symposia and Invited Papers.

  41. Tian, J. and J. Pearl. 2001. Causal discovery from changes: a Bayesian approach. Proceedings of UAI 17: 512–521.

  42. Urbach, P. 1985. Randomization and the design of experiments. Philosophy of Science 52 (2): 256–273.

    Article  Google Scholar 

  43. Woodward, J. 2003. Making things happen: a theory of causal explanation. Oxford University Press.

  44. Worrall, J. 2002. What evidence in evidence-based medicine? Philosophy of Science 69:S316–S330

    Article  Google Scholar 

Download references


I wish to thank Alan Shiell, for his perceptive comments on an earlier version of this paper, and the two anonymous referees for this journal for their helpful comments and suggestions. This work was completed while a postdoctoral fellow at the University of Calgary, as part of The International Collaboration for Complex Interventions [ICCI]: the ideas are not necessarily representative of that group, and, of course, any errors are my own responsibility.

Author information



Corresponding author

Correspondence to Dean Rickles.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Rickles, D. Causality in complex interventions. Med Health Care and Philos 12, 77–90 (2009).

Download citation


  • Causality
  • Intervention research
  • Complexity
  • Randomized controlled trials
  • Observational studies