## Abstract

Causal pluralism can be defended not only in respect to causal concepts and methodological guidelines, but also at the finer-grained level of causal inference from a particular source of evidence for causation. An argument for this last variety of pluralism is made based on an analysis of causal inference from randomized experiments (RCTs). Here, the causal interpretation of a statistically significant association can be established via multiple paths of reasoning, each relying on different assumptions and providing distinct elements of information in favour of a causal interpretation.

### Similar content being viewed by others

## Notes

The proposed reconstruction concerns explanatory, non-adaptive RCTs conducted under the assumption that no prior knowledge relevant to the causal claim under investigation is available. This type of experiment is common in clinical and preclinical research, psychology, education, agriculture and ecology. Since the experimental design behind traditional RCTs is meant to accommodate statistical inference under a frequentist approach, the frequentist statistics standardly employed in the analysis of experimental results is assumed in this paper.

The above inference, known as the randomization inference, does not rely on theoretical approximations based on assumptions about the shape of the sampling distribution. In practice, researchers are often interested in demonstrating that there is an average between-group exposure association, as well as estimating the size of that association. Thus, “the first step in assessing the intervention’s effect involves testing for a statistical association between intervention group membership (intervention vs. control) and the identified outcome […]. This is accomplished by using an appropriate inferential statistical procedure (e.g., an independent-samples

*t*test) coupled with an effect size estimate (e.g., Cohen’s*d*), to provide pertinent information regarding both the statistical significance and strength (i.e., the amount of benefit) of the intervention–outcome association” (DeLucia and Pitts 2010, p. 631). Unlike the randomization inference,*t*tests presuppose that the outcomes are normally distributed or that the population from which individuals are drawn is large enough that the test statistics follow a posited sampling distribution (Gerber and Green 2012, p. 65).“When a variable systematically varies with the independent variable, the confounding variable provides an explanation other than the independent variable for changes in the dependent variable” (Kovera 2010b, p. 220). For instance, in a linear regression model, the “[c]orrelation of the random effects

*ε*_{j}[the error term in*µ*_{j}=*α*+*xβ*+*ε*_{j}] with the treatments*x*_{j}[allocation] leads to bias in estimating*β*[the magnitude of the contribution of*x*to*µ*]. This bias may be attributed to or interpreted as confounding for*β*in the regression analysis. Confounders are now covariates that ‘explain’ the correlation between*ε*_{j}and*x*_{j}. In particular, confounders reduce the correlation of*x*_{j}and*ε*_{j}when entered in the model and so reduce the bias in estimating*β*” (Greenland et al. 1999, p. 34). Note that confounding is defined here in strictly statistical terms with no explicit reference to causation. In practice, researchers assess the potential for confounding by checking if allocation alone (before exposure to the tested treatment) covaries with potential confounders; in other words, researchers check if usual suspects, such as gender and age, are unbalanced between test and control trial participants. If unbalances exist, then it is possible that the allocation intervention sets both the value of the independent variable*X*to*x*and that of covariate*W*to*w*, such that the observed differences in outcome can be explained not only by differences in*X*, but also by differences in*W*.Randomization cannot and is not meant to guarantee that variables will not happen to be associated with groups by chance. That this is the case is most vividly illustrated by a trial in which only two individuals are randomly assigned to test and control conditions: in this case, individual characteristics and allocation condition are undistinguishable. It can certainly be argued that random assignment “equates groups on expectation at pretest,” that is, “in the long run over many randomized experiments” (Shadish et al. 2002, p. 250). However, this “does not mean that random assignment equates units on observed pretest scores” (Shadish et al. 2002, p. 250). For any given round of randomization, gains in precision are driven by group size. If “the original sample is large enough then the two groups should be more or less identical in the important characteristics […]. The two groups will differ only by chance” (Bowers 2014, p. 120). Since random error decreases as sample size increases, it follows that as “the sample size grows, observed and unobserved confounders are balanced across treatment and control groups with arbitrarily high probability” (Sekhon 2008, p. 273). Nevertheless, even in the case of large groups, a single round of randomization cannot guarantee perfect balancing, but only a high probability that known and unknown confounders are balanced across treatment and control group.

Greenland et al. (1999, p. 29) take this to be the oldest usage of confounding, namely as “a type of bias in estimating causal effects. This bias is sometimes informally described as a mixing of effects of extraneous factors (called confounders) with the effect of interest.” According to this usage a “variable cannot be a confounder (in this sense) unless (1) it can causally affect the outcome parameter

*μ*within treatment groups, and (2) it is distributed differently among the compared populations […]. The two necessary conditions (1) and (2) are sometimes offered together as a definition of a confounder” (Greenland et al. (1999, p. 33–34). Greenland et al. (1999, p. 33) further point out that confounding occurs only if different distributions (unbalances) result in net effect differences in the outcome of interest: “a covariate difference […] is a necessary but not sufficient condition for confounding, because the effects of the various covariate differences may balance out in such a way that no confounding is present.”.On the other hand, if comparability is understood as a homogenization removing all unbalances, including those due to random error, then something is amiss. The fact that a chance explanation is considered and needs to be ruled out by a statistical test indicates that the causal inference deployed in the context of an RCT works explicitly on the expectation that there is a non-negligible risk of generating incomparable groups and that, therefore, comparability cannot be guaranteed. Senn points out that not only “[i]t is not necessary for the groups to be balanced,” but “the probability calculation applied to a clinical trial automatically makes an allowance for the fact that groups will almost certainly be unbalanced, and if one knew that they were balanced, then the calculation that is usually performed would not be correct” (Senn 2013, p. 1442). Conversely, prior assurance of comparability removes the need to consider the possibility of a chance explanation and, therefore, the need to perform a statistical test: “If treatment groups could be equated before treatment, and if they were different after treatment, then pretest selection differences could not be a cause of observed posttest differences” (Shadish et al. 2002, p. 249). Likewise, if we knew that all patients and their circumstances are identical, the recovery of every single patient in the test group could only be attributed to the efficacy of the treatment (Hill 1955, Ch. VIII). Sekhon (2008) further argues that, under these circumstances, causal inference collapses into a version of Mill’s method of difference. Given a zero probability of generating incomparable groups, if differences in outcomes are observed, these differences must have been caused by something other than the process by which the groups were generated (random allocation) irrespective of the magnitude of the differences in outcomes and the size of the groups. Presumably, this is the strategy favoured in most controlled experiments in basic science, where it is common practice to engineer homogenous populations of comparable individuals, for instance, by generating immortalized cell-lines consisting of clones or systematically inbreeding mice in order to produce isogenic/homozygous stains (Baetu 2020; Müller-Wille 2007).

Some authors prefer the term ‘internal validity’ in order to emphasize that the results of the test concern solely the particular experimental setup employed in the study (e.g., trial participants, the specific version of the intervention employed, the circumstances in which the intervention was conducted, the specific outcomes or endpoints assessed in the experiment, etc.) (Manski 2007; Shadish et al. 2002).

In principle, the prospective design of experiments may also help rule out non-causal explanations of systematic association, such as supervenience, coreference and nomic dependence scenarios, in which the independent and dependent variables covary simultaneously.

A common example is gene duplication. If, upon knocking out a gene, no differences in phenotype are observed, background knowledge dictates that this result is more likely to be due to the compensating effect of homologous sequences than to the fact that the gene plays no causal role vis-à-vis phenotype.

Observational data can discriminate between two classes of Markov-equivalent structures, with chains, reverse chains and forks, in one class, and colliders (

*A→B←C*), in the other, thus allowing for causal inference from observational data assuming minimality, Markov and positivity (Pearl 2000; Spirtes et al. 1993). Despite its importance for causal discovery algorithms, this result is not of significant utility for causal inference from experiments which probe solely the input and output of a system, thus generating data about the conditional probability of the output variable given the probability of the input variable.

## References

Altman DG, Bland JM (1999) Treatment allocation in controlled trials: why randomise. BMJ 318:1209

Baetu TM (2020) Causal inference in biomedical research. Biol Philos 35:43

Baumgartner M (2009) Interventionist causal exclusion and non-reductive physicalism. Int Stud Philos Sci 23(2):161–178

Ben-Menahem Y (2018) Causation in science. Princeton University Press, Princeton

Bowers D (2014) Medical statistics from scratch: an introduction for health professionals. Wiley, Hoboken, NJ

Broadbent A (2013) Philosophy of epidemiology. Palgrave Macmillan, Houndmills

Cartwright N (1989) Nature’s capacities and their measurement. Oxford University Press, Oxford

Cartwright N (2010) What are randomised controlled trials food for? Philos Stud 147:59–70

Cox DR, Wermuth N (2004) Causality: a statistical view. Int Stat Rev 72(3):285–305

DeLucia C, Pitts SC (2010) Intervention. In: Salkind NJ (ed) Encyclopedia of research design. SAGE, Thousand Oaks

Donner A, Klar N (2004) Pitfalls of and controversies in cluster randomization trials. Am J Public Health 94(3):416–422

Eberhardt F, Glymour C et al (2012) On the number of experiments sufficient and in the worst case necessary to identify all Causal relations among N variables. In UAI'05 Proceedings of the Twenty-First Conference on Uncertainty in Artificial Intelligence (ed). Arlington, AUAI Press, pp 178–184

Evidence Based Medicine Working Group (1992) Evidence-based medicine. A new approach to teaching the practice of medicine. J Am Med Assoc 268:2420–2425

Fisher RA (1947) The design ol experiments. Oliver and Boyd, Edinburgh

Franklin A (2007) The role of experiments in the natural sciences: examples from physics and biology. In: Kuipers T (ed) General philosophy of science: focal issues. Elsevier, Amsterdam

Fuller J (2019) The confounding question of confounding causes in randomized trials. Br J Philos Sci 70:901–926

Gerber AS, Green DP (2012) Field experiments: design, analysis, and interpretation. Norton, New York, NY

Godfrey-Smith P (2009) Causal pluralism. In: Beebee H, Hitchcock C, Menzies P (eds) Oxford handbook of causation. Oxford University Press, New York

Gori GB (1989) Epidemiology and the concept of causation in multifactorial diseases. Regul Toxicol Pharmacol 9(3):263–272

Greenland S, Robins JM et al (1999) Confounding and collapsibility in causal inference. Stat Sci 14(1):29–46

Guyatt G, Djulbegovic B (2019) Evidence-based medicine and the theory of knowledge. In: Guyatt G, Rennie D, Meade MO, Cook DJ (eds) Users’ Guides to the medical literature: a manual for evidence-based clinical practice. JAMA/McGraw-Hill Education, New York

Hall N (2004) Two concepts of causation. In: Collins J, Hall N, Paul L (eds) Causation and counterfactuals. MIT Press, Cambridge, pp 225–276

Hernán MA, Robins JM (2020) Causal inference: what if. Chapman & Hall/CRC, Boca Raton

Hill AB (1952) The clinical trial. N Engl J Med 247:113–119

Hill AB (1955) Principles of medical statistics, 6th edn. Oxford University Press, New York

Howick J (2011) The philosophy of evidence-based medicine. BMJ Books, Oxford

ISIS-2 (Second International Study of Infarct Survival) Collaborative Group (1988) Randomised trial of intravenous streptokinase, oral aspirin, both, or neither among 17’187 cases of suspected acute myocardial infarction: ISIS-2. Lancet 2:349–360

Jaynes ET (2003) Probability theory: the logic of science. Cambridge University Press, Cambridge

Koepke D, Flay R (1989) Levels of analysis. In: Braverman MT (ed) New directions for program evaluation: evaluating health promotion programs. Jossey-Bass, San Francisco

Kovera MB (2010a) Bias. In: Salkind NJ (ed) Encyclopedia of Research Design. SAGE, Thousand Oaks

Kovera MB (2010b) Confounding. In: Salkind NJ (ed) Encyclopedia of research design. SAGE, Thousand Oaks

Leighton JP (2010) Internal validity. In: Salkind NJ (ed) Encyclopedia of research design. SAGE, Thousand Oaks, pp 619–22

Lewis DK (1979) Counterfactual dependence and time’s arrow. Noûs 13:455–476

Manski C (2007) Identification for prediction and decision. Harvard University Press, Cambridge

Merlo J, Lynch K (2010) Association, Measures of. In: Salkind NJ (ed) Encyclopedia of Research Design. SAGE, Thousand Oaks

Miettinen O (1974) Confounding and effect-modification. Am J Epidemiol 100(5):350–353

Mill JS (1843) A system of logic, ratiocinative and inductive. John W. Parker, London

Müller-Wille S (2007) Hybrids, pure cultures, and pure lines: from nineteenth-century biology to twentieth-century genetics. Stud Hist Philos Biol Biomed Sci 38(4):796–806

Neyman J, Pearson ES (1928) On the use and interpretation of certain test criteria for purposes of statistical inference part I. Biometrika 20A(1–2):175–240

Papineau D (1994) The virtues of randomization. Br J Philos Sci 45:437–450

Parkkinen V-P, Wallmann C et al (2018) Evaluating evidence of mechanisms in medicine: principles and procedures. Springer, Cham

Pearl J (2000) Causality. Models, Reasoning, and Inference. Cambridge University Press, Cambridge

Pearl J, Glymour M et al (2016) Causal inference in statistics: a primer. Wiley & Sons, Chichester

Psillos S (2004) A glimpse of the secret connexion: harmonizing mechanisms with counterfactuals. Perspect Sci 12:288–391

Rosenbaum PR (1995) Observational studies. Springer, New York

Rothman KJ (1974) Synergy and antagonism in cause-effect relationships. Am J Epidemiol 99(6):385–388

Rubin D (1974) Estimating causal effects of treatments in randomized and nonrandomized studies. J Educ Psychol 66(5):688–701

Russo F, Williamson J (2007) Interpreting causality in the health sciences. Int Stud Philos Sci 21(2):157–170

Sekhon JS (2008) The Neyman-Rubin model of causal inference and estimation via matching methods. In: Box-Steffensmeier JM, Brady HE, Collier D (eds) The Oxford Handbook of Political Methodology. Oxford University Press, New York, pp 271–299

Senn S (2013) Seven myths of randomisation in clinical trials. Stat Med 32:1439–1450

Shadish WR, Cook TD et al (2002) Experimental and quasi-experimental designs for generalized causal inference. Houghton Mifflin, Boston

Sharabiani M, Aylin P et al (2012) Systematic review of comorbidity indices for administrative data. Med Care 50(12):1109–1118

Skyrms B (1984) EPR: lessons for metaphysics. Midwest Stud Philos 9:245–255

Spirtes P, Glymour C et al (1993) Causation, prediction and search. Springer-Verlag, New York

Stigler SM (1986) The history of statistics. Harvard University Press, Cambridge, MA

Verma T, Pearl J (1990) Equivalence and synthesis of causal models. In: Proceedings of the sixth conference on uncertainty in artificial intelligence. Cambridge, MA, pp 220–227

Weber M (2009) The crux of crucial experiments: Duhem’s problems and inference to the best explanation. Br J Philos Sci 60:19–49

Williamson J (2019) Establishing causal claims in medicine. Int Stud Philos Sci 32(1):33–61

Winch RF, Campbell DT (1969) Proof? No. evidence? Yes. The significance of tests of significance. Am Sociol 4(2):140–143

Woodward J (2003) Making things happen: a theory of causal explanation. Oxford University Press, Oxford

Worrall J (2002) What evidence in evidence-based medicine? Philos Sci 69:S316–S330

Worrall J (2007a) Evidence in medicine and evidence-based medicine. Philos Compass 2(6):981–1022

Worrall J (2007b) Why there’s no cause to randomize. Br J Philos Sci 58:451–488

## Acknowledgements

This research was supported by SSHRC Grant # 430-2020-0654.

## Author information

### Authors and Affiliations

### Corresponding author

## Additional information

### Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Rights and permissions

## About this article

### Cite this article

Baetu, T.M. Inferential Pluralism in Causal Reasoning from Randomized Experiments.
*Acta Biotheor* **70**, 22 (2022). https://doi.org/10.1007/s10441-022-09446-2

Received:

Accepted:

Published:

DOI: https://doi.org/10.1007/s10441-022-09446-2