Introduction

Causal assessment is fundamental to epidemiology as it may inform policy and practice to improve population health. A leading figure in epidemiology, Sir Austin Bradford Hill, suggested the goal of causal assessment is to understand if there is “any other way of explaining the set of facts before us … any other answer equally, or more, likely than cause and effect” [1]. Causal assessment may be applied to a body of evidence or a single study to interrogate the “set of facts” underlying a relationship. Bradford Hill notably laid out a set of such facts. Although commonly described as Bradford Hill criteria, he described them as ‘viewpoints’ and emphasised they should not be used as a checklist, but as considerations for assessing causality. As a result, we refer to them as ‘BH viewpoints’ [2].

Since Bradford Hill first introduced his viewpoints, causal thinking in epidemiology has increasingly incorporated the potential outcomes framework [3,4,5,6,7,8]. Informally, the potential outcomes framework posits that a true causal effect is the difference between the observed outcome when the individual was exposed and the unobserved potential outcome had the individual not been exposed, all other things being equal [6]. Because the unobserved potential outcome of an individual cannot be known, investigators often compare the outcomes of exposed and unexposed groups [6]. Application of the potential outcomes framework asks investigators to consider exchangeability between these groups i.e., if the unexposed group would have the same risk of the outcome as the exposed group had they also been exposed [6]. In practice, this means considering if groups are comparable. Investigators may be more confident that the observed effect equals the true causal effect if the groups are exchangeable [9].

We focus on three approaches that implicitly or explicitly incorporate the potential outcomes framework but operationalise it differently [4, 10,11,12]. Firstly, directed acyclic graphs (DAGs) help articulate assumptions about the interrelationships between variables of interest and therefore threats to valid causal inference. Sufficient-component cause (SCC) models highlight the multi-factorial nature of causality, drawing attention to how different exposures interact to produce the outcome. Finally, the Grading of Recommendations, Assessment, Development and Evaluation (GRADE) methodology provides a systematic approach to assessing the certainty of a causal relationship based on a body of evidence (i.e., the existing studies available used to assess whether a causal relationship between an exposure and outcome exists). Epidemiologists have proposed that causal assessment may be improved by combining approaches such as these [7, 13,14,15].

To draw on the strengths of each of these potential outcomes framework approaches, we compared the extent to which they overlap or complement each other. There is limited literature comparing the potential outcomes framework in SCC models and DAGs [4, 5, 11] and one study comparing BH viewpoints to GRADE [10]. While BH viewpoints have been revisited to critically reflect on the theory and application of each viewpoint [2, 16,17,18,19,20], we have not identified any attempts to compare it to DAGs and SCC models, with the former particularly important given the growing influence of DAGs in epidemiology [21].

Our main aims are to examine: 1) if and how each BH viewpoint is considered by each of the three potential outcomes framework approaches (referred to simply as ‘approaches’ hereafter); and 2) the extent they elucidate the underpinning theory of BH viewpoints. BH viewpoints serve as the foundation for this comparison because of its influential status within epidemiology [19, 20, 22]. Additionally, there is agreement in the literature that the BH viewpoints account for the most relevant considerations in causal assessment [17]. To facilitate comparisons, we drew DAGs and SCC models for each BH viewpoint and mapped each BH viewpoint against each GRADE domain. We use the example of alcohol consumption and active-tuberculosis where relevant to illustrate the elements of each approach. Mycobacterium tuberculosis (MTB) is the bacterium responsible for tuberculosis (TB). MTB causes latent-TB, which can turn into active-TB in individuals with low immunity [23]. Alcohol consumption is hypothesised to cause a weaker immune system, resulting in active-TB [24]. The example is purposefully simplified and may not reflect real-world scenarios.

In the next section, we summarise the BH viewpoints and key characteristics of the three approaches they are being compared against. Our aim is to introduce the commonalities and distinctions within these approaches as approaches to causal inference, rather than to provide a detailed explanation or critical assessment of each approach. Following this, we compare each of the nine BH viewpoints against the three approaches and critically reflect on the theoretical implications for assessing causal relationships. We finish by summarising our key findings, make tentative suggestions about how causal assessment could be conducted in the future and note some areas for future research.

Causal assessment approaches

Bradford Hill viewpoints

Bradford Hill’s explanation of the nine viewpoints is summarised in Table 1. These were not intended to be “hard and fast rules of evidence that must be obeyed before we accept cause and effect,” but characteristics to keep in mind while considering if an observed association is due to something other than causality [1]. In current practice, BH viewpoints are applied together or separately to a body of evidence or a single empirical study.

Table 1 Bradford Hill viewpoints and explanatory quotations

Directed acyclic graphs

DAGs are diagrams that illustrate the putative causal relationship between an exposure and outcome [6]. DAGs include the variables that might bias the relationship in question and their development is based on background knowledge of the topic [25]. Detailed explanations of DAGs can be found elsewhere [5, 6, 25,26,27]. DAGs are commonly applied to a single study, but it has been proposed that they can be applied to a body of evidence [62].

The simplified DAG below (Fig. 1) shows the pathway between the exposure and outcome, alcohol consumption and active-TB, respectively. Alcohol consumption may result in active-TB, for example, by lowering an individual’s immune system (mediator not shown) [23]. Overcrowding is a confounding variable, causing both alcohol consumption and active-TB. If there was no causal effect of alcohol consumption on active-TB (i.e. no edge between those two variables in the DAG), an association would still be observed between them in the data due to the common cause overcrowding [4, 25, 28, 29]. Thus, overcrowding must be conditioned upon, indicated by a square around the variable, to obtain an unbiased estimate of alcohol consumption on active-TB. If investigators condition on the appropriate variables using a DAG that accurately represents a causal relationship, they may be more confident of exchangeability and thus estimating the true causal effect [9, 30].

Fig. 1
figure 1

Directed acyclic graph representing relationship between alcohol consumption and active-TB. The confounding variable, overcrowding, effects both the exposure and outcome and should be conditioned on, as indicated by the bold square around overcrowding

Sufficient-component cause (SCC) models

SCC models (also known as causal pies) illustrate the multi-factorial nature of causality through pie charts [31]. SCC models view each of the variables that contribute to the outcome occurring as causal components [32], with many different combinations of components potentially bringing about the outcome of interest. Taken together, the components for each ‘complete pie’ are sufficient to produce the outcome. Necessary components are those without which the outcome could not occur [33]. For example, MTB is a necessary (but insufficient) component of tuberculosis and will therefore be a component for all of the causal pies for tuberculosis (but never features as a sole component of a causal pie). The origins of SCC models can be traced to Mackie’s definition of causality. This introduced the idea of INUS causation, that is a cause can be “an insufficient but necessary part of a condition which is itself unnecessary but sufficient for the result” [34] p. 45.

Causal pies are useful for understanding causal mechanisms and interactions of causal components [33]. Table 2 illustrates four pies (S1, S2, S3, S4) for two different populations (population 1 and population 2) which represent the possible combination of selected causal components (alcohol, overcrowding and unknown factors) for the development of active TB.

Table 2 Sufficient component cause models and corresponding prevalence rates and risk ratios (RRs) for each sufficient-cause between two populations

GRADE methodology

GRADE is the most widely adopted approach for assessing certainty of evidence in systematic reviews, guideline development and evidence-informed recommendations [35]. Certainty has been defined by the GRADE Working Group as the “extent of our confidence that the estimates of the effect are correct” [10, 36,37,38]. Certainty is based both on assessing the risk of bias of individual studies and an evaluation across studies [35]. GRADE typically considers evidence from randomised controlled trials (RCTs) as providing a higher level of certainty than evidence from nonrandomised studies (NRSs), although the appropriateness of this has been critiqued [39]. Certainty may be modified according to different GRADE domains (summarised in Table 3). Large associations, dose-response relationships and adjusting for plausible confounding upgrade certainty.

Table 3 The initial level of certainty, according to GRADE, differs between randomised controlled trials (RCTs) and nonrandomised studies (NRSs)

Comparisons against Bradford Hill’s viewpoints

Table 4 summarises the overlapping elements between BH viewpoints and the potential outcomes framework approaches, with subsequent text providing additional detail.

Table 4 Summary of utilisation of each Bradford Hill (BH) viewpoint by each causal assessment approach: BH viewpoints, directed acyclic graphs (DAGs), sufficient-component cause models and GRADE methodology. Based on comparative analysis of causal assessment approaches

Strength of association

Bradford Hill argued that a large association suggests the observed effect is less likely to be due to bias [1, 40], but he acknowledged that weak (or small) associations may still reflect causal relationships. As noted by Greenland and Robins, large associations can still arise from confounding and a weak association does not mean there is an absence of causality[33]. In practice, investigators may rely on existing tools and guidelines, or their own interpretation, to determine what constitutes a strong association.

Although DAGs cannot represent the size of an association, they facilitate “bias analysis” (see Fig. 1) [14]. Investigators may use DAGs to highlight important variables that they were unable to condition on and consider their implications for the effect estimate, including residual confounding (from inaccurately or poorly measured variables, including confounders) [41].

SCC models draw attention to the impact of disease prevalence and the prevalence of competing causes on the strenth of association or effect estimate. For example, the RR of S3 is attenuated as the prevalence of a competing sufficient cause (S4) or the prevalence of the outcome in the reference group (S1) increases (see Table 2).

According to the GRADE Working Group, a strong association is indicated by a risk ratio (RR) of 2–5 or 0.2–0.5 [17, 17, 17]. Evidence from NRSs that estimate a large effect will be upgraded on the basis that confounding is less likely to entirely remove the observed association [43].

Consistency

Bradford Hill argued that consistent estimates observed in different circumstances reduce the likelihood that the effect is due to chance or bias [1]. Comparison with the three approaches demonstrate that differences in effect size across studies which may be due to variations in causal structures, variable interactions, or biases of the relevant studies.

Transportability refers to the extent to which a causal effect in one context can be used to infer a causal effect in different circumstances, such as different populations or study designs [44]. Investigators can use DAGs to understand how differences in causal structures may explain different observed effect sizes. For example, investigators may want to understand if the causal effect of alcohol consumption on active-TB can be extrapolated to a target population with a high baseline risk of HIV (represented in Fig. 2). In other words, to understand if the different effect size in the target population is due to HIV modifying the effect of alcohol consumption on active-TB by reducing immunity [45, 46]. To represent the target population’s exposure to a stratum of HIV (i.e., a higher risk of HIV), there is a square around HIV [44, 46]. If the likelihood of active-TB for a given level of alcohol consumption is equivalent between the populations, the estimated effect of alcohol on active-TB is transportable and any statistical heterogeneity observed is likely due to HIV risk modifying the effect of alcohol on active-TB[46].

Fig. 2
figure 2

modified by the higher risk of HIV. This needs to be considered when comparing the effect estimates between this target population and the one described in Fig. 1 with low risk of HIV

Directed acyclic graph (DAG) of target population with high baseline risk of HIV. The high baseline risk of HIV means that HIV has been conditioned upon, indicated by square around HIV. The estimated effect of alcohol consumption on active-TB in this population will be

Investigators can use SCC models to understand differences in variable interactions and if that can explain different observed effect sizes observed between populations [44, 47,48,49]. For example, investigators may want to understand if the RR of individuals in population 1 in Table 2 can be transported to population 2. According to Table 2, the RR of active-TB when individuals are exposed only to overcrowding (S3) is lower in population 2 than population 1. i.e., the effect of overcrowding on active-TB differs between populations when alcohol is not consumed. It may be that the unknown factors of S3 differ between populations. However, because the RRs are the same for other causal pies, investigators may assume that the reason for different prevalence and RRs for S3 is that unknown factors and overcrowding are interacting differently between the populations, in which case the effect sizes cannot be transported from population 1 to population 2.

In GRADE, unexplained inconsistency (typically, statistical heterogeneity) suggests lower confidence about the likely effect of the exposure under different circumstances. GRADE considers unexplained inconsistency rather than consistent effect estimates, as Bradford Hill suggested, to highlight that consistent estimates in different circumstances may be subject to the same bias and do not necessarily increase confidence in causality [50].

Specificity

According to Bradford Hill, a relationship is specific if the exposure is associated with the outcome in question and no others, and if the outcome is associated with the exposure in question and no others. He emphasised that a non-specific relationship does not undermine causality. Specificity originated in Robert Koch’s postulates to evaluate causality in infectious diseases, but is rare in epidemiology and usually arises when the outcome is defined based on the exposure status (e.g., tuberculosis being defined by the presence of the tubercle bacillus) [17, 51, 52]. Comparisons highlighted how multiple causation (where one exposure may affect many outcomes and one outcome may be effected by many exposures) limits the utility of directly applying specificity in epidemiological practice, but extending the concept to the related idea of ‘falsification’ may improve its usefulness.

The DAG in Fig. 1 illustrates a non-specific relationship as active-TB is caused by at least two exposures: alcohol-consumption and overcrowding [53]. The relationship is also non-specific because alcohol consumption may cause many other outcomes such as cancer, cardiovascular disease and injuries [54]. This is not shown in the DAG in Fig. 1 because DAGs typically include the main variables related to the relationship of interest (i.e., an exposure, outcome and any potential confounders) [55]. This is also the reason why DAGs are not used to demonstrate specific relationships; a variable may be left out of a DAG because it is not of interest, not because the relationship illustrated in the DAG is specific.

One important reason for specificity is multiple causation suggests a higher likelihood that the observed association is due to confounding. Rather than seeking evidence of specificity, DAGs can be used to help identify and assess falsification (or negative control) outcomes and exposures. A falsification outcome is expected to be both independent of the outcome and associated with the exposure only through the confounding variable [56]. If investigators accurately condition on the confounding variable, they would not observe an effect of the exposure on the falsification outcome.

A hypothetical falsification outcome is head lice (Fig. 3). Alcohol consumption does not have a causal effect on head lice. If investigators observe an effect of alcohol consumption on head lice despite conditioning upon overcrowding, this is likely due to residual confounding due to overcrowding being inaccurately measured. Therefore, it is possible that the relationship between alcohol and active-TB is also subject to residual confounding of overcrowding and investigators should adjust their conclusions accordingly. An absence of association between alcohol consumption and head lice does not suggest specificity, but investigators may be more confident that in this study, the association between alcohol consumption and active-TB is not confounded by overcrowding.

Fig. 3
figure 3

The directed acyclic graphs (DAG) shows the relationship between the exposure (alcohol consumption), the outcome (active-TB), the confounding variable (overcrowding) and the falsification outcome (head lice). The bold square around overcrowding indicates that it has been conditioned on. If there is no effect of alcohol consumption on head lice, there is a greater likelihood that overcrowding has been accurately conditioned upon

Fig. 4
figure 4

Temporality using directed acyclic graphs (DAGs). Investigators may be more confident that the effect of alcohol consumption on active-TB is not due to reverse causality if (1) they condition upon active-TB before diagnosis and continue to observe an effect of alcohol consumption on active-TB after diagnosis or (2) if they do not observe an effect of active-TB before diagnosis on alcohol consumption

Finding falsification variables can be challenging. Take the example of identifying a falsification exposure (which is independent of the exposure and associated with the outcome only through the confounding variable). Many possible exposures associated with the confounder (overcrowding), such as smoking, air pollution, experiences of homelessness and malnutrition are also associated with the outcome (active-TB) and therefore would fail as a falsification exposure [57, 58]. Put another way, the lack of specificity in most causal relationships in epidemiology limits our ability to carry out falsification tests. However, where they do exist they can offer a powerful tool for assessing bias.

Causal pies illustrate the multi-factorial nature of causal relationship that limits the likelihood of specificity because a range of causal pies (and causal components) may produce the same outcome (see Table 2). One causal pie may also be used to represent a possible sufficient-cause for various exposures[59]. The causal pie would represent a specific relationship only if a component is both necessary and sufficient to produce the outcome and the outcome could only be produced by this necessary and sufficient cause [31, 33]. These limitations are among the reasons why some, including the originators of GRADE methodology, argue that specificity should be excluded from causal assessment [7, 10, 31, 60].

Temporality

Temporality is considered fundamental to causality; an exposure must precede an outcome. Bradford Hill alluded to how reverse causality skews temporality: “does a particular occupation or occupational environment promote infection by the tubercle bacillus … or, indeed, have they already contracted it?” [1]. Two of the three approaches explicitly incorporate temporality, with the order of cause and effect being fundamental to DAGs.

DAGs can highlight reverse causality [20, 61]. For example, in a cross-sectional study, the observed effect of alcohol consumption is based on measurements after individuals were diagnosed with active-TB. However, active-TB may have actually occurred prior to diagnosis of active-TB and been a cause of alcohol consumption, via social marginalisation [62]. Given a longitudinal study that has information on previous diagnoses, investigators could test for reverse causation by considering if active-TB was present before the diagnosis that was observed after alcohol consumption (see Fig. 4). If investigators conditioned upon active-TB before diagnosis and continued to observe an effect of consuming alcohol on active-TB after diagnosis, or if they found no effect of active-TB before diagnosis on alcohol consumption, then the estimated effect of alcohol consumption on active-TB after diagnosis is less likely due to reverse causation.

Time may be one component of a causal pie but temporality is not considered in the synergy, antagonism and interaction of the components [2]. Temporality is not directly considered by GRADE. RCTs, which guarantee that the exposure precedes the outcome through study design, are upgraded. However, the favouring of RCTs is not only about temporality but also about the achievement of exchangeability through randomisation. Additionally temporality is not explicitly considered for NRSs (which include longitudinal studies and so may also be able to ensure that the exposure precedes the outcome).([10].

Dose-response

A dose-response gradient exists when incremental increases (or decreases) of the exposure produce incremental increases (or decreases) of the outcome. Dose-response is fundamental to causal assessment in pharmacology and toxicology [63]. Bradford Hill argued that a dose-response gradient provides a “simpler explanation” of the causal relationship than if it were not observed (see Table 1) [1]. However, there are many reasons investigators may not observe a dose-response gradient including exposure threshold effects, as in the case of allergens [17]. Furthermore, a dose-response relationship may be induced by a confounding variable [64, 65]. For example, an incremental increase in alcohol consumption that corresponds to an incremental increase in active-TB may be due to incremental increases in overcrowding (see Fig. 1) [66]. While DAGs non-parametric (and so cannot show the structure of the relationship between any two variables), they can be used to consider the plausibility of one or more confounding variables undermining a dose-response relationship.

Unknown components limit the utility of SCC models to assess dose-response gradients. Evidence from NRSs is upgraded in GRADE if a dose-response relationship has been observed on the basis that confounding is less likely [35]. However, as noted above, a dose-response relationship may easily arise from confounding.

Plausibility

Investigators develop assumptions about a causal relationship based on background knowledge. Thus, the plausibility of the causal relationship is both dependent on and limited by knowledge available at the time [1]. It may be further limited by assumptions based on investigators’ beliefs rather than empirical evidence [67].

The process of developing DAGs and SCC models forces investigators to explicitly articulate assumptions about the causal relationships relevant to the research question of interest, making it transparent to other investigators [44, 68] [69]. DAGs may include mediators, which lie on the causal path between the exposure and outcome; a weakened immunity is the mediator by which alcohol consumption causes active-TB. Mediation analysis considers the direct and indirect effect of mediators [70]. Interrogating background knowledge to develop a DAG encourages a more systematic exploration of the plausibility of the causal chain.

For SCC models, investigators make explicit the nature of variable interaction [71]. GRADE upgrades for appropriate adjustment for all plausible confounding variables, but does not consider the broader variables relevant to the plausibility of a causal relationship across a body of evidence [35].

Coherence

Coherence is an assessment of how the putative relationship fits into existing theory and empirical evidence [1, 60]. Our comparisons suggest that coherence is not considered by the other approaches and may have limited utility, partly because it is poorly delineated from plausibility [72]. Investigators evaluating the coherence of a DAG or SCC model may consider how the assumptions illustrated by either approach fit existing theory, however, neither consider or illustrate coherence. Schünemann and colleagues argue that GRADE considers coherence by assessing indirectness [10]. However, in considering indirectness, investigators determine how applicable the population and interventions of identified studies are to the putative causal relationship under study. Coherence, on the other hand, asks investigators to consider how applicable the putative causal relationship is to broader evidence, including studies that do not investigate that specific relationship.

Experiment

Bradford Hill argued that “strong support for the causation hypothesis might be revealed” from “experimental, or semi-experimental data” [1]. He alluded to natural experiment studies, where the exposure is determined by nature or other factors outside of the control of investigators and where exchangeability between comparison groups is more likely [29].

Investigators have used DAGs to elucidate why randomisation results in exchangeability. Randomisation is an example of an instrumental variable; it causes (and is not caused by) the exposure and only impacts the outcome through the exposure [73]. If consuming alcohol was completely random and randomisation was independent of active-TB (see Fig. 5), the risk of overcrowding would be the same for individuals allocated to consume alcohol and those allocated to not [74]. Thus, the effect estimated would be based on exchangeable groups, but bounded by the proportion of individuals exposed due to randomisation, potentially limiting the transportability of the effect estimate [44, 75].

Fig. 5
figure 5

Directed acyclic graph (DAG) with randomisation as the instrumental variable. According to this DAG, randomisation causes alcohol consumption. If this were true, there is a greater likelihood that the effect estimated would be similar or equivalent to the true causal effect

Due to limitations on randomisation, epidemiologists rely largely on observational data. Investigators can use DAGs to interrogate the plausibility of “naturally occurring” instrumental variables, and how likely it is that individuals were truly randomly exposed [29, 73]. Clarity about study design, particularly procedures for assigning exposure, has been assisted by DAGs through the development of the ‘target trial’ (or ‘emulated trial’) where observational data analysis emulates randomised trial data analysis [76]. While it has several advantages, this does not seem to be directly comparable with the original BH viewpoint.

The causal pies that result in a given disease include both known and unknown components, as shown in Table 2. As investigators are unable to measure unknown variables for each causal pie, they cannot be certain that the groups exposed to each causal pie are exchangeable because they may differ in other characteristics that affect the outcome [4, 11]. GRADE privileges effect estimates from randomised (experimental) studies which are more likely to be “causally attributable to the intervention” by initially grading RCTs higher than NRSs [43]. At present, no distinction is made between natural experiment studies and other NRSs on the basis of study design.

Analogy

Bradford Hill argued that the likelihood of a causal relationship may be strengthened if a comparable association is observed between the same outcome and an analogous exposure or the same exposure and an analogous outcome. DAGs and SCC models do not account for analogous relationships in their assessment, but analogous relationships may be part of developing the assumptions and theories encoded in the diagrams. In GRADE, downgrading would be prevented if there was certainty in a causal relationship between the same exposure and similar outcomes in the same body of evidence [10]. While this has been conflated with analogy, this is more to do with the directness of the evidence to the research question rather than the transportability of the assumptions of an analogous, confirmed causal relationship to the one under study [77].

Discussion and conclusions

Epidemiologists evaluate evidence to understand how likely it is the observed effect is equal to the causal effect. We mapped DAGs, SCC models and GRADE against each BH viewpoint by comparing each tool to identify the overlap between different perspectives on causal assessment. The summary of these comparisons and the potential implications for causal assessment can be found in Table 5.

Table 5 Summary of conclusions. Interpretation of each BH based on mapping of DAGs, SCC models and GRADE

The comparisons highlight the overlap between BH viewpoints and other approaches. This underscores the ongoing influence of BH viewpoints in causal assessment alongside developments in causal thinking. It also highlights the importance of other approaches in understanding BH viewpoints. DAGs help explain the theoretical underpinning of strength of association, consistency, temporality, specificity, dose-response, plausibility, and experiment. GRADE provides guidance on how causal assessment can be applied in practice, particularly for considering strength of association, consistency, temporality, dose-response and experiment. While the inclusion of SCC models can be debated as they can be considered a framework to describe causal reality and are least used of the approaches we studied, their inclusion has been useful for understanding strength of association and plausibility in our analysis. Despite their seemingly limited utility for understanding BH viewpoints, SCC models, along with GRADE, also help explain why specificity may have limited usefulness in causal inference.

Our analysis is the first to compare insights from advancements in causal assessment with BH viewpoints [7]. This is an area that requires further research and we hope our study will encourage debate and discussion on overlapping approaches to causal inference. Further research and discussion is necessary to develop a new and comprehensive set of causal criteria that incorporates both traditional and recently developed approaches in causal inference. Such work would likely benefit from applying these different approaches to specific research questions, with a view to identifying their relative capacity to facilitate causal assessment. However, we did not critique the individual approaches as this has been done in previous works [4, 5, 10, 11]. We did not investigate all potential approaches to assessing causality (e.g. International Agency for Research on Cancer and criteria for teratogenicity) due to limited time and resources. Instead, we focused on GRADE, DAGs and SCC models which are perhaps the best-known causal assessment approaches outside of BH viewpoints.

This study underscores the need for greater clarity on causal assessment in epidemiology. This is an initial attempt to demonstrate how recent approaches can be used to elucidate BH viewpoints, which remain fundamental to causal assessment and to tentatively suggest how their application could be improved. Our findings are preliminary and we welcome debate about our comparisons and the suggested implications for causal assessment.