Background

Treatment contamination is defined as the receipt of active intervention amongst participants in the control arm of a randomised controlled trial (RCT) [1]. It is thought to be particularly prevalent in RCTs of complex interventions in mental health. Psychological therapies are complex interventions that comprise several interacting constituent parts [2]. Such intervention components are often transportable and difficult to confine, meaning that their receipt by participants within the control arm is possible. The effect of contamination is to make the control arm more similar to the active intervention arm, i.e. to dilute the treatment contrast. This is a concern to researchers because the contrast between the randomised groups (intention-to-treat estimator) will be biased for the effect of treatment receipt (efficacy).

The processes leading to contamination in trials of complex interventions in mental health have never been reviewed comprehensively and the literature is unclear about their relative frequencies. This is necessary in order to plan what steps should be taken to address the problem. In mental health the typical quantity of contamination, some of the methods that researchers take to minimise or prevent it, and the extent to which it impacts on study findings are either little known or poorly formalised within the literature. Here we undertake a comprehensive scoping review that addresses these points. This type of review is defined as a map of literature within a research area that identifies key concepts; gaps in research; and types and sources of evidence, in order to inform practice, policymaking and research [3].

Previous reviews have assessed the extent of contamination in certain areas of medicine. For example, a literature review of 235 RCTs of guideline dissemination and implementation strategies for healthcare professionals identified eight trials that quantified contamination [1]. The review assessed the proportion of participants in the control arm who were considered to have received treatment and found a median of 24% of participants to be contaminated (range 0–65%). In oncology, a large breast cancer screening trial (n = 9780) found that 22% of those in control arm received a mammogram outside the trial compared to 5% of the intervention group doing likewise [4]. A review of cancer trials using Zelen’s design, where patients consent to their randomly allocated treatment before being asked for consent to participate in the study, found 11 trials that reported the number of patients who switched treatments [5, 6]. The average was 18% (range 10–36%). However, this figure did not represent solely contamination as many of the studies in the review described switches from active to comparator treatments or provided an overall summary of switches in either direction. The scale of the problem in mental health trials remains unclear.

In terms of steps that might be undertaken to minimise the occurrence of contamination, we distinguish between statistical design methods, trial conduct solutions, and analytical approaches. The main statistical design method is the use of cluster randomisation, which can prevent contamination provided that clusters are constructed at the level at which it takes place [7]. By ensuring that all participants within a cluster receive the same treatment, contamination of the control condition due to participants being affected by each other’s treatment receipt can be avoided. However, the cost of such a design is that correlation within clusters must be factored into a power calculation and will inflate the sample size requirement. Sample size is inflated by design factor, D.

$$ D:= 1+I\left(k-1\right) $$

where I is the intraclass correlation coefficient and k is the cluster size. In addition, cluster randomised controlled trials (cRCTs) often suffer from selection biases, mainly due to treatment being known before participant entry into trial (recruitment bias) and differential loss to follow-up between trial arms (attrition bias) [8]. The second type of strategy (trial conduct solutions) relates to methods that can be used in the running of the trial to reduce exposure of the control arm to active intervention. Education of clinicians and participants against contamination and provision of clear information about the purposes of the trial have been suggested [1]. However, it remains an open question as to what methods researchers use in practice. In terms of analytical methods that adjust for contamination, the use of modern causal estimation techniques has been advocated for this purpose [9,10,11], but it is not known how widely these have been used.

The review of educational interventions in RCTs also assessed whether there was any link between the prevention of contamination (e.g. through use of cluster randomisation) and an increase in the size of treatment effect estimates [1]. When assessing all studies in the set there was no evidence for such a link; however, a more homogeneous sample showed some evidence of a relationship. Other reviews have found similarly mixed results. For example, a review of 14 hip protector trials showed large positive effects in cluster RCTs and a mixture of positive and negative effects in RCTs with individual-level randomisation with suspected contamination [12]. On the other hand, a meta-analysis of 34 RCTs of enhanced care for depression found very similar treatment effect sizes when comparing cRCTs with individual-randomised RCTs [13]. One particular statistical design approach may provide extra information about the link between contamination and estimated treatment effect sizes. Specifically, in trials that use treatment allocation at more than one level, a comparison of treatment effect estimates between cluster- and participant-randomised sub-trials may provide some information as to the impact of contamination or the ability of cluster randomisation to prevent it.

The aims of this article were fourfold: to identify the processes that are considered to lead to contamination in trials of complex interventions in mental health, to quantify typical levels of contamination, to summarise what researchers do in order to prevent or mitigate it, and to compare treatment effect estimates within trials of complex interventions that used both cluster- and individual-level treatment allocation to quantify the contamination bias.

Methods

Type of review

We carried out a scoping review of trial design and conduct methods in RCTs of complex interventions in mental health. This type of review was chosen on the basis that our objectives were to summarise researchers’ perceptions of and solutions to a trial design problem where there is limited literature and potentially highly heterogeneous evidence. This was a methodology review and did not focus on a particular patient outcome, therefore was not eligible for registration with PROSPERO.

Eligibility criteria

All articles were screened using full texts and were assessed using five inclusion criteria. First, the text described a trial purporting to have used random allocation. Second, the intervention was complex, which in this review meant it comprised multiple components. It was not possible to assess whether these elements acted together to provide some added benefit (as per MRC guidance definition) so we used a general and therefore wide definition for this. Third, the publication gave some information about the process leading to, amount of, or solution used to counter treatment contamination. Fourth, the abstract and main body of the article were written in English. And finally, the trial was related to mental health, psychology, or psychiatry – this meant that a minimum of one of the target population, intervention, or primary outcome was directly related to one of these fields. Many trials in these fields test unblinded treatments where the suspicion is that they may be subject to contamination. The scoping review was limited to these areas of medicine for this reason and because of the apparent gap in the literature surrounding contamination in these fields.

Information sources

The search for contamination in RCTs of complex interventions in mental health was done using the Ovid platform and included the databases Medline, Embase, and PsycInfo. Articles that were published between January 2000 and April 2015 were searched. Results were restricted to those articles published after 2000 because this was the year when the first MRC framework paper on complex interventions was first published [14]. The publication of this framework marked the point at which the design and evaluation of complex interventions were formalised.

Search

Randomised controlled trials were searched for using the sensitivity-maximising 11-step process recommended by Cochrane [15]. The search terms “contamination” and “spillover” were included in the procedure. Synonymous terms for complex interventions that were used included all combinations of “multicomponent”, “psychosocial”, and “behavioural”, with “interventions”, “treatments”, and “training”. The search was restricted to articles that mentioned “mental health”, “psychology”, or “psychiatry”. All terms were searched for in the main body of the text. The full search procedure can be found in the supplementary materials (see file Additional file 1).

Study selection

Duplicates were removed from the set and the remaining articles were assessed for each of the exclusion criteria. Any potentially relevant article that was referred to by a paper in the results of the search and was not already in the set was followed up by a single author (NM). If the article was judged to have met the inclusion criteria it was included in the set and the full text was reviewed, also by a single author (NM). In order to assess the reliability of study selection, a second reviewer (RK) re-screened 70 articles (11%).

Data collection process

Any studies that were included in the review that featured sub-studies that used both cluster- and individual-level treatment allocation were reviewed as two separate sub-trials because of the different contamination processes and methods used to address these. Treatment effect sizes were extracted for trials that reported effects separately depending on the level of treatment allocation. Data from any such studies that did not report results at the different levels of treatment allocation were obtained from the authors in order to allow the comparison.

Data items

Abstracted data included an assessment of bias, summaries of trial design (e.g. study population, intervention, primary outcome, unit of treatment allocation), details about contamination (e.g. how it was thought to take place, its quantity, steps taken to avoid it), and records of trial summaries (e.g. extent of clustering, power, sample size, treatment effect). In order to assess the reliability of data abstraction, a second reviewer (RK) re-extracted data from 20 articles (8%) using the same procedure described above.

Risk of bias in individual studies

The review of trial bias included recording the “Jadad score” (a single item measure of methodological quality of RCTs [16]) and most of the domains of Cochrane’s classification scheme for bias [17]. In addition to these, some other domains that were pertinent to cluster randomised trials were used. These included whether randomisation occurred after participant consent was obtained, baseline measures were completed before randomisation, baseline outcome measurements were similar across trial arms, other clinical and demographic characteristics were similar across arms, and whether attrition was similar in the arms. These additional assessments of bias were based on outcomes used in a review of cRCTs [8].

Results

Reliability

At the screening stage agreement was 71%; all discrepancies were discussed and subsequently resolved. Agreement was 81% for all assessments of bias, and 82% for details of contamination processes.

Summary of trials

Two hundred and thirty-four studies were identified as meeting the eligibility criteria. This included seven trials that were referred to by an article in the main search and were found to meet the eligibility criteria. The results of the implementation strategy and numbers of exclusions are summarised in Fig. 1. Four hundred and fourteen articles were excluded. A list of these articles together with reasons for exclusions can be found in the supplementary materials (see Additional file 2).

Fig. 1
figure 1

Flow diagram for searching for relevant articles (articles could be excluded for more than one reason)

Details of the 234 trials in the review are given in Table 1. The table shows that the overwhelming majority of articles described the primary analysis of an RCT, were based in either North America or Western Europe, and were late phase (i.e. not pilot or feasibility trials). Most target populations were adult patients and the most commonly targeted conditions were depression, substance abuse, and psychosis. The two most common interventions were cognitive behavioural therapy and care management; there were many small categories. The sample size of participant randomised trials ranged from 16 to 14,910; that of cluster randomised trials ranged from 13 to 6076. A full list of references can be found in the supplementary materials (see Additional file 3).

Table 1 Summary of characteristics of articles

Summary of assessment of bias

Summaries of assessments of bias using the Jadad scale, items recommended by Cochrane, and items aimed at identifying possible bias in cluster randomised trials are reported in Table 2. The table demonstrates the potential for bias split by the level of treatment allocation. In general, it shows that the greatest potential for bias arose due to lack of knowledge of allocation, incomplete outcome data, differences in attrition between trial arms, and randomisation occurring before consent was obtained and before baseline measures were completed. More evidence for bias was found in cluster randomised trials when assessing whether randomisation took place after consent and after baseline measures were completed, whether outcome assessment was blind, and whether baseline demographic characteristics were similar between trial arms. Assessments of bias for each trial together with the support for judgement associated with each assessment can be found in the supplementary materials (see Additional file 3).

Table 2 Summary of assessment of bias

Processes driving contamination

There were perceived to be five main processes that led to contamination. The first two processes, staff delivering the active intervention in the control arm (n = 82, 35%) and communication between trial arms (n = 79, 34%), were the most common. Staff delivering the active intervention in the control arm happened either due to a given clinician delivering both the active and control treatments (n = 76, e.g. [18]) or due to control participants being exposed to the intervention as a consequence of clinicians, who were not directly involved in providing the treatment, treating participants in both arms and thereby potentially learning about the active intervention and passing this on to participants in the control arm (n = 6, e.g. [19]). The other main contamination process was communication between individuals in different trial arms. This could be either at the level of the clinician (n = 20, e.g. [20]), participant (n = 57, e.g. [21]), or both (n = 2). Communication between providers of interventions was often a worry in environments in which the people giving the treatment worked closely together, for example GP surgeries, hospital units, and schools. Communication between participants was thought to be most likely in environments in which participants came into close contact. Examples of this included interaction between participants who were family members, patients in a waiting room, school children, employees working on the same worksite, and university students. Particular healthcare settings that were thought to be highly likely to foster communication were antenatal clinics/childbirth classes, specialist clinics (e.g. substance misuse, dialysis), and wards for those admitted to hospital.

There were perceived to be three other, more minor processes that drove contamination. First, participants switching clinicians (n = 4, 2%, e.g. [22]), where control participants were treated by multiple clinicians of whom one was trained in the active intervention. Second, participants seeking treatment outside the trial (n = 6, 3%, e.g. [23]). And finally, background noise, where the treatment already existed to some extent within the healthcare system (n = 5, 2%, e.g. [24]). Fifty-nine articles did not provide information about the contamination process.

Quantity of contamination

Twenty-seven studies (12%) attempted to quantify contamination. Twenty-three trials measured individual-level contamination on a binary scale and summaries of these quantities are given in Table 3. The median level of contamination was 13% (IQR 5–33%).

Table 3 Quantifying treatment contamination where treatment receipt was defined as binary

Four trials measured contamination using a continuous scale; three were trials of cognitive behavioural therapy and one of cognitive analytic therapy. One created a treatment fidelity scale and asked participants in each trial arm (behavioural weight control instructions, cognitive behavioural therapy, standard counselling) about their knowledge of all three treatments at the beginning and end of treatment [25]. The sub-scales showed high knowledge of behavioural weight control in the group allocated to receive behavioural weight control instructions (mean change of 1.1 compared to 0.5 and 0.5 in cognitive behavioural therapy and standard arms), high knowledge of cognitive behavioural therapy in those allocated to receive this (mean change of 1.6 compared to 0.0 and 0.8 in behavioural weight control and standard groups), and high knowledge of standard intervention in the control group (mean change of 0.5 compared to 0.1 and 0.1 in behavioural weight control and cognitive behavioural therapy arms). This seemed to indicate receipt of treatment in the control arm. Three RCTs showed negligible evidence of treatment contamination. Of these RCTs, one used a cognitive behavioural therapy adherence scale (adapted CTACS) to record adherence and contamination in the active intervention and control arms [26]. The CTACS means were 98.0 and 98.8 in the cognitive behavioural therapy and education intervention (control) arms respectively, indicating that contamination did not occur. Another trial found that the family-focused cognitive behavioural therapy (FCBT; active intervention) group scored higher than the traditional child-focused cognitive behavioural therapy (control) group on two scales, Family Focus (mean = 4.90 and 1.55) and Parenting Style Focus (mean = 4.75 and 1.00) [27]. This suggested that only the FCBT group incorporated family and parenting interventions and therefore that there was little evidence of contamination. The fourth trial used a scale to measure the fidelity of the control intervention, which was good clinical care [28]. This scale included a sub-scale for cognitive analytic therapy and the mean for this was very low: 0.52 (SD 0.11). This represented negligible contamination.

Solutions used to counter contamination

Methods that were used to counter contamination are summarised in three categories: statistical design, trial conduct, and analysis methods. Statistical design includes the use of cluster randomisation, where clusters are chosen based on groups of participants who are thought potentially to become contaminated by direct or indirect links (e.g. via a shared therapist). One trial inflated the sample size in order to account for reduced statistical power caused in part by contamination bias [29]. The great majority of other methods for preventing contamination were aspects of trial conduct, such as recruitment of more clinicians to ensure that each clinician only delivered one of the interventions. In terms of analysis methods, one trial used per protocol analysis, meaning that participants whose treatment was contaminated were dropped from the analysis [30]. This review found no trials that addressed the problem of contamination by using methods from the causal inference field.

Categorisations of trial conduct solutions that were used to avoid treatment contamination can be found in Table 4. The sections of the table match the processes of contamination described in the earlier section on this. The majority of solutions used to prevent contamination related either to preventing staff delivering the active intervention in the control arm or preventing communication between clinicians or participants.

Table 4 Trial conduct solutions to treatment contamination

Four trials were concerned about contamination during data collection and aimed to prevent this by minimising interaction between researchers and participants [31,32,33,34]. Another temporally separated the control and active treatments with data collection following each. This meant that treatment could only influence data from active intervention participants [35].

Trials using both cluster- and participant-level treatment allocation

The results of the review included four trials that used both participant- and cluster-level treatment allocation [36,37,38,39]. Treatment effect estimates and confidence intervals for these trials are shown in Fig. 2. The figure shows treatment effects arranged such that greater benefit (or less harm) of treatment is represented by a greater number on the horizontal axis. The figure enables the comparison of the absolute size of treatment effect between participant- and cluster-level allocation to assess the impact of contamination on effect size estimation. Of the 21 outcomes investigated, just under half of outcomes showed a difference in the anticipated direction, i.e. smaller estimated absolute effect sizes under participant-level random allocation. In particular, an attenuated treatment effect (lesser distance from the null line in Fig. 2) was found under participant-level allocation in eight out of 21 outcomes with a tie in one outcome.

Fig. 2
figure 2

Forest plots for four trials that used both individual- and cluster-level randomisation; (P) = primary outcome. a) Clarkson et al. (2009) [36]. Choice of primary outcomes is based on sample size calculation; estimates are adjusted for baseline measures. Larger (more positive) treatment effects indicate benefit. b) Lee & Gay (2011) [37]. Estimates were standardised and calculated from summaries of means and SDs (mothers’ scores only). Larger (more positive) treatment effects indicate benefit. c) Marshall et al. (2004) [38]. Estimates used same adjustments as in the trial publication. Larger (more positive) treatment effects indicate benefit. d) Richards et al. (2008) [39]. Estimates were standardised and calculated from summaries of means and SDs. Larger (more positive) treatment effects indicate benefit

Discussion

The review identified 234 articles that described either the processes driving treatment contamination, its quantity, or steps that researchers took to prevent or alleviate the problem in trials of complex interventions in mental health. The principal processes leading to contamination were found to be clinicians being required to treat participants in both treatment and control conditions and communication among clinicians or participants in different trial arms. Typically, around one in eight participants in the control arm of a trial were assessed as having received the active intervention. The most common steps that researchers took to prevent or mitigate contamination were the use of cluster randomisation, organising for each clinician to provide only one type of treatment, monitoring treatment receipt, spatially or temporally separating trial arms, and informing participants about only the treatment that they were allocated to receive. There was little evidence of a difference in the magnitude of treatment effects within trials that used both cluster- and participant-level treatment allocation.

The classification of two main processes and three more minor types of contamination was based on the processes that researchers and clinicians described in such trials. The main trial conduct steps that researchers took to minimise contamination were in line with the processes that were found to be driving it. Many researchers attempted to design against contamination by carefully controlling the treatment’s delivery. These processes were often anticipated and then prevented or attenuated by the designers of the trials. There were no examples of researchers first having evaluated in detail treatment receipt within the control arm. The processes described here therefore partly represent researchers’ expectations and not entirely clinician or participant behaviour.

The small number of trials that measured and reported treatment receipt in the control arm found it to be affecting a minority of the control participants. The distribution of this was similar to the quantity found previously in other areas of medicine such as educational interventions [1], breast cancer screening [4], and cancer trials using Zelen’s design [5]. Thus while there is a lot of concern about contamination it is not clear that this problem is indeed widespread. The extent of the problem may be related to the complexity, intensity, and nature of the intervention.

Researchers often used cluster randomisation to prevent treatment contamination, amongst other reasons. While cRCTs can avoid contamination bias they are at risk of other biases. Our set of articles included 93 cluster randomised trials. Assessments of bias suggested that cRCTs were more likely to be affected by bias when considering whether randomisation took place after consent was obtained and after baseline measures were completed, whether outcome assessment was blind, and whether demographic characteristics were similar between trial arms. This was consistent with an earlier review of cRCTs that were published in three prominent medical journals which found evidence of recruitment bias [8].

The small number of trials that allocated treatment at both cluster and participant levels did not find any evidence for differences in effect size estimates. The lack of evidence for a link between the level of randomisation and treatment effect size suggested that either the employment of cluster randomisation did not prevent contamination, the anticipated contamination was overstated, or that the use of cluster randomisation led to a similar degree of bias as that caused by contamination in the participant-randomised trials. Overall, the finding was consistent with those of a review of trials of enhanced care in depression [12], and of educational interventions [1]. Similarly to previous reviews, there was considerable heterogeneity between trials identified in this study that used both cluster- and participant-level randomisation. However, the variability here is between trials and not within them because randomisation implies that the sub-trials were balanced for every variable except the level at which treatment allocation took place. It is possible that the impacts of contamination and cluster randomisation on bias are dependent on the disease or type of intervention. In order to draw substantive conclusions about the effect of treatment allocation level on contamination, a systematic review of this particular trial design is needed.

Conclusions

This is the most comprehensive review of contamination in mental health trials to date. It is the first to identify the processes leading to contamination and the measures that researchers take in order to minimise the problem. The main limitation is that the trials were heterogeneous in that they represented a large range of illnesses and interventions. With regard to the causes of contamination, it is an assumption that the processes described by authors were the drivers of contamination. Another limitation is the time interval between the literature search and publication of this review. It is possible that the problems, solutions and reporting of contamination have evolved since the search took place.

The results of this review suggest that treatment contamination is perceived to be a significant problem in trials of complex interventions in mental health. However, the trials that measured and reported it suggest that the phenomenon is often modest (with a large range). The reporting of it is infrequent and almost certainly not as commonplace as that of treatment non-compliance. This implies a need for greater measurement and reporting of treatment receipt in the control arm of trials in this field. The findings also show that there are many steps that researchers can take to minimise contamination without resorting to the use of cluster randomisation. In addition, we found that modern causal analysis methods, including the techniques developed particularly for contamination adjustment [9, 10], are yet to be utilised to deal with contamination bias at the analysis stage. This is likely to be in part a reflection of the infrequency of measurement of treatment receipt for all participants in the control trial arm.