Background

Patient outcomes depend crucially on the treatment provider delivering the intervention. Where there is more than one treatment provider, outcomes observed in patients treated by the same treatment provider may be more similar than those in patients treated by other treatment providers, a phenomenon known as clustering. Whilst treatment providers are often thought of as health professionals, such as general practitioners, nurses, surgeons or therapists, the potential for clustering is also present for treating centre within a clinical trial [1, 2]. In addition to clustering, a change in skill in treatment delivery may be observed over time; specifically, there may be a learning element experienced within one or all of the arms of the study observed during the course of the trial, meaning that trial outcomes may also change and be associated with changes in skill [3]. When comparing interventions within a clinical trial, it is imperative that any trial is designed under a common protocol with regard to treatment delivery, and that the trial is conducted in accordance with this protocol. At trial outset, a researcher may consider the homogeneity of any intervention under examination and the degree to which it is appropriate to standardise these procedures [4]. In extreme cases, where the trial results are questioned by the research community related to the study results, the trial team should be prepared to alleviate any doubts of heterogeneity of treatment effects [5].

Difference in treatment delivery is often considered more of a concern in trials investigating a complex intervention, such as surgery. Trials involving a complex intervention are often criticised because of variability between intervention providers (clustering) but also due to variability over time, often as a result of increased experience (learning) [4]. Recognition, and management as appropriate, of clustering and learning is recommended, and it may have increased relevance within the surgical field, dependent upon the interventions being investigated and their routine use [2, 4, 6,7,8,9]. Considering these aspects at trial outset will ensure that any necessary adjustment, to the design or analysis of the study, is applied in a manner appropriate for the intervention under investigation and will support clinical decision-making [5].

Whilst the notion of clustering and learning is familiar to many statisticians, the extent to which these considerations are made, and how, is unknown. A survey to establish current practice for the statistical management of clustering and learning effects in the design and analysis of randomised multicentre trials was undertaken within UK Clinical Research Collaboration (UKCRC) Registered Clinical Trials Units (CTUs) [10]. This survey aimed to ascertain the UK-wide experience of running multicentre studies, in particular those investigating a complex or surgical intervention, in addition to establishing awareness of design issues associated with these studies and levels of concerns around these issues.

Methods

The survey was delivered at the bi-annual Statisticians Operational Group Meeting in April 2018. Attendees were statistical representatives from each of the UKCRC Registered CTUs [10]. Units that did not have a representative present at this meeting, or did not respond, were contacted via email following the event and invited to participate. Registered CTUs were identified from the network website [10] on 4 January 2018 (n = 51, of which 50 were registered at the time of survey; see Supplementary Box 1). As the survey involved professionals and discussions of current practice, no formal ethical approvals were deemed necessary.

Survey

EJC and CG developed the survey, and GB, JMB and JAC reviewed and provided feedback. The survey was subsequently piloted and revised prior to roll out (Supplementary Box 2).

This survey was developed to establish experience in multicentre trials, in particular those investigating a complex intervention. In order to contextualise the survey content, questions drew upon quotes from existing guidelines, references to relevant publications and example scenarios developed by the study team ([4, 9, 11], Table 1). Questions included concepts such as CTUs’ experience in adjusting for clustering (therapist/surgeon or centre) or time-varying effects (learning curves) and, when a Unit had experience, when and how adjustments are applied. This survey also aimed to establish awareness about design issues in surgery and levels of concern around these issues.

Table 1 Example trial scenarios

Questions were analysed and reported by Unit. To represent Unit practice and experience as a whole, Units with multiple responders were combined. In the case that multiple responders from a single Unit provided contradictory answers, for example one responder stated they had experience and another stated they did not, it was assumed that the Unit had the experience. However, due to the nature of the network meeting invites (one per registered CTU), multiple responders from a single CTU were minimised.

Analysis

Quantitative data from closed questions were analysed using descriptive statistics with standard statistical software (Statistical Analysis Software [SAS®] 9.1.4; SAS Institute Inc., Cary, NC, USA); no formal statistical testing was undertaken.

Free text answers were used to contextualise and illuminate quantitative responses.

To ensure anonymity, each Unit was assigned a project identification number.

Results

Unit participation and demographics

Forty-seven of the 50 UKCRC Registered CTUs were represented at the network meeting on 28 April 2018. Of those present, 34 representatives from 31 CTUs (62%) participated. Following the meeting, Units without a completed survey were contacted, of which 13 responded (n = 13/19). Supplementary Table 1 provides further details. The overall participation rate of registered Units was 88% (n = 44/50). One representative from a newly registered Unit reported lack of experience as a reason for non-participation; reasons were not provided from the remaining five Units.

All responders had a statistical background, with the majority of responders holding a senior or lead at their Unit (senior statistician: n = 15/44, 34%; statistical lead: n = 13/44, 30%). Supplementary Table 2 provides further details.

CTUs listed on the UKCRC Resource Finder [10] as conducting cluster or surgical trials had participation rates of 94% (n = 16/17) and 92% (n = 33/36) respectively (see Supplementary Table 3). CTUs with a methodological research area in complex interventions participated with a rate of 90% (n = 35/39).

Three-quarters of CTUs indicated experience in running trials with a complex intervention (n = 32/44, 73%) and two-thirds in running trials with a surgical intervention (n = 29/44, 66%), with 25 (57%) indicating experience in both. Seven CTUs stated that their Unit did not have experience in running trials with either type of intervention (n = 7/44, 16%). One did not respond to this question (Question 1, Supplementary Table 4).

Managing effects through design

Clustering

Twenty-five Units had undertaken multicentre trials that did not stratify by centre (n = 25/44, 57%; see Table 2, Question 2, and Table 3). Common reasons for not stratifying by centre were many centres with few participants (n = 19/25, 76%) and expected homogeneity of treatment effect (n = 11/25, 44%). Additional reasons for not stratifying by centre included allocation concealment in an open trial; logistical reasons; and grouping centres by region. One responder clearly indicated that this decision was influenced by the nature of the intervention, stating:

… drug trials less effect due to centre compared to say complex or surgical interventions.” [ID23]

Table 2 Methods for managing clustering and learning by design
Table 3 Reasons for having multicentre studies that do/do not stratify by centre (Question 2)

One responder who did stratify all the Unit’s trials by centre alluded to concerns regarding potential for unequal distribution of costs across centres:

This subject gets a lot of academic debate in some academic circles. But: our randomisation defaults to stratifying by centre; need to balance resources—don’t want to give one too many overheads; balancing avoids confounding; other opinions, such as Torgerson, exist.” [ID8]

Question 3 asked responders to consider five scenarios (Tables 1, 2 and Supplementary Table 5), in particular their approach to stratifying the randomisation in trials of each type run by their Unit. Responses to Scenario A, of which 39 Units had experience, indicated that most Units when running a trial with a large sample size, with multiple treatment providers per centre each recruiting a minimum of 10 participants, would stratify by centre alone (n = 34/39, 87%).

Three would stratify by treatment provider alone (n = 3/39, 8%). Seventy percent had experience in running trials like Scenario B, which was the same as Scenario A, only with a small sample size (n = 31/44, 70%). As with Scenario A, most Units ran such trials by stratifying by centre alone (n = 24/31, 77%) and few by treatment provider alone (n = 2/31, 6%).

Responders had less experience running Scenario C trials, trials recruiting in several centres where treatment providers treated patients across centres (n = 16/44, 36%). Again, most common was stratification by centre only (n = 14/16, 88%), with a greater number of Units indicating that they had stratified such trials by treatment provider only (n = 3/16, 19%).

CTUs with experience running trials in Scenario D, i.e. trials recruiting from multiple centres, each with multiple treatment providers and investigating a surgical intervention (n = 25/44, 57%), also primarily stratified by centre only (n = 21/25, 84%). One-fifth indicated stratifying by both centre and treatment provider in such trials (n = 5/25, 20%).

Whilst Units had less experience running trials like Scenario E, which was similar to Scenario D but investigating substantially different interventions, stratification approaches were similar to those of Scenario D (centre only: 13/16, 81%; both centre and treatment provider: 2/16, 13%).

Twelve responders provided free text explaining their approaches for stratification in each of the scenarios (Question 3, Supplementary Table 5). Two-thirds (n = 8/12, 67%) commented on the feasibility of stratifying by treatment provider. Reasons were as follows: concerns that there would be too few per strata [ID8, ID15, ID39]; treatment provider not known in advance [ID8, ID32]; delivered by a subset of treatment deliverers [ID1, ID39]; data not collected on treatment provider [ID13]; treatment differences assumed to be differences in facilities and protocols [ID17]; usually comparing the intervention policy and not the different aspects of the intervention [ID32]; treatment provider can change during the trial [ID30].

Other responses provided examples of stratification levels, e.g. centre as hospital and treatment provider as operating surgeon [ID10]; two that this was trial-specific [ID14, ID29]. One raised concerns with stratifying by centre:

Recent conversions between senior statisticians advocate not stratifying by centre in any situation. They cited concerns regarding prediction of allocation.” [ID18]

When comparing stratification approaches across scenarios within Units (Question 3, Table 2), 19 Units used the same approach across all scenarios in which they had experience, and 20 changed their approach depending on the trial scenario (same: n = 19/44, 43%; different: n = 20/44, 46%). Five had no experience in any of the suggested scenarios or did not respond to the question.

Learning

The majority of responders (n = 39/44, 89%) indicated they had accounted for learning by defining a minimum level of expertise for treatment providers (Question 4, Table 2). Common definitions were set in terms of delivering the trial intervention (n = 31/44, 70%); treating the condition within the patient population (n = 24/44, 55%); and setting a minimum professional level for treatment providers (n = 22/44, 50%). Three delegated this responsibility to the clinical investigators on the study. Examples of alternative approaches to specifying minimum levels of expertise included use of a surgical manual with senior surgeons signing off treatment deliverers [ID15] and treatment deliverers being required to pass both surgical and radiotherapy quality assurance [ID18].

Thirty percent of CTUs had used an expertise-based trial design, in which participating treatment providers provide only the intervention in which they have expertise (n = 13/44, 30%, Question 5, Table 2).

Managing effects through analysis

Clustering

In trials stratified by centre, 55% of Units had subsequently adjusted by this stratification factor in the analysis (n = 24/44, 55%, Question 6, Tables 4 and 5). This had been done either by pre-specified grouping rules at the design stage (n = 19/24, 80%); by an ad hoc approach (n = 14/24, 58%); or by other approaches: grouped centres where numbers are small [ID7, ID15]; site as a fixed effect [ID8]; or:

Depends. Either include as a stratifying factor (small number of centres, large patient numbers) or by including centre or treatment provider as a cluster.” [ID32]

Table 4 Methods for managing clustering and learning by analysis
Table 5 Other grouping rules when randomisation is stratified by (a) centres or (b) treatment providers (Question 6)

Regardless of the stratification approach used, very few Units had never adjusted for centre in the statistical model when comparing treatment (n = 3/44, 7%, Question 7, Table 4 and Supplementary Box 3). Responders from Units that did (39/44, 89%), did so using fixed effects (n = 11); random effects (n = 12); or, depending on the circumstance, used either (n = 14). Two did not respond. Reasons in favour of using fixed effects were ease of interpretation and fewer assumptions associated with it [ID27]; and random effects as:

“Usually an underlying assumption that centre may be a surrogate for socioeconomic factors that may affect outcome and/or treatment effect and so often not happy to assume that there is an equal fixed treatment effect across all sites.” [ID15]

In trials stratified by treatment provider, 36% also subsequently adjusted the analysis (n = 16/44, 36%, Question 6, Tables 4 and 5). Three-quarters did so in accordance with pre-specified grouping rules (n = 12/16, 75%) or using a more ad hoc approach (n = 7/16, 44%).

Regardless of stratification approach used, 59% adjust for treatment provider in the statistical model when comparing treatment (n = 26/44, 59%, Question 8, Table 4 and Supplementary Box 4). The majority of responders used a random effect (n = 18/26, 69%), with one providing reason:

“If treatment provider was included as stratification factor it will be because we are concerned that the provider will have an impact on outcome but also because we would expect different population for different treatment providers.” [ID15]

When responders were asked to revisit the scenarios in Table 1, this time to consider investigating treatment by centre or treatment provider (Question 9, Table 4), exploring treatment by centre was universally most common across all scenarios. Exploring treatment by provider was rare. Twelve responders provided free text to explain their approaches for adjustment (Question 9, Supplementary Table 6). General themes for additional information provided were as follows: that the decision is trial-dependent [ID6, ID14]; concerns around sample size [ID6, ID7, ID39]; and, when explored, that this was informal [ID5, ID8, ID14, ID32, ID38].

When comparing treatment interaction approaches across scenarios within Units (Question 9, Table 4), 24 Units used the same approach across all scenarios, and 12 utilised a scenario-specific approach (same: n = 24/44, 55%; different: n = 12/44, 27%). Eight had no experience in any of the suggested scenarios or did not respond to the question.

Seventy-three percent of Units explored heterogeneity by centre when a positive treatment effect is found (n = 32/44, 73%, Question 10a, Table 4), whereas fewer explored heterogeneity by treatment provider (n = 12/44, 27%, Question 10b, Table 4). Of those that do explore heterogeneity for either effect, the majority did so by graphical display (centre: n = 31/32; treatment provider: n = 11/12). Many also explored by analytical methods, for example significance testing (centre: n = 22/32; treatment provider: n = 9/12). Supplementary Tables 6 and 7 provide further details.

Learning

Fifty-nine percent of CTUs included the treatment provider in the statistical model when comparing treatment (n = 26/44, 59%), two of which had treated this as a time-varying covariate (Question 8, Table 4), with one specifying:

“Fairly crude by letting the number of procedures in the trial increase the relevant surgeon’s experience (ignoring procedures done outside of the trial of course!)” [ID38]

Those that had not used a time-varying effect had experience in exploring learning through a sensitivity analysis [ID35] or secondary analyses [ID8, ID39], with one specifying:

“Had we found evidence of learning, we would have had awkward additional data summaries and presentations” [ID8]

Two responders had not considered such analyses [ID7, ID23], and one provided time restrictions as a reason for not doing so [ID30].

Discussion

This survey identifies the fact that, despite multicentre trials being prominent across all CTUs, there is a UK-wide variation of designing and analysing these trials with respect to clustering and learning effects. Approximately half of Units changed their approach to design and analysis when presented with five example trial scenarios, each with varying levels of complexity, such as small sample size per centre and complex interventions, such as surgery. This finding suggests that variation can exist both across and within Units, suggesting that this decision can depend on the type of trial being conducted. Units indicate awareness of the potential methodological challenges associated with the design and analysis of multicentre trials, although the approaches used and opinions on these vary. The high response rate achieved provides insight into the general and current practice of managing clustering and learning effects in multicentre trials investigating varying types of interventions. Whilst acknowledging that different approaches may be more suitable to different trial types, they indicate the need for a more unified approach to the design and analysis of trials where outcomes are associated with the delivery of the intervention and/or more research in this field.

When adjusting for clustering within the design, a higher proportion than expected ran trials that did not stratify by centre (52%). Most commonly, this was due to too many centres and not enough participants within centre. Stratifying by centre was most common in all scenarios, whilst stratifying by treatment provider was consistently rare but more common in trials with a surgical intervention. Stratifying by treatment provider raised pragmatic concerns, e.g. concerns over relevance to research question, or provider not known pre-randomisation. Whilst in some settings, such as emergency treatment, advance knowledge of the treatment provider will be unobtainable, advanced planning may be possible in other settings, such as group therapy, with guidance for practical issues like these available [13]. Half of the responders had adjusted by centre following stratification by the same. Most commonly this was done by pre-specified grouping rules established at the design stage or using an ad hoc approach determined after design due to small numbers per group. Regardless of stratification approach, eight-tenths of responders had adjusted for centre in the statistical model. There were mixed opinions on how this adjustment was made, i.e. by fixed or random effects with reasons provided for and against both approaches. When a positive treatment effect is found, three-quarters and one-third stated that they then explore heterogeneity by centre and treatment provider respectively. All did so using graphical displays.

Managing learning by design through defining a minimum level of expertise for health professionals participating in the trial [4] was most common, with almost all responders (89%) applying this approach to studies within their Unit. Less than a third indicated experience in conducting expertise-based designs, a design that can be particularly useful when comparing substantially different interventions. This finding suggests these designs are more commonly implemented than suggested by the literature [7, 14]. Concerns were raised that identifying evidence of learning may lead to ’awkward additional data summaries’.

Guidance on trial design and analysis does exist, with the most relevant of these recommendations being explicitly incorporated into the survey questions [4, 6] Supplementary Box 2. Additional documents within the International Council for Harmonisation (ICH) Series provide further guidance beyond ICH E9 [15, 16]. The Consolidated Standards of Reporting Trials (CONSORT) statement and relevant extensions provide direction valuable at study design despite the document being developed to support reporting [17, 18]. The decision to explore effects may, in part, be related to the intention of the research in terms of how the results will be used, and the PRECIS-2 tool has been developed to help with this [19]. However, the ability to identify and explore heterogeneity at the analysis stage is an important consideration for generalisability for all trials.

Strengths of this study were that, although the survey was limited to registered Units, responders represent wide geographic coverage within the UK, spanning a diverse range of medical conditions and associated methodologies. In addition, participating Units are known to comply with required regulatory standards and meet acceptable standards of quality required by the UKCRC CTU registration process [20]. All responders were experienced triallists who either were statistical lead at their Unit or a nominated statistical representative. Publicly funded trials cover a diversity of interventions [15] and are generally not seeking a marketing authorisation from the competent authorities, and this may impact the approaches taken in line with heterogeneity of effects by cluster or time. Limitations of this work are that it represents statistical practice within the UK in leading trial centres, with global practice unknown. However, the survey drew upon internationally accepted guidelines [4] for best practice, and therefore the opinions and experiences are applicable beyond the UK. Second, some of the observed responses may have related to the different types of trials that the CTUs conduct. Not all trials include interventions where there is learning. Indeed, one would anticipate that many pragmatic large-scale trials do not have ‘learning’ effects because they include interventions that are stabilised and in widespread use. Whilst the survey allowed for free text responses, a more focussed survey, achieved using qualitative research methods, would be needed to examine these issues. Third, the volume of studies designed by each Unit will vary widely, and one responder per Unit may result in experiences reported for larger Units not being indicative of all studies run. However, responders were able to complete the survey with additional support within their Unit if deemed appropriate.

Conclusions

This survey is the first to report on the experience and management approaches with regard to clustering effects and the learning curve in multicentre randomised trials. Importantly, responders, who were highly experienced in the design and analysis of such studies, appear to have awareness of when to make such considerations. Whilst approaches to management are varied, and this variation may be trial-dependent within Unit, reasons for approaches reported were provided and approaches justified. Historically, guidance on the design, analysis and reporting of randomised controlled trials was developed more generally to support consistency in approaches across a more conventional randomised controlled trial [4, 15, 16], with the development of more intervention-specific guidelines being established following these to address the additional complexities across different types of trials [11, 17, 18, 21]. Intervention-specific guidelines may have led to the variation and justifications identified in this survey. Results highlight the need for better consistency between triallists. Agreeing principles to guide trial design and analysis across a range of realistic clinical scenarios should be considered and/or further researched to establish optimal methods.