Background

Well-conducted randomised controlled trials (RCTs) are widely viewed as providing the optimal evidence on the relative performance of competing healthcare interventions [1,2]. However, simply detecting any statistical difference in the effectiveness of interventions may not be sufficient or useful; if the interventions differ to a degree or in a manner that is of little consequence in patient, clinical or economic (or other meaningful) terms, then the interventions might be considered not to be different. If RCTs are to produce useful information that can help patients, clinicians and planners make decisions about health care, it is essential that they are designed to achieve this. This is typically achieved by specifying a target difference for a primary outcome as part of a sample size calculation, which provides reassurance that the trial will have the specified statistical power to identify whether a difference of a particular magnitude exists. Beyond purely statistical or scientific concerns, the sample size calculation has financial and ethical implications. Failing to recruit sufficient participants to be able to confidently detect a relevant difference between interventions may be viewed as an inefficient use of finite research resources, while recruiting substantially more than are needed risks exposing participants to unnecessary experimentation [3].

Given these considerations, determining an appropriate sample size is of critical importance. Surprisingly, little practical advice is available on specifying the target difference of the chosen primary outcome, which as noted above is a key component of the sample size calculation. A comprehensive systematic review of the literature identified methods for determining the target difference that are available and surveys have shown these methods are in use [4,5]. Nevertheless, uncertainty regarding the magnitude of the target difference when designing the trial will lead to uncertainty regarding the interpretation of the results, even when the trial is otherwise successfully conducted [6,7].

This article aims to provide practical guidance primarily for researchers involved in determining the sample size for an RCT and, in particular, the specification of the target difference in the primary outcome. It is also relevant to those who are involved in commissioning and publishing such studies. We provide guidance on the choice of the primary outcome, specification of the target difference and a brief summary of available methods that can be used to inform its specification and reporting. Additionally, two sets of reporting items, one for a trial protocol and the other a report of the trial findings in a peer reviewed biomedical journal, are also proposed and examples provided. A comprehensive systematic review and discussion of the individual methods for specifying a target difference has been reported elsewhere [4,5]. The focus of this guidance is upon what might be termed the conventional, or standard, approach to an RCT sample size calculation: a standalone trial utilising the conventional statistical framework for sample size calculation and primarily for superiority trials (those where the difference to be detected is specified). The key issues considered are relevant to other RCT designs and analysis approaches though implementation may differ. We note that the conventional approach to sample size calculation is not without its limitations and alternatives have been proposed [8], nevertheless it continues to be the most widely adopted approach [1,9].

The conventional approach to the sample size calculation for a two parallel group RCT is as follows:

  1. 1.

    The RCT is conceived as a standalone definitive study (a study that is designed to provide a meaningful answer on its own);

  2. 2.

    It addresses a superiority question evaluating evidence of a difference (in either direction);

  3. 3.

    Adoption of a two parallel-group RCT design (typically 1:1 allocation);

  4. 4.

    Application of the Neyman-Pearson framework to calculate the sample size [2,10-12]. This requires specification of: the primary outcome for which the required sample size is to be calculated; the target difference (specification varies according to outcome type); statistical parameters (significance level and power) and other component(s) of the sample size calculation (such as standard deviation (SD)).

Methods

Development of the guidance

This work was part of the DELTA (Difference ELicitation in TriAls) project, a study on target differences commissioned by the Medical Research Council/National Institute for Health Research Methodology Research Panel (MRC/NIHR) in the United Kingdom. It comprised three interlinking components: a comprehensive systematic review of methods for specifying the target difference, two surveys of current practice amongst clinical trialists and generation of structured guidance. This article is an abridged version of this guidance and other components of the project which have been reported in full elsewhere [4]. DELTA was undertaken by a collaborative group in which the majority of members have extensive experience of the design and conduct of RCTs (both as investigators and as independent committee members) and have conducted methodological research related to RCTs (such as quality-of-life measurement, statistical methodology, reporting, surgical trials and economic evaluation). The draft guidance was developed by the project steering and advisory groups utilising the results of the systematic review and surveys. Findings were circulated and presented to members of the combined group at a face-to-face meeting, along with a proposed outline of the guidance document structure and a list of recommendations and reporting items for a trial protocol and report. Both the structure and main recommendations were agreed at this meeting. The guidance was subsequently drafted and circulated for further comment before finalisation. No ethical approval was needed for this research.

Scope of the guidance

This guidance is based upon the conventional approach to a sample size calculation, though it should be applicable to most RCTs [1,9]. However, other approaches, for example trials with an explicitly Bayesian analysis framework, will require adaptation of the reporting items. It focuses upon guidance for a trial with a ‘superiority’ question; one which seeks evidence of a difference between intervention groups. Although this guidance is primarily aimed at researchers, it is also relevant for publishers, funders and commissioners of research.

Results

Abridged guidance is given below.

Choosing the primary outcome

In the conventional approach to the sample size calculation for an RCT, a single outcome is usually chosen to be the primary measure upon which the sample size calculation is based (in some cases more than one primary outcome may be appropriate) [2,10,13]. The specification of a primary outcome performs a number of functions in terms of trial design, but it is clearly a pragmatic simplification to aid the design, interpretation and use of RCT findings. Through the corresponding sample size calculation and specification of the target difference, it clarifies what the study aims to identify, and the statistical power and precision with which this can be achieved. Stating the primary outcome in the study protocol also helps prevents undue over-interpretation arising from testing multiple outcomes and selective outcome reporting bias, whereby authors report only statistically significant (on possibly clinically irrelevant) outcomes or change the primary focus of the study to match a statistically significance finding. Additionally, it helps clarify the initial basis upon which to judge the study findings. This is particularly important in presence of a ‘negative’ result, where the result does not meet the criteria for statistical significance (typically 5%). In all cases, focus should be upon the confidence interval as well as the point estimate, where a justifiable target difference can guide the interpretation. However, such justification of the target difference is often lacking in trial reports [1,6]. Calculating (or reverse engineering) the magnitude of a difference that can be detected at conventional levels of statistical significance and power (typically two-sided 5% and 80%, respectively), given a sample size which is believed to feasible, is often performed in practice for a selection of key outcomes before determining the primary outcome. Nevertheless, it is important to report the final sample size calculation, including the chosen primary outcome, the target difference and any justification of the value chosen, in as robust and transparent a fashion as possible to allow others to judge the basis of the calculation.

Specifying the target difference

The specification of the target difference in an RCT sample size calculation has received surprisingly little discussion in the literature. For a superiority trial, it is the difference in the primary outcome value that the study is designed to detect reliably [2,10,13]. There are two main bases for specifying the target difference: a difference considered to be ‘important’ (for example, by a stakeholder group such as health professionals or patients), and a ‘realistic difference’ based upon current evidence (for example, seeking the best available estimates in the literature through some form of knowledge synthesis).

It has been argued that a target difference should always meet both of these criteria [14]. The desire to be able to consider an (clinically) important difference can be viewed as a middle ground between ignoring the consequences of the treatment decision and a full assessment of the benefits, harms and costs of an intervention against the alternatives, which seeks to ensure that any harms and costs are incurred for a good reason. Focusing on a benefit (or harm) of the most important outcome is a natural and intuitive, if imperfect, way to guide a decision. A large body of literature exists on defining a clinically important difference, though not in the context of an RCT sample size calculation [15-17]. The most common general approach is the minimal clinically important difference (MCID). This has been defined as ‘the smallest difference …. which patients perceive as beneficial and which would mandate, in the absence of troublesome side effects and excessive cost, a change in the patient’s management’, or more simply as ‘minimum difference that is important to a patient’ [17]. Many variants on this basic approach exist [18,19]. In the context of specifying a target difference for a typical two parallel-group trial, the focus is on a difference at the group level, between two groups of different participants. This contrasts with the vast majority of the MCID (and related) literature, which focuses overwhelmingly upon within-patient change and whether an important difference can be said to have occurred [15-17]. An alternative approach is to consider all relevant issues, including the consequences of decision-making, whereby a difference of any magnitude can be viewed as important and therefore a study’s size (and implicitly the target difference) is determined by reference to resource implications [20,21]. Whatever definition is used, estimation of an important difference is not without its challenges and limitations [22,23].

The other main basis for a target difference is to specify a realistic difference; there is, for example, little point in setting as the target difference one that is so large that it cannot plausibly exist. If a systematic review of RCTs on the research question is available, it can be used to specify what difference is supported by current evidence. In essence, a realistic difference makes no claim regarding its clinical importance or otherwise. However, where a realistic difference is used, consideration of the importance of the difference is needed if the study findings are intended to inform clinical, patient or policy decisions. For some outcomes, the importance may be very clear (for example, mortality), whereas for others (especially quality of life and surrogate outcomes) further explanation is needed. Recruitment, study management and finance will naturally come into play when determining the sample size of a study. However, such considerations do not negate concerns about what is a realistic and/or important difference.

For a superiority trial it is generally accepted that the target difference should be a clinically important difference [2,10-12] or ‘at least as large as the MCID [minimum clinically important difference]’ [24]. The target difference in a conventional sample size calculation is not the minimum difference that can be statistically detected; statistical significance alone is not a sufficient consideration for attributing importance to a difference [2,12].

The target difference is specified differently depending upon the type of primary outcome. For a continuous outcome, this target difference on either the original or standardised scale is often referred to as the ‘effect size’. Strictly speaking, this value alone does not fully (uniquely) specify the target difference; the assumed variability of the outcome (standard deviation) is also needed to convert the effect size between the original and standardised scales. For a binary outcome, the target difference will be conditional on the control group event proportion. To uniquely specify the sample size, the target difference and the control group event proportion are needed, which together imply a unique pair of absolute and relative target differences. Similarly, survival outcomes require the control group proportion or survival distribution and length of follow-up period to be stated, in addition to the target difference. This is necessary as the sample size required is sensitive to both the absolute level and the relative difference. Despite this, it is not uncommon for only one or the other to be specifically stated in trial reports.

Seven methods for specifying the target difference have been identified [4] which can be used to inform the choice of target difference: anchor, distribution, health economic, opinion-seeking, pilot study, review of the evidence base and standardised effect size (see Table 1 for a brief summary and elsewhere for a summary of the literature assessment of the use of each method [5]).

Table 1 Methods for specifying an important and/or realistic difference [5]

Reporting the sample size calculation and target difference

The assumptions made in the sample size calculation should be clearly specified. All inputs should be clearly stated so that the calculation can be replicated. It is recommended that trial protocols clearly and fully state the sample size calculations, including where the approach taken differs from the conventional approach (for example, the adoption of a Bayesian framework instead of a frequentist approach), statistical parameters and the target difference, with justification for the choice of values. Due to space restrictions in many publications the main trial paper is likely to contain less detail. A minimum set of items for the main trial results paper along with full specification in the trial protocol is recommended below in Table 2. These are more extensive lists of reporting items building upon the Consolidated Standards for Reporting Trials (CONSORT) including the 2010 version) and Standard Protocol Items: Recommendations for Interventional Trials (SPIRIT) statements, which provide guidance on reporting the sample size calculation, but not explicitly how to report the target difference and its justification [25-27] Examples for the three most common outcome types are provided in Table 3.

Table 2 Reporting items for the protocol and report of a two parallel group superiority trial
Table 3 Reworked example RCT protocol sample size calculation sections

Discussion

The RCT is widely considered to be the best method for comparing the effectiveness of health interventions [1]. Determining the target difference is a key element of an RCT design. Improved standards in both RCT sample size calculations and reporting of these calculations would aid health professionals, patients, researchers and funders in judging the strength of the available evidence and would ensure better use of scarce resources. While no single method provides a perfect solution to a difficult question, we have provided practical guidance for researchers on sample size calculation with reference to specifying the target difference and how this should be reported in trial protocols and reports. To our knowledge, no alternative guidance exists. Although our examples and framing are from a medical context, the issues are relevant to social care, animal and other non-medical research as well. Further research into the implementation, practicality and consequence of using alternative methods for specifying the target difference (such as health economic and opinion-seeking), and exploration of the justification of some methods (such as the standardised effect size method, where the magnitude of the effect is used to infer the important of a difference) is needed.

Conclusions

Specification of the target difference for the primary outcome is a key component of an RCT sample size calculation. There is a need for better justification of the target difference and for corresponding reporting of its specification. Raising the standard of RCT sample size calculations would aid health professionals, patients, researchers and funders in judging the strength of the evidence and would ensure better use of scarce resources.