Background

Pragmatic trials evaluating the effectiveness of complex health interventions often evaluate new or modified treatments against a usual care comparator arm. As these trials aim to inform policy and practice in real-world settings, it is important that the usual care comparator resembles care normally provided in everyday practice [1]. Achieving this, however, might not be straightforward. Whilst the term usual care implies that there is consistent practice against which an intervention can be assessed, which may be the case where there is high-level evidence for a particular treatment [2], usual care can differ for the same condition, between patients and practitioners, across clinical sites, countries and over time [3].

Some researchers argue that to strengthen a trial’s external validity, the potential heterogeneity in usual care should be accepted and the trial’s usual care arm should include the full range of treatments available [4, 5]. The problem with this approach is that can make interpretation of trial findings difficult, with a potential lack of clarity about what the intervention is being compared against. Such detail is needed as the content and quality of the usual care arm will affect the effect size found, and therefore how effective the intervention is determined to be [6]. It is also problematic because, as Mant [7] comments, even if usual care is fully described, interpreting a trial’s results and applying them to clinical practice is very difficult if the care reported varies in content and quality.

The heterogeneity in usual care can also raise methodological and ethical problems. For example, trials often aim to detect clinically important effect sizes, which are based on the predicted difference between the effectiveness of the control and intervention groups. Not understanding what usual care consists of, therefore, undermines the basis on which sample sizes are calculated [8]. This lack of understanding also means researchers cannot judge whether the comparator and the intervention share similar ‘active’ components; if they do, the effects of the intervention may be masked or reduced [9]. In terms of ethics, a trial may be viewed as unethical if the quality and quantity of usual care provided at a trial site falls below that provided elsewhere or below standards specified in clinical guidelines [10].

It is usual care’s potentially heterogeneous nature, and the need to address such methodological and ethical requirements, that have led to some researchers specifying at the start of a trial, what treatment(s) and trial processes will be included in their usual care arm. Researchers have based these comparators on clinical guidelines [11], knowledge of current practice [12, 13], and patients’ views on what would be considered acceptable [14]. Yet, if the treatment chosen differs from what is normally delivered, this could weaken the trial’s external validity [1] and potentially put trial participants at risk [15]. Also, deciding what treatment(s) to include might be a difficult decision, as there may be no clinical guidelines or consensus on which treatments should be considered ‘standard’, and current practices might be less than optimal medical care [16]. In addition, there may be insufficient evidence to establish what usual care is [17], and factors such as the feasibility of standardising care across sites also need to be considered [1, 16]. Thus, both of what we will refer to in this paper as ‘unrestricted’ and ‘defined’ usual care comparators (see the ‘Glossary’ section), have their strengths and weaknesses.

There is literature detailing when a usual care comparator should be used [18, 19], and researchers have published protocols of trials that include usual care arms and justified their content [20, 21]. Currently, however, there is no guidance detailing how researchers should decide the content of usual care comparators when designing trials of complex interventions, and what should inform this decision. The aim of this methodology review is to assess current thinking around what factors should drive this decision and what actions should be taken whilst making it. Its focus and design were informed through a discussion with seven patient and public involvement members prior to submitting the application for funding, and one of these members was a co-applicant on the study, attended team meetings, and is an author on this paper (TY).

Methods

We registered the protocol with PROSPERO (CRD42022307324). The reporting of this review was guided by the PRISMA guidelines [22].

Searches and screening

Four electronic bibliographic databases (MEDLINE, Embase, CINAHL and PsycINFO) were searched from the inception of databases to 7th January 2022. A comprehensive search strategy was developed and tested with support from an information specialist (SaD). Searches included both MeSH and free text terms relating to usual care and synonyms, and methodology-related terms, such as methodology and research design. The parent Medline search strategy can be found in Additional file 1: Appendix 1. In addition to searching databases, we also emailed experts in the field and used social media to contact experts, asking them to share relevant literature.

After deduplication, we exported references into Rayyan [23] to screen results from the database searches. Titles and abstracts and the full texts were independently screened by two reviewers (KT and SD) against study criteria. Any disagreements were resolved through discussion and, where necessary, in consultation with a third reviewer (AH). Reference lists of included articles were hand searched, and forward citation searches conducted, to identify additional relevant articles.

Inclusion and exclusion criteria

We included methodology papers, reviews, book chapters, and articles based on case studies that described how to identify or develop usual care comparators in trials of complex interventions. These trials could be in any population and based in primary, secondary or social care, or in public health. We used the MRC’s definition of complex interventions [24] and therefore excluded papers detailing comparators in trials evaluating medicines (e.g. drugs or vaccines) and which focused only on treatment outcomes and not, for example, improving adherence. No language restrictions were applied if an English language abstract was available for initial screening.

Quality assessment of included articles

We did not conduct a risk of bias assessment, as our aim was to review methodological literature to understand current thinking around how researchers should identify or define usual care when planning a trial. Thus, it would not have been appropriate to do so.

Data extraction strategy

A customised data extraction table was developed in Microsoft Word. We extracted data on what principles, considerations and evidence should drive the decision about what usual care comparators should include (we defined these data as ‘decision drivers’), and what steps or tasks researchers should undertake when making this decision (we referred to these as ‘actions’). We also extracted details about the articles and the definitions of usual care used.

The data extraction table was tested on a random sample of two papers and refined. Data were extracted independently by two reviewers (KT and SD), and any discrepancies were resolved through discussion with a third reviewer (AH).

Data synthesis and presentation

We followed Popay et al.’s guidance on narrative synthesis using the general framework proposed [25]. Specifically, element 2: the development of a preliminary synthesis (developing an initial description of results) and element 3: exploring relationships within and between studies. This framework allowed us to shape both our synthesis and discussion of the included studies.

When synthesising details about the articles, we documented within a table information such as, where the article had been published, what terms had been used to refer to a usual care comparator and how this arm had been defined. When synthesising data on drivers and actions, we aimed to provide a narrative rather than a quantitative overview. This was because the articles included in the review varied in their focus and structure, so it was not possible to compare them directly or list how many articles mentioned a specific driver or action. Also, in many cases, it was our analysis and interpretation of the data, rather than an explicit statement in the article, that resulted in text being viewed as detailing a driver or an action.

As we identified individual drivers, it became apparent some related to the context in which a trial would be implemented, whilst others related to what a trial needed to do to address its aim and remain ethical. We therefore labelled drivers as either ‘context drivers’ or ‘trial drivers’, and as we continued to synthesise the data, began to consider how individual drivers might relate and affect each other. Similarly, we realised actions could be grouped according to when they needed to be undertaken, during the decision-making process, to decide the content of a usual care comparator.

In the “Results” section, we describe the included articles, before detailing drivers and actions.

Results

Included articles

We identified 1930 articles from searching databases. After de-duplication, 1611 titles and abstracts were screened. One hundred twelve articles were included for full-text screening and 16 were included in the review (Fig. 1).

Fig. 1
figure 1

PRISMA 2020 flow diagram

Of the 16 included articles, all but one were published by authors based in the USA [2, 4, 10, 12, 15,16,17, 26,27,28,29,30,31,32,33]. The exception was published by authors based in Canada [9]. Thirteen focused specifically on the use, design and implications of usual care comparators [2, 4, 9, 10, 15,16,17, 26, 27, 29, 31,32,33], and three discussed the selection of control groups more broadly but included text or a specific section on usual care comparators [12, 28, 30].

Six papers described themselves as reviews [9, 10, 15, 17, 27, 31]. Nine papers did not define themselves or simply said ‘this article’. We defined them as ‘discussion/methodological’ articles [2, 4, 12, 16, 26, 28,29,30, 33], as they discussed, for example, situations when unrestricted usual care may not be ethically acceptable [26], and how the vulnerability of the target population should inform the use and design of usual care controls [4]. The remaining included paper described an empirical study that identified current treatments for adolescent suicide attempters and then discussed the implications of the study findings on the use and design of usual care arms [32]. Whilst this was the only article based on an empirical study, five of the other included papers described individual studies to illustrate points made [4, 12, 16, 28, 29].

In terms of topic area, four articles were published in the area of critical care [9, 15, 29, 31], five in the areas of mental health, i.e. psychotherapy [26], psycho-oncology [17], genetic counselling [27], suicide prevention [4, 32], and one in the area of Type 2 diabetes [33]. The remaining articles focused on trial or intervention types: experimental studies [12], behavioural effectiveness trials [28], clinical trials [16], behavioural interventions [10], non-pharmacologic interventions [2] and psychological interventions [30] (Table 1).

Table 1 Details of included articles

Terminology used and definitions of usual care

When first defining usual care, nine articles used the term usual care [2, 9, 12, 15,16,17, 27, 29, 33], four used the term treatment as usual [4, 28, 30, 32], two used both of these terms [10, 26], and the remaining article used the term standard care [31]. All of these terms were used to refer to existing treatments or health care practices used in practice (Table 1). This suggested that they are used interchangeably within the literature, and certainly text such as ‘the term usual care (also referred to as routine care, control case, or standard treatment)’ ([33], page 126) and ‘usual care, sometimes called treatment as usual’ ([26], page 64) supports this suggestion.

However, Dawson et al. [16] and Thompson and Schoenfeld [2] both included sections on terminology and detailed why they had chosen the term usual care. Dawson et al. explained they had used it ‘to avoid any legal or normative implications of the term “standard of care.”’ (page 1) and Thompson and Schoenfeld wrote ‘the terms “best current” therapy or “standard of care” are problematic as they imply a uniform or proven practice standard. We prefer the descriptive term “usual care” to describe de facto clinical care without any value judgment.’ (page 577).

Some of the articles used specific terms to refer to a defined usual care comparator, such as protocolised usual care [9] and devised usual care [12] (Table 1).

Drivers and actions informing the content of usual care

Synthesis of the text extracted on drivers indicated that the following should drive decisions about the content of a usual care arm: a trial’s purpose and the need for internal and external validity; existing practices; the existence and content of clinical guidelines; and vulnerability and size of the target population. We viewed these as ‘context drivers’, as they came from the context in which a trial would be implemented. We also identified ‘trial drivers’, which related to the requirements of ethical research that stipulate trials must protect their participants, produce finding that inform practice, have scientific validity and be efficient, feasible and acceptable to stakeholders. These context and trial drivers are detailed below, under individual subheadings, and listed in Fig. 2.

Fig. 2
figure 2

Drivers and actions

Through synthesising and reflecting on our findings, we visualised that when deciding the content of a usual care arm, context drivers needed to be identified and considered before trial drivers, as they would influence how the trial drivers were prioritised within a trial, which would then inform the content of the usual care comparator. For example, if the target population was viewed as vulnerable, and/or existing care viewed as substandard compared to clinical guidelines (context drivers), ensuring participant safety within a trial (trial driver) would be a high priority and one that might result in a decision to define or enhance usual care. We also realised that when accounting for different trial drivers, tensions might arise between them. For example, the need to protect participants might reduce the extent to which findings would inform current practice, if the former required usual care to be enhanced beyond what was normally provided in real-world settings. Thus, there was a sense of needing to balance or trade trial drivers against one another when determining the content of usual care (this is indicated by the arrows included in Fig. 2, between trial drivers).

Synthesis of text detailing what actions researchers should undertake when deciding what usual care should include, showed these could be categorised as actions to gather data to understand the context in which a trial would be implemented; actions to support the process of deciding which trial drivers should be prioritised within a trial and what trade-offs would be made, and what actions should be taken having decided the content of usual care.

Context drivers

The trial’s aim and the need for internal and external validity

Unrestricted usual care comparators were described as essential to pragmatic effectiveness trials where the aim was to establish the utility of an intervention compared to current practice [26]. However, Brigham et al. [28] highlighted that if the experimental intervention needed to be substantially altered for it to be implemented in a real-world setting, the aim of the trial moved away from assessing its effectiveness towards assessing its efficacy. This in turn raised the importance of the trial’s internal validity, potentially requiring some control over what the intervention was compared against, and therefore what was delivered as a comparator.

The need to balance internal and external validity within a trial was also discussed in relation to the implications of a trial resulting in a type I or type II error, in terms of whether the decision to implement or further evaluate an ineffective treatment (type I error) would have greater negative impact on stakeholders (e.g. patients and providers) than concluding a treatment was not effective when in fact it was (type II error) [30].

The specific focus of a trial, in terms of whether it assessed the use of services or the effectiveness of a treatment compared to another, was also noted as important. If the former, unrestricted usual care might be appropriate but if the latter, the usual care arm might need to include processes to ensure participants in the comparator arm accessed existing practices [26].

Existing practices and clinical guidelines

Several articles mentioned existing practices should inform decisions about comparator content. If existing care was variable within or across trial sites, or between providers, in terms of quality, availability and/or accessibility to participants, then usual care should be standardised or enhanced within a trial [12, 26]. The same applied if existing practices were substandard to some local or national standard [26]. If current practice at a site was no treatment, no treatment would be an acceptable comparator if no guidelines exists but would be unacceptable if guidelines existed showing that practices elsewhere were more effective than no treatment [26]. A no treatment usual care arm was also unacceptable if care was already being provided, as in such circumstances it would be unethical to withhold care from trial participants [10, 12].

If there was so much variation in current practice that ‘‘standard’ care appears to be a misnomer’ ([32], page 46), and therefore viewed as an inadequate control group against which to evaluate another intervention, then usual care should be defined. Similarly, if usual care was viewed as too weak, too atypical, too variable in content, and too different from the intervention to act as an adequate comparator, it should be defined [12]. The effectiveness of existing care was also highlighted. For example, Degneholtz et al. [4] discussed a trial in suicide prevention where existing practices had been linked to high suicide rates due to unrecognised, untreated or undertreated depression, and the experimental intervention aimed to improve access to care considered beneficial. The vulnerability of the target population, alongside the possibility that clinical equipoise might not exist between trial arms, meant usual care in this trial was enhanced to meet the demands of participant protection and scientific rigour.

Vulnerability and size of the target population

Already noted is the article by Degenholtz et al. [4] where vulnerability of the target population, alongside inadequate usual care practices, resulted in usual care being enhanced within a trial. In addition to this article, vulnerability of the target population influencing the content of usual care was also highlighted by three of the four articles published in the area of critical care [15, 29, 31]. These articles argued that creating a comparator that did not reflect current practice, and/or restricted the extent to which care could be tailored to the specific needs of an individual or subgroup, could result in trial participants receiving inappropriate or suboptimal care. Two of these articles provided evidence to support their arguments, by detailing three critical care trials where mischaracterising usual care had resulted in significant or fatal consequences for participants in the usual care arm [15, 29].

In terms of the size of the target population informing the content of usual care, the availability of potential trial participants could be a driver if there was a limited number of potential trial participants with the disorder or condition of interest, as conducting a large trial might not be feasible. In this situation, where the size of a trial could be limited, a researcher may want to define usual care to restrict its content and heterogeneity in order to reduce sources of variation within the trial. Reducing sources of variation would strengthen a trial’s ability to detect, if present, differences in outcomes between trial arms [31]. Defining usual care would also ensure that practices included in the comparator did not overlap with those delivered as part of the intervention. This would also strengthen the trial’s ability to detect differences.

Trial drivers

The need to protect participants, inform practice, have scientific validity and be efficient

Silverman et al. [31] argued that the content of a usual care arm should be driven by the need to conduct ethical research. They commented that to be ethical, a trial must protect participants, be of clinical value (i.e. generate knowledge that has the potential to enhance practice), have scientific validity (i.e. have sufficient methodological rigour to generate valid results) and be efficient (i.e. avoid wasting resources). As these objectives may conflict, the decision about choice of comparator might involve trade-offs and compromises. For example, as Silverman et al. [31] explained, defining usual care might increase a trial’s scientific validity, as it could reduce variation in usual care, ensuring the comparator does not overlap or drift towards the experimental intervention, maximising a trial’s ability to detect differences in outcomes between trial arms. It might also be more efficient if the defined comparator meant a smaller sample size was needed, as it would reduce time and costs required. However, it might be of less clinical value and pose greater risks to participants if it does not accurately represent existing practices.

The requirements of ethical research informing the content of a usual care comparator were also highlighted by Degenholtz et al. [4]. They described a trial where entire populations were medically screened to identify individuals for study inclusion, and thus resulted in information about an individual’s health that otherwise would not have been known. This led to researchers having an ethical responsibility to monitor participants in the control arm to ensure that any adverse consequences of their trial participation or medical condition, were addressed.

The need to be feasible and acceptable to stakeholders

The content of a usual care arm could be informed by the finances available and the size of the target population, as both factors would affect how feasible it would be to conduct a large trial. For example, if finances were limited and/or there was a limited number of potential trial participants, a trial might be restricted in terms of its size and may need to define usual care in order to have sufficient power to identify differences between trial arms [31].

Whether a trial was feasible would also depend upon stakeholder engagement and successful recruitment of trial participants. Its design, therefore, must be acceptable to those implementing the comparator arm and to potential trial participants. Clinicians or providers might object to usual care being defined if they think tailoring treatment to the needs of specific individuals would result in better outcomes [16], and the target population might have a preference of unrestricted usual care [9].

When considering the feasibility of defining usual care, researchers might also want to consider where variability in current practices stems from. If it stems from various factors, e.g. age and needs of the patient, resources availability and/or clinician preferences, defining usual care could be particularly challenging [33].

Actions to undertake when making a decision

Some articles explicitly or implicitly mentioned actions that researchers could undertake to ensure their usual care comparator would be appropriate for their specific trial. These actions gathered data on the context drivers detailed above, facilitated the decision-making process that would then occur to determine priorities within a specific trial, and detailed what should be done once a decision had been made about the content of usual care (Table 2).

Table 2 Actions to inform and document the content of usual care

Interestingly, some articles also mentioned actions that related to the design of the trial, rather than informing the content of usual care. These aimed to account for or minimise variation within usual care, without having to define it. For example, randomising providers to deliver usual care or the experimental intervention, to ensure pre-existing skill level was not a factor in any treatment effect [28], and selecting trial sites known to provide care viewed as representative of usual care [26].

Discussion

The content of usual care comparators in trials evaluating complex interventions should be informed by the aim of the trial and the extent to which internal and external validity is needed; what care is currently being provided in practice, in terms of its content, effectiveness and accessibility to the target population; the existence and content of clinical guidelines; and the vulnerability and size of the target population. In addition to these context drivers, its content will also be driven by the trial’s requirement to protect participants, inform practice, be methodologically robust, efficient, feasible, and be acceptable to stakeholders. All of these latter trial drivers need to be met for a trial to address its aim, whilst remaining ethical and feasible. This may be difficult to achieve, as one driver might indicate the need to define usual care, whilst another might suggest it should be unrestricted. Some of the actions we identified could help researchers balance such tensions, for example, developing criterion to review possible comparators, and discussing alternative comparators with policymakers and providers. However, as others have noted, trade-offs may need to be made and the most appropriate comparator will be the one that best fits with the purpose of that specific trial [10].

To date, much of the discussion around the need to clarify the content of usual care has stemmed from the recognition that its content might vary between trial sites and that its content will affect the effect size found, so needs to be carefully described when interpreting and reporting trial findings [6, 30, 34]. Our review highlights reasons for it being defined from the start of a trial and supports the observation that the ethical requirements of research can mean usual care needs to be altered [35]. In addition, whilst we noted variation in the content and accessibility of usual care as a key driver, we also noted that if existing practices were viewed as ineffective, or below standards set locally or via clinical guidelines, then usual care might need to be defined. We also noted that usual care might be enhanced to ensure clinical equipoise between trial arms, or because recruitment processes in a trial resulted in information about an individual’s wellbeing, which otherwise would not have been known. Thus, it is not simply about variability in usual care but also what is viewed as appropriate care when compared to external standards and to the experimental intervention, and how trial processes might affect the population from which it is recruiting.

Whilst the focus here was on how to determine the content of usual care comparators, others have emphasised that study processes also need to be considered when designing trial arms. For example, it has been argued that differences can exist between control and intervention arms relating to the ways in which individuals’ access and engage with care provided within them. As these differences could affect treatment outcomes, researchers should aim to reduce them when designing trial arms [36].

Considering how important the content of a usual care arm is to the ethical and methodological conduct of a trial, it is interesting that such little attention has been given to how it should be decided. This could be because, as Burns [37] argues, our understanding of trials has been limited by the view that the experimental arm represents ‘the’ intervention, and the usual care control simply a necessary structure to facilitate the trial. Burns suggests that future trials are developed, interpreted and reported as a comparison of two interventions, and that the term ‘treatment as usual’ should not be used as it implies there is some consistent background practice again which any new intervention can be evaluated. We agree with both of these statements and hope our review goes some way towards helping researcher actively consider and question the content of usual care comparators and move towards viewing them as complex interventions in their own right.

In terms of terminology, we noted that some researchers have considered what terminology they use when referring to a usual care comparator, and that specific terms have been used when referring to a defined usual care arm. Like others though, we noted researchers use a range of terms to describe usual care and interpret the usual care concept differently [35]. Perhaps if the content of usual care arms is given more thought, researchers will start to describe the care provided within them. Detailing what care is provided would enable researchers to meet requirements of reporting statements, such as CONSORT [38] and TIDieR [39], which ask researchers to clearly detail both intervention and comparator arms, and which a recent review of published trial protocols shows are not being met, with researchers providing limited or no information on the content of their usual care arms [40].

A potential limitation of our review is that we may not have identified all relevant articles because the term ‘usual care’ has not been consistently used. That said, our search strategy included terms such as standard care, usual care, usual medical care, and comparison group to address this. Differences between articles in terms of their structure, focus and content meant we adopted a narrative summary approach when synthesising and presenting findings on drivers and actions. This enabled us to provide an overview of current thinking about how to select or develop a usual care arm, but we are aware that we extracted and categorised data as drivers and actions, and these were not terms used within the articles themselves. We are also aware this methodology review gives no insight into the reality of designing trial comparator arms and that some of the actions detailed could be very challenging to address. For example, as usual care can differ between patients and practitioners, across clinical sites and over time, characterising and assessing existing care practices could be very difficult.

If the long-term ambition is to develop guidance to support researchers designing trials of complex health interventions, then an understanding of this reality and assessing the extent to which such guidance could be applied in practice would be needed to ensure it is relevant and applicable. Future research could entail conducting in-depth interviews with trialists and wider stakeholders (e.g. Directors of funding committees and trial units, Research Design Service staff, and health care practitioners) to explore their views and, where appropriate, experiences of defining usual care. Such work could highlight drivers and actions not yet detailed within the existing literature, which is limited, and indicate the feasibility of undertaking some of the actions discussed, and the most efficient or effective way of addressing them. Interviews with researchers who have defined usual care could also provide insights into the impact of defining usual care on a trial’s implementation and relevance to real-world practice. Once such in-depth work had been completed, a consensus exercise could be undertaken with researchers and wider stakeholders to agree which drivers and actions should be detailed in the guidance, and what information should be provided about how best to undertake the decisions and actions faced when defining usual care. Prior to finalising this guidance, it could be applied to two or three case study trials to assess feasibility of implementation and its acceptability to trialists.

Conclusions

Both the context within which a trial is conducted, and the needs of that trial, will determine what its usual care comparator should include. Compromises might need to be made, as tensions may arise when accounting for different drivers. What is important is that researchers actively think about the content of their usual care arm, acknowledge its strengths and limitations, and justify its selection. To be able to do this, primary research is needed at the design stage of a trial to understand what care is currently being provided in practice, what the needs of the target population are, and what clinical guidelines exist. In addition, the decision-making process that will then occur to determine content of usual care, will require engagement with stakeholders, and members of the research team to consider the needs and priorities of their trial, and what structural, population and financial factors might constrain what is possible. Such work requires time and resources and could be done as part of a feasibility study.