Digital Features for this article can be found at https://doi.org/10.6084/m9.figshare.16655143.

FormalPara Key Points

Contemporary moderate-to-severe atopic dermatitis trials have many parameters that vary and may have significant impact on efficacy results.

Key study design and analysis parameters will need to be considered when interpreting clinical trials results to appropriately implement findings into clinical practice.

1 Introduction

Atopic dermatitis (AD) is a chronic inflammatory skin disease that causes significant morbidity for patients, which when unaddressed may impact psychosocial wellbeing [1,2,3,4,5]. Prior to 2017, treatment options for AD in the US were largely limited to topical corticosteroids (TCS) and off-label use of immunosuppressants [2, 6]. There was recently a marked increase in AD clinical research, particularly for moderate-to-severe disease. The number of clinical trials initiated in participants with moderate-to-severe AD increased from 1 in 2011 to 38 in 2018 (ClinicalTrials.gov, retrieved 23 February 2021). Dupilumab was the first biologic approved for use in patients with moderate-to-severe AD, and another biologic, tralokinumab, has received authorization in the EU and other health authorities [7]. Janus kinase (JAK) inhibitors are also an emerging treatment option [8, 9]. The AD therapeutic pipeline remains robust, with 20 trials initiated in 2020 alone (ClinicalTrials.gov, retrieved 23 February 2021) and 17 trials initiated to date in 2021 (ClinicalTrials.gov, retrieved 24 August 2021). As our understanding of AD pathophysiology evolves, investigations continue to support the development of new treatments through further elucidation of underlying inflammatory pathways and identification of disease biomarkers [6].

As with any rapidly developing area of study, various trial designs have been utilized to assess the effectiveness of investigational treatments for AD [10,11,12,13,14,15,16,17,18,19,20,21]. Depending on trial timing, guidance from the regulatory bodies may also change, adapting to the evolving treatment landscape. The combination of different trial designs with evolving regulatory guidance poses challenges when interpreting clinical trial results and incorporating evidence into clinical practice. Clinical dermatologists are left to weigh the importance and impact of trial design factors, which may have a material impact on trial outcomes. While indirect treatment comparisons (e.g., network meta-analyses) are often used to derive some insights from different trials, such efforts remain limited by between-trial differences in key study parameters. There are several ongoing efforts to improve the consistency of clinical research in AD, including the Harmonizing Outcome Measures for Eczema (HOME) initiative, which offers guidance for minimum mandatory outcomes measures [22] and standardizing study design elements in pediatric AD trials [23]. Despite these efforts, important clinical trial design elements remain variable across many phase III studies. As such, those seeking to make evidence-based clinical decisions must evaluate clinical trial results in the context of varying study designs [24], statistical methodology [25], and additional parameters that might impact trial efficacy outcomes.

In this paper, we review which key study parameters may impact efficacy outcomes of clinical trials conducted in participants with moderate-to-severe AD. We also discuss the potential impacts of these parameters on one another. We aim to provide guidance to aid interpretation of randomized controlled trials in AD and raise awareness about the need for harmonization of study design elements in clinical trials conducted in participants with moderate-to-severe AD.

2 Method for Identification of Key Clinical Trial Parameters Impacting Efficacy Outcomes

As a first step in identifying key study parameters in contemporary clinical trials in moderate-to-severe AD, two moderated meetings were convened in December 2020, with participation of all six authors, followed by an initial free-form survey to determine items for consideration. Authors were identified by the sponsor based on their expert knowledge of randomized controlled trials in moderate-to-severe AD; all authors are investigators/authors in recent clinical trials for new systemic and biologic treatments in AD. Survey responses, complemented with findings from a review of the literature and trial protocols for phase III moderate-to-severe AD trials, were used to identify 22 initial trial parameters (Fig. 1). Subsequently, the author group completed a second survey to rank these 22 identified parameters by importance, based on their potential impact on efficacy outcomes. An overall list of ranked parameters was derived from survey responses using the minimum sum of rank scores. Unranked parameters were assigned the average of missing rankings per individual, with the assumption that total rank scores should be the same for all survey respondents (Table 1).

Fig. 1
figure 1

Conceptual framework of biologic/systemic therapeutic trials in moderate-to-severe AD. Multiple parameters spanning clinical trial design (top), execution (center), and statistical analysis (bottom) may influence efficacy outcomes over the course of a phase III clinical trial for moderate-to-severe AD. AD atopic dermatitis, AEs adverse events, EASI Eczema Area and Severity Index, H2H head-to-head, HSV herpes simplex virus, IGA Investigator Global Assessment, LTE long-term extension, PROs patient-reported outcomes, TCS topical corticosteroids, VZV varicella-zoster virus

Table 1 Ranking of key clinical trial parameters impacting efficacy outcomes

3 Key Clinical Trial Parameters that Impact Interpretation of Efficacy Outcomes

3.1 Inclusion/Exclusion Criteria

Inclusion and exclusion criteria are intended to specify an appropriate study population for a given trial. Certain criteria can directly impact observed trial safety/efficacy outcomes and therefore it is important to understand differences in inclusion/exclusion criteria when comparing studies. Often, subtle details may skew the make-up of the study population.

For example, if one clinical trial was to exclude participants with asthma while another included this population, the trial that excludes asthma participants may mitigate any adverse event signal associated with asthma compared with a trial that enabled participants with comorbid asthma to enter. This could similarly be the case for the inclusion/exclusion of participants with prior infections, major adverse cardiovascular events, or venous thrombotic events. Such differences in observed safety outcomes may be more attributable to patient selection than treatment effect.

Efficacy may also be affected if participants in a given trial were not permitted to enter unless they had previously not responded to certain treatments. For example, if the trial only enrolled participants who had previously not responded to a certain standard-of-care treatment, the trial may examine a comparatively more severe and refractory patient population than a study that did not have such inclusion criteria.

3.2 Washout Period Duration

Washout period refers to the time prior to baseline when previous treatments are cleared from a patient’s system, such that effects of an investigational drug are not confounded by the previous treatment [26]. Washout period duration varies depending on the medication with generally longer washout periods for systemic therapies relative to topical therapies [10,11,12,13,14,15,16,17,18,19,20,21]. Optimal washout length also ensures that the patient is at the eligible baseline disease severity.

The length of a washout period may have cascading consequences on various aspects of the subsequent stages of a trial. A longer period (e.g., 4 weeks) reduces the likelihood of a carryover treatment effect, but may result in greater disease severity at baseline, potentially leading to increased rescue treatment use, especially early in a study [27]. Conversely, a shorter washout period may mean the prior treatment still impacts the patient and may reduce the likelihood that participants will flare prior to randomization or may need rescue treatment (typically TCS) to control disease shortly after randomization. Longer washout periods may be a deterrent/disincentive to patient participation, especially among participants with more severe AD due to fear of flaring, raising ethical considerations. Indeed, patient input into study protocol design contributed to the decision to use a short washout period in a recent trial [15]. This relationship between washout length and baseline disease severity warrants further investigation.

3.3 Comparator

Placebo-controlled randomized trials are the gold standard for clinical research and are required for regulatory approval. Considerations around placebo utilization in AD clinical trials and practical suggestions for trial design were recently expertly reviewed [27]. When attempting to compare active agents from independent placebo-controlled trials, attention must be paid to the response rate to placebo in each trial, which may be influenced by background treatments/rescue therapies.

To reflect real-world practice, many clinical trials for moderate-to-severe AD compare placebo and TCS with the experimental drug and TCS, as placebo alone may not meet standard-of-care and may expose participants to a risk of worsening AD and reduced quality of life [27]. However, if background TCS is used on an as-needed basis, participants in the placebo arm may use more TCS than participants in the active treatment arm [10, 13]. This could result in higher levels of AD improvement in the placebo arm and lead to an underestimation of the perceived placebo-adjusted treatment effect of the experimental drug.

Placebo responses may also be influenced by the method by which participants obtain rescue treatment in a trial. Some trials supply TCS directly to participants [14, 17], which minimizes barriers to use. In such trials, participants may use TCS more consistently and in greater amounts than they would in the real-world where a prescription is typically required [28]. Other trials do not supply TCS and require participants to obtain new TCS prescriptions [10, 13]. In some instances, there may be a financial barrier to using TCS if these are not provided. This potentially leads to non-standardized TCS potencies, vehicles, quantities/application frequencies, and likely lower quantities of TCS used overall [29].

3.4 Use of Rescue Treatment

As previously mentioned, rescue treatment may be offered to ensure AD is adequately controlled for study participants, with TCS being the most common rescue treatment in AD monotherapy trials. Rescue treatment rules may vary substantially across clinical trials, with some permitting use throughout the active trial periods [10, 13, 17] and others limiting use to a defined period (e.g., only after 2 weeks of treatment) [14] or limiting rescue treatment use entirely [15, 18]. When TCS as needed is used for all study arms, the precise regimen for the rescue treatment may be defined differently across clinical trials. Different trials may also provide varying degrees of patient instruction around rescue treatment use. These variations may drive differences in trial outcomes; the potential efficacy implications of rescue treatment use across the various clinical trial phases are illustrated in Fig. 2.

Fig. 2
figure 2

Efficacy implications of variations in rescue treatment. The duration of prior therapy washout during screening may impact the likelihood of flares, which subsequently impacts the need for rescue treatment in the treatment period. Additionally, residual treatment effects due to a short washout period may mask the true severity of the study population, skewing the participants who may meet the inclusion criteria for the treatment period. During the treatment period, whether or not rescue treatment is permitted has implications for trial discontinuation rates. Those who need rescue treatment may be considered non-responders in some trials; how non-responders are statistically accounted for may influence response rates. Long-term extension trials may also set rules around the inclusion/exclusion of participants who required rescue treatment during the treatment period, which may influence the patient profile in this phase. NRI non-responder imputation, TCS topical corticosteroids

Given the waxing and waning nature of AD, screening duration and washout specifications during this period can impact disease severity (e.g., Investigator Global Assessment [IGA] or Eczema Area and Severity Index [EASI] scores) at baseline [27]. As previously discussed, participants enrolled in trials prohibiting TCS use during screening or with longer washout periods may experience more disease exacerbation at study initiation than those in trials permitting TCS. Trials involving a large proportion of participants with moderate disease may have comparatively higher efficacy results than those mostly comprising participants with severe disease given the differences in baseline disease severity.

Notably, use of rescue treatment during the initial trial period may impact the patient population that enrolls in the long-term extension phase. Entry into the long-term extension phase of some trials may be restricted to participants who complete the initial treatment period without needing rescue treatment. Conversely, in other trials, participants needing rescue treatment may be eligible for the long-term extension phase.

Rescue treatment use may also influence patient behavior during the initial trial period. If participants are aware that they will not be eligible for the long-term extension after rescue treatment, they may be less likely to use it. In trials where rescue use leads to participants being considered non-responders, this behavior change would translate to impacts on the statistical analysis of the study.

3.5 Nature of the Efficacy Analysis Set

Many aspects of statistical analysis may differ between trials and may influence study outcomes, with a key factor being the efficacy analysis set, defined as the patient population used for the determination of efficacy. During the randomized phase, the efficacy analysis set is often well-defined and approved by regulatory bodies. This may not be the case in long-term extension trials. The estimate of overall treatment response may be conservative if a large proportion of participants were non-compliant or withdrew from the trial. Conversely, a responder-enriched analysis set, comprising participants who showed a strong treatment response, may result in an artificially exaggerated response rate. Including all participants in the long-term extension may yield numerically lower efficacy data than only including those who were responders at the time of the primary endpoint.

3.6 Missing Data Handling and Data Censoring

Approaches for handling missing data have grown more standardized in recent years, with international efforts to arrive at statistical frameworks for use across trials for many therapeutic areas [30]. Until such standards are broadly implemented, it is important to note the statistical methods used when interpreting clinical trials.

Missing data may arise due to study discontinuation or lost to follow-up. The advantages and disadvantages of various common imputation approaches for handling missing data were reviewed previously in the context of psoriasis trials [31]. A key learning for moderate-to-severe AD trials is to consider differences in imputation strategies to understand possible statistical biases and inform more meaningful comparisons [31]. For example, some trials may use the last observation carried forward (LOCF) approach, assigning the value recorded at the patient’s last visit to subsequent missed visits; this may overestimate the sustained response or underestimate the improved response the participants may have experienced if they stayed in the trial [32].

Data censoring may also be needed in the context of rescue therapy, if a study specifies that participants needing rescue treatment are to be imputed as non-responders. This conservative approach, known as non-responder imputation (NRI), assigns a value of non-response to the censored data point [31, 32]. When NRI is used, the investigational treatment may be perceived to have lower efficacy, regardless of treatment response up to that point [31, 32]. Varying rules around rescue treatment may impact this aspect of trial design. For example, permitting rescue treatment immediately after randomization may result in more participants being considered non-responders due to rescue use in the early weeks of the trial.

4 Concluding Remarks

4.1 Discussion

There are many ongoing and completed clinical trials for the treatment of moderate-to-severe AD, and their methodological differences make it challenging to comparatively interpret trial results. It is critical that the key study parameters, including comparator, rules for rescue treatment, and washout periods, are recognized and their implications understood. We have provided our perspectives on such parameters, with reasonable consensus regarding those of highest importance. There remains a paucity of evidence in AD to inform clinical trial designs and statistical analysis, and learnings from other therapeutic areas, such as psoriasis, may offer some insights into clinical trial design and interpretation [31].

Increased harmonization of primary outcome measures will also serve to aid the interpretation of AD clinical trials. Outcomes recommended by the HOME initiative could be widely utilized as the minimum set of outcomes, with any additional outcomes dictated by the objectives of a particular study [22]. In addition to type of outcome measurement, the timing of measurements would also be helpful to standardize; an examination of early efficacy in addition to longer-term endpoints at 6 or 12 months to assess sustained AD control would be valuable. However, certain trial design factors, such as study visit schedule or study population, may remain subject to variation due to unique study objectives, differences in drug mechanism of action, or hypotheses being tested.

Another variable aspect of clinical trials that confounds the interpretation of results is participant adherence to the study drug. Reduced adherence to a study drug may lower response rates due to lower effective dose of the drug, which in turn could increase the need for rescue treatment. Such adherence issues are common in the treatment of AD, particularly for topical treatments, with poor adherence in clinical trial settings potentially due to complicated administration instructions, time burden for treatment application, or safety concerns [33, 34]. While drug adherence itself is not a component of trial design, trials may monitor and/or report adherence differently, and low adherence may have cascading impacts on other aspects of a trial in a manner similar to the parameters discussed here. Studies of topical therapies often weigh tubes to measure drug use, and this is commonplace for documenting TCS rescue therapy use in AD trials. However, monitoring adherence in studies of oral or injectable therapies is more challenging; in psoriasis trials, studies using these administration routes often rely on participant reports or prescription refill records, which have variable reliability [35]. The assessment of AD clinical trial results may include considerations of the impact of adherence on results, rescue medication use, and subsequent missing data handling if non-adherent patients are excluded from the results.

In addition to the points raised herein, across-trial interpretation may benefit from increased reporting of the number needed to treat (NNT) and absolute risk reduction (ARR). The NNT represents the number of participants needed to be treated to achieve one additional positive outcome relative to a control group [36, 37]. The ARR expresses the difference in frequency of negative/unfavorable outcomes in the arms of a trial [38]. Since 2001, the CONSORT group (CONsolidated Standards Of Reporting Trials) recommends reporting the NNT in the results of randomized controlled trials [39]. While NNT and AAR also have limitations related to study design, their regular reporting would provide an additional measure when seeking to understand differences between agents studied in independent trials [40].

4.2 Conclusions

Overall, this article highlights the key aspects of clinical trial design that may impact treatment efficacy, and will limit the validity of side-by-side comparisons of clinical trial data. Our review highlights how certain study parameters may impact other aspects of study design. These interactions pose challenges for medical experts seeking to compare the outcomes of two different trials. The differences between trials may impact active treatment and placebo arms differently, which further complicates comparisons. This raises a need for caution towards trial comparisons and substantiates the need to harmonize study procedures in moderate-to-severe AD trials and establish a standardized approach for analyzing trial results. We welcome further research investigating the impact of these differences, as well as efforts to harmonize AD trials.