FormalPara Key Summary Points

Why carry out this study?

Crisaborole is a phosphodiesterase-4 inhibitor that is approved in multiple regions/countries globally for the treatment of mild-to-moderate atopic dermatitis (AD). There are no head-to-head randomized controlled trial (RCT) data currently available on the efficacy and safety of crisaborole versus comparator agents (particularly, topical calcineurin inhibitors) for use by healthcare decision-makers.

This analysis compared the efficacy of crisaborole 2% to that of several other AD treatments by using the unanchored matching-adjusted indirect comparison (MAIC) method, which reweighted individual patient data (IPD) for crisaborole to estimate absolute response in comparator populations

What was learned from the study

The results of the unanchored MAIC suggest that the odds of achieving an Investigator’s Static Global Assessment/Investigator Global Assessment (ISGA/IGA) scores of 0/1 (“clear or almost clear”) is greater with crisaborole 2% than with pimecrolimus 1% or tacrolimus 0.03% in patients aged ≥ 2 years with mild-to-moderate AD.

The results from this unanchored MAIC are consistent with the findings from a previously published network meta-analysis that used a different methodology for conducting indirect treatment comparisons and included adjustment for heterogeneity of vehicle effect.

The results from this unanchored MAIC may help to inform clinicians and healthcare decision-makers in the management of AD.

Introduction

Atopic dermatitis (AD) is a chronic inflammatory skin condition that affects up to 20% of children and 3% of adults [1, 2], with most of those affected having mild-to-moderate disease severity [3]. Treatment guidelines recommended by the American Academy of Dermatology in 2014 include topical corticosteroids (TCS) and topical calcineurin inhibitors (TCIs) for mild AD, phototherapy for the management of mild and moderate cases of AD [4], and biologics and immunosuppressants for moderate-to-severe disease [5, 6]. The European Academy of Dermatology and Venereology (EADV) consensus-based guidelines from 2018 and EADV position paper from 2020 also make similar recommendations relative to the role of TCS, TCIs, biologics and immunosuppressants in management of AD according to severity [7, 8].

A novel way to manage AD is to inhibit phosphodiesterase-4 (PDE4), an enzyme that is associated with a pro-inflammatory response and cytokine release [9]. Crisaborole topical ointment, 2%, is a nonsteroidal, topical anti-inflammatory PDE4 inhibitor that is approved for the treatment of mild-to-moderate AD [10]. No head-to-head randomized controlled trials (RCTs) are available to evaluate efficacy and safety of crisaborole versus TCIs, despite comparative data being required for healthcare decision-making and evaluations for reimbursement and access.

To address this evidence gap, a network meta-analysis (NMA) was performed which showed that crisaborole 2% was superior to vehicle and pimecrolimus 1% and comparable to tacrolimus 0.1 or 0.3% at 28–42 days in patients aged ≥ 2 years with mild-to-moderate AD in terms of the efficacy outcome Investigator’s Static Global Assessment (ISGA) 5-point rating scale scores of “clear or almost clear” (ISGA 0/1) [2]. Safety outcomes were not compared because of differences in reporting (e.g. thresholds used), outcome definitions and study publication dates, which are common issues in indirect comparisons [11]. The efficacy analyses in the NMA required an adjustment for variations in vehicle response rates across studies included in the NMA. This variation in vehicle response is partly explained by differences in their active ingredients [12] and, therefore, they cannot be viewed as a placebo. For example, the base ointment of crisaborole 2% contains propylene glycol, a well-known humectant commonly used in skin care formulations, and emollients, such as soft paraffin and hard paraffin [13]. In vehicle-controlled RCTs assessing similar populations with mild-to-moderate AD, 40.6% and 29.7% of vehicle-treated patients achieved the endpoint of ISGA 0/1 at 4 weeks of treatment in each of two crisaborole comparator studies versus only 19.5% of vehicle-treated patients achieving this endpoint in a tacrolimus comparator study [14, 15].

Baseline risk regression was used in the previous NMA to adjust for differences in vehicle response, but the heterogeneity is such that the evidence network is not truly connected via vehicle [16]. In such cases, the unanchored matching-adjusted indirect comparison (MAIC) method has been recommended in the literature and by healthcare payers, including the UK National Institute for Health and Care Excellence (NICE) [17, 18]. This statistical approach does not use vehicle controls, but instead adjusts for imbalances in effect modifiers and prognostic factors using individual patient data (IPD) for crisaborole and aggregated data for comparators from published trials (previously included in the NMA).

The objective of the current analysis was to compare treatments using the unanchored MAIC method [18, 19], in which efficacy results between crisaborole 2% with pimecrolimus 1%, tacrolimus 0.03% and tacrolimus 0.1% were assessed in patients aged ≥ 2 years who had mild-to-moderate AD.

Methods

Systematic Literature Search

The efficacy of crisaborole and comparator drugs were evaluated from published randomized clinical studies that were included in the NMA. The process for selecting articles and for extracting data was published previously [2]. The selected studies included a mix of pediatric and adult populations and is described in detail in this prior publication [2]. The comparator studies included those for pimecrolimus 1% [20,21,22,23], tacrolimus 0.03% [15, 20, 21, 24] and tacrolimus 0.1% [22].

Statistical Analysis via the Unanchored MAIC

Because vehicle controls in the identified studies are not comparable, anchored indirect comparison or NMA would be biased [12, 17]. The UK healthcare decision maker (NICE) recommends the use of unanchored population-adjusted indirect comparisons, which accommodate for study differences when common comparators cannot be used to perform anchored indirect comparisons [17, 18]. In this analysis, MAIC was utilized, which did not include vehicle controls [19]. The ISGA and Investigator’s Global Assessment (ISGA/IGA 0/1) at 28–42 days were chosen to measure efficacy given that the registration studies for crisaborole used the ISGA to evaluate efficacy and that most of the RCTs on comparators reported these outcome measures.

Other measures of efficacy and severity, such as the Eczema Area and Severity Index (EASI) and SCORing Atopic Dermatitis (SCORAD), were not included in the crisaborole registration studies. Utilizing ISGA as the primary outcome aligns with the previously published NMA [2], and with the guidance from the US Food and Drug Administration that recommends ISGA for the assessment of global disease severity in AD [25].

The estimator of the relative effect between intervention A (i.e. crisaborole) and comparator B (e.g. pimecrolimus or tacrolimus) in the population of the RCT on B is

$${\widehat{d}}_{AB(B)}=\mathrm{g}\left({\widehat{Y}}_{A(B)}\right)-\mathrm{g}\left({\widehat{Y}}_{B\left(B\right)}\right),$$

where \({\widehat{Y}}_{A\left(B\right)}\) is the estimator of absolute response on A in the comparator B population, \({\widehat{Y}}_{B(B)}\) is the reported absolute response of B in the B population and \(g()\) is a link function to a scale on which treatment effects are additive (i.e. the linear predictor scale). In the current analysis, absolute response was the proportion of patients achieving ISGA/IGA 0/1 and \(g()\) was a logistic link function converting to the log odds scale. The MAIC estimator \({\widehat{Y}}_{A\left(B\right)}\) uses propensity scores, themselves estimated by logistic regression on patient characteristics to reweight the IPD for intervention A, so the distribution of effect modifiers and prognostic variables are matched to the B population [17, 18].

This process involves two steps: (1) IPD were reweighted to ensure that covariate distributions in the trial on treatment A match the population of comparator B; and (2) the estimated weights were used to calculate \({\widehat{Y}}_{A\left(B\right)}\). The effective sample size (ESS) for each comparison is estimated using patient weights; these are always reduced from the total sample on crisaborole (i.e. 1021 patients), and lower values indicate poor overlap with comparator trials and overreliance on a subset of patients.

A histogram of patients’ weights was also generated to test for overreliance on a subset, which would be indicated by extreme weights for a small number of patients. Patient characteristics before and after matching were compared between crisaborole and comparator RCTs to assess the success of the reweighting process. An unweighted “naïve” estimator \({\widehat{Y}}_{A\left(A\right)}\) (i.e. absolute response in the A population) was used as a sensitivity analysis.

When multiple comparator studies were available, \({\widehat{Y}}_{B(B)}\) was the output of a random effects meta-analysis. The weights of this meta-analysis were used to form weighted averages of proportion, mean and standard deviations (SD) of baseline characteristics across comparator studies, to which the IPD were then matched. Unanchored MAIC was conducted in R based on recommendations from NICE [18]. The ‘sandwich’ package was used to generate effect estimates as it correctly propagates uncertainty in patient propensity scores through to relative effect estimates. Results are summarized as odds ratios on ISGA/IGA 0/1 for crisaborole versus comparators. Two-sided p values and 95% confidence intervals were calculated by assuming the log odds of ISGA/IGA 0/1 followed a normal distribution.

NICE and published literature recommend that all effect modifiers and prognostic variables be included in the logistic regression used to estimate propensity scores for unanchored MAIC [17, 18]. Effect modifiers and prognostic variables were identified through a literature review, expert clinical opinion and regression analyses (Electronic Supplementary Material Files 1–3), a triangulation approach previously used for MAIC in psoriatic arthritis. Identified variables to match, when reported, were age (mean and SD), proportion male, proportion Caucasian, percentage body surface area, ISGA/IGA score, proportion receiving prior TCI and proportion receiving prior TCS. Characteristics not reported by both the crisaborole and the comparator trial were omitted from the propensity score regression model.

Ethics Compliance

This article is based on data from previously conducted studies and does not contain any studies with human participants or animals performed by any of the authors.

Results

The evidence for higher odds of achieving an ISGA 0/1 improvement was evident for crisaborole 2% versus pimecrolimus 1% (OR 2.03; 95% CI 1.45–2.85; ESS = 627, reduced from 1021; p value < 0.001) and also evident for crisaborole versus tacrolimus 0.03% (OR 1.50; 95% CI 1.09–2.05; ESS = 311, reduced from 1021; p = 0.012) (Fig. 1). Unweighted naïve comparisons were similar to the estimated effect size findings.

Fig. 1
figure 1

Odds ratio (95% confidence interval) and two-sided p values of achieving an Investigator's Static Global Assessment improvement score of 0–1 (ISGA 0/1) with crisaborole versus pimecrolimus 1%, tacrolimus 0.03% and tacrolimus 0.1%. ESS Estimated effect size

Comparison of crisaborole versus tacrolimus 0.1% was infeasible due to a large reduction in ESS (to 94) and highly skewed patient weights histogram (Fig. 2), indicating an over-reliance on a small subset of crisaborole patients and poor overlap between trials.

Fig. 2
figure 2

Patient weights histograms for unanchored matched-adjusted indirect comparison

Crisaborole 2% versus pimecrolimus 1% MAIC gave similar weight to a large proportion of the population and had no clear outliers (Fig. 2).

The crisaborole 2% versus tacrolimus 0.03% comparison gave zero weight to a larger proportion, which was reflected by the lower effective sample size, but was not very reliant on a small subset.

A comparison of patient characteristics before and after matching crisaborole 2% to pimecrolimus 1% and to tacrolimus 0.03% was made. Good quality matching was evident in both the means and, for the case of age, standard deviations (Tables 1, 2). However, the potentially important characteristics of ISGA at baseline, prior TCI use and prior TCS use were not reported by any comparator RCT so there was no feasibility for matching them.

Table 1 Comparison of matching variables (effect modifiers or prognostic variables) before and after matching to pimecrolimus 1% randomized controlled trials
Table 2 Comparison of matching variables (effect modifiers or prognostic variables) before and after matching to tacrolimus 0.03% randomized controlled trials

Discussion

This study sought to compare the efficacy of crisaborole 2% with pimecrolimus 1%, tacrolimus 0.03% and tacrolimus 0.1%, in patients with mild-to-moderate AD using the MAIC method. The findings showed that crisaborole 2% had higher odds of achieving ISGA/IGA 0/1 at 6 weeks than tacrolimus 0.03% or pimecrolimus 1%. Comparison with tacrolimus 0.01% was infeasible due to limited overlap between the crisaborole 2% and tacrolimus 0.01% RCTs, indicated by an ESS reduction from 1021 to 94.

In the NMA, lack of connected networks (comparability of vehicle controls) and balance of effect modifiers supported the use of an unanchored MAIC. The strength of this study design is that it aligns with the NICE guidelines for population adjusted indirect comparison methods when head-to-head RCTs are not available and when networks are disconnected. The methodology was closely aligned to those presented in NICE Decision Support Unit Technical Support Document 18 [18], and the indirect comparisons were assessed on a linear predictor scale of log odds ratios.

Our choice of the method MAIC occurred before the recent publication of simulation studies indicating that simulated treatment comparison may be less biased [26]. However, this simulation study was only conducted for the anchored case (i.e. when vehicle control could be used) and not the unanchored case (i.e. our situation, where only single arms from RCTs were used). Therefore, the relevance of the method to the current research is unclear. A more relevant simulation study is that of Hatswell et al. [27] which found that unanchored MAIC can reduce bias and produce valid comparisons if patient numbers are not low.

As with all studies, some limitations should be acknowledged. Potential effect modifiers or prognostic variables, including family history, use of emollients, lesion characteristics or other allergic comorbidities, were not recorded in certain datasets. Published studies may have lacked standardization on how IGA was implemented, and hence direct comparison between studies was not feasible [28].

Although the weighting of the unanchored MAIC provides an indirect comparison of treatments in the same population, the patients eligible to receive crisaborole may not align with comparator populations; this raises the need for head-to-head trials to compare treatment groups and to reduce the need for indirect comparisons. Moreover, a comparison on odds ratios of achieving ISGA/IGA 0/1, rather than mean differences in ISGA/IGA, limits clinical interpretability as a minimum clinically important difference is difficult to specify. Alternative outcomes, such as SCORAD and EASI, were not collected in the crisaborole RCTs, and it was thus infeasible to compare with pimecrolimus or tacrolimus on these outcomes. Pruritus data were collected during crisaborole RCTs, although a standard instrument was not used and, therefore, the results for pruritus could not be evaluated in comparison to other studies.

The ESS indicated a moderate reduction (from 1021 to 627) when comparing crisaborole 2% with pimecrolimus 1% but a more substantial reduction (from 1021 to 311) when compared with tacrolimus 0.03%. Thus, the MAIC comparison with tacrolimus 0.03% was based on a limited subset of the crisaborole RCT population.

The indirect comparison methods used in the current study and a previous study [2] are accepted as reliable and robust methods by NICE guidelines [18] although these methods cannot replace a head-to-head comparison in a randomized controlled trial (not currently available), when determining superiority. Therefore, caution should be used when interpreting the results of the current study.

Conclusion

Unanchored MAIC suggests that the odds of achieving an ISGA/IGA 0/1 is greater with crisaborole 2% than with pimecrolimus 1% or tacrolimus 0.03% in patients aged ≥ 2 years with mild-to-moderate AD. This is consistent with the findings from the previously published NMA, which had used a different methodology for the conduct of indirect treatment comparisons and included adjustment for heterogeneity of vehicle effect. These results may provide useful evidence for clinicians and health care decision-makers in the management of AD.