Dear Editor,


We are gratified by the interest raised by our paper (Rand et al. [1]) on an unanchored matching-adjusted indirect comparison (MAIC) between lebrikizumab and dupilumab maintenance therapy for patients with moderate-to-severe atopic dermatitis (AD), and are delighted to observe that it sparks debate in the form of a letter to the editor. Bastian et al. [2] argue that there are methodological issues with the paper that challenge the appropriateness and robustness of our analyses.

We will endeavour to be brief in our response, and keep focus trained on the four issues and claims raised in the letter, namely:

  1. 1.

    That the comparison should have been conducted using anchored rather than unanchored MAIC

  2. 2.

    That differences in the handling of patients using topical corticosteroids could unfairly favour lebrikizumab in our comparison

  3. 3.

    That there were errors in the reporting of baseline characteristics in Table 1 of the manuscript, potentially calling into question the results of the analyses, and

  4. 4.

    That differences in baseline characteristics remain after re-weighting, and that this suggests either inappropriate use of propensity score re-weighting methods or the omission of important prognostic factors.

Let us deal with these in the order of presentation:

Anchored vs. Unanchored MAIC

Bastian et al. [2] argue that an anchored matching-adjusted comparison (MAIC) would have been more appropriate to use, to which we disagree.

In our manuscript, we state that the maintenance phase placebo arms are not suited for anchored comparison. The rationale is based on the fact that the patients involved were re-randomized upon responding to active treatment at week 16. While these patients were indeed administered placebo in the maintenance phase, this approach is better described as treatment withdrawal from two distinctly different treatment regimens. Anchored ITC methods generally, including naïve comparison using Bucher’s approach, network meta-analysis, anchored MAIC and anchored simulated treatment comparison, all crucially rely on a strong assumption of transitivity: given evidence comparing treatments A vs. C and B vs. C, comparison of A vs. B can be estimated by use of the common arm C. This requires that C is essentially equivalent in the sense that patients from the two (or more) trials can be handled as if they were randomized to two essentially similar trial arms in a single trial; with an anchored MAIC offering some additional flexibility in terms of being able to adjust for differences caused exclusively by uneven distribution of effect modifiers between the trial populations. The latter is not the case for the patients re-randomized to placebo in the SOLO-CONTINUE and ADvocate 1 and 2 trials, a fact which can be easily demonstrated: if the two withdrawal arms were indeed comparable, we would expect similar rates of maintained treatment response. The forest plots in Fig. 1 illustrates the maintenance of EASI-75 and IGA 0/1 in the treatment withdrawal arms of the SOLO-CONTINUE and the pooled adult subsample of the ADvocate 1 and 2 trials, as well as naïve comparisons of the treatment withdrawal from lebrikizumab vs. treatment withdrawal from dupilumab.

Fig. 1
figure 1

Comparison of treatment withdrawal arms in SOLO-CONTINUE and ADvocate 1 and 2. Chance (risk) of maintaining EASI 75 (top) and IGA 0/1 (bottom) until week 52 in the treatment withdrawal arms of the ADvocate 1 and 2 trials and the SOLO-CONTINUE trial, along with relative risks of the two treatment withdrawal arms. EASI 75 Eczema 75% improvement from baseline on the Area and Severity Index, IGA 0/1 score of 0 or 1 on the Investigator’s Global Assessment, RR risk ratio or relative risk, 95% CI 95% confidence interval, SE standard error

As is evident from Fig. 1, maintenance in the treatment withdrawal arms differ greatly, with risk ratios of 1.995 in favour of the withdrawal from the lebrikizumab arm for EASI-75 and 3.724 in favour of the withdrawal from the lebrikizumab arm for IGA 0/1. We maintain that unanchored comparison is both justified and necessary for a valid comparison of these maintenance populations.

Handling of Topical Corticosteroids

Bastian et al. [2] argue that patients in the SOLO-CONTINUE trial were handled as non-responders if they made use of topical corticosteroids (TCS), while it was not clear how this was handled in our analyses. While our paper mentions that we used non-responder imputation (NRI) throughout, we were not explicit that NRI for the co-primary endpoints in the ADvocate 1 and 2 trials applied to TCS use.

From the study protocol in the ADvocate 1 and 2 trials, we have that “Patients who received topical or systemic rescue medication, discontinued treatment due to any reasons, or transferred to escape arms had values set to non-response subsequent to this time through week 52; intermittent missing values were also set to non-response. Observed results exclude data collected after rescue medication or treatment discontinuation”; and indeed similar statements can be found e.g. below supplementary Table 4 of Blauvelt et al. [3].

It is important to note that Bastian et al. [2] claim that “topical therapy was also used by up to 18.3% of patients in ADvocate 1 and 2 trials”. While this is the highest number reported, the statement is misleading, as the proportion quoted is clearly labelled as applying to the treatment withdrawal arm, which is not used in our unanchored MAIC study. The correct numbers to reference related to this unanchored MAIC would be the ones applying to the Q4W maintenance arms of the ADvocate 1 and 2 trials, wherein topical treatments were used by 16 out of 118, or 13.6% of patients (see supplementary Table S7 in Blauvelt et al. [3]); and for the QW/Q2W arm of the SOLO-CONTINUE trial, wherein 30 out of 169 patients (17.8%) used TCS rescue medication (see eTable 4 in the electronic supplement 2 for Worm et al. [4]).

Errors in Reported Baseline for SOLO-CONTINUE

Bastian et al. [2] point out apparent errors in the reported baseline characteristics of the SOLO-CONTINUE trial in Table 1 of our manuscript, with wrong reported rates of EASI-75 and IGA 0/1 at initiation of maintenance treatment. We greatly appreciate this error being pointed out.

This is a typographical error that appears to have occurred at the writing stage. Avid readers may have noticed that there was an online appendix including a full listing of candidate baseline variables considered for use in the propensity score weighting step of the MAIC, a list which includes age, sex (male), scores at week 16 on EASI, DLQI, POEM, IGA, %BSA, and proportions of Asian, other, or non-white. Given the presence of reported mean and standard errors for the continuous scores for EASI and IGA, EASI-75 and IGA 0/1 at week 16 were not extracted or included in any analyses. The corresponding rows for EASI-75 and IGA 0/1 at week 16 in Table 1 of the manuscript should not have been included and appear to have been populated with numbers corresponding to these outcomes at week 52.

While it is indeed unfortunate that these were erroneously reported in Table 1 of the manuscript, these specific scores have no bearing on the analyses presented in the paper, as the numbers in question did not enter into any of the analytical procedures. A rectified table with correct numbers for week 16 EASI-75 and IGA 0/1 has been submitted to Dermatology and Therapy for inclusion in an errata associated with the paper.

To summarize, the reporting of variables used in the analyses are all correct.

Remaining Differences After Re-weighting

Bastian et al. [2] point out that differences in aggregate baseline scores remain after re-weighting and suggest that this may reflect issues with the identification, inclusion, and re-weighting conducted. While it is indeed correct that minor differences remain after propensity score weighting, this reflects the number of factors included in the analyses combined with the specific makeup of the population being matched vs. the aggregate statistics for which matching is conducted; while it will usually be possible to re-weight singular aggregate characteristics to more or less exactly match a target, the inclusion of multiple imperfectly correlated parameters will typically result in situations where improving the adjustment for one parameter inevitably results in worse fit for one or more other parameters. In our base case, we adjusted for baseline EASI (mean + SD), %BSA (mean + SD), age (mean + SD), proportion white, proportion male, for a total of eight matching parameters. The online supplementary material for Rand et al. [1] includes a range of sensitivity analyses that were conducted alongside.

Imperfect adjustments can be mitigated by increasing the sample size (typically not feasible in post hoc analysis) and by reducing the number of parameters included in the matching exercise. The latter comes at a potential cost to the validity of the comparison. The matching procedure employed is the one used by NICE in their technical support document on population-adjusted indirect comparisons (NICE TSD 18 [5]). More to the point, considering that the reported findings are robust to the selection of variables included in the propensity score weighting exercise, readers familiar with MAIC will note that the differences remaining in the base case (and the sensitivity analyses) are very minor indeed.

Conclusion

  • The results and conclusion of the unanchored MAIC conducted are robust and appropriate, considering available data suffices for the suggested ITC approach.

  • We hold and maintain that anchored MAIC cannot be justified in this case, given the distinct differences in the maintenance of treatment effect in the two treatment withdrawal arms.

  • There are no differences in the statistical handling of data collected after TCS use between SOLO-CONTINUE and ADvocate 1 and 2.

  • The typographical errors for EASI-75 and IGA 0/1 at baseline, which were not used in any analyses, have no impact on the reported findings. A corrected table has already been submitted to the journal and will be published as an erratum.

  • The reported differences after re-weighting are of a magnitude to be expected considering the number of included factors for re-weighting, and do not contribute to uncertainties regarding the interpretation of the reported findings.