Dear Editor,

We read with interest the comments from Cree et al. [1] on our recently published study describing a network meta-analysis (NMA) comparing the efficacies of eculizumab, satralizumab, and inebilizumab for neuromyelitis optica spectrum disorder (NMOSD) [2]. Here, we acknowledge and address several points raised by the authors in the order they were presented in the letter.

The aim of our study was to estimate the relative treatment effects between eculizumab, satralizumab, and inebilizumab by applying principles of NMA to available randomized controlled trial (RCT) evidence. Cree et al. contend that a matching-adjusted indirect comparison (MAIC) is the “gold standard” when estimating relative effects of competing treatment interventions. We respectfully disagree with this view [3]. Different comparative methodologies based on available data sets such as NMA, MAIC, and simulated treatment comparisons each have advantages and disadvantages when evaluating comparative efficacy given the evidence available. With a MAIC, individual patient data from PREVENT would be re-weighted using propensity score methods to match the PREVENT population to another trial sharing a common control arm and for which only aggregate level data are available; this would be done in order to reduce differences in study populations between the two trials prior to an indirect treatment comparison analysis. However, among the limitations of MAIC methodology is that it only allows for indirect comparison between two trials. We could not have matched the individual patient data from the PREVENT subject sample simultaneously with that of several other trials (e.g., N-MOmentum, SAkuraSky, and SAkuraStar). Therefore, we believe that an NMA remains the appropriate method to generate comparative evidence of the three interventions.

Cree et al. cite differences in enrolled patient populations between the RCTs, including prior attack history, disease duration, and baseline disability, as potential causes of systematic bias in an indirect treatment comparison. It is important to assess differences amongst individual studies included in an NMA but only with regards to effect modification. One must distinguish between variables that are effect modifiers and those that are prognostic factors and then determine whether there are systematic differences in the distribution of effect modifiers between the trials [4, 5]. Effect modifiers are study design or patient characteristics that influence the efficacy of the treatment, resulting in different relative treatment effect estimates, e.g., different hazard ratios (HRs) for relapse. We considered aquaporin-4 immunoglobulin G (AQP4-IgG) serological status a treatment effect modifier, leading to either exclusion of seronegative patients or stratification of analysis. Background immunosuppressive therapy (IST) was also regarded as a treatment effect modifier. We therefore focused on AQP4-IgG seropositive patients and conducted separate networks, isolating the impact of background ISTs.

In contrast to effect modifiers, prognostic factors are variables that are associated with the outcome of interest irrespective of the treatment. Differences in prognostic factors between studies are not a source of bias in NMA of RCTs because their impacts cancel out in the trial-specific relative treatment effects owing to randomization [4, 5]. We consider variables related to trial inclusion criteria in our NMA to be prognostic factors rather than effect modifiers. Indeed, subgroup analyses from each of the NMOSD trials have examined factors such as age, sex, race, disease duration, and baseline disability and have not detected an association with treatment effect for any drug, supporting our approach. We acknowledge that unknown effect modifiers are always a possible source of bias.

Among the differences in patient characteristics between trials, Cree et al. consider the rituximab exclusion criteria as the most noteworthy. In PREVENT, patients who had received rituximab within 3 months prior to screening were excluded from the study; eculizumab reduced relapse risk versus placebo similarly irrespective of prior rituximab use [6]. This exclusion period was 6 months for the other three trials. Assuming the relapse prevention benefit of rituximab extends to 6 months after last dose, it should be more difficult to demonstrate a maximum relative treatment effect of eculizumab versus placebo at the beginning of the trial under a 3-month exclusion criterion than under a 6-month exclusion criterion. Thus, if differences in rituximab history between studies were an effect modifier of importance, this would result in biased relative treatment effect estimates of eculizumab versus the other drugs, favoring the other drugs.

Nevertheless, we further investigated the expressed concern about the potential effect of different rituximab exclusion criteria by analyzing time-to-first relapse data from the PREVENT study without the 11 PREVENT subjects who last received rituximab between 3 and 6 months prior to initiating eculizumab. The resulting Kaplan–Meier curves for both the eculizumab and placebo treatment arms were very similar to those from the overall PREVENT study population (Fig. 1). This indicates that exposure to rituximab within 3–6 months before initiation of study drug is not a prognostic factor nor an effect modifier in this study.

Fig. 1
figure 1

Time-to-first adjudicated relapse in all PREVENT patients and in PREVENT patients minus those having received rituximab between 3 and 6 months prior to screening

Cree et al. point out differences between attack criteria and attack adjudication among the four RCTs. We agree with our colleagues that it would have been optimal if all trials had used identical attack definitions and adjudication methods. However, even today there remains no consensus on these issues and we acknowledged this as a limitation of our study. All trials utilized retrospective analysis of objective data collected at the time of a study visit. The discrepancies between trials in the concordance of attack determination (committee-adjudicated attacks divided by physician-reported attacks) deserves further examination. Cree et al. focus on potential explanations for the differences that involve the numerator; however, these proportions are likely affected to a greater degree by many factors that influence the denominator (the number of physician-reported attacks) and do not affect the validity of the final adjudication. For example, event reporting thresholds may vary at the investigator, site, or regional level based of the desire to avoid missing a true relapse. This may result in differential reporting of pseudo-relapses, which would be discarded by the adjudication committee. The type and degree of pre-trial and on-trial education of trial investigators differed between the studies with the N-MOmentum investigators having to learn a novel attack definition to which they likely had to repeatedly refer when judging and reporting new events.

An NMOSD relapse definition is a construct and the decision to combine and analyze data in an NMA requires an assessment of whether the constructs evaluated in each RCT substantially and sufficiently overlap as they attempt to capture the “true” outcome. We believe these conditions likely hold given the requirements for symptom reporting, associated objective neurological examination change, and evaluation by a blinded expert adjudication committee in each trial. Moreover, the proportions of placebo-group subjects with an adjudicated relapse at 48 weeks (28 weeks in N-MOmentum) were very similar, again suggesting similar relapse signal detection in each trial. It seems highly unlikely that differences in attack definition and adjudication can account for the results of the primary monotherapy analysis in the NMA. We cannot exclude a contribution to the results of the combination therapy analysis involving eculizumab and satralizumab. This is an important area for future research.

Cree et al. express concern about the small sample sizes in the NMA monotherapy network. Although the PREVENT study allowed concomitant IST, it was not an add-on design as stated by Cree et al., and 24% of the study sample was using eculizumab monotherapy. The sample sizes are relatively small and will influence the precision of hazard ratio estimates in the NMA analyses but do not invalidate the results. Our choice of data imputation strategy, another concern related to the low relapse rate in eculizumab-treated subjects in PREVENT, was transparent and not unusual.

We appreciate the methodological issues raised by Cree et al. because they help to highlight some of the lessons learned about NMOSD during the first generation of NMOSD RCTs. We believe that our NMA represents an appropriate methodology for such an assessment of the relative magnitude of the primary efficacy outcomes given that head-to-head trial results are not (and likely will not be) available, that open-label extension studies cannot address relative efficacy questions, and that prospective observational comparative efficacy or “real world” studies will take years to perform and have a host of limitations to their interpretation, even greater than those raised here. As we originally stated, there are many considerations beyond efficacy against relapses in selecting a preventive therapy for individual patients with AQP4-IgG seropositive NMOSD.