To the Editor

Network meta-analysis (NMA) approaches are increasingly being used to compare treatments across multiple trials by incorporating results of both direct and indirect comparisons. To inform sound evidence-based treatment decisions, NMA studies must be based on methodologically rigorous analyses that are presented and interpreted in the context of important limitations to ensure robust scientific conclusions. Armstrong et al. [1] recently published an NMA evaluating the benefit–risk profile of treatments for plaque psoriasis. A similar analysis was previously published by the same group (Shear et al. [2]). This publication was subject to scientific criticism by Pettitt et al. [3], as it utilized an analytic approach that is inconsistent with prevailing methodological conventions. At that time, the authors (Shear et al. [2]) chose not to respond nor defend their methodological choices and assumptions. As the same methodological flaws persist in the current manuscript, further scientific discourse on the matter is needed.

Both publications [1, 2] conclude that “Risankizumab was associated with the most favorable long-term benefit-risk profile.” However, this statement is not supported by statistical analysis. The authors present risk–benefit results as surface under the cumulative rank (SUCRA) scores, which vary from 0 to 1 and can be interpreted as the proportion of therapies to which a given treatment compares favorably [4]. There are several important limitations of the SUCRA approach that can be misleading if results are not interpreted correctly. As a ranking strategy, SUCRA is agnostic to the magnitude of treatment effects and therefore artificially inflates apparent differences between therapies [5]. Indeed, SUCRA can make a treatment look favorable to comparators even if it is neither statistically nor clinically significantly superior to any other treatment on any outcome. In Armstrong’s analysis of any serious adverse event (SAE), no statistically significant differences were evident in the frequencies between risankizumab and any other treatment examined [1].

To address the limitations and provide important context for the results of SUCRA analyses, experts encourage the use of absolute risks in such studies to align with best practices [6], as well as displaying the point estimates and their credible intervals in figures. To illustrate, Fig. 1 shows how SUCRA scores magnify small differences when plotting SAEs against psoriasis area and severity index (PASI) 100 response. Plots of absolute outcomes reveal that safety trade-offs are small and uncertain, with widely overlapping credible intervals. A formal test of significant improvement in both outcomes could be achieved using commonly available multivariate methods [7], which was not presented by the authors.

Fig. 1
figure 1

Comparison of bivariate SUCRA (A) plots to absolute outcomes with estimates of uncertainty over the full (B) or restricted (C) range of values. A SUCRA values for SAE and multinomial PASI. B, C Absolute probabilities for no SAE and PASI 100. Panels B and C are identical, but panel C is presented with restricted axes to assist readability. ADA adalimumab, BIM bimekizumab, GUS guselkumab, IXE ixekizumab, PASI psoriasis area and severity index, Q2W every 2 weeks, Q4W every 4 weeks, Q8W every 8 weeks, RIS risankizumab, SAE serious adverse event, SEC secukinumab, SUCRA surface under the cumulative rank, UST ustekinumab

Although the authors did provide 95% credible intervals around the posterior median estimates for PASI, and separately for the proportions of patients that experienced SAEs [1], fixed effects models were used to conduct the NMA. Fixed effects models do not account for and appropriately reflect variation due to between-study heterogeneity, which is apparent in the wide range of placebo responses in the induction period across trials. Best practice when faced with between-trial heterogeneity is to use a random effects model and a meta-regression to adjust for variations in placebo response across trials (i.e., baseline-risk adjustment) to produce unbiased estimates [8]. Since a random effects model and baseline-risk adjustment are not possible in sparse networks, such as the one in Armstrong et al. [1], credible intervals can be misleading and unadjusted point estimates are likely confounded, meaning that the NMA should not be used for decision-making.

The issues raised here reiterate our previous concerns with the NMA conducted by Shear et al. [3]. We continue to emphasize the importance of aligning to common NMA standards and best practices to provide reliable evidence and appropriately guided clinical decision-making. Avoiding perceptions that NMAs are untrustworthy and that results are malleable is critical to their ongoing clinical utility, especially since the volume of overlapping NMAs in psoriasis and the appearance of “spin” in conclusions has recently been highlighted [9]. We invite an open dialogue with interdisciplinary NMA stakeholders on how to align on common standards for conducting and interpreting NMAs.