FormalPara Key Summary Points

Why carry out this study?

Psoriasis is a chronic inflammatory multisystem disease which has a substantial negative impact on the patients’ quality of life and physical and psychologic functioning. Thus, it represents a major economic burden to health systems owing to the disease’s chronicity, high prevalence, and disabling effect

The use of biologic therapy has revolutionized the management of moderate-to-severe psoriasis offering healthcare providers and patients with a multitude of highly effective and tolerable treatment options. Nevertheless, head-to-head comparisons between the different biologic treatments in psoriasis are limited. This subsequently leads to uncertainty about their comparative efficacy and limits the clinicians’ and patients’ ability to make informed decisions about treatment choices. Therefore, we sought to conduct a network meta-analysis (NMA) which allows for comparison between multiple treatments that are not directly compared in a randomized controlled trial (RCT) and produces estimates of treatment effects and rankings that may be used in decision-making

What did the study ask?/What was the hypothesis of the study?

The current study conducted a systematic literature review and an NMA to compare the short-term efficacy at 10–16 weeks (according to the Psoriasis Area and Severity Index [PASI]) among the approved biologic systemic therapies for moderate-to-severe psoriasis

What was learned from the study?

What were the study outcomes/conclusions? (data)

The biologics were associated with better probability of achieving all PASI response levels compared with non-biologics and placebo. All biologics except etanercept had > 80% probability of achieving PASI50. Interleukin inhibitors (risankizumab 150 mg, ixekizumab 80 mg, brodalumab 210 mg, secukinumab 300 mg, and guselkumab 100 mg) were the best-performing treatments for achieving all levels of short-term PASI (50, 75, 90, and 100). Certolizumab pegol 400 mg and infliximab 5 mg/kg performed the best among the tumor necrosis factor-α inhibitors

Introduction

Psoriasis (PsO) is a chronic, inflammatory, multisystem disease with a prevalence of 2–3% in Western countries. Its most common (up to ~ 90% of cases) phenotype, plaque PsO, is characterized by scaly and often itchy red patches [1]. PsO severity can be classified as mild, moderate, or severe, depending on its location, the grading of skin signs, the surface area involved, and the impact on the individual [2]. At least 20% of patients have disease that involves > 5% of the body or affects crucial body regions, including the hands, feet, face, or genitals [3]. Severe disease is associated with increased mortality; estimated life expectancy is reduced by 3.5 years in men and 4.4 years in women [4]. Severe PsO also has a substantial impact on quality of life [5], with extensive emotional and psychosocial effects on patients, and represents a major economic burden to health systems owing to the disease’s chronicity, high prevalence, and disabling effect. This heavy toll is compounded by the fact that many patients have associated joint disease (psoriatic arthropathy) and comorbidities such as depression, anxiety, and cardiovascular disease [5].

Current guidelines recommend oral systemic drugs (e.g., dimethyl fumarate, methotrexate, ciclosporin, and apremilast) and targeted biologic therapies for chronic disease control in patients with moderate-to-severe PsO [6]. During the last 2 decades, different classes of targeted therapies have been developed for PsO including tumor necrosis factor alpha (TNF-α) and interleukin (IL) inhibitors. Anti TNF-α treatments include certolizumab pegol (CZP), etanercept, infliximab, and adalimumab. Among the TNF-α inhibitors, CZP has a different structure, as it lacks the immunoglobin G (IgG) fragment crystallizable (Fc). This structural difference results in advantageous solubility and stability and has demonstrated minimal to no placental transfer from mothers to infants due to the absence of the region that binds to the neonatal Fc receptor for IgG (FcRn) [7,8,9]. The second class of antibodies that has been developed targets pro-inflammatory cytokines, including the IL-12/23p40 antibody (ustekinumab), and, more recently, inhibitors of IL-17A (secukinumab, ixekizumab), IL-17RA (brodalumab), and IL-23p19 (guselkumab, tildrakizumab, risankizumab).

Ample research has been conducted in placebo-controlled trials to evaluate efficacy and safety, but head-to-head comparisons among the biologic treatments are lacking. A network meta-analysis (NMA) allows for comparison between multiple treatments that are not directly compared in a randomized controlled trial (RCT) and produces estimates of treatment effects and rankings that may be used in decision-making [10].

The current study conducted a systematic literature review (SLR) on short-term efficacy among the approved biologic systemic therapies for moderate-to-severe PsO and conducted analyses using a novel enhancement to the standard multinomial NMA model with baseline risk adjustment. This approach relaxes the assumption of a constant probit difference between Psoriasis Area and Severity Index (PASI) cutoffs across treatments, allowing treatments to have different rankings across PASI levels. The analysis focused on the efficacy data at the end of induction treatment (10–16 weeks), specifically, the proportions of patients achieving commonly reported percentage changes with the PASI relative to baseline (PASI50, PASI75, PASI90, and PASI100). Although several efficacy outcomes have been developed in PsO, our analysis selected the use of PASI as it is the most widely reported outcome in PsO trials and has been used as a decision tool for the use of biologics in healthcare decision-making [6]. A high correlation effect has been shown between PASI and other patient-reported outcomes (such as Dermatology Life Quality Index), which demonstrates PASI's role as a primary efficacy end point in PsO trials [11].

Methods

SLR Overview

The SLR followed well-established recommendations of the Cochrane Collaboration [12, 13], and it systematically identified evidence from RCTs (phase II–IV) investigating the short-term efficacy (as measured by all PASI level) of biologic therapies (at dosages approved by the European Medicines Agency) at the end of induction treatment phase (10–16 weeks; Table 1) for adults with moderate-to-severe plaque PsO. Non-biologic systemic treatments were included in the NMA to enhance and strengthen the evidence network. The SLR results aligned with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension statement for systematic reviews incorporating NMAs of healthcare interventions [14].

Table 1 List of drugs and approved dosages

MEDLINE, Embase, the Cochrane Central Register of Controlled Trials, the Cochrane Database of Systematic Reviews, and PsycINFO were searched to identify English-language studies conducted on humans and published through March 5, 2019. Searches used a combination of terms and keywords for moderate-to-severe PsO, approved treatments for moderate-to-severe PsO, and study design (RCT; Tables S1–S4). Search terms and strategies were adapted to each database using the appropriate indexing terms. The proceedings of seven relevant conferences (2016 to 2018) were also searched, and searches were validated by cross-checking the reference lists of previous SLRs and NMAs (published between 2016 and 2018) conducted for the same topic to identify any studies not captured by this SLR.

Abstract and full-text screening was performed by two independent investigators guided by the inclusion and exclusion criteria of the protocol (Table S5). Any discrepancies were resolved by a third, senior investigator. A single investigator extracted data on the study design, types of bias, patient population (including demographic characteristics, comorbidities such as psoriatic arthritis [PsA]), disease duration, prior use of biologics, treatment details, and outcomes of interest for each included RCT. All data were validated by a second, senior investigator using a pre-designed template. The quality of all RCTs was assessed using the Cochrane Risk of Bias Assessment Tool 2.0 [15]. When more than one publication was identified for the same RCT, a single publication (the one with the most complete or most recent information) was selected to avoid double-counting of patients. This article is based on previously conducted studies and does not contain any new studies with human participants or animals performed by any of the authors.

Study Characteristics and NMA Assumptions

The key assumption in any NMA is that the underlying relative treatment effects (between any two specific treatments, after ignoring the sampling error) are or would be the same in all trial populations [16]. The characteristics of the included trials were assessed to ensure NMA assumptions were met (Table 2). The presence of potential effect modifiers [17], such as disease duration, baseline PASI, and presence of comorbidities, was assessed to confirm similarity among the included trials. Differences in placebo rates were of interest, since relative treatment effect size in a study may depend on the placebo response; these differences are presented in Fig. 1. To allow for network connectivity and based on clinical expert review, we assumed that small differences in treatment doses and schedule in the non-biologic treatments cyclosporine and methotrexate do not impact relative effects.

Table 2 Criteria for psoriasis area severity index analysis
Fig. 1
figure 1

Placebo response rates across the trials for PASI75

Statistical Analysis

We explored clinical heterogeneity and the performance of NMA models using unadjusted and adjusted models per the National Institute for Health and Care Excellence (NICE) Decision Support Unit recommendations [18,19,20]. A Bayesian multinomial likelihood (probit link) NMA model was conducted based on the number of patients in four PASI categories: patients achieving 50%, 75%, 90%, and 100% improvement at 10–16 weeks. The PASI provides a combined assessment of lesion severity and the area affected into one score (ranging from 0 [no disease] to 72 [highest burden of disease]).

Two modifications were made to this NMA model. We added a component for baseline risk, per NICE guidelines, as relative effects of drugs in autoimmune diseases are often dependent on baseline risk (i.e., the placebo rate and relative effect of a treatment vs. placebo are likely related) [18]. Given prior research and expert agreement, a baseline risk model was assumed to be the most clinically valid. We decided a priori that, barring convincing evidence to the contrary, the base-case model should include a parameter for baseline risk, given supporting evidence in recent publications and how common the adoption of placebo-adjusted NMA models is in PsO comparative evidence synthesis [17, 21, 22]. Models without baseline risk were explored in a sensitivity analysis.

The other modification allowed flexibility around the key assumption in the standard multinomial model (i.e., that each treatment has the same probit difference between PASI cutoffs PASI cutoffs—e.g., in probit terms, each treatment has the same conditional difficulty to advance to PASI90 given achievement on PASI75. This assumption allows ‘borrowed strength’ across PASI cutoffs, but it also leads to the effect that all treatments will have the same treatment rank for each PASI level. Our modification added a random-effects (RE) component that allowed each treatments’ increase in difficulty to the next-highest PASI cutoff to vary around a common mean, thus allowing ‘borrowed strength’ across PASI cutoffs but also allowing treatments to have different efficacies (and thus different rankings) for different levels of PASI. This enhanced model is referred to as the ‘REZ’ model because it adds an RE component to the parameter \(z\), which reflects the difficulty to go from one PASI cutoff to the next. More details, including exemplar model code, are provided in the Supporting Information.

All analyses were run with fixed-effects (FE) and RE modeling for relative treatment effects, for a total of eight scenarios (models) tested (crossing FE/RE for treatment effects with inclusion/exclusion of baseline risk and standard vs. REZ modeling). Binomial analyses with a logit link were also conducted for all four PASI responses, as sensitivity analyses. However, in these binomial analyses (especially of PASI50 or PASI100), some interventions were not compared because of lack of trial data.

In all Bayesian NMAs, non-informative priors were used for all non-RE parameters. In the RE models, a Uniform(0,1) prior was used for τ (the square root of the treatment effect variance, i.e., the heterogeneity standard deviation [SD]). In the REZ model, a Uniform(0, 0.5) prior was used for \({\sigma }_{zT}\) (the square root SD around the value between probit cutoffs). Sensitivity analyses around these values (Uniform(0,0.25) and Uniform(0,1)) were tested for any sensitivity to the choice of prior, but none were found (likely due to the large size of the dataset).

All Bayesian analyses were carried out with Markov chain Monte Carlo simulations, with 50,000 discarded burn-in iterations followed by 50,000 iterations for parameter estimation. Convergence was confirmed by evaluating the three-chain, Brooks-Gelman-Rubin plots [23, 24] and values of \(\widehat{R}\) (potential scale reduction factor [24], considered converged if \(\widehat{R}\) < 1.05 for all parameters being estimated) as well as the ratios of Monte Carlo error to the SDs of the posteriors. The median and (2.5th and 97.5th) percentiles of the posterior samples for each effect were used as an estimate of the effect (e.g., probit differences between treatments) and its 95% credible interval (CrI). These posterior samples were also used to obtain the rank probability of a treatment being the best, the probability of a treatment being better than each comparator, and each treatment’s surface under the cumulative ranking curve index [25]. A separate natural history model [20], using the most recent, robust placebo data (placebo arms with n ≥ 50 patients in studies published 2013 or later) to ensure that extrapolated response proportions reflect current practice [26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87] estimated an ‘anchor’ placebo rate for PASI50 of 17.9%, allowing the models to estimate PASI probabilities for each treatment for each PASI level relative to this anchor.

Goodness of fit of the eight analytic models was compared using the posterior mean residual deviance and deviance information criteria (DIC) [18]. A model with a DIC smaller by > 5 points is generally considered a better-fitting model [88]. Bayesian NMAs of multinomial models were conducted in JAGS (version 4.3.0), and binomial NMAs were conducted in OpenBUGS (version 3.2.3).

Network inconsistency was assessed using an unrelated mean effect model, as recommended in NICE Technical Support Documents [89]. Residual deviance in each arm in each study was also obtained in the multinomial model (for which average deviance over all PASI responses was computed) to assess absolute fit to the data. Arm-level deviances from different models were investigated, as badly fitting data contribute to high heterogeneity, inconsistency, or both in a network; however, no substantive examples of inconsistency or heterogeneity were detected.

Similar analyses were carried out in a subgroup of RCTs in which populations or subpopulations were not previously treated with biologics (100% of patients who were previously treated or naïve or ≥ 90%). There were insufficient studies/subgroup data to conduct analyses on previously treated patients.

Results

SLR Search Results

The electronic database searches yielded 4135 publications, and 78 references were retrieved from the review of conference proceedings. After screening, 319 publications met the inclusion criteria for the broad SLR reporting clinical outcomes of treatments for moderate-to-severe plaque PsO. Among these, 72 publications (across 73 RCTs) covering 30,314 patients reported results for at least one PASI level (50, 75, 90, and or 100) at 10 to 16 weeks and were included in the base case NMA. Figure 2 summarizes the flow of included studies in the SLR and NMA, and Fig. 3 presents the NMA network diagram.

Fig. 2
figure 2

Preferred reporting items for systematic reviews and meta-analysis flow chart

Fig. 3
figure 3

PASI (10–16 weeks) NMA network

Study and Patient Characteristics

Eight of the 73 included RCTs were phase II, 2 were phase II/III, 51 were phase III or IV, and 12 did not report their phases. All had similar inclusion/exclusion criteria and definitions of PsO severity. The proportion of patients with comorbid PsA (in mixed populations) ranged from 2.4 to 55.8% (trials were excluded if all patients had comorbid PsA). Sample sizes ranged from 40 to 1306 patients, with most studies analyzing at least 100 patients. Details on patient characteristics are presented in Table 3. Sixty-three trials were deemed to have low risk of bias, seven were rated as having some concerns, and three had a high risk of bias. The main driver of bias was missing outcome data for some patients (i.e., lack of intent-to-treat analyses). Studies had wide variation in reported placebo rates, justifying the baseline-risk adjustment model to the extent that placebo rates are related to relative effect vs. placebo (Fig. 1). Summary assessments for each domain and the overall risk of bias are summarized in Table S6.

Table 3 Study and patient characteristics of studies included in the base-case network meta-analysis

NMA Results

The most important component for fit was use of REZ modeling versus the standard model. The best-fitting model in terms of DIC was the baseline-unadjusted FE REZ model. While the baseline-risk-adjusted RE REZ model had the smallest mean residual deviance, it involved extra parameters. Given the small difference in DIC, the level of significance for the estimate of the slope (− 0.69 [95% CrI: − 0.86, − 0.53] on the probit scale), and based on clinical recommendations, we maintained our a priori choice of baseline-risk adjusted model (RE REZ) as the base case and the best-fitting baseline-unadjusted FE REZ model was used as a sensitivity analysis (Table 4).

Table 4 Deviance information criterion for all multinomial-ordered probit models

At the end of the induction phase in the base-case model, all treatments considered in the network were more effective than placebo, and all biologic treatments except etanercept were more effective than non-biologics (apremilast, methotrexate, dimethyl fumarate, cyclosporin, or acitretin) at achieving all levels of PASI responses. IL inhibitors (risankizumab 150 mg, ixekizumab 80 mg and brodalumab 210 mg, secukinumab 300 mg, and guselkumab 100 mg) were the best-performing treatments for achieving all PASI (50, 75, 90, and 100) response in short term. Both doses of etanercept (25 mg and 50 mg) had the lowest probabilities of response among the biologics across all PASI response levels (Fig. 4).

Fig. 4
figure 4

Predicted probabilities of achieving PASI responses at 10–16 weeks in baseline adjusted REZ random effects multinomial model. Treatments are sorted by the highest to lowest estimates of probabilities of reaching PASI75

Among the TNF-α inhibitors, infliximab 5 mg/kg and CZP 400 mg were better performing treatments than adalimumab and etanercept for achieving the lower levels of short-term PASI (50, 75), with infliximab showing a larger advantage and CZP 400 mg being essentially equivalent to adalimumab for PASI (90, 100).

Treatment rankings from REZ RE baseline adjusted multinomial analysis remained nearly similar across PASI levels, as was permissible with the REZ model (Fig. 5), but even when rankings changed, estimated probabilities were very similar across TNF-α inhibitors other than etanercept and infliximab. Similar performances of these treatments were observed in the non-baseline risk REZ FE model (Table S7 and S8) and in the binomial sensitivity analyses (Table S9).

Fig. 5
figure 5

SUCRA* plot of treatments achieving each PASI threshold in baseline adjusted REZ random-effects multinomial model

REZ Model Versus the Standard Model

For both FE vs. RE, and baseline-risk adjusted vs. unadjusted analyses, the REZ model had better fit than the standard model (Table 4). This was not unexpected, as the assumption that all treatments must share an exactly equivalent ‘step’ in between PASI cutoffs in a standard model is a strong one and would be regardless of the statistical metric. The REZ model allowed treatments to share a common step, while allowing for some variation across treatments, the amount determined by the data.

While the REZ models had better fit than the standard model under baseline risk adjustment with random-probit differences assumption, the findings were substantively similar between the two models. The REZ (Fig. 5) and standard models (Table S10) showed that the top-performing treatments were the same, though the ranking is (non-substantively) different (also see predicted probabilities from REZ [Fig. 4] and standard models [Table S11]).

Note that as drugs within treatment class could share the exact same ‘steps’ between PASI cutoffs (e.g., relative treatment rankings might not vary), the model could be extended to allow for variation across treatment classes instead of across treatment. The richness of the data in PsO trials allows the testing of a variety of approaches that allow for borrowed strength across PASI cutoffs without making the strong assumption that all treatments will have precisely the same ranking, from PASI50 through PASI100.

Patients Naïve to Previous Biologic Treatment (Subgroup Analysis)

Thirty-five RCTs reported subgroup data for populations that were 100% naïve to previous biologic treatment, and six additional studies had data for ≥ 90% but < 100% naïve (Figs. S1 and S2). In the ≥ 90% naïve biologic-naive population, risankizumab 150 mg outperformed all available treatments across all PASI levels, followed by brodalumab 210 mg, guselkumab 100 mg, and ixekizumab 80 mg. Among the TNF-α inhibitors, CZP 400 mg performed better than infliximab, adalimumab, and etanercept across all PASI levels (Fig. S3).

When the cutoff for the percentage of patients that are treatment naïve was adjusted to a more stringent 100%, dimethyl fumarate fell out of the network. Similar findings were obtained in the 100% biologic-naïve population with the following exception: infliximab 5 mg/kg performed better than CZP 400 mg in achieving PASI 75 only. The conclusions in this subgroup analysis reflected those of the base-case analysis, with the IL inhibitors showing the best efficacy across PASI levels (Fig. S4).

Discussion

This review considered only licensed dosages and restricted PASI outcomes to 10–16 weeks—a clinically relevant time point to assess whether PsO treatments produce a positive effect on patients. It considered peer-reviewed and gray literature evidence to avoid publication bias, and screening and data extraction were conducted by two independent reviewers.

The NMA analysis enabled us to produce indirect treatment comparative effect estimates for biologics that were compared in RCTs, while adjusting for the effect of differences in baseline placebo rates. The biologics were associated with better probability of achieving all PASI response levels compared with non-biologics and placebo. All biologics except etanercept had > 80% probability of achieving PASI50. Except for tildrakizumab (200 mg and 100 mg), the IL inhibitors were the best-performing treatments for achieving all levels of short-term PASI (50, 75, 90, and 100). CZP 400 mg and infliximab 5 mg/kg performed the best among the TNF-α inhibitors. The validity of our base-case analysis was reinforced by similar results from two sensitivity analyses (a baseline-unadjusted model and an analysis restricted to biologic-naïve populations).

These results should be interpreted in light of the following limitations. Our SLR search cutoff point (March 2019) may have missed trials of newer treatments approved after this date. The analysis was limited to PASI outcomes to evaluate clinical efficacy and did not consider other efficacy outcomes which may have provided additional value to the performance of treatments. Our analysis was restricted to the efficacy of treatments in a 10- to 16-week period, which may not reflect patients’ long-term experiences. Additionally, all NMAs assume that populations and study designs/methods across trials are homogeneous enough for the valid estimation of indirect treatment effects. While studies had similar populations and all used the PASI, we cannot rule out that some differences in patient characteristics might have influenced results. Generally, a signal such as heterogeneity in study results indicates a problem, although the global estimate of heterogeneity was quite low. We also found similar results when restricting analyses to biologic-naïve patients, suggesting that variation in previous biologics was unlikely a source of heterogeneity.

Previous NMAs [17, 69, 90,91,92,93,94,95,96,97,98] investigated the comparative effects estimates between different biologic and non-biologic treatments for patients with moderate-to-severe PsO; however, the scope of many of these publications differed from ours (e.g., some NMAs restricted the analysis to comparisons of specific treatments or treatment class). Our analysis results were generally consistent with previous findings in Armstrong (2020) [22] and Sawyer (2019) [95], which used a multinomial approach with adjustment for baseline risk. There are distinct methodologic differences with the planning of our evidence generation and analysis, making these NMA results more robust in providing reliable effect and comparative effect estimates among the biologics. For example, we restricted our protocol to approved treatments with licensed dosages with a reasonably adequate sample size and with a restricted follow-up period (10–16 weeks) to allow interpretation of results in a healthcare decision-making setting. The recent Cochrane review published on the same topic adopted a different protocol with a broader scope that allowed inclusion of trials testing non-approved treatments, of any sample size, reporting on outcomes in a wider range of follow-up (8–24 weeks) periods and employing different analytic approaches (multivariate modeling), limiting direct comparability of treatments effect estimates between this study and the Cochrane review [96].

Wright et al. recently published an SLR of 25 NMAs conducted in PsO [98]. The authors reported that the choice of multinomial vs. binomial models had minimal impact on the results. Six of the NMAs in the SLR adjusted for placebo rate, and, in all cases, those models found a better fit for the adjusted over the unadjusted model, supporting this approach. Across the short-term NMAs of PASI 75 and 90, ixekizumab, brodalumab, and risankizumab tended to rank in the top three when evaluated, with ixekizumab ranking first in most NMAs, and second in the remainder, including our study. Secukinumab, infliximab, adalimumab, and guselkumab tended to rank next, with etanercept, certolizumab pegol, ustekinumab, and tildrakizumab tending to rank lowest. Our NMA yielded comparable results, with risankizumab, ixekizumab, and brodalumab as the top-ranking treatments.

Conclusions

Our study confirmed that IL inhibitors are likely the best short-term treatment choices for improving all PASI levels. The findings from our enhanced NMA analyses, which considered additional methodologic approaches using the richness of the trial data, provide clinicians and researchers with reliable comparative estimates of biologics in moderate-to-severe PsO and allow further application of similar methodologies in other disease areas with similar analytical challenges, such as PsA.