Introduction

Rheumatoid arthritis (RA) is a chronic, disabling systemic inflammatory disorder, with immune-mediated attacks of the synovial joints. Disease-modifying anti-rheumatic drugs (DMARDs) alleviate the symptoms of RA and have the potential to slow or stop disease progression [13]. DMARDs are classified into two types: conventional and biologic. European Guidelines recommend that methotrexate (MTX), a conventional DMARD, is included in the first-line treatment strategy for active RA as soon as possible after diagnosis [4]. In patients with an insufficient response to treatment with MTX and/or other conventional DMARDs, biologic DMARDs designed to target specific elements of the immune system involved in the inflammation and damage to joints should be combined with MTX to improve the outcome, in particular TNF inhibitors [4]. Currently licensed TNF inhibitors for patients with RA showing active disease despite MTX therapy include infliximab [5], etanercept [6], adalimumab [7], certolizumab pegol [8] and golimumab [9]. Other licensed biologic agents with alternative mechanisms of action include tocilizumab [10] and abatacept [11]; also rituximab [12] was under evaluation for approval in this patient population at the time of this analysis.

Abatacept is the first in class of biologic DMARDs and acts by selectively modulating an essential co-stimulatory pathway needed for T-cell activation, thus inhibiting the inflammatory process upstream in the cascade of inflammatory events of importance in the pathology of RA [13]. The effectiveness of abatacept has been demonstrated in a series of randomised controlled trials [1418]. Ideally, in order that decisions on treatment options could be made based on firm clinical evidence, the comparative efficacy of each and every treatment option would be known. However, given the lack of head-to-head data for direct comparison, network meta-analyses are necessary in order to calculate the expected efficacy of biologic DMARDs. Indirect comparisons of interventions can be made through a common comparator [19].

Our objective was to perform a network meta-analysis for abatacept following a systematic review of the published clinical evidence of abatacept and all other existing biologic DMARDs available, licensed in Europe for patients that failed to respond to MTX or in the process of obtaining such a license. The aim of the study was to estimate the relative efficacy of abatacept in combination with MTX in Health Assessment Questionnaire change from baseline (HAQ score CFB) compared to other relevant biologic DMARDs plus MTX in the treatment of patients with RA with insufficient response to MTX. As a secondary aim, we studied the efficacy in terms of response rates of the American College Rheumatology Criterion for 50% improvement (ACR-50) and in Disease Activity Score in 28 joints (DAS28) defined remission (< 2.6).

Materials and methods

Systematic review

A protocol was developed to define the search strategy and a systematic review performed consecutively to identify those randomised controlled trials (RCTs), which investigated the efficacy of biologic DMARDs licensed to treat RA with insufficient response to at least one conventional DMARD. MEDLINE and EMBASE databases were searched simultaneously using Datastar. Further searches were undertaken for the Cochrane Library, the American College of Rheumatology (ACR) and European League Against Rheumatism (EULAR) conferences, and the technology appraisals for the UK. Searches included a combination of free-text and Medline Subject Headings (MeSH) terms for 'disease terms' with 'drug names', and were limited to 'human' RCTs published, in English, between January 1980 and January 2010.

The systematic review was performed by two researchers, with discussions between the two to come to agreement in case of discrepancies. The full-text articles were assessed for inclusion according to the following selection criteria: (1) treatment combinations of MTX with abatacept, adalimumab, certolizumab, etanercept, golimumab, infliximab, rituximab or tocilizumab in comparison with each other or Placebo + MTX; (2) RA patients with an inadequate response or intolerance to previous treatment with at least one conventional DMARD (MTX, sulfasalazine, leflunomide, azathioprine, gold salts or minocycline); (3) clinical endpoints of HAQ CFB [20, 21], American College of Rheumatology Criterion of 50% improvement (ACR-50) [22] and remission defined by a Disease Activity Score including a 28-joint count less than 2.6 (DAS28 < 2.6) [23]; at 24 and/or 52 weeks.

Data collection

For each selected study, the details of design, selection criteria, study population characteristics, interventions, outcome measures, length of follow-up and results were extracted and recorded in data extraction forms. The data extraction was performed by one researcher and reviewed by another; meaning, effectively, that the second reviewer traced back every value/number/comment to the original full text report and validated the extracted data.

Network meta-analyses

The search strategy was developed in order to capture all the relevant studies; but to ensure more coherent network meta-analyses, the inclusion criteria used for the analyses were restricted as follows: (1) only recommended dosages licensed for treatment in Europe [512] and (2) only RA patients with an inadequate response or intolerance to MTX. The quantitative results of the different interventions from the studies identified were combined using Bayesian mixed treatment comparison techniques [19]. All analyses were performed using a non-informative prior distribution and, depending on the heterogeneity as assessed by the goodness-of-fit test based on the residual deviance [19], either a fixed effect or a random effects model was chosen. Analyses were performed for the endpoints of HAQ CFB (continuous outcome), ACR-50 and DAS28 < 2.6 response rates (dichotomous outcomes) using placebo (in combination with MTX) as the common comparator. The network meta-analysis results present estimates of the differences in mean HAQ CFB, and estimates of odds ratio (OR) for ACR-50 and DAS28 < 2.6, for each biologic agent compared with placebo and for each pairwise combination of biologic agents. By using the average absolute placebo response (calculated as the weighted mean placebo response based on all included trials) as a baseline, the relative efficacy of each treatment compared with placebo was adjusted to obtain expected absolute mean HAQ CFB and its 95% credible interval (95% CrI), and expected absolute probability of response and its 95% CrI, for ACR-50 and DAS28 < 2.6, for each biologic agent. For the relative efficacies as well as for the absolute responses, the point estimates reflect the most likely value for the parameter considered and the 95% credible intervals state that there is a 95% posterior probability that the parameter lies between the two values of the interval.

For the HAQ CFB analyses, the standard deviation was directly extracted from the publications where possible. When the standard deviation was not reported, it was estimated based on other statistics that allow calculation or estimation of the standard deviation (for example, confidence interval, standard error, t-value, P-value, F value). When no information about the uncertainty was available, the average of all the other standard deviations explicitly reported was imputed to the missing standard deviation, enabling integration of all the data available. The feasibility of the network meta-analysis was evaluated by means of a qualitative assessment of the comparability of the studies in terms of study design, treatments evaluated, patient population and quality of the network of studies. Differences across trials might act as effect modifiers and thereby potentially violate the similarity and consistency assumptions associated with network meta-analyses. Violation of these assumptions might introduce bias in the relative treatment effect estimates. Analyses were performed with WinBUGS 1.4 statistical software.

Base-case and sensitivity analyses

The base case analysis of a network meta-analysis includes the broadest available evidence base corresponding to the question evaluated, under the condition of comparability for effect modifiers' characteristics. As the firm definition of such a case is often challenging, we pre-specified in the protocol that scenario analyses would be conducted along the base case, with an exact definition of these scenarios elaborated after the qualitative assessment of the included studies.

Results

Systematic review

The systematic review identified 1,551 potentially relevant studies, of which 29 publications, including 2 Clinical Study Reports (CSRs), 1 NICE submission and 4 abstracts, were identified to be relevant. The study selection process is summarised in Figure 1. The 29 documents identified by the literature search included 16 individual studies for abatacept [1418], adalimumab [24, 25], certolizumab pegol [2629], etanercept [3032], golimumab [33, 34], infliximab [15, 35, 36], rituximab [3741] and tocilizumab [4244]. Each comparison was supported by at least one pivotal trial, but not all trials reported findings for the HAQ CFB, the ACR-50 and DAS28 < 2.6 responders at either or both 24-week and 52-week follow-ups. All 16 included studies were randomised, double-blind and placebo-controlled.

Figure 1
figure 1

Selection of included publications. CFB, change from baseline; HAQ, Health Assessment Questionnaire; MTX, Methotrexate.

Study design and patient characteristics

As presented in Table 1, most studies were generally comparable in design, although differences were identified regarding patients not responding to treatment; the adalimumab studies included an early escape for non-responders [24] while the certolizumab pegol studies specifically withdrew patients who did not show an ACR20 response at weeks 12 and 14 [2629]. Furthermore, the golimumab [33, 34] and tocilizumab [43, 44] studies provided rescue therapy for patients who did not achieve at least 20% improvement in both Tender Joint Count (TJC) and Swollen Joint Count (SJC) by week 16. The TEMPO trial [30, 31] did not meet the inclusion criteria defined for the network meta-analyses; the study population was not comprised solely of patients diagnosed with RA showing an inadequate response to MTX. The SERENE study evaluating rituximab [37, 38] and the LITHE study evaluating tocilizumab [42] were only publicly available in abstract format. Since no characteristics on design and patients were reported, no full evaluation of the comparability could be performed for these studies.

Table 1 Overview of trial designs

An overview of the baseline patient characteristics is provided in Table 2. All studies reported similar HAQ scores at baseline, except for the study by Kremer et al. 2005 [16], which presented a lower mean HAQ baseline value. This difference was likely to be due to the use of the modified HAQ (mHAQ) instead of the traditional Health Assessment Questionnaire Disability Index. Both instruments are strongly correlated with a Pearson correlation coefficient of 0.88 [45], so the difference in the instruments is assumed to have no impact on the relative treatment effect. For golimumab, the main publication [33] reported median and IQR data, instead of the expected means and SD, suggesting that data were not normally distributed. This study also included patients with lower swollen joint counts, a lower CRP level and shorter disease duration than most of the other studies in the network meta-analysis. Certolizumab pegol [2629] and etanercept [30, 31] included patients with a shorter disease duration compared to other identified trials. No information about the patient characteristics were provided for the SERENE and the LITHE studies.

Table 2 Overview of patient characteristics

The reported data for HAQ change from baseline at 24 and 52 weeks are presented in Table 3. A network meta-analysis was performed including 14 studies in the base case. Etanercept was evaluated in only two trials: Weinblatt 1999 [32] and TEMPO trial. As Weinblatt 1999 is a relatively small study (89 patients included), it was decided to retain the TEMPO trial in the base case analysis and to evaluate the impact of exclusion in a scenario analysis. It was decided to evaluate the inclusion of the SERENE and LITHE studies in sensitivity analyses in anticipation of the full text publications. Since comparability of the study design characteristics and the patients' characteristics could not be performed, the results need to be interpreted with this limitation in mind. Other observed differences between trials could not be explored in scenario analyses, as excluding these studies would have removed the treatments from the analysis.

Table 3 Reported data for HAQ CFB, ACR-50 and DAS28 < 2.6 at 24 and 52 weeks

Network Meta-analysis results (Tables 4 and 5)

HAQ change from baseline at 24 and 52 weeks

At 24 weeks, all biologic agents in combination with MTX were found to be more effective than placebo in combination with MTX in improving functional status (HAQ CFB). Small numerical differences were observed in favor of abatacept over etanercept, infliximab, rituximab and tocilizumab. The adjusted absolute mean HAQ change from baseline varied between -0.48 and -0.67 for the biologic agents considered. Abatacept showed comparable efficacy compared to other biologics at 24 weeks (absolute mean HAQ change from baseline of -0.58). At 52 weeks, the findings were in line with those at 24 weeks. All biologics demonstrated a higher reduction in HAQ score compared to placebo and a comparable efficacy relative to the other biologic agents, with a trend in favor of abatacept over infliximab (-0.11, 95% CrI: -0.22; 0.01)).

Table 4 Relative efficacy versus abatacept + MTX at 24/26 and 48/54 weeks
Table 5 Adjusted absolute efficacy for biologic DMARDS + MTX at 24/26 and 48/54 weeks

Figure 2 illustrates each pairwise relative efficacy of all biologic agents compared to placebo at 24 and 52 weeks.

Figure 2
figure 2

Relative HAQ CFB of each biologic versus placebo. CFB, change from baseline; HAQ, Health Assessment Questionnaire; MT,: Methotrexate.

ACR-50 response rates at 24 and 52 weeks

At 24 weeks, all biologic agents demonstrated a higher proportion of ACR-50 responders than placebo, and abatacept is expected to demonstrate comparable ACR-50 response rates as to the other biologic agents. The expected proportion of patients with ACR-50 response was estimated to be 31.7% (95% CrI: 15.9%; 50.6%) for abatacept, which is higher than those for placebo (11.9%, 95% CrI: 9.7%; 14.0%) and comparable to the other biologic agents (expected proportions between 26.0% and 57.3%). At 52 weeks, abatacept is expected to result in a higher proportion of responders than placebo and comparable response rates to other biologic agents except for certolizumab pegol (OR:0.51, 95% CrI: 0.26; 0.96) although these results need to be interpreted with caution due to the earlier described difference in trial design. The expected proportion of ACR-50 responders for abatacept was slightly higher (35.4%, 95% CrI: 27.3%; 43.3%) than those at 24 weeks.

DAS28 defined remission (< 2.6) at 24 and 52 weeks

At 24 weeks, no data were available for adalimumab and rituximab. Abatacept was found to result in more patients with DAS28 defined remission than placebo, with an OR of 4.77 (95% CrI: 1.60; 15.78). Abatacept is expected to be less efficacious than tocilizumab, but showed findings comparable to all other biologic agents. The expected proportion of patients under remission at 24 weeks amongst the biologics ranged from 6.9% to 71.0%. At 52 weeks, data were only available for infliximab, etanercept and abatacept. Abatacept was found to result in more DAS28 responders than the placebo and in comparable remission rates compared to infliximab and etanercept. The expected proportion of patients under remission at 52 weeks for abatacept was higher (40.2%, 95% CrI: 10.4%; 80.3%) than at 24 weeks.

Sensitivity analyses

The TEMPO trial was included in the base case analysis as it was the pivotal trial for etanercept in this patient population. However, the TEMPO trial included a DMARD-IR population rather than a MTX-IR population as included in the other trials and also showed high observed response rates in the control group, which is substantially different from observed findings in other studies. The patient selection criteria in the TEMPO trials allowed for inclusion of patients not treated with MTX, potentially explaining the high response rate observed in the control arm. Removing the TEMPO trial did not significantly impact on the findings for the mean HAQ CFB at 24 weeks: abatacept was found to be comparable in efficacy to all biologics, including etanercept (difference in HAQ CFB vs. etanercept: 0.00 (95% CrI: -0.32; 0.33)). However, excluding the TEMPO trial from the ACR-50 analysis at 24 weeks did have an impact on the results. By excluding this trial the heterogeneity was reduced and goodness of fit statistics suggested the use of a fixed effects model. This resulted in smaller credible intervals around the point estimates. As a result, abatacept was found to be more efficacious than placebo (OR: 3.31, 95% CrI: 2.47; 4.48) although less efficacious than certolizumab pegol (OR: 0.37, 95% CrI: 0.20; 0.64), adalimumab (OR: 0.43, 95% CrI: 0.24; 0.75), etanercept (OR: 0.12, 95% CrI: 0.00; 0.82) and tocilizumab (OR: 0.50, 95% CrI: 0.27; 0.91). Abatacept showed comparable efficacy to golimumab, infliximab and rituximab. Differences in trial design that might explain these findings are described in the discussion section.

The TEMPO trial did not report HAQ data at 52 weeks and was the only trial reporting ACR-50 data for etanercept at 52 weeks, and the only trial reporting DAS28 defined remission data for etanercept at both follow-ups, limiting the evaluation of excluding TEMPO on these endpoints.

In the base case analysis all randomised patients were included for the AIM study, although patients included from one site were excluded from the efficacy analyses because of protocol violations. Its impact on the findings was evaluated in a sensitivity analysis and did not change the relative efficacy of abatacept to other biologic agents (data not reported).

Including the data for the SERENE [37, 38] study, evaluating rituximab, and the LITHE [42] study, evaluating tocilizumab did not substantially impact the results. The SERENE study presents HAQ CFB, ACR-50 and DAS28 < 2.6 data at both follow-ups. The LITHE study only reports ACR-50 and DAS28 defined remission response rates at 52 weeks. Abatacept showed comparable efficacy versus rituximab at 24 weeks: (mean difference in HAQ CFB: -0.08 95% CrI: -0.24; 0.10), ACR-50 (OR: 0.87, 95% CrI: 0.31; 2.30), DAS28 < 2.6 (OR: 0.80, 95% CrI: 0.07; 8.20), and at 52 weeks (mean difference in HAQ CFB: -0.01, 95% CrI: -0.36; 0.31), ACR-50 (OR: 0.55, 95% CrI: 0.13; 1.78), DAS28 < 2.6 (OR: 1.09, 95% CrI: 0.04; 30.72). Abatacept demonstrated comparable efficacy versus tocilizumab at 52 weeks (ACR-50 (OR: 0.73, 95% CrI: 0.17; 3.12), DAS28 < 2.6 (OR: 0.58, 95% CrI: 0.03; 14.23)).

Discussion

A network meta-analysis based on a systematic review of the literature was performed to estimate the relative efficacy of abatacept compared with other relevant biologic DMARDs in the treatment of RA patients with insufficient response to MTX. The results of the network meta-analysis showed that abatacept is expected to be more efficacious than placebo and show comparable efficacy relative to the other biologic DMARDs in combination with MTX. The primary outcome in the present study was the reduction in functional status as measured by the HAQ score, which is commonly used in economic modeling of RA since this can be translated into required utility values by means of published algorithms. Also, the clinically relevant endpoints ACR-50 and DAS28-defined remission (< 2.6) at 24 weeks and 52 weeks were analysed. Not all trials reported findings on all evaluated endpoints. The decision was made to include all available data leading to differences in evidence used across endpoints.

The analysis of DAS28-defined remission at 24 weeks showed comparable findings to other biologic agents for abatacept, except in the case of tocilizumab. It should be noted that tocilizumab, due to its mechanism of action, has a direct effect on the CRP-level and, therefore, is expected to show more efficacy on this endpoint. Also, a low number of patients in remission were observed in the placebo arms across the trials, making the indirect treatment comparison susceptible to small differences in the placebo arms. As a consequence, results should be interpreted cautiously.

Although the TEMPO trial included different patients, it was decided to include this study based on the fact that TEMPO is the pivotal trial for etanercept. Had TEMPO been excluded from the base case, data for etanercept would have been based solely on a relatively old and small trial (89 patients) by Weinblatt (1999) [32], potentially biasing the findings in favor of etanercept.

Other limitations in comparability of study and patient characteristics were observed with the adalimumab, golimumab and certolizumab pegol trials. The adalimumab studies included an early escape for non-responders [24] while the certolizumab pegol studies specifically withdrew patients who did not show an ACR20 response at weeks 12 and 14 [2629]. Furthermore, the golimumab [33, 34] and tocilizumab [43, 44] studies provided rescue therapy for patients who did not achieve at least 20% improvement in both Tender Joint Count and Swollen Joint Count by week 16. The impact associated with the adalimumab, golimumab and certolizumab pegol studies was not explored in scenario analyses, as excluding these studies would have removed the treatments from the analysis and this would not have provided additional information. Furthermore, there is currently no consensus on how to correct for these differences in trial design.

All patients in the studies received methotrexate in the trial, independent of whether they were assigned to the placebo or intervention arm. The fact that optimal methotrexate dosing was decided by the investigator and that the trials differ in specification of minimal methotrexate dose may result in differences across the trials. In turn, this may have had interaction with the observed effect for the biologic agents and, therefore, is potentially introducing bias in the analysis. Unfortunately, we were unable to correct for this since methotrexate details are lacking.

A recent network meta-analysis of tocilizumab and other biologic agents in patients who have an inadequate response to conventional DMARDs or MTX [46] suggests that tocilizumab has a better overall response than TNF-α inhibitors and abatacept, whereas our analyses suggest comparable efficacy. The apparent distinction may be attributable to differences in the selection criteria for relevant studies (MTX vs. conventional DMARDs background treatment) and, therefore, the evidence base and analysis techniques (fixed versus random approaches). The TOWARD trial [47] was not included in our analyses and no data on HAQ score were available for the LITHE trial. Similarly, despite important differences in the study selection process, the Cochrane collaboration found that abatacept, adalimumab, etanercept, infliximab and rituximab showed comparable efficacy in patients with RA [48]. The Cochrane collaboration also performed a network meta-analysis on the safety of the biologic agents [49]. This study revealed that abatacept was associated with a significantly lower risk of serious adverse events compared to most other biologics and was significantly less likely than infliximab and tocilizumab to be associated with serious infections. When comparing different treatments, safety should always be considered in addition to efficacy. In our study no evaluation of safety was performed as this would have required a different search strategy. Finally, a systematic review [50] followed by several meta-analyses of nine biological DMARDS (including abatacept) vs. placebo was performed and used to inform the EULAR recommendation [51]. In this publication, all biological DMARDs + MTX combinations were found to be more efficacious than placebo + MTX in the treatment of patients with an inadequate response to MTX.

Conclusions

Currently it is not possible to predict, on an individual basis, which patient will respond to a particular therapy. This is a significant unmet need which is the goal of much research effort. In the absence of reliable biomarkers on which to base individual treatment decisions, it is important that patients have access to the full range of biologic therapeutics with different mechanisms of action and proven efficacy. This network meta-analysis strongly suggests that abatacept in combination with MTX is superior to placebo and is comparable to other biologic DMARDs for the reduction in disability (HAQ CFB) of RA for at least a year of treatment in patients with active disease despite previous treatment with MTX. Abatacept in combination with MTX is also expected to be superior to placebo and comparable to all other biologic agents for ACR-50, with the exception of certolizumab pegol at 52 weeks, although this needs to be interpreted with caution due to the earlier described difference in trial design, and comparable efficacy in DAS28 defined remission at 24 weeks (except for tocilizumab, which can be explained by the causal relation with the CRP level).

Based on its unique mechanism of action, relative efficacy and clinical trial safety profile [1418], abatacept is a suitable alternative to currently licensed biologic DMARDs, meaning that abatacept in combination with MTX should be available to patients with RA, which is refractory to MTX alone.