A systematic literature review (SLR) and NMA were conducted following methods in line with Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines  and recommended in the current National Institute for Health and Care Excellence (NICE) specification for manufacturer and sponsor submission of evidence , as well as the 2016 NICE technology appraisal of adalimumab, etanercept, infliximab, certolizumab pegol, golimumab, tocilizumab, and abatacept for RA . Due to the nature of the study it was not registered with clinicaltrials.gov or a similar body. This article is based on previously conducted studies and does not contain any studies with human participants or animals performed by any of the authors.
Searches for the SLR were conducted in the MEDLINE, EMBASE, and Cochrane databases (all without any time limit), plus conference proceedings since 2013 for evidence published until December 6, 2016. Studies were selected according to pre-defined population/intervention/comparator/outcome/study design criteria (Table 1) [7, 8, 10, 11]. All titles, abstracts and articles were then screened independently by two researchers, with study selection following published best practice guidelines for indirect treatment comparisons [8, 10, 11].
Data on study design, patient characteristics, efficacy, safety and patient-reported outcomes at the time points 12 (± 4), 24 (± 4) and 52 (± 8) weeks for all studies (except open-label extensions) were extracted independently by two reviewers in a pre-defined data extraction process. Evidence for the NMA was filtered for drugs licensed for RA at doses approved in Europe, the USA and Canada. All trials comparing one intervention of interest with at least one other intervention of interest or methotrexate or ≥ 1 csDMARD(s) were considered in the evidence base.
Small studies have been shown to distort meta-analyses  therefore, studies with fewer than 30 patients per arm were excluded. Studies which did not report any outcomes of interest were also excluded.
Different licensed dosages and different routes of administration [e.g., intravenous (IV) versus SC delivery)] of the same treatment were pooled in many cases, on the basis of evidence of equivalence (Supplementary Table S1). These decisions were explored by examining forest plots of the odds ratio (OR) for American College of Rheumatology (ACR) 20% response criteria (ACR20) at 24 weeks in individual studies by group of interventions. If the confidence intervals were overlapping (e.g., as for infliximab studies), the doses were pooled. The validity of the decisions was also confirmed via clinician input.
Key efficacy endpoints were extracted and analyzed including ACR20, ACR 50% response criteria (ACR50), ACR 70% response criteria (ACR70), and the Health Assessment Questionnaire Disability Index (HAQ-DI) change from baseline. The European League Against Rheumatism (EULAR) Disease Activity Score 28-joint count (DAS28) remission (defined as DAS28 erythrocyte sedimentation rate or C-reactive protein < 2.6) was also extracted; however, this endpoint was not analyzed given that the EULAR networks were small and a high level of variability was observed in response rates between the different studies. Safety endpoints included the proportion of patients with any serious infection (SI) and the proportion of patients with any serious adverse event (SAE). All efficacy and safety outcomes were examined at 24 weeks as this was the assessment period with the most data available for analysis.
Prior to the conduct of the NMA, a feasibility assessment was conducted to assess the sufficiency of the evidence base to draw feasible networks for all outcomes of interest. The exchangeability assumption is critical and requires that selected trials measure the same underlying relative treatment effects. Deviations to this assumption can be evaluated through two metrics: heterogeneity (i.e., evaluation of comparability in characteristics and results across included studies) and consistency (i.e., evaluation of consistency between direct and indirect evidence). Effect modifiers were evaluated by establishing the link between patient characteristics at baseline and ACR20; only weight was identified as an effect modifier given the expected variation in patient characteristics across RA studies [11, 13, 14], which can limit the validity of indirect comparisons.
Variability of response in the placebo arms is an issue which can limit indirect comparisons, and the heterogeneity of RA studies has been previously noted  where the treatment effect expressed as log ORs has a negative relationship with the baseline risk [9, 16]. While the common comparator across most monotherapy trials was an active comparator (adalimumab), for the few placebo-controlled studies, variability was therefore considered in the selection of models.
The efficacy and safety of the treatments included in the analysis were evaluated using a Bayesian NMA approach [10, 14, 17], comprising a likelihood distribution, a model with parameters and prior distributions for these parameters. A linear model with normal likelihood distribution was used for continuous outcomes, and a binomial likelihood with a log link was used for the dichotomous outcomes [15, 18]. Consistent with NICE guidelines, flat (non-informative) prior distributions were assumed for nearly all outcomes so as not to influence the observed results by the prior distribution . Prior distributions of the baseline treatments and relative treatment effects were normal, with 0 mean and variance of 10,000, with informative prior based on between study variance according to the recommendation of NICE in the case of limited data . Random- and fixed-effects models were evaluated to allow for heterogeneity of treatment effects between studies, with the choice of base-case informed by Deviance Information Criterion (DIC) values and mean total residual deviance (compared against the number of fitted data points), as well as consistency with directly reported trial results . Posterior densities for unknown parameters were estimated using Markov chain Monte Carlo simulations.
All results for OR-NMA and risk difference (RD)-NMA were based on 100,000 iterations on three chains, with a burn-in of 20,000 iterations. Convergence was assessed by visual inspection of trace plots. The accuracy of the posterior estimates was assessed using the Monte Carlo error for each parameter (Monte Carlo error < 1% of the posterior standard deviation). All models were implemented using WinBUGS. Results of the NMA are presented in terms of ‘point estimates’ (median of posterior) for the relative treatment effects, along with the 95% credible intervals.
Two scenario analyses were conducted. The first excluded studies conducted in exclusively Asian populations (i.e., the SATORI, CHANGE and Etanercept 309 studies) to test the potential modifying effect of patient body weight (with Asian ethnicity serving as a proxy for populations with relatively lower body weight than other populations. For the second analysis, tumour necrosis factor (TNF)-α inhibitors were pooled together as a class. For the latter scenario analysis, ACR outcomes were compared with the base case which evaluated the TNF-α inhibitors individually. This scenario was evaluated to inform cost-effectiveness evaluations of sarilumab.