FormalPara Key Summary Points

Interpretation of mixed treatment comparisons (MTCs) requires a clear understanding of the methods of analysis and population studied.

MTCs of pharmaceutical treatments for relapsing–remitting multiple sclerosis were detailed and compared.

A comparison of all identified MTCs showed that the findings of each individual MTC were not directly comparable with the others because of differences in the disease-modifying therapies (DMTs) compared, included studies, effect measures and analysis methods.

Given the importance of MTCs for healthcare decision-making, it is imperative that reporting of methods, results and assumptions is clear and transparent to allow accurate interpretation of findings.

For MTCs to be relevant, the choice of outcome measures should reflect clinical practice, combination of treatments or outcomes measured at different points of time should be avoided, as should imputation without justification, all approved treatment options should be included, and updates of MTCs should be conducted when data for new treatments are published.

Digital Features

This article is published with digital features to facilitate understanding of the article. To view digital features for this article go to


In the last 8 years, the number of approved disease-modifying therapies (DMTs) for multiple sclerosis (MS), a chronic autoimmune-mediated inflammatory disease of the central nervous system, has more than doubled. In contrast with the requirements of regulatory bodies for randomised controlled trials (RCTs) directly comparing new treatments with placebo and/or one active comparator [1], a broader evidence base is needed to inform how new medicines should fit into existing treatment algorithms [2]. Mixed treatment comparisons (MTCs) utilise a totality of data by combining both direct and indirect evidence [3]. Consequently, they have become increasingly important to the scientific and clinical community as well as reimbursement agencies by enabling objective comparison of the efficacy and safety of MS therapies that have never been compared head-to-head [4,5,6].

As discussed by Giovannoni et al. [7], the field of MTC continues to evolve and mature. Interpreting MTC findings is complex, particularly in MS where diagnostic criteria [8,9,10,11], definition of clinical outcomes and patient populations studied have changed over time [12]. These temporal changes in clinical study design make meeting the necessary assumptions of RCT homogeneity, similarity and transitivity in MTC difficult to achieve in MS [13, 14], resulting in variable conclusions depending on how systematic review and MTC are performed. The combination of RCTs conducted in populations with different subtypes of MS, different sets of treatments, or doses licensed under different regulatory authorities, while pragmatic, can provide limited or invalid information when applied to a different clinical context.

However, progress has been made in enhancing understanding of the strengths and limitations of MTC findings through improved transparency of study conduct and reporting. The application of, for example, standards defined by the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) [15] and the International Society for Pharmacoeconomics and Outcomes Research (ISPOR) guidelines [16] aim to improve completeness of meta-analyses and MTC reporting, providing important insights to decision-makers on the applicability of MTC findings to their patient populations.

We undertook a comparison of existing MTCs of DMTs in MS to assess their relevance to European decision-makers. Here, we present a detailed discussion of the differences identified between MTCs in the populations studied, treatments assessed, methods employed and implications for interpretation of findings.


As part of a systematic review to inform an up-to-date MTC of DMTs at European Commission-approved doses for the treatment of relapsing–remitting MS (RRMS), already completed MTCs were also identified and extracted for comparison. This article is based on previously conducted studies and does not contain any studies with human participants or animals performed by any of the authors.

Search Methods

Searches were conducted to identify MTCs of pharmaceutical treatments for MS. The following databases and resources were searched from inception to July 2019: Cochrane Database of Systematic Reviews (CDSR); Database of Abstracts of Reviews of Effects (DARE); Health Technology Assessment (HTA) Database; National Institute for Health Research (NIHR) Journals Library; International Prospective Register of Systematic Reviews (PROSPERO); National Institute for Health and Care Excellence (NICE) Guidance; Kleijnen Systematic Reviews (KSR) Evidence; Canadian Agency for Drugs and Technologies in Health (CADTH); MEDLINE; MEDLINE Epub Ahead of Print; MEDLINE In-Process; MEDLINE Daily Update; PubMed; and Embase. Reference lists of included studies and systematic reviews were screened to identify any additional relevant studies. The Cochrane Library search strategy and results of all searches are provided in Appendix 1 of the supplementary material.

Inclusion Criteria

Published and unpublished MTCs were included when they reported on adults (at least 18 years of age) with a confirmed diagnosis of RRMS or rapidly evolving severe (RES) RRMS, treated with any form of pharmaceutical treatment. Inclusion was limited to studies published during or after 2010, but no language restriction was imposed.

Methods of Study Selection, Quality Assessment and Data Extraction

Two reviewers independently assessed studies for inclusion and extracted data on the inclusion criteria, systematic review methods, analysis methods and results (relapse, disability progression and safety) for each MTC; any discrepancies were resolved by discussion or by reference to a third reviewer.

We present a narrative synthesis comparing the scope, systematic review methods, analysis methods and findings of MTCs in MS.


Literature searches yielded 1627 references. After removing duplicates (n = 367) and references published before 2010 (n = 261), a total of 999 references were available for screening. Titles and abstracts were screened and 56 potentially relevant papers were ordered as full texts. One MTC was identified checking the reference lists of full-text papers. Of these, 45 references (relating to 27 individual MTCs) were included (Fig. 1).

Fig. 1
figure 1

Flow chart of study searches and inclusion. MTC mixed treatment comparison

Comparison of Inclusion Criteria and Methodology

The majority of MTCs included patients with RRMS (Table 1). Two [17, 18] included studies in patients with all subtypes of MS, and four [19,20,21,22] included studies which enrolled patients with relapsing MS, encompassing RRMS, secondary progressive multiple sclerosis with relapses, progressive–relapsing MS or combinations thereof. These six studies did not provide subgroup analyses of patients with RRMS only. Two MTCs [23, 24] focused on RES and highly active RRMS, while two [25, 26] focused on RRMS, with the stipulation that the population in the included trials must be at least 80% patients with RRMS. The CADTH 2013 [27] study also focused on trials in patients with RRMS but specified that the population should be more than 50% RRMS.

Table 1 Inclusion criteria for and systematic review methods of included MTCs

Many of the MTCs did not report full details of their methodologies. In 12 of the studies [17, 19, 21, 24, 28,29,30,31,32,33,34,35] omissions were identified in reporting of inclusion screening, data extraction, quality assessment or discrepancy resolution. Table 1 details the inclusion criteria and methods used in the systematic review underlying the MTCs.

Treatments Assessed by Included MTCs

Each MTC included different DMTs, in part due to the timing of new treatments becoming available after previous MTCs had been conducted (Table 2). Studies also focussed on different aspects of DMTs formulation or mechanism of action. For example, Tolley [25] only included treatments that were delivered by subcutaneous injection, resulting in the exclusion of alemtuzumab, dimethyl fumarate, fingolimod, natalizumab and teriflunomide; while Filippini [18] focused on immunomodulators and immunosuppressants for all subtypes of MS. The majority of MTCs focussed on monotherapies; however, three [18, 27, 36] also allowed the inclusion of combination treatments.

Further variation arose because of reviews limiting/restricting the inclusion of studies using treatment doses approved/licensed in particular countries/regions (e.g. Tolley  [25] and Hutchinson [26] limited inclusion to treatment doses that were approved in the European Union or the USA). In CADTH 2013 [27], inclusion was limited to doses that were licensed in Canada and approved by Health Canada plus a selection of emerging treatments at that time (alemtuzumab, dimethyl fumarate, teriflunomide) that were not limited by dose.

Of the most recent MTCs published, the study by McCool [19] included daclizumab, which, although approved at the time of the analysis, was subsequently withdrawn by the manufacturer because of concerns about the benefit–risk profile. This MTC also included cladribine 5.25 mg/kg, a dosing regimen which is off-label for the treatment of MS. Removal of these drugs/doses from the networks may potentially lead to different results and conclusions.

Table 2 Treatments assessed by included MTCs

Annualised Relapse Rate

Annualised relapse rate (ARR), often included in clinical trials as a surrogate for future disability, was the most widely reported outcome in the MTCs identified and was therefore chosen to compare methods and results related to it. Table 3 presents an overview of methods used in the analyses. A summary of the available results for all treatments compared with placebo is presented in Table 4.

Table 3 MS relapse: analysis methods
Table 4 MS relapse—results

In 19 of the MTCs [19, 20, 23,24,25,26,27,28,29, 31, 33, 35,36,37,38,39,40,41,42,43] relapse was analysed on the basis of the ARR. Appropriate data on MS subtype were more commonly reported in the more recent studies and were often unavailable for older treatments. In patients with high disease activity despite previous treatment, there were insufficient data to form a network, as shown in the study by Huisman [23].

Differences were observed in the statistical approaches applied in measuring ARR between the MTCs. Fourteen [19, 20, 24,25,26,27, 31, 35, 37,38,39, 41,42,43] analysed ARR as a Poisson outcome based on the total number of relapses and the total number of person-years of follow-up, while six [17, 18, 21, 22, 34, 44] used a different approach by considering recurrence of relapse as a binary outcome based on the number of patients with relapse and the total number of patients. Four MTCs [[17, 18, 21, 22] reported the results as odds ratios (ORs), two [37, 44] as risk ratios, and three [34, 38, 39] as hazard ratios. This approach meant that the results of these MTCs are not directly comparable to each other, or to other MTCs using a rate ratio as the outcome measure.

In ten out of 16 MTCs the analysis was based on using Bayesian methods to fit generalised linear models [18, 19, 25, 27,28,29, 33, 38, 39, 41, 43]. Hutchinson [26] also used linear models but applied frequentist methods in SAS software to fit the model. Two MTCs [36, 44] used linear models applying frequentist methods using the Stata statistical software. Six other studies used Bayesian methods but did not report further details [17, 20, 23, 24, 31, 40] while Melendez-Torres 2017 [42] used frequentist methods without providing further details. Two studies [21, 22] used the Bucher method for indirect comparison, and three studies [30, 34, 35] did not report the methods for MTC. Zagmutt [32] assessed adverse events (AEs) rather than relapses so was not included in this comparison. In 14 of 19 MTCs, the primary analysis used a random-effects model. A fixed-effects model was the primary analysis for seven studies [23, 28, 31, 37,38,39, 41] and was also reported for comparison in three [25, 27, 40]. Three studies [33,34,35] did not report their statistical model while Zagmutt 2015 [32] included no MTC of ARR.


Overall, a comparison of all identified MTCs showed that the findings of each individual MTC were not directly comparable with the others because of differences in the DMTs compared, included studies, effect measures and analysis methods (Table 4). Nevertheless, the estimated treatment effects relative to placebo were generally in the same direction for those treatment effects that were reported by multiple studies. This trend was consistent across outcomes and across populations, but the magnitude of the effects and the associated uncertainty varied as a result of the differences between MTCs (e.g. risk ratio of 0.32 for natalizumab in CADTH [27], compared with 0.46 in Tramacere [44]). In particular, Filippini [18] was an outlier, with the ORs for IFNβ1a and teriflunomide favouring placebo, in contrast to other MTCs of the same treatments.

The results available from nine MTCs [19, 21, 22, 24, 26, 29, 30, 36, 40] were limited in applicability to decision-making as they reported the relative effects of one drug compared with other treatments in line with the objectives of the studies, and therefore did not present results for all treatments compared with placebo. Three studies [17, 26, 34] combined all of the IFNβ1a treatments into a single treatment class, while two [18, 44] merged treatments in studies comparing the same agent at different doses but summing the number of events and the sample size.


Twenty-seven MTCs were identified. As detailed above, variation in the clinical question to be answered, as defined by the population, intervention, comparator, outcomes (PICO) framework, accounted for the differences in included populations, treatments assessed and methods of analysis used. Many of these differences have a potential impact on the findings of the review or possibly limit the relevance of a particular review for decision-makers. However, the estimated treatment effects in comparison with placebo were generally in the same direction across MTCs of the same treatments (Table 4). A number of examples of the differences between MTCs, and the potential implications, are discussed below.


Consistent more generally with other neurological diseases, diagnosis of MS poses challenges, which, together with changing clinical criteria, have resulted in an evidence base for treatment which has a high level of heterogeneity in study populations (the ‘Will Rogers phenomenon’) [45]. In addition, outcome measures and definitions have changed over time such that two studies, conducted 20 years apart and both reporting relapse outcomes, may not measure the same clinical state [8, 11]. The increase in treatment options has also influenced the types of patients included in more recent clinical trials. This creates considerable limitations in comparing the effectiveness of new treatments to existing treatment options, where the previous evidence base may largely comprise heterogeneous studies. Therefore, authors of MTCs should carefully examine the inclusion criteria and characteristics of study participants in the identified studies, to ensure the subsequent analyses uphold similarity assumptions. For example, whether the definition of “serious adverse event” included or excluded relapse of MS differed between RCTs. It would be advisable to investigate the effect of this issue in sensitivity analyses, and it may be useful to involve clinical input to ensure that all assumptions are valid, and that all-important disease modifiers, characteristics or biases are investigated.

Five of the identified MTCs included a trial in which 19% of patients had a diagnosis of clinically isolated syndrome [46]. While inclusion of studies with a limited proportion of participants with other subtypes of MS is acceptable, that proportion should be relatively small (e.g. less than 15% [47]), to reduce the potential impact of including results of patients with higher or lower disease stage which might have a different treatment response compared with the population of interest.

Intervention and Comparators

Traditional meta-analyses compare only a few, often two, treatments while MTCs enable the comparison of all relevant treatments in a certain disease. In many cases, the selection of treatments assessed by an MTC is related to the date of publication; as treatments and trials for MS have evolved over time, we have witnessed a corresponding natural evolution of MTCs from simple networks involving IFN and glatiramer acetate to more complex networks involving newer products such as natalizumab and fingolimod. As many treatments in MS have never been directly compared, MTCs should ideally include all relevant treatments, thus presenting the reader with a comprehensive overview of all available treatment options while increasing the connectivity of the network. If treatments are excluded, the reasons for choosing particular treatments should be transparent and the rationale for exclusion documented. Regular updates are also needed to include newly available treatments and data.

Five of the identified MTCs combined individual DMTs into classes or combined different doses of a single treatment before conducting the MTC. Drugs grouped into the same class may still differ by structure, pharmacokinetics and mode of action and these differences translate into clinical practice in terms of both efficacy and tolerability. Therefore, no two drugs are really the same and grouping them into a similar class requires many, often limiting, assumptions and risks a potential loss of important information about the comparative benefits and harms [7]. It should be noted that the results obtained by this approach would be applicable only to a hypothetical average treatment rather than any of the individual treatment doses. It is likely that this decision will also have an impact on the relative effect of other treatments in the same network. If treatments are combined, e.g. to increase sample size, this should be clearly documented, and the effect explored in sensitivity analyses.


Careful consideration of the outcome definition is required in terms of its description but also with regards to the type of data (discrete or continuous) and whether intention-to-treat or per-protocol analysis was employed, ensuring similarity assumptions. This is especially important if comparing results between different reviews, and clinical opinion is useful for establishing the best outcome to measure effectiveness or safety.

When analysing endpoints, such as AEs, as a binary outcome, studies can only be combined in the analysis if they measure the outcome at the same point in time. Two studies [27, 43] appear to have combined results for the endpoints of confirmed disease progression after 3 and 6 months (CDP3M, CDP6M), respectively (data not shown). Combining endpoints makes it much harder, or even impossible, to compare the outcome with other studies and to interpret the results. It could be assumed that the results would be mainly driven by CDP3M as this is more widely reported. As different disability criteria and lengths of confirmation can cause variation in observed results, the definition of disease progression should be clearly reported [48].

As shown in Table 3, 13 MTCs analysed ARR as a Poisson outcome based on the total number of relapses and the total number of person-years of follow-up. These values were frequently not reported in the primary studies and therefore would have required imputation. The total number of relapses was typically imputed from the reported ARR values and the exposure time in person-years was typically imputed from the study duration multiplied by the number of patients who completed the trial. Two MTCs [27, 43] assumed that all patients completed the trial if the percentage of completers was not reported. In the trials which did report the percentage of completers, patients who discontinued may have contributed to the total number of relapses, but would be excluded from contributing to the exposure time, therefore increasing the imputed relapse rate. In those trials where the number of completers was not reported and 100% completion was assumed, those patients who discontinued could contribute to the exposure time but could not contribute any further relapses, therefore decreasing the imputed relapse rate. Two studies [26, 31] imputed the exposure time as study duration multiplied by the number of completers, plus half of the study duration multiplied by the number who withdrew. This has a similar impact to the assumptions of the CADTH 2013 [27] study in that the patients who discontinued did not contribute equally to the number of relapses and the exposure time. The other studies reporting ARR did not report the assumptions made to impute missing data. The net effect of these assumptions is difficult to assess as this depends on the number of studies that did or did not report specific values which cannot be determined from the published articles. Therefore, imputation should either be avoided or assumptions should be clearly reported while following standard recommendations [4, 49, 50].

Systematic Review Methods

Studies analysed in MTC should be identified by a systematic literature review in order to minimise the risk of publication bias and other biases [4, 51]. Nine MTCs did not report how many people were involved in the screening for relevant studies which could indicate a risk that relevant studies may have been missed. Twelve MTCs did not report double data extraction of studies, which increases the risk that relevant data were missed or extracted incorrectly [4].

All methods should be clearly reported, following established guidelines, to enable the reader to assess the risk of bias [15, 16, 52]. Assessment of methodological rigour is hampered when reporting is limited (e.g. five of the identified MTCs were published only as conference abstracts).

Analysis Methods

When conducting MTCs, authors can choose between a frequentist and a Bayesian framework. In general, both methods are accepted by decision-makers. However, more guidance is available for the Bayesian approach as it can incorporate previous evidence in the prior distributions [6]. While effect estimates of the two methods are comparable, different measures of uncertainties (Bayesian: credible intervals, frequentist: confidence interval) are obtained.

Similar to classical meta-analyses, authors can decide to run a fixed-effects or a random-effects analysis. While both approaches are acceptable [6], some authors advise using a random-effects analysis to account for potential ‘additional heterogeneity in an indirect comparison compared to a direct comparison’ [53]. However, Bayesian random effects analyses are less applicable when there are only a few studies to compare [54].

Two reviews [18, 44] assumed ‘that treated and control group participants who contributed to missing outcome data both had an unfavourable outcome’. This assumption is likely to have contributed to differences in the effect estimates compared with the other MTCs (Table 4). Ideally, such assumptions should be explored in sensitivity analyses and discussed in detail, in order to ensure transparency.


This paper aimed to raise awareness of issues related to the conduct and reporting of MTCs in MS to support an accurate interpretation of results by decision-makers so that limitations in the robustness of analyses can be factored into an understanding of the applicability and relevance of findings.

Clear and transparent reporting of methods, results and assumptions made in the analyses should be presented [4, 15, 51, 52, 55]. The choice of outcome measure should be explicit and reflect standard clinical practice. Both the pooling of treatments (assumption of class effect) and the combination of outcomes measured at different points in time should ideally be avoided, or the effect explored in sensitivity analyses. Further, imputation of results without a clear justification should also be discouraged. Basic assumptions of homogeneity, similarity and consistency should be explored when conducting MTCs [14]. MTCs should include all relevant treatment options.

A second paper by Giovannoni et al. [7] presents the results of a recent MTC of DMTs for RRMS and discusses how to avoid some of the issues identified in this paper.