Background

To decide the best treatment for a patient with a specific condition, healthcare providers and patients need a synthesis of the relative treatment effects for all potential treatment options [1, 2]. This comparative effectiveness synthesis would ideally involve a systematic review with network meta-analysis (NMA) of randomised controlled trials (RCTs) [3]. NMA emerged due to the limitations of standard meta-analyses to compare and rank the effectiveness of multiple treatments for the same condition [4]. Standard meta-analyses only combine effects from RCTs comparing two treatments.

NMA can help patients and their care providers choose the treatment that is most important to them based on the side effects and efficacy of all treatments. For example, Li et al. recently showed that prostaglandins would have been identified 7 years earlier as the most effective drug class in lowering intraocular pressure for open-angle glaucoma if an NMA had been performed at that time [5]. Recent empirical research also showed that NMA was 20% more likely to provide strong evidence of treatment differences compared with standard meta-analysis, and NMA provided strong evidence 4 years earlier than standard meta-analysis (because head-to-head RCTs had not been conducted that would have provided “direct” evidence) [6].

For a practicing healthcare provider, researcher or policymaker, deciding whether to believe the results from a single NMA or to choose amongst conflicting NMAs, is difficult without a tool to assess the risk of bias. An empirical evaluation identified 28 NMAs on treatment for rheumatoid arthritis [7] and found considerable discrepancies across data extracted and risk of bias assessments of included RCTs and assessment of heterogeneity. In addition, different network configurations were possible due to the different grouping of interventions considered and how they might have been merged or split into different nodes. Concerns with each of these issues leave healthcare providers and policy makers with uncertainty as to which of the biologics has the greatest treatment effect [7, 8].

Tools are available for most study designs to make quality assessment easier. For example, the Methodological Expectations of Cochrane Intervention Reviews [9] is a guideline which outlines the methods that should be followed when authors are conducting a systematic review. The ROBIS (Risk Of Bias In Systematic Reviews) [10] tool can be used by stakeholders to assess the risk of bias in systematic reviews with standard meta-analysis. Biases at the systematic review level include publication bias (e.g. where studies are missing from the published literature because they did not report statistically significant results) and selective reporting of outcomes (e.g. where outcomes did not reach a high level of magnitude or the desired direction of effect and are not reported in the published trial) or analyses. The consequence of selective reporting is that the published literature is strongly biased and will substantially overestimate or underestimate effects and associations.

The only way to deal with the problems plaguing medical science is a combined effort by researchers, editors and funding bodies to publish all science without bias and improve the quality of research that reaches publication. This cannot be done without a tool to evaluate the limitations in the way in which the NMA was planned, analysed and presented, including the way in which the evidence was assembled. If inappropriate NMA methods are used, the validity of the findings could be compromised, and decision makers will not know whether to trust the NMA results and conclusions [11,12,13].

Our proposed risk of bias (RoB) NMA tool will allow decision makers (defined as an individual or group who has an interest in, or affected by, health- and healthcare-related research) to assess the biases in an NMA. Our proposed RoB NMA tool is not targeted at authors of NMAs, as it does not outline how to conduct an NMA. It is targeted at decision makers such as healthcare providers, policymakers and physiotherapists, or journal peer reviewers who want to determine if the results of an NMA can be trusted to be at low risk of bias.

Checklists and tools with different aims exist to appraise NMAs, including for example, the PRISMA-NMA (PRISMA statement extension for reviews incorporating NMA, 2014) [14], used when writing up the results of an NMA, or the ISPOR (International Society for Pharmacoeconomics and Outcomes Research; [15]) checklist, used by researchers when conducting an NMA (Table 1). These review-level tools are not to be confused with tools to assess the individual primary studies included in systematic reviews (e.g. Cochrane risk of bias tool for randomised controlled trials [16]).

Table 1 Tools and checklists to aid in systematic review conduct and to assess the reporting, quality of conduct or the risk of bias in a review

Guidance on how to develop quality and risk of bias tools has been proposed by Moher [26] and Whiting [27], and one of their first recommended steps is to create a systematically developed list of bias items. Such a list of items has been created by Page et al. [28] when updating the PRISMA 2020 checklist [25]. However, there has been no attempt to comprehensively identify items from NMA quality tools, checklists and scales, which would provide a useful item bank for a proposed risk of bias tool for NMAs (RoB NMA tool), or those wishing to update existing tools or standards for NMAs. The aim of this study is to conduct a methodological review to compile a preliminary list of concepts related to bias in NMAs. The list is not intended to be used to assess biases in NMAs, but to inform the development of items to be included in our tool.

Methods

Management, gGuidance and protocol

A steering committee of nine individuals was convened and comprised of eight experts in NMA, tool development and evidence synthesis methodology, as well as one clinician. The steering group is responsible for the management of the project and has executive power over all decisions related to the new tool.

A methodological review is where evidence on a given methods topic is systematically identified, extracted and synthesised (e.g. Song [29] and Page [28]). We followed the methodology proposed by Whiting [27], Sanderson [30], and Page [28] as previously discussed. We published our study protocol in BMJ Open [31] and present all data on the Open Science Framework at https://osf.io/f2b5j/.

We adopt a broad definition of an NMA as a review that aims to, or intends to, synthesise simultaneously the evidence from multiple primary studies investigating more than two health care interventions of interest. We also considered in our definition the cases when multiple treatments are intended to be compared in an NMA but then the assumptions are found to be violated (e.g. studies are too heterogeneous to combine), and an NMA is not feasible. Our RoB NMA tool will aim to address the degree to which the methods lead to the risk of bias in both the NMA’s results and the authors’ conclusions.

Paper eligibility criteria

We included papers describing instruments (i.e. domain-based tools, checklists, scales). A tool is defined as any structured instrument aimed at aiding the user to assess quality or susceptibility to bias [30]. Domain-based tools are designed to assess the risk of bias or quality within specific domains [32]. To be defined as a checklist or questionnaire, it had to include multiple questions, but without the intention to ascribe a numerical score to each response or to calculate a summary score [32]. To be defined as a scale, a numeric score was ascribed to each item and a summary score was calculated [33].

We also include methods papers and journal editorial standards that present items related to bias, reporting or the methodological quality of NMAs. We also included papers that assessed the methodological quality of a sample of NMAs.

Inclusion criteria

  1. I.

    Papers describing methods relating to methodological quality, bias or reporting in NMAs of interventions

  2. II.

    Papers or reports describing journal editorial standards for NMAs (e.g. comparable to the Cochrane MeCIR [methodological standards for the conduct of new Cochrane Intervention Reviews] standards [9])

  3. III.

    Papers examining quality (or risk of bias) used in a sample of NMAs of interventions (e.g. Chambers 2015 [34]) using criteria that focus specifically on aspects of NMAs not just on general aspects of systematic reviews

  4. IV.

    Guidance (e.g. handbooks and guidelines) for undertaking NMAs of interventions

  5. V.

    Commentaries or editorials that discuss methods for NMAs of interventions

Exclusion criteria

  1. I.

    Papers describing instruments that only assess general aspects of reviews without focusing specifically on NMAs (e.g. AMSTAR [18], AMSTAR 2 [17] or ROBIS [10]).

Papers with any publication status and written in any language were included. If we identified a systematic review of studies that would themselves be eligible for this review, we used the results of the review and only included similar studies published subsequent to the review.

Item eligibility criteria

Items that were potentially relevant to the risk of bias in NMAs were assessed against the eligibility criteria outlined below. Items related to reporting quality were retained because they potentially could be translated into a risk of bias item.

We included items related to bias, methodological quality or reporting and excluded items that were equally applicable to all systematic reviews as they are covered by other instruments.

Exclusion criteria

  1. I.

    Items that are equally applicable to all systematic reviews as they are covered by other tools (e.g. ROBIS [10], AMSTAR 2 [17]).

  2. II.

    A tool to assess the risk of bias due to missing evidence in an NMA (i.e. selective outcome reporting and publication bias) has been recently published [35], and we have therefore not included any items related to missing data in an NMA.

Where we included method studies related to NMA biases (e.g. Bujkiewicz 2019 [36]) and studies assessing the quality of NMAs (e.g. Dotson 2019 [37]), we extracted the sentence and surrounding text outlining the method and reworded the text into a concept.

Search methods for studies

An experienced information specialist executed literature searches in July 2020 in the following electronic databases: MEDLINE (Ovid), Cochrane Library and difficult-to-locate/unpublished (i.e. grey) literature: EQUATOR Network, Dissertation Abstracts, websites (Cochrane, The Canadian Agency for Drugs and Technologies in Health [CADTH], National Institute for Health and Care Excellence [NICE], Pharmaceutical Benefits Advisory Committee, Guidelines International Network, ISPOR and International Network of Agencies for Health Technology Assessment) as well as methods collections (i.e. Cochrane Methodology Register, AHRQ Effective Health Care Program). One expert in search validation designed the search, a second expert revised the search and two librarians independently reviewed the search (Additional file 1).

We scanned the reference lists of included studies. We also asked members of the steering group to identify studies missed by our search. We contacted authors of abstracts or posters to retrieve the full study or when data were missing.

To identify in-house journal editorial standards for NMAs, we created an email list of editors-in-chief of journals publishing NMAs, using the reference list of a bibliometric study of NMAs [38]. We located the journal website using the Google search engine and then located the emails of the editors-in-chief. If they indicated they used an in-house editorial standard for NMAs, then we added these standards to our list of potentially eligible papers.

Selection of studies

The eligibility criteria were piloted by two reviewers independently on a sample of studies retrieved from the search to ensure consistent application. Two reviewers independently reviewed the title, abstracts, and full text for their potential inclusion against the eligibility criteria. Any disagreement was resolved by discussion with a third reviewer. In instances where there was limited or incomplete information regarding a paper’s eligibility (e.g. when only an abstract was available), the original study authors were contacted to request the full text or further details. Google Translate was used when the authors of the current paper were not fluent in the language of interest.

Selection of items

Extracted items were reviewed against our eligibility criteria by the steering committee using a consensus-based decision structure. The steering committee decided on their inclusion through an online Zoom™ polling process. The polling options were to include, amend or exclude the item as it was a general systematic review item, or not related to NMA bias.

Data extraction of studies

From the included studies, we extracted the following data: first author and publication year, standard instrument nomenclature (i.e. tool, scale, checklist and definitions), whether the instrument was designed to assess specific topic areas, number of items, domains within the instrument, whether the instrument focuses on reporting or methodological quality (or focuses on other concepts such as precision of the treatment effect estimates), how domains and items within the instrument are rated (if applicable), methods used to develop the instrument (e.g. review of items, Delphi study, expert consensus meeting) and the availability of guidance as a separate document or included within the original publication.

Data extraction of items

From the included studies, items potentially relevant to NMAs were extracted verbatim. Two seminal instruments were extracted first because (a) they have the most comprehensive list of items and (b) they were rigorously developed (e.g. used a Delphi process, tested reliability): ISPOR [15] and PRISMA NMA [14] checklists.

PRISMA NMA and ISPOR provided a taxonomy of items, onto which we mapped other similar items (original taxonomy can be found at https://osf.io/f2b5j/). We first (i) extracted items from the ISPOR checklist, (ii) grouped similar PRISMA NMA items next to the ISPOR item and finally (iii) added items not present in ISPOR next to those in the same domain (e.g. eligibility criteria domain). This process made it easier to identify duplicate items, which could be later combined.

Once the items from PRISMA NMA [14] and ISPOR [15] were extracted, a new source was reviewed one at a time based on the year of publication (newest first) [28]. It is hypothesised that old instruments would contain outdated methods and are not as comprehensive.

Once all items were extracted, the following steps were used to group items:

  1. III.

    Split items so that each item only covers a single concept

  2. IV.

    Combine duplicate items

  3. V.

    Group items by similar concept

  4. VI.

    Categorize items as being related to biases specific to NMAs

  5. VII.

    Reword into concepts

Two reviewers independently extracted data and discussed discrepancies until a consensus was reached. Data were extracted using Microsoft Excel.

Organising and categorising items

Several rounds of modification were required until a list of items was finalised and categorised into domains. The steering committee reworded the items, typically structured as questions, into concepts (i.e. general notions) to avoid undue focus on the wording of the item and to make sure these were not confused with a list of items that would be included in the final tool.

Deviations from the protocol

A deviation from our protocol [31] was that one author (CL) extracted data for the columns “Methods to develop the document” for Tables 3, 4 and 5, and “Research Institute” for Tables 2, 4 and 5, when we had planned for two independent authors to extract all data.

Results

Search results

The search yielded 3599 citations, 3418 of which were excluded at the title/abstract phase. A total of 181 were assessed in full text and of these, and 58 studies were included (Fig. 1). Three CINeMA studies were similar but reported slightly different results: Nikolakopoulou [23], Papakonstantinou [39] and Salanti [40]. Three articles were therefore grouped together in Table 1.

Fig. 1
figure 1

Flowchart of the study selection

We identified a review by Laws et al. in 2019 [41] that contained guidance documents for conducting an NMA from countries throughout the world. We therefore did not search for guidance documents published before the last search date of this review. Four other reports were comprehensive methods reviews aggregating previous items related to NMAs [42,43,44,45].

Journal editors’ in-house reporting standards

We located the emails of 206 editors-in-chief of journals publishing NMAs, and of these, 198 emails were successfully delivered. We received 40 responses (40/198), representing a 20% response rate. No respondents reported that they had an in-house editorial standard for NMAs.

Characteristics of included studies

Of the 58 included studies, 12 were tools, checklists or journal standards; 13 were guidance documents for NMAs; 27 were studies related to bias or NMA methods; and 6 were papers assessing the quality of NMAs.

Tools, checklists or standards for NMAs

Two instruments focused solely on the risk of reporting biases, one focused on assessing the validity of NMAs, one focused on assessing certainty in the NMA results, two focused on methodological quality and the remainder mixed all these concepts into one instrument (Table 2). Of the instruments relating to all types of quality or bias, four reported and used rigorous methods in their development (Hutton [14], Jansen [15], Ortega [46], and Page [25]).

Table 2 Characteristics of tools, checklists or journal standards (n = 12)

Nearly all of the included tools (n = 10/12) were domain-based, where users judge the risk of bias or methodological quality within specific domains (Table 2). All NMA tools were designed for generic rather than specific use (e.g. a tool designed only for meta-analyses of diagnostic accuracy studies). Six tools described methods to develop the tool, or linked to supplementary data containing this information. Five of the tools included guidance documents.

Guidance documents for NMAs

We identified 13 guidance documents for the conduct and reporting of NMAs (Table 3). None of the guidance reports was targeted at specific types of NMAs. One study by Laws in 2019 [41] was a comprehensive systematic review of all guidance for NMAs worldwide, and none of which was targeted at specific types of NMAs. In the Laws systematic review [41], guidelines from 41 countries were examined, yielding guideline documents from 14 countries that were related to the conduct of an NMA. Laws [41] broadly categorized the criteria for conducting NMA from these guidelines as (a) assessments and analyses to test assumptions required for an NMA, (b) presentation and reporting of results and (c) justification for modeling choices.

Table 3 Characteristics of guidance documents (n = 13)

Studies assessing the methodological or reporting quality of NMAs

Of the six papers assessing the quality of NMAs, one assessed reporting quality using PRISMA NMA [62] (Table 4). Three assessments used the National Institute for Health and Care (NICE) Guide to the Methods of Technology Appraisal, the NICE Excellence Decision Support Unit checklist (NICE-DSU) alone [63], or the latter in combination with the ISPOR checklist [64, 65]. The remaining two studies did not report basing their assessment on any instrument; Donegan [66] assessed both methodological quality and reporting quality but did not base their assessment on an established instrument, and Dotson [37] evaluated if NMAs displayed evidence of a confounding bias that varies with time.

Table 4 Characteristics of studies assessing quality of NMAs (n = 6)

Method and bias studies on NMAs

Of the 27 papers on methods for NMAs, 11 were from the UK, 8 were from Canada and the USA each, 2 were from Germany, Switzerland and Greece each, and one each was from Ireland and Portugal. The majority of methods studies were not aimed at a specific type of NMA, nor a specific medical field (n = 18/27). Of the five studies that focused on a specific type of NMA, two were aimed at disconnected networks, and one each of adaptive trial designs, random inconsistency effects and Bayesian models (Table 5). The remaining four were aimed at specific medical fields, namely depression, hypertension, social anxiety, any drug therapy and inflammatory arthritis.

Table 5 Table of characteristics of methods studies related to NMA biases (n = 27)

Retained concepts

A total of 99 items were extracted verbatim from the 58 studies (dataset at https://osf.io/f2b5j/), and after item screening against the eligibility criteria, we included 22 that were reworded into concepts (Additional file 3).

The concepts in Additional file 3 were categorised into the following domains: 3 concepts in network characteristics, 4 concepts in effect modifiers, 13 concepts in statistical synthesis and 2 concepts in interpretation of the findings and conclusions. Concepts related to joint randomisability, inappropriate exclusion of interventions, specification of nodes, network geometry, effect modifiers, appropriate handling of multi-arm studies, heterogeneity, consistency, choice of priors, sensitivity analyses, robustness of the results and trustworthiness of the conclusions were considered. These concepts should not be used to assess bias in NMAs as they are preliminary thoughts which will be altered and refined into items based on expert feedback [89].

Discussion

Using a systematic search of the literature, we identified 58 studies presenting items or concepts related to quality or bias in NMAs. When we surveyed editors-in-chief of journal publishing NMAs, we found that none reported using in-house editorial standards for NMAs. These studies yielded 99 items of which the majority related to general systematic review biases and quality, which are covered in tools such as AMSTAR 2 [17] and ROBIS [90] and were therefore excluded. Twenty-two concepts related to biases specific to NMAs were retained. Concepts related to joint randomisability, effect modifiers, specification of nodes, inconsistency, robustness of the results, and trustworthiness of the conclusions, and others were considered. The list of concepts in Additional file 3 is not intended to be used as an instrument. While waiting for our tool to be finalised and published, stakeholders should use a combination of methods and topical expertise to anticipate the most important sources of bias, assess risk of bias and interpret the effect of potential sources of bias on NMA estimates of effect and authors’ conclusions.

Strengths and limitations

A major strength of our research was that we conducted it in accordance with a systematic review protocol [31]. Two other studies, Sanderson [30] and Page [28], developed lists of quality items systematically. We followed their methods which involved building a bank of items through a systematic review of the relevant literature. Other strengths included using a systematic search strategy developed by an information specialist and inclusion of grey literature in any language, using intuitive domains to organise items related to bias and using a consensus-based decision structure to select, reframe and refine items.

One limitation of our study is the challenge in retrieving methods studies as methods collections are not regularly updated (for example, the Cochrane Methodology Register has not been updated since July 2012 [91] and the Scientific Resource Center Methods library’s most recent article is from 2013). Since the submission of this manuscript, two new websites for methods guidance have emerged: LIGHTS (https://lights.science/) for methods guidance and LATITUDES (www.latitudes-network.org) which features validity assessment tools. However, we do not expect any missing relevant methods studies or tools to supply additional novel concepts.

An additional limitation is that potentially relevant studies may have been published since our last search (July 2020), and our search may not have retrieved all relevant studies. However, the 22 included concepts reflect all aspects of NMA bias considered by previous methodological tools and their expert authors, and it is therefore unlikely that important concepts are missing.

Impact of the development of a new risk of bias tool for NMAs

We believe our proposed tool to assess the risks of bias in NMA is needed for several reasons. Other tools and checklists for NMAs have been published; however, few of these were developed based on systematic and rigorous methodology (i.e. Moher [26] and Whiting [27]), and none is current and comprehensive (see Table 1). The PRISMA-NMA (Hutton [14]) and the NICE-DSU checklist (Ades [47]) were designed to assess reporting quality (i.e. how well a study is described in publication). The ISPOR checklist (Jansen [15]) was designed to assess reporting, validity and applicability. Finally, the checklist for critical appraisal of indirect comparisons (Ortega [46]) was designed to assess methodological quality. These tools (published between 2012 and 2014) are now outdated and fail to incorporate advances in biases, methodological and statistical approaches to NMA evidence synthesis. Our proposed tool will be current and aims to incorporate these new advances.

Future research

This study represents the first stage in the development of a new risk of bias tool for NMAs. This systematic review of items identified 22 concepts which were entered into a Delphi survey to solicit expert opinion [89]. The steering committee used expert feedback to choose and refine the concepts. We also considered feedback from a stakeholder survey on the structure, conceptual decisions and concepts in the proposed tool [89]. The concepts were then worded into items, and an elaboration and explanation document was written. The protocol tool is currently undergoing pilot testing, and those interested in piloting, or using the tool in the future, can contact the first author (CL). The steering committee intended the RoB NMA tool to be used in combination with ROBIS [10] (which we recommend as it was designed to assess biases specifically) or other similar tools (e.g. AMSTAR 2 [17]) to assess the quality of systematic reviews. Further research will involve reliability and validity testing.

Conclusions

Twenty-two concepts were included, which will inform the development of a new tool to assess the risk of bias in NMAs. Concepts related to joint randomisability, effect modifiers, specification of nodes, inconsistency, robustness of the results, and trustworthiness of the conclusions and others were considered. The list of concepts is not intended to be used as an instrument to assess biases in NMAs, but to inform the development of items to be included in our tool.