Background

Overviews of systematic reviews aim to systematically retrieve, critically appraise and synthesise the results of multiple systematic reviews (SRs) [1]. Overviews of reviews (also called umbrella reviews, meta-reviews, reviews of reviews; but referred to in this paper as ‘overviews’ [2]) have grown in number in recent years, largely in response to the increasing number of SRs [3]. Overviews have many purposes including mapping the available evidence and identifying gaps in the literature, summarising the effects of the same intervention for different conditions or populations or examining reasons for discordance of findings and conclusions across SRs [4,5,6]. A noted potential benefit of overviews is that they can address a broader research question than the constituent SRs, since overviews are able to capitalise on previous SR efforts [7].

The steps and many of the methods used in the conduct of SRs are directly transferrable to overviews. However, overviews involve unique methodological challenges that primarily stem from a lack of alignment between the PICO (Population, Intervention, Comparison, Outcome) elements of the overview question and those of the included SRs, and overlap, where the same primary studies contribute data to multiple SRs [7]. For example, overlap can lead to challenging scenarios such as how to deal with discordant risk of bias assessments of the same primary studies across SRs (often further complicated by the use of different risk of bias/quality tools) or how to synthesize results from multiple meta-analyses where the same studies contribute to more than one pooled analysis. Authors need to plan for these scenarios, which may require the application of different or additional methods to those used in systematic reviews of primary studies.

Two recent reviews of methods guidance for conducing overviews found that there were important gaps in the guidance on the conduct of overviews [8, 9]. The results of our first paper—which identified methods for the initial steps in conducting an overview and collated the evidence on the performance of these methods [10]—aligned with these findings. We further identified that there was a lack of studies evaluating the performance of overview methods and limited empirical evidence to inform methods decision-making in overviews [10].

This paper is the second of two papers, which together, aim to provide a comprehensive framework of overview methods and the evidence underpinning these methods—an evidence map of overview methods. In doing so, we aim to help overview authors plan for common scenarios encountered when conducting an overview and enable prioritisation of methods development and evaluation.

Objectives

The objectives of this study were to (a) develop a comprehensive framework of methods that have been used, or may be used, in conducting, interpreting and reporting overviews of systematic reviews of interventions (stage I)—the Methods for Overviews of Reviews (MOoR) framework; (b) map studies that have evaluated these methods to the framework (creating an evidence map of overview methods) (stage II); and (c) identify unique methodological challenges of overviews and methods proposed to address these.

In the first paper, we presented the methods framework, along with the studies that had evaluated those methods mapped to the framework (the evidence map) for the four initial steps of conducting an overview: (a) specification of the purpose, objectives and scope of the overview; (b) specification of the eligibility criteria; (c) search methods and (d) data extraction methods [10]. In this second companion paper, we present the methods framework and evidence map for the subsequent steps in conducting an overview: (e) assessment of risk of bias in SRs and primary studies; (f) synthesis, presentation and summary of the findings and (g) assessment of the certainty of evidence arising from the overview (Fig. 1).

Fig. 1
figure 1

Summary of the research reported in each paper

We use the term ‘methods framework’ (or equivalently, ‘framework of methods’) to describe the organising structure we have developed to group-related methods, and against which methods evaluations can be mapped. The highest level of this structure is the broad steps of conducting an overview (e.g. synthesis, presentation and summary of the findings). The methods framework, together with the studies that have evaluated these methods, form the evidence map of overview methods.

Methods

A protocol for this study has been published [11], and the methods have been described in detail in the first paper in the series [10]. The methods for the two research stages (Fig. 2) are now briefly described, along with deviations from the planned methods pertaining to this second paper. A notable deviation from our protocol is that we had planned to include the step ‘interpretation of findings and drawing conclusions’, but after reviewing the literature, felt that there was overlap between this step and the ‘assessment of certainty of the evidence arising from the overview’ step, and so consolidated the identified methods into the latter step.

Fig. 2
figure 2

Stages in the development of an evidence map of overview methods

Stage I: development and population of the framework of methods

Search methods

Our main search strategy included searching MEDLINE from 2000 onwards and the following methods collections: Cochrane Methodology Register, Meth4ReSyn library, Scientific Resource Center Methods library of the AHRQ Effective Health Care Program and Cochrane Colloquium abstracts. Searches were run on December 2, 2015 (see Additional file 1 for search strategies). These searches were supplemented by methods articles we had identified through a related research project, examination of reference lists of included studies, contact with authors of conference posters, and citation searches (see Paper 1 [10] for details).

Eligibility criteria

We identified articles describing methods used, or recommended for use, in overviews of systematic reviews of interventions.

Inclusion criteria:

  1. (i)

    Articles describing methods for overviews of systematic reviews of interventions

  2. (ii)

    Articles examining methods used in a cross-section or cohort of overviews

  3. (iii)

    Guidance (e.g. handbooks and guidelines) for undertaking overviews

  4. (iv)

    Commentaries or editorials that discuss methods for overviews

Exclusion criteria:

  1. (i)

    Articles published in languages other than English

  2. (ii)

    Articles describing methods for network meta-analysis

  3. (iii)

    Articles exclusively about methods for overviews of other review types (i.e. not of interventions)

We populated the framework with methods that are different or additional to those required to conduct a SR of primary research. Methods evaluated in the context of other ‘overview’ products, such as guidelines, which are of relevance to overviews, were included.

The eligibility criteria were piloted by three reviewers independently on a sample of articles retrieved from the search to ensure consistent application.

Study selection

Two reviewers independently reviewed the title, abstracts and full text for their potential inclusion against the eligibility criteria. Any disagreement was resolved by discussion with a third reviewer. In instances where there was limited or incomplete information regarding a study’s eligibility (e.g. when only an abstract was available), the study authors were contacted to request the full text or further details.

Data extraction, coding and analysis

One author collected data from all included articles using a pre-tested form; a second author collected data from a 50% sample of the articles.

Data collected on the characteristics of included studies

We collected data about the following: (i) the type of articles (coded as per our inclusion criteria), (ii) the main contribution(s) of the article (e.g. critique of methods), (iii) a precis of the methods or approaches described and (iv) the data on which the article was based (e.g. audit of methods used in a sample of overviews, author’s experience).

Coding and analysis to develop the framework of methods

We coded the extent to which each article described methods or approaches pertaining to each step of an overview (i.e. mentioned without description, described—insufficient detail to implement, described—implementable). The subset of articles coded as providing description were read by two authors (CL, SB or JM) who independently drafted the framework for that step to capture and categorise all available methods. We grouped conceptually similar approaches together and extracted examples to illustrate the options. Groups were labelled to delineate the unique decision points faced when planning each step of an overview (e.g. determine how to deal with discordance across systematic review (SR)/meta-analyses (MAs) and determine criteria for selecting SR/MAs, where SR/MAs include overlapping studies). To ensure comprehensiveness of the framework, methods were inferred when a clear alternative existed to a reported method (e.g. using tabular or graphical approaches to present discordance (6.2, Table 4)). The drafts and multiple iterations of the framework for each step were discussed and refined by all authors.

Stage II: identification and mapping of evaluations of methods

Search methods

In addition to the main searches outlined in the ‘Search methods’ section for Stage I, we planned to undertake purposive searches to locate ‘studies evaluating methods’ where the main searches were unlikely to have located these evaluations. For this second paper, we undertook a purposive search to locate studies evaluating assessment of risk of bias tools for SRs, since these studies may not have mentioned ‘overviews’ (or its synonyms) in their titles or abstracts and thus would not have been identified in the main searches. However, through our main search, we identified a SR that had examined quality assessment or critical appraisal tools for assessing SRs or meta-analyses [12]. We therefore did not develop a new purposive search strategy, but instead used the strategy in the SR, and ran it over the period January 2013—August 2016 to locate studies published subsequent to the SR (Additional file 2). For the other steps, the identified methods were specific to overviews, so evaluations were judged likely to be retrieved by our main searches.

Eligibility criteria

To create the evidence map, we identified studies evaluating methods for overviews of systematic reviews of interventions.

Inclusion criteria:

  1. (i)

    SRs of methods studies that have evaluated methods for overviews

  2. (ii)

    Primary methods studies that have evaluated methods for overviews

Exclusion criteria:

  1. (i)

    Studies published in languages other than English

  2. (ii)

    Methods studies that have evaluated methods for network meta-analysis

We added the additional criterion that methods studies had to have a stated aim to evaluate methods, since our focus was on evaluation and not just application of a method.

Study selection

We used the same process, as outlined in the ‘Study selection’ section, for determining which studies located from the main search met the inclusion criteria. For studies located from the purposive search, one author reviewed title, abstracts and full text for their potential inclusion against the eligibility criteria.

Data extraction

We extracted data from primary methods studies, or SRs of methods studies that evaluated the measurement properties of tools for assessing the risk of bias in SRs and one study that developed measures to quantify overlap of primary studies in overviews. The data extracted from these studies were based on relevant domains of the COSMIN checklist (Table 1) [13, 14]. We had originally planned to extract quantitative results from the methods evaluations relating to the primary objectives; however, on reflection, we opted not to do this since we felt this lay outside the purpose of the evidence map. Data were extracted independently by three authors (CL, SM, SB, JM).

Table 1 Data extracted from methods studies evaluating tools for assessing risk of bias in SRs

Assessment of the risk of bias

For primary methods studies, we extracted and tabulated study characteristics that may plausibly be associated with either bias or the generalisability of findings (external validity) (Table 1). For SRs of methods studies, we used the ROBIS tool to identify concerns with the review process in the specification of study eligibility (Domain 1), methods used to identify and/or select studies (Domain 2), and the methods used to collect data and appraise studies (Domain 3) (Table 1) [15]. We then made an overall judgement about the risk of bias arising from these concerns (low, high, or unclear). We did not assess Domain 4 of ROBIS, since this domain covers synthesis methods that are of limited applicability to the included reviews.

Analysis

The yield, characteristics and description of the studies evaluating methods were described and mapped to the framework of methods.

Results

Results of the main search

Details of our search results are reported in our first companion paper [10]. Here, we note the results from the additional purposive search and changes in search results between the papers. Our main search strategy retrieved 1179 unique records through searching databases, methods collections and other sources (Fig. 3) [10]. After screening abstracts and full text, 66 studies remained, 42 of which were included in stage I and 24 studies in stage II (exclusions found in Additional file 3). Our purposive search to identify studies evaluating tools for assessing the risk of bias in SRs (rather than primary studies) found no further stage II studies (see Additional file 4 for flowchart).

Fig. 3
figure 3

Flowchart of the main search for stages I and II studies

Of the 24 included stage II studies, 12 evaluated search filters for SRs (reported in paper 1 [10]), 11 evaluated risk of bias assessment tools for SRs, and one evaluated a synthesis method. Of the 11 studies evaluating risk of bias assessment tools for SRs, four were SRs of methods studies ([12, 16,17,18] and seven were primary evaluation studies [15, 17, 19,20,21,22,23].

Four of the seven primary evaluations of risk of bias assessment tools [20,21,22,23] and one SR [16] were included in the results of the 2013 SR by Whiting [12] and so were not considered individually in this paper. We excluded one of the SRs since, after close examination, it became clear that it reviewed studies that applied rather than evaluated AMSTAR (A Measurement Tool to Assess Systematic Reviews [22, 23]) and so did not meet our stage II inclusion criteria [18]. Therefore, of the 24 initially eligible stage II studies, 18 met the inclusion criteria, six of which are included in this second paper (Fig. 3).

Stage I: development and population of the framework of methods

We first describe the characteristics of the included stage I articles (see ‘Characteristics of stage I articles’; Table 2) followed by presentation of the developed framework. This presentation is organised into sections representing the main (latter) steps in conducting an overview—‘assessment of risk of bias in SRs and primary studies’, ‘synthesis, presentation and summary of findings’ and the ‘assessment of certainty of the evidence arising from the overview’. In each section, we orient readers to the structure of the methods framework, which includes a set of steps and sub-steps (which are numbered in the text and tables). Reporting considerations for all steps are reported in Additional file 5.

Table 2 Characteristics of stage I studies and the extent to which each described (two ticks) or mentioned (one tick) methods pertaining to the latter steps in conducting an overview

We focus our description on methods/options that are distinct; have added complexity, compared with SRs of primary studies; or have been proposed to deal with major challenges in undertaking an overview. Importantly, the methods/approaches and options reflect the ideas presented in the literature and should not be interpreted as endorsement for the use of the methods. We also highlight methods that may be considered for dealing with commonly encountered scenarios for which overview authors need to plan (see ‘Addressing common scenarios unique to overviews’; Table 6).

Characteristics of stage I articles

The characteristics and the extent to which articles (n = 42) described methods pertaining to the latter steps in conducting an overview are indicated in Table 2. The majority of articles were published as full reports (n = 34/42; 81%). The most common type of study was an article describing methods for overviews (n = 26/42; 62%), followed by studies examining methods used in a cohort of overviews (n = 11/42; 26%), guidance documents (n = 4/42; 10%) and commentaries and editorials (n = 1/42; 2%).

Methods for the assessment of risk of bias in SRs and primary studies were most commonly mentioned or described (n = 33), followed by methods for synthesis, presentation and summary of the findings (n = 30), and methods for the assessment of certainty of the evidence in overviews (n = 24). Few articles described methods across all of the latter steps in conducting an overview (n = 6 [1, 4, 6, 24,25,26]).

Assessment of risk of bias in SRs and primary studies

The three steps in the framework under ‘assessment of risk of bias in SRs and primary studies’ were ‘plan to assess risk of bias (RoB) in the included SRs (1.0)’, ‘plan how the RoB of the primary studies will be assessed or re-assessed (2.0)’ and ‘plan the process for assessing RoB (3.0)’ (Table 3). Note that in the following we use the terminology ‘risk of bias’, rather than quality, since assessment of SR or primary study limitations should focus on the potential of those methods to bias findings. However, the terms quality assessment and critical appraisal are common, particularly when referring to the assessment of SR methods, and hence, our analysis includes all relevant literature irrespective of terminology. We now highlight methods/approaches and options for the first two steps since these involve decisions unique to overviews.

Table 3 Assessment of risk of bias in SRs and primary studies

When determining how to assess the RoB in SRs (1.1), identified approaches included the following: selecting or adapting an existing RoB assessment tool for SRs (1.1.1, 1.1.2), developing a RoB tool customised to the overview (1.1.3), using an existing RoB assessment such as those published in Health EvidenceTM [27] (1.1.4) or describing the characteristics of included SRs that may be associated with bias or quality without using or developing a tool (1.1.5). More than 40 tools have been identified for appraisal of SRs [12], only one of which is described as a risk of bias tool (ROBIS (Risk of Bias In Systematic reviews tool) [15]). Other tools are described as being for critical appraisal or quality assessment. Studies have identified AMSTAR [22, 23] and the OQAQ (Overview Quality Assessment Questionnaire [28]) as the most commonly used tools in overviews [3, 12]. Methods for summarising and presenting RoB assessments mirror those used in a SR of primary studies (1.2, 1.3).

Authors must also decide on how to assess the RoB of primary studies included within SRs (2.0). Two main approaches were identified: to either report the RoB assessments from the included SRs (2.1.1) or to independently assess RoB of the primary studies (2.1.3) (only the latter option applies when additional primary studies are retrieved to update or fill gaps in the coverage of existing SRs). When using the first approach, overview authors may also perform quality checks to verify assessments were done without error and consistently (2.1.2). In attempting to report RoB assessments from included SRs, overview authors may encounter missing data (e.g. incomplete reporting of assessments) or assessments that are flawed (e.g. using problematic tools). In addition, discrepancies in RoB assessments may be found when two or more SRs report an assessment of the same primary study but use different RoB tools or report discordant judgements for items or domains using the same tool. We identified multiple methods for dealing with these scenarios, most are applied at the data extraction stage (covered in Paper 1 [10]). Options varied according to the specific scenario, but included the following: (a) extracting all assessments, recording discrepancies; (b) extracting from one SR based on a priori criteria; (c) extracting data elements from the SR that meets pre-specified decision rules and (d) retrieving primary studies to extract missing data or reconcile discrepancies ([10]).

Synthesis, presentation and summary of the findings

The six steps in the framework under ‘synthesis, presentation and summary of the findings’ were ‘plan the approach to summarising the SR results (1.0)’, ‘plan the approach to quantitatively synthesising the SR results (2.0)’ ‘plan to assess heterogeneity (3.0)’, ‘plan the assessment of reporting biases (4.0)’, ‘plan how to deal with overlap of primary studies included in more than one SR (5.0)’, and ‘plan how to deal with discordant results, interpretations and conclusions of SRs (6.0)’ (Table 4). As a note on terminology, we distinguish between discrepant data—meaning data from the same primary study that differs between what is reported in SRs due to error in data extraction, and discordant results, interpretation and conclusions of the results of SRs—meaning differences in results and conclusions of SRs based on the methodological decisions authors make, or different interpretations or judgments about the results.

Table 4 Synthesis, presentation and summary of the findings

An identified step of relevance to all overviews is determining the summary approach (1.2). This includes determining what data will be extracted and summarised from SRs and primary studies (e.g. characteristics of the included SRs (1.2.1), results of the included SRs (1.2.2), results of the included primary studies (1.2.3), RoB assessments of SRs and primary studies (1.2.4)) and what graphical approaches might be used to present the results (1.3). In overviews that include multiple SRs reporting results for the same population, comparison and outcome, criteria need to be determined as to whether all SR results/MAs are reported (1.1.1), or only a subset (1.1.2). When the former approach is chosen (1.1.1), methods for dealing with overlap of primary studies across SR results need to be considered (5.0), such as acknowledging (5.3.4), statistically quantifying (5.1) and visually examining and depicting the overlap (5.2). Choice of a subset of SR/MAs (1.1.2) may bring about simplicity in terms of summarising the SR results (since there will only be one or a few SRs included), but may lead to a loss of potentially important information through the exclusion of studies that are not overlapping with the selected SR result(s).

A related issue is that of discordance (6.0). Some overviews aim to compare results, conclusions and interpretations across a set of SRs that address similar questions. These overviews typically address a focused clinical question (e.g. comparing only two interventions for a specific condition and population). Identified methods included approaches to examine and record discordance (6.1.1) and the use of tools (e.g. Jadad [29]) or decision rules to aid in the selection of one SR/MA (6.1.2).

In addition to determining the summary approach of SR results, consideration may also be given to undertaking a new quantitative synthesis of SR results (2.0). A range of triggers that may lead to a new quantitative synthesis were identified (2.2) (e.g. incorporation of additional primary studies (2.2.2), need to use new or more appropriate meta-analysis methods (2.2.3), concerns regarding the trustworthiness of the SR/MA results (2.2.5)). When undertaking a new meta-analysis in an overview, a decision that is unique to overviews is whether to undertake a first-order meta-analysis of effect estimates from primary studies (2.3.1), or a second-order meta-analysis of meta-analysis effect estimates from the SRs (2.3.2). If undertaking a second-order meta-analysis, methods may be required for dealing with primary studies contributing data to multiple meta-analyses (5.3.2). A second-order subgroup analysis was identified as a potential method for investigating whether characteristics at the level of the meta-analysis (e.g. SR quality) modify the magnitude of intervention effect (3.3.2). If new meta-analyses are undertaken, decisions regarding the model and estimation method are required (2.5, 3.4).

Investigation of reporting biases may be done through summarising the reported investigations of reporting biases in the constituent SRs (1.2.6), or through new investigations (4.0). Overviews also provide an opportunity to identify missing primary studies through non-statistical approaches (4.2), such as comparing the included studies across SRs. An additional consideration in overviews is investigation of missing SRs. Identified non-statistical approaches to identify missing SRs included searching SR registries and protocols (4.1).

Assessment of the certainty of the evidence arising from the overview

The two steps in the framework under ‘assessment of the certainty of the evidence arising from the overview’ are as follows: ‘plan to assess certainty of the evidence (1.0)’ and ‘plan the process for assessing the certainty of the evidence (2.0)’ (Table 5). GRADE is the most widely used method for assessing the certainty of evidence in a systematic review of primary studies. The methods involve assessing study limitations (RoB, imprecision, inconsistency, indirectness, and publication bias) to provide an overall rating of the certainty of (or confidence in) results for each comparison [30]. In an overview, planning how to assess certainty (1.1) involves additional considerations. These include deciding how to account for limitations of the included SRs (e.g. bias arising from the SR process, whether SRs directly address the overview question) and how to deal with missing or discordant data needed to assess certainty (e.g. non-reporting of heterogeneity statistics needed to assess consistency, SRs that report conflicting RoB assessments for the same study). One approach is to assess certainty of the evidence using a method designed for overviews (1.1.1). However, GRADE methods (or equivalent) have not yet been adapted for overviews and guidance on addressing issues is not available. In the absence of agreed guidance for overviews, another option is to assess the certainty of the evidence using an ad hoc method (1.1.2). For example, Pollock 2015 incorporated the limitations of included SRs in their GRADE assessment by rating down the certainty of evidence for SRs that did not meet criteria deemed to indicate important sources of bias [31, 32].

Table 5 Assessment of the certainty of the evidence arising from the overview

Other identified approaches use methods developed for SRs of primary studies, without adaptation for overviews. The simplest of these is to ‘report assessments of certainty of the evidence from the included SRs’ with or without checking accuracy first (1.1.3 and 1.1.4). Authors may then use approaches specified in the data extraction step to deal with missing or discrepant assessments (see paper 1 [10]). These approaches include simply noting missing data and discrepant assessments, or reporting assessments of certainty from an SR that meets pre-specified methodological eligibility criteria, for example, the review that addressed the overview question most directly or assessed to be at lowest risk of bias. The final option when using methods developed for SRs of primary studies involves completing the assessment of certainty from scratch (1.1.5). This option may apply in circumstances where (a) an assessment was not reported in included SRs, (b) new primary studies were retrieved that were not included in the SRs or relevant studies were not integrated into the assessment reported in the SR, (c) included SRs used different tools to assess certainty (e.g. GRADE [30] and the Agency for Healthcare Research and Quality’s [AHRQ] tool [33]) or (d) assessments are judged to be flawed or inappropriate for the overview question.

Addressing common scenarios unique to overviews

In our examination of the literature, methods were often proposed in the context of overcoming common methodological scenarios. Table 6 lists the methods options from the framework that could be used to address each scenario.

Table 6 Methods and approaches for addressing common scenarios unique to overviews

While the literature reviewed often suggested a single method or step at which a scenario should be dealt with, Table 6 shows that there are multiple options, some of which can be combined. Only those methods that provide direct solutions are listed, not those that need to be implemented as a consequence of the chosen solution. Taking an example, a commonly cited approach for dealing with reviews with overlapping primary studies is to specify eligibility criteria (or decision rules) to select one SR (see Paper 1 [10]). However, multiple methods exist for addressing overlap at later steps of the overview. During synthesis, for example, authors can (i) use decision rules to select one (or a subset) of meta-analyses with overlapping studies (5.3.1), (ii) use statistical approaches to deal with overlap (5.3.2), (iii) ignore overlap (5.3.3) or (iv) acknowledge overlap as a limitation (5.3.4; Table 4). Alternatively, overlap may be addressed when assessing certainty of the evidence. Any of these approaches can be combined with methods to quantify and visually present overlap (5.1–5.2; Table 4).

Stage II: identification and mapping of evaluations of methods

Mapping studies evaluating methods to the framework

Five studies, published between 2011 and 2015, evaluated tools to assess risk of bias in SRs. Two were SRs [12, 17] and three were primary studies not included in either of the SRs [15, 19, 34]. Characteristics of these studies are summarised in Tables 7 and 8. All five studies map to the sub-option ‘select an existing RoB assessment tool for SRs’ (1.1.1) of the approach ‘plan to assess RoB in the included SRs’ (1.0) under the ‘assessment of RoB in SRs and primary studies’ step of the framework (see ‘Assessment of risk of bias in SRs and primary studies’; Table 3).

Table 7 Characteristics of SRs of methods studies and assessment of risk of bias
Table 8 Characteristics of primary methods studies and assessment of risk of bias

We found one study that evaluated methods for synthesis. Pieper 2014b developed and validated two measures to quantify the degree of overlap in primary studies across multiple SRs [35]. This study maps to the ‘synthesis, presentation and summary of the findings’ step of the framework (see ‘Synthesis, presentation and summary of the findings’; Table 4) in option 5.0 ‘plan how to deal with overlap of primary studies included in more than one SR’.

We found no stage II studies evaluating methods in the ‘assessment of the certainty of evidence arising from the overview’ step of the framework (Table 5).

Two SRs reviewed published tools to assess the risk of bias in SRs [12, 17]. Pieper [17] reviewed evidence of the reliability and construct validity of the AMSTAR [22, 23] and R-AMSTAR (revised-AMSTAR [36]) tools. Whiting [12] reviewed the content and measurement properties of 40 critical appraisal tools (Table 7). The review includes a summary of tool content (items and domains measured), tool structure (e.g. checklist, domain based), and item rating (i.e. response options). Studies included in Whiting [12] reported methods of development for 17 of 40 tools (i.e. providing information needed to assess content validity). Three of these 17 tools were judged to have been developed using a ‘rigorous’ process (notably AMSTAR [22, 23, 37], Higgins [38], and OQAQ [28]) (details in Table 7). Inter-rater reliability assessments were available from 11 of 13 studies included in Pieper [17], and for five of the 40 tools (most reporting kappa or intraclass correlation coefficient) in Whiting [12]. Six of the studies included in Pieper [17] assessed construct validity. No tests of validity were reported for any of the tools in Whiting [12] (although exploratory factor analysis was used to develop the content of AMSTAR). In addition, Pieper [17] reported data on the time to complete the assessment of each tool.

Of the three primary studies that evaluated RoB tools, two assessed the reliability and validity of AMSTAR and OQAQ [19, 34], one assessed the reliability and validity of the Rapid Appraisal Protocol internet Database (RAPiD) and the Quality and Applicability of Systematic Reviews of the National Center for the Dissemination of Rehabilitation Research (NCDRR) [34], and one reported the development and reliability of ROBIS [15] (Table 8). In addition, two of the three studies assessed the time to complete assessments [19, 34].

Assessment of risk of bias in studies evaluating methods

Both SRs [12, 17] were judged at low risk of bias, based on assessment using the ROBIS tool. Assessments for each domain are reported in Table 7. Of the four primary studies evaluating methods [15, 19, 34, 35]: (i) none referred to a study protocol or noted the existence of one, (ii) three used convenience samples as a method to select the sample of SRs to which the tool/measure was applied, (iii) the three studies that evaluated RoB tools either used a convenience sample, or provided no description, of the process for selecting raters who applied the tool and (iv) only one pre-specified hypotheses for testing of the validity of the measure [35] (Table 8).

Discussion

In this paper, we present our developed framework of overview methods for the final steps in conducting an overview—assessment of the risk of bias in SRs and primary studies; synthesis, presentation and summary of the findings; and assessment of the certainty of evidence arising from the overview. We identified five stage II evaluation studies that mapped to the ‘assessment of the risk of bias in SRs and primary studies’ step of the framework and one study that mapped to the ‘synthesis, presentation and summary of the findings’ step. The evaluations included psychometric testing of tools to assess the risk of bias in SRs and development of a statistical measure to quantify overlap in primary studies across SRs. Results presented in this paper, in combination with our companion paper [10], provide a framework—the MOoR framework—of overview methods for all steps in the conduct of an overview. The framework makes explicit the large number of steps and methods that need to be considered when planning an overview and the unique decisions that need to be made as compared with a SR of primary studies. Here, we focus on issues pertinent to this second companion paper and present some overarching considerations.

What this study adds to guidance and knowledge about overview methods

A key observation from our first paper, and aligned with conclusions of others [8, 9], was that there are important gaps in the guidance on the conduct of overviews [10]. Similar conclusions can be drawn from this paper, wherein guidance covers particular options, but not alternatives, and there is a lack of operational guidance for many methods. This is particularly pertinent for the step ‘assessment of the certainty of the evidence arising from the overview’, where GRADE methods (or equivalent) have yet to be developed for overviews. An exception was within the ‘assessment of risk of bias in SRs and primary studies’ step, where many tools for appraising or assessing the risk of bias in SRs have been developed, with psychometric evaluation for some tools, yielding at least some empirical evidence to underpin selection of tools. Detailed guidance on the applications of these tools has also been published.

The framework extends previous guidance on overviews methods [4, 39] through provision of a range of methods and options that might be used for each step. For most methods, we identified a lack of evaluation studies, indicating that there is limited evidence to inform methods decision-making in overviews. However, not all methods presented necessarily require evaluation. Theoretical considerations or poor face (or content) validity of a method may determine that it should not be used. For example, in the ‘assessment of risk of bias in SRs and primary studies’ step, an identified option (and one that has been used in some overviews) is to not report or assess RoB in the primary studies (2.1.4). Since the interpretation of evidence is highly dependent on limitations of primary studies within an SR, this option has little face validity.

A further extension to previous guidance is the linking of methods from our framework to address commonly arising challenges in overviews. This linking demonstrates that multiple methods are available for addressing each scenario, as illustrated in ‘Addressing common scenarios unique to overviews’ section using the example of the range of methods available for dealing with reviews that include overlapping primary studies.

Strengths and limitations

The strengths and limitations described in the first paper in this series [10] are now briefly described here. The strengths of our research included (a) noting any deviations to our planned protocol [11], (b) using consistent language throughout the framework and an intuitive organising structure to group related methods and (c) drafting of the framework for each step by two authors independently. The limitations included the following: (a) the subjective nature of the research involving ‘translating’ descriptions of methods into a common language or standardised phrasing, (b) exclusion of articles that could have been of relevance to overviews (e.g. methods of indirect comparison and updating systematic reviews) and (c) difficulty in retrieving methods studies as methods collections are not routinely updated (for example, the Cochrane Methodology Register has not been updated since July 2012 [40]; and the Scientific Resource Center Methods library’s most recent article is from 2013).

An additional limitation is that new methods and methods evaluations may have been published since our last search (August 2016). However, we sought to identify methods that were missing from the literature (through inference) so the structure of the framework is unlikely to change. Given the sparsity of evidence about the performance of methods, any new evaluations will be an important addition to the evidence base but are unlikely to provide definitive evidence. One recent example is the publication of AMSTAR 2 [41]. While the development of AMSTAR 2 reflects an important advancement on the previous version of AMSTAR (extending to non-randomised studies and changing the response format), the tool will require application and further testing in overviews before its measurement properties can be fully established and compared to existing tools.

Future research to refine and populate the framework and evidence map

Overview methods are evolving, and as methods are developed and evaluated, the evidence map can be further refined and populated. There are two related, but distinct streams of research here. The first stream relates to the development and application of methods. Substantial work is needed to provide detailed guidance for applying methods that have been advocated for use in overviews, in addition to developing new methods where gaps exist. The development of GRADE guidance for overviews is an important example where both methods development and detailed guidance is required.

The second stream of research involves methods evaluation. In our first paper, we suggested three domains against which the performance of overview methods should be evaluated: the validity and reliability of overview findings, the time and resources required to complete the overview, and the utility of the overview for decision-makers. For example, researchers could compare the statistical performance of different metrics to assess the degree of overlap, or different statistical methods to adjust for overlap in meta-analyses, using numerical simulation studies. A further area of research could include evaluation of different visual presentations of the range of summary results extracted from the constituent SRs. The framework will need to be refined, in response to methods development and evaluation. As mentioned in Paper 1, visual representation of an evidence map of overview methods will be useful when more evidence is available.

Furthermore, our framework and evidence map only focused on overviews of intervention reviews. The framework and evidence map could be extended to include methods for other types of overviews, such as overviews of diagnostic test accuracy reviews or prognostic reviews [42].

Conclusions

A framework of methods for the final steps in conducting, interpreting and reporting overviews was developed, which in combination with our companion paper, provide a framework of overview methods—the MOoR framework—for all steps in the conduct of an overview. Evaluations of methods for overviews were identified and mapped to the framework. Many methods have been described for use in the latter steps in conducting an overview; however, evaluation and guidance for applying these methods is sparse. The exception is RoB assessment, for which a multitude of tools exist—several with sufficient evaluation and guidance to recommend their use. Evaluation of other methods is required to provide a comprehensive evidence map.

Further evaluation of methods for overviews will facilitate more informed methods decision-making. Results of this research may be used to identify and prioritise methods research, aid authors in the development of overview protocols and offer a basis for the development of reporting checklists.