Background

Systematic reviews (SRs) are considered the gold standard for evidence used to evaluate the benefits and harms of healthcare interventions. They are powerful tools used to assess treatment effectiveness which can subsequently improve patient care [1]. SR evidence has become increasingly important in clinical decision-making and for informing clinical guidelines and health policy [2, 3].

Often, the quality of both methodology and reporting of SRs is flawed due to deficiencies in the design, conduct, and reporting. Poorly conducted SRs can lead to inaccurate estimates of treatment effectiveness, misleading conclusions, and reduced applicability, all of which are a waste of limited resources [4]. Unfortunately, poorly conducted or reported SRs may be associated with bias, limiting their usefulness [5]. When SRs comply with established methodology, report findings transparently, and are free of bias, they provide relevant information for practice guideline developers and other stakeholders such as policy makers [5]. As such, SR methodologists have proposed and developed various methodological and reporting guidelines over the years to assist in improving the methodological rigor and reporting of SRs.

With the rise of evidence-based medicine, criteria for assessing quality began to emerge, such as Mulrow [6] and Sacks [7]. In 1991, Oxman and Guyatt developed the Overview Quality Assessment Questionnaire (OQAQ) [8], a validated tool to assess methodological quality for SRs of intervention studies. Since then, SR methodologists have suggested several other methodological quality (MQ) items, such as potential sources of bias, as important in improving quality of conduct. A Measurement Tool to Assess Systematic Reviews (AMSTAR) [9] tool was developed in 2007 for SRs for intervention studies to include these additional items. In 2010, a revised tool (R-AMSTAR) was developed to provide a quantitative scoring method to assess quality [10]. The accurate reporting of methods and SR findings was established in the late 1990s. In 1999, the Quality of Reporting of Meta-analyses (QUOROM) Statement was developed to evaluate the completeness of reporting of meta-analyses of randomized trials [11]. A decade later, the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) Statement was developed as an update of QUOROM to address several conceptual and methodological advances in the conduct and reporting of SRs of randomized trial [12]. In 2011, Cochrane developed the Methodological Expectations of Cochrane Intervention Reviews (MECIR) guidelines to specify the methodological and reporting standards for Cochrane intervention protocols and reviews [13, 14]. These guidelines drew criteria from AMSTAR, PRISMA, and other guidelines from organizations such as the US Institute of Medicine [13, 14].

Little was known about how quality or reporting of SRs was assessed in methodological reports. In a separate manuscript, we mapped the methods used to assess SR quality (e.g., use of quality assessment tools) or reporting of SRs (e.g., reporting guidelines) in methodological reports [15]. We found that the criteria used to assess MQ and reporting quality (RQ) of SRs varied considerably. These findings raised an important issue regarding how well SR authors used published reporting guidelines and MQ assessment tools.

Although methodological studies of SRs assessing the MQ or RQ have been published, adherence of SRs to established MQ and RQ assessment tools is unknown. We will address this aspect by examining existing methodological overviews.

Objectives

The objective of this study was to determine SR adherence to the QUOROM and PRISMA reporting guidelines and the AMSTAR and OQAQ quality assessment tools as evaluated in methodological overviews.

Methods

Definitions and important concepts

SRs and meta-analyses were defined based on the guidelines provided by the Cochrane Collaboration and the PRISMA Statement [12, 16]. We adopted the term overview to mean a summary of evidence from more than one SR at a variety of different levels, including the combination of different interventions, different outcomes, different conditions, problems or populations, or the provision of a summary of evidence on the adverse events of an intervention [17, 18]. Other terminology used to describe overviews includes systematic review of systematic reviews, reviews of reviews, or an umbrella review. We included publications that are “methodological overviews,” meaning research that has assessed the MQ or RQ of a cohort of SRs and refer to these publications simply as “reports.”

Methodological quality and completeness of reporting

There is an important distinction between SR quality of methods and quality of reporting. MQ is concerned with how well a SR was designed and conducted (e.g., literature search, selection criteria, pooling of data). RQ refers to how well methodology and findings were described in the SR report(s) [19]. This critical difference should be reflected in the choice of quality assessment tools and reporting guidelines.

Eligibility criteria

Inclusion criteria

This work stems from a parallel investigation where any methodological report published between January 1990 and October 2014 with a primary objective to assess the quality of methodology, reporting, or other quality characteristics of SRs was included [15]. We included only those methodological reports that evaluated SRs addressing the comparative effectiveness of interventions as most quality tools have been developed for intervention reviews. For this paper, however, we include only those reports using the most frequently employed published MQ (AMSTAR and OQAQ) and RQ (PRISMA and QUOROM) tools, as determined from the parallel investigation [15].

Exclusion criteria

We excluded reports of clinical interventions, where the intent was to summarize the evidence for use in healthcare decision-making; reports assessing the quality of diagnostic, screening, etiological, or prognostic studies; and other publication types, such as editorials, narrative reviews, rapid reviews, and network meta-analyses. Reviews that include study designs other than randomized controlled trials were also excluded. Reports in languages other than English were not included. Reports including fewer than 10 SRs, assessing the reliability of an assessment tool, evaluating only one methodological characteristic (e.g., search strategy), or those assessing only SRs with pooled estimates of effect were also excluded.

Search methods

An experienced information specialist developed and conducted an extensive search of the Cochrane Library, EMBASE®, and MEDLINE® to identify methodological reports published between January 1990 and October 16, 2014. Potentially eligible titles and/or abstracts were identified using a combination of subject headings (e.g., “Meta-Analysis as Topic,” “Quality Control,” “Checklist”) and key words (e.g., “umbrella review,” scoring, compliance) (see Additional File 1). The search strategy was peer-reviewed prior to execution [20]. Additional reports eligible for inclusion were identified by members of the research team prior to the start of the project [2, 21, 22]. These articles were used as “seed” articles when developing the electronic search strategy.

Screening

Titles and abstracts were screened for potentially relevant articles using a liberal accelerated approach (i.e., any potentially relevant citations were identified by one reviewer; a second person verified potential excludes). Full-text screening was completed independently and in duplicate by a team of reviewers with experience in methodological reviews; a 5% pilot testing was conducted at both screening levels. All screening disagreements were discussed among pairs of reviewers, with any outstanding disagreements resolved by an independent third reviewer (DM). A data management software, DistillerSR® [23], was used to manage retrieved records, screen citations/reports, record reasons for exclusion, and store extracted data.

Data extraction

We developed standardized forms for data extraction of items of interest from the included reports. Basic characteristics and findings relating to the SRs that were reviewed were extracted from each included report by two of four reviewers; a 10% random sample of reports was assessed for accuracy. A pre-extraction meeting was held for all extraction levels along with pilot testing to ensure consistency across reviewers. The following basic characteristics of the included overviews were extracted: year of publication, number of included SRs, specified medical area, number of databases searched, language restrictions, SR definition, types of publishing journals, Cochrane or non-Cochrane review, reporting of availability of study protocol, and source of funding. Additional items pertaining to the evaluated reviews were extracted: intent of assessment (whether MQ or RQ), the method(s) used to assess MQ or RQ, and details of adherence of SRs to individual items included in OQAQ, AMSTAR, QUOROM, or PRISMA guidelines.

Analyses

Summary statistics are reported as frequency and percentage of reports for report characteristics or frequency and percentage of compliant SRs. No formal inferential statistical analyses were conducted. In some cases, reports would allocate points, or scores, to MQ or RQ items. In these cases, we considered full points or a complete score to be optimal; any meeting partial scores would be considered non-adherent. A post hoc decision was made to look at publications by their intent to assess MQ only, RQ only, or both MQ and RQ. This decision was made without prior examination of the data by the senior investigator (DM). Due to the limited number of Cochrane reviews, the data did not allow for comparison of reports, including Cochrane versus non-Cochrane reviews, as planned. This study was not registered in PROSPERO or elsewhere as no known repositories take methodological protocols. However, the study protocol is available upon request.

Results

Of the 20,765 independent records retrieved from electronic searching, 1189 reports were reviewed in relation to a subset of the eligibility at full text, of which 935 were excluded for either not assessing a cohort of SRs or the primary intent was not to assess MQ or RQ. A secondary full-text review of the remaining 254 reports was carried out to determine whether exclusion criteria were met; 178 reports were excluded, leaving 76 potentially eligible reports. Once it was determined by the parallel investigation [15] which quality tools were used most often (OQAQ, AMSTAR, QUOROM, or PRISMA), 20 of the 76 reports were excluded for not using one of those tools. The tools or criteria used by the 20 reports were reported in a separate manuscript [15]. A total of 56 reports [2177] evaluating 5371 SRs were included (Fig. 1).

Fig. 1
figure 1

Flow of study reports

Report characteristics

The report characteristics are listed in Table 1. The majority of reports were conducted with the intent to assess MQ or RQ using an appropriate tool; 61% (34/56) of reports had a primary intent to assess MQ only, 7% (4/56) reported having a primary intent to assess RQ, and 27% (15/56) had a primary intent to assess both MQ and RQ. The remaining reports did not use the tools according to their intended use: one report used OQAQ for RQ assessment, one used PRISMA for both RQ and MQ assessments, and two reports used MQ tools to assess both MQ and RQ. Regardless of intent, 27 reports used AMSTAR, 26 reports used OQAQ, 13 reports used PRISMA, and seven reports used QUOROM.

Table 1 Table of characteristics by mechanism for assessing “quality”

Reports spanned an 18-year period, of which 63% (35/56) were published between 2010 and 2014, indicating a marked increase in recent years. A median of 57 SRs (interquartile range 30 to 109) were assessed in reports. Almost all reports (91%) addressed SRs of a topic within a specific medical field. Forty-three percent (24/56) of reports include SRs limited to specific journals, half (28/56) included SRs from a general sample of reviews across medical journals, and only 7% (4/56) evaluated a cohort of Cochrane reviews (i.e., from one specific source). Accordingly, the majority of reports provided details for the source of SRs, whether it was databases or specific journals. Information as to whether language restrictions were used was provided in 61% (34/56) of reports. In relation to specifying a definition for SR, 21% (12/56) did not report this information. The majority of reports (88%) did not state whether a protocol was available. Thirty-eight percent (21/56) of reports did not state the source of funding for their research. Table 1 also details these characteristics according to reports using a particular tool.

Adherence to MQ and RQ items in methodological reports

The reports assessed adherence to items for the most frequently used MQ and RQ tools (i.e., AMSTAR, OQAQ, QUOROM, PRISMA). These data have been collated across the samples of SRs (Tables 2, 3, 4, and 5). Data pertaining to adherence to quality or reporting criteria by item were obtainable from most methodological reports: 100% (13/13) using PRISMA, 71% or more (5–6 out of 7, depending on the item) using QUOROM, 85% or more (22–23 out of 27, depending on the item) using AMSTAR, and 85% (22/26) using OQAQ.

Table 2 Summary across reports of systematic reviews adhering to PRISMA reporting guidelines (N = 13)
Table 3 Summary across reports of systematic reviews adhering to QUOROM reporting guideline (N = 7)
Table 4 Summary across reports of systematic reviews meeting AMSTAR quality assessment criteria (N = 27)
Table 5 Summary across reports of systematic reviews adhering to OQAQ items (N = 26)

Adherence to reporting guidelines (RQ)

A total of 1741 SRs were included in the 13 reports that used PRISMA (Table 2). Over 85% of SRs fully reported their title, provided a rationale for the review, described all information sources, and provided a general interpretation of the results. However, compliance was poor for several items, with only 38% (657/1741) of SRs specifying any assessment of risk of bias methods across studies, 30% (527/1736) presenting results of risk of bias assessments across studies, and 37% (647/1741) describing sources of funding. Less than 6% (102/1741) provide protocol information in their SR report.

Six reports evaluating 449 SRs used QUOROM (Table 3). One additional report did not provide any information by item and is excluded from the analysis. Thirty percent (133/449) identified the report as a systematic review, and 9% (40/449) of SRs provided a figure summarizing trial flow. Included SRs adhered well to several QUOROM items. Over 85% of SRs used a structured format in the abstract, described the main results in the abstract, provided an explicit clinical question and rationale in the introduction/background section, described the study selection criteria, and presented descriptive data for each trial.

Adherence according to methodological quality

A total of 1794 SRs were included in the 23 reports that provided AMSTAR assessments by item (Table 4). Eighty percent (1439/1794) of SRs provided the characteristics of included studies. Just over half (995/1794) assessed publication bias. Thirty-nine percent (685/1779) stated a conflict of interest, and a third (590/1794) of SRs reported limitations. In addition, 30% (534/1794) of SRs used duplicate study selection and data extraction during the data collection process and 30% (537/1779) provided a list of included and excluded studies.

Twenty-two reports evaluating 1387 SRs used the OQAQ criteria (Table 5). Thirty-seven percent (499/1367) of the SRs assessed risk of bias (validity) in the included studies. Comparatively, 80% (1112/1387) of the SRs reported the criteria for study selection, 75% (1027/1387) of SRs reported search methods used to find the evidence, 73% (1005/1387) described the methods used to combine the findings, and 78% (1076/1387) of SRs determined whether the conclusions were supported by the data.

Discussion

Previously, we identified that the most commonly used tools or guidelines for critical appraisal and RQ assessment were QUOROM, PRISMA, AMSTAR, and OQAQ [15]. In this study, we evaluated SR, MQ, or RQ adherence to these quality assessments or reporting guidelines tools across methodological reports published between 1990 and 2014.

Our results indicate that SR adherence to reporting items was variable. Over 85% provided a rationale for the review when assessed using PRISMA, yet less than 6% gave protocol information in their SR report. Our study, like others, shows that reporting of review protocols is poorly reported [2, 24]. Review protocols are important to reduce duplication of research, allow researchers to plan and anticipate potential issues, assess validity of methods and replication of the review if desired, and prevent arbitrary decision-making [78, 79]. In addition, risk of bias across individual studies within reviews, additional analyses, and funding source were also poorly reported. These findings are consistent with other research [24]. We note that compliance to some reporting criteria has improved over time. Nine percent provided a trial flow diagram as reported using the QUOROM guidelines, compared to 63% using the PRISMA guidelines. This observed improvement in reporting could be partly due to journal endorsement of the reporting guideline but also due to authors’ exposure to the published tools or their general awareness to the issues of reporting in health research over time. For the few items that are similar between PRISMA and QUOROM and show a lower compliance with PRISMA, these results are possibly attributed to differences in operationalization of the criteria or simply as chance findings.

Adherence to methodological quality items was also variable. Overall, SRs using OQAQ adhered quite well to all methodological items in the tool. OQAQ was validated and is well accepted, but it was developed and validated over two decades ago [8]. The OQAQ criteria do not include assessment of issues such as a priori design, assessment of publication bias, and conflict of interest. As such, OQAQ differs from AMSTAR, which was published and validated more recently [80, 81]. For the 27 reports using AMSTAR to assess quality of SRs, the percentage of SRs meeting AMSTAR criteria was mediocre. One third or less of SRs used duplicate study selection and data extraction, provided a list of included and excluded studies within their review, or reported limitations. One small study has also shown the need for better adherence to AMSTAR [82]. We would expect that future research will include an evaluation of the recently published risk of bias in systematic reviews (ROBIS) tool [83].

SR evidence is used by decision-makers, policy makers, and other stakeholders. They should expect consistent and high-quality standards for reporting and conduct. Guidelines and tools have been developed over the years to improve RQ and MQ of SRs. Our findings suggest that for several items in MQ or RQ tools, SR authors comply well with the guidelines, but some items require major improvement. Other studies have also found that methodological and reporting quality is suboptimal [2, 84, 85]. In addition, evidence is emerging that biases within SRs could influence results and quality of overviews [86]. Effort should be directed towards improving the quality and reporting of SRs, wherever possible.

Journal endorsement and implementation of the use of reporting guidelines and critical appraisal tools during the editorial process is one mechanism to facilitate better quality. There is insufficient evidence to date in relation to systematic reviews but some information in relation to trials. One recent methodological review found insufficient evidence to determine a relationship between endorsement and completeness of reporting: Of 101 reporting guidelines, only seven had evaluable data from only a few evaluations each [87]. One small study found that reporting and methodological quality (adherence to both AMSTAR and PRISMA) significantly increased after journal endorsement of the PRISMA guidelines [25]. Readers may also be curious as to whether reporting differs when examining the influence of publication of the tools, such as a before and after publication comparison; none of the included methodological reviews assessed this. Further, in thinking about publication and then journal endorsement as potential interventions, we would agree with previously published work that journal endorsement might serve as a “stronger” intervention [87].

One unexplored hypothesis is whether the endorsement and use of reporting tools at the protocol phase of a SR paves the way for better reporting and methodological quality for the SR report. Review protocols allow researchers to plan and anticipate potential issues, assess validity of methods, and prevent arbitrary decision-making [78, 79]. The reporting of protocols can be guided and assessed by the Preferred Reporting Items for Systematic Reviews and Meta-Analysis for Protocols 2015 (PRISMA-P 2015) [78, 79]. Further, Moher et al. [2] suggested that granting agencies and journals require full compliance with established reporting and methodological guidelines, such as a requirement to include SR protocols with the submission of a SR.

Our review was limited exclusively to SRs included by authors of methodological reports. Each overview had their own selection criteria and quality thresholds; therefore, we did not seek out the publication of the individual SRs but relied on the data reported in each overview. As such, there is inherent heterogeneity that may be causing some of the observed variation in MQ and RQ. In addition, we relied on how the authors assessed and reported adherence. Variability in how strictly review authors assessed adherence to items in MQ and RQ tools could result in additional heterogeneity. Nevertheless, this report provides some insight into the adherence to quality assessment and reporting guideline items.

A rigorous development of tools for MQ and RQ is important and should involve several steps and appropriate consideration of stakeholders and methodological experts’ participation [88]. Despite considerable effort, the delivery of fit-for-purpose tools may not always be optimally achieved if items are not completely reflective of intent. For example, it could be reasonable to note that some MQ items in both AMSTAR and OQAQ are written in language that reflects more of reporting than conduct. We encourage developers to carefully consider the wording of items. Further, any tool could potentially be subject to content modifications as the science of health research methodology continues to evolve.

Conclusions

In conclusion, the methodological and reporting quality of SRs varied considerably across items in four well-known tools. Mechanisms to improve adherence to established reporting guidelines and methodological assessment tools are needed to improve the quality of SRs.