Background

Overviews of systematic reviews (henceforth termed overviews) attempt to systematically retrieve, assess, and synthesize the results of multiple systematic reviews (SRs) for a given condition or public health problem [1]. The number of published overviews is rapidly increasing [2, 3].

“Systematic reviewers” has become a term for people conducting SRs. We would expect systematic reviewers to also be involved in the conduct of overviews. Thus, authors of overviews might include SRs into their overviews which they have (co-)authored. We employ the term dual (co-)authorship (DCA) to describe this scenario [4]. Such an overlap in authorship can be considered a competing interest and raises questions regarding conflicts of interests. In theory, several steps in the conduct of an overview may be biased by DCA such as formulating inclusion criteria, conducting quality assessments, interpreting data, drawing conclusions, or dealing with competing reviews. Experts in a given field might be more likely to participate in an overview, while being enthusiastic for specific interventions, or have strong views about its effectiveness, for instance. Their opinion might also be bias by financial conflicts of interest. For example, a recent analysis found review sponsorship and authors’ financial conflicts of interest to introduce bias affecting the outcomes of reviews that could not be explained by other sources of bias [5].

In a sample of 197 Cochrane reviews, 28 (14%) were affected by DCA. DCA was mentioned in 68% (19/28) of the cases as a potential conflict of interest [6]. Our former study found that most (90%) Cochrane overviews were affected by DCA (i.e., at least one of the included reviews was affected by DCA) [4]. In 9/18 (50%) Cochrane overviews with DCA, quality assessment was not conducted independently (i.e., at least one person who (co-)authored the review was involved in quality assessment). To the best of our knowledge, no such data are available for non-Cochrane overviews. Furthermore, our former analysis focused only on the prevalence and management of DCA.

In this study, our objectives are to (1) investigate DCA in non-Cochrane overviews; (2) investigate whether there is an association between DCA and quality assessments of SRs in Cochrane and non-Cochrane overviews.

Methods

There was no a priori protocol for the study.

Given that our study had two objectives, the methods and results section are each separated into two parts. The first deals with the analysis of DCA in non-Cochrane overviews and a comparison with Cochrane overviews. The second describes a comparison of quality assessments of reviews with and without DCA using meta-analytical methods. The second part of the analysis comprises data of Cochrane and non-Cochrane reviews. The data on Cochrane overviews is taken from our former study [4].

DCA in non-Cochrane overviews

To allow for the comparability of results, the methods for the analysis of non-Cochrane overviews followed the same methods as our former study on Cochrane overviews [4]. In brief, we searched MEDLINE(via Pubmed) with a precision-maximizing search strategy (overview[ti] AND reviews[ti]) for overviews published from 2010 to September 2015. Our definition of an overview followed criteria outlined as follows [7]:

  1. 1.)

    Overviews should contain a clearly formulated objective designed to answer a specific clinical research question, typically about a healthcare intervention.

  2. 2.)

    Overviews should intend to search for and include only systematic reviews (with or without meta-analyses).

  3. 3.)

    Overviews should use explicit and reproducible methods to identify multiple systematic reviews that meet their inclusion criteria and to assess the methodological quality of these systematic reviews.

  4. 4.)

    Overviews should intend to collect, analyze, and present the descriptive characteristics of their included systematic reviews (and their primary studies) and the quantitative outcome data contained within the systematic reviews.

Protocols were excluded. In cases where updates were published, we used the most recent version. Overview selection was performed applying liberal acceleration (i.e., all titles and abstracts were screened by one reviewer; those deemed not relevant were verified by a second person for exclusion). All data were extracted by one person and verified by a second person. Data were extracted on the same items as in our former study [4]. Data were analyzed descriptively as frequencies or medians and interquartile ranges (IQR). To compare Cochrane and non-Cochrane overviews we used the Mann-Whitney U statistic and calculated odds ratios with 95% confidence intervals.

Comparison of quality assessments of reviews with and without DCA

We used meta-analytical methods to compare the quality assessments of the included SRs with versus without DCA. For this, we extracted data on quality assessments of the SRs from the overviews. The methodological quality of the included SRs was assessed in the overviews using various tools and was reported in different ways.

The Assessment of Multiple Systematic Reviews (AMSTAR) tool [8], R(evised)-AMSTAR [9] and the Overview Quality Assessment Questionnaire (OQAQ) [10] were frequently used to assess the methodological quality of SRs. AMSTAR consists of 11 items, each of which is categorized into a standardized set of four possible responses: “yes”, “no”, “cannot answer”, or “not applicable” [8]. The OQAQ was used in the development of AMSTAR. R-AMSTAR was developed to quantify the methodological quality by assigning a quality score to each SR ranging from 11 to 44, with higher scores indicating higher quality [9]. The OQAQ consists of 10 items, the first nine of which focus on methodological aspects of the scientific quality of a SR, while the last item provides an overall assessment based on an ordinal scale ranging from 1 to 7, with higher scores indicating less flaws (i.e., higher quality) [10]. The first nine questions each have three possible responses: “yes,” “no,” or “partial/cannot tell.”

For AMSTAR, a total score can be derived by summing up the number of “yes” items. This was done when authors did not present an overall score. Where overall scores were reported, these were extracted along with information on how a score was calculated to account for modifications of the original tool. In this regard, R-AMSTAR and the OQAQ were treated in the same way. In cases where authors applied or reported the results of the quality assessment on an ordinal scale (i.e., high, medium, low quality) we assigned numerical values, i.e., “high” was given a score of three, “medium” was given a score of 2, and low was given a score of 1, so that a higher value indicates a higher methodological quality. All data extractions were performed by one person and checked by a second for accuracy. Disagreements were resolved by discussion. We did not approach any authors for additional data.

Overviews had to meet the following criteria to be eligible for inclusion in the meta-analytical analysis:

  • At least two reviews affected by DCA and two reviews not affected by DCA included (to allow for calculation of a SD)

  • SD greater than 0 (i.e., the quality ratings varied across SRs)

Within each overview, we calculated the difference in mean quality score between SRs with and without DCA. These differences were standardized by the pooled SD. We conducted random effects meta-analyses (MAs) using DerSimonian and Laird’s heterogeneity variance estimator. All analyses were performed with RevMan 5.3. The included SRs served as units of analysis. The standardized mean difference (SMD) was chosen as the principal summary measure in meta-analysis to account for different scales. Mean differences (MD) were calculated when all overviews used the same scale. We used I2 to quantify inconsistency [11].

We expected skewed data due to unbalanced and small frequencies of reviews per group. Therefore, we checked the data by calculating the observed mean minus the lowest possible value (e.g., 1 for AMSTAR) and by dividing this by the SD [11]. A ratio less than 2 suggests skew, while there is strong evidence for a skewed distribution if the ratio is less than 1 [12]. We performed a sensitivity analysis by excluding all overviews where the ratio was less than 2 in any of the two groups, i.e., reviews with or without DCA. A subgroup analysis was also performed for Cochrane overviews and non-Cochrane overviews. Further meta-analyses were conducted for overviews using the original quality assessment instruments without any modifications. We were not able to investigate the impact of independent (i.e., quality assessment is performed by authors without DCA) versus non-independent quality assessment of SRs in overviews with overlapping authors due to (too) few overviews in this subsample.

Results

DCA in non-Cochrane overviews

In total, we included 78 non-Cochrane overviews (see Appendix for list of included and excluded overviews). They included a median of 14 reviews (interquartile range (IQR), 8.25–24). In 40 of 78 non-Cochrane overviews (51%), at least one of the included reviews was affected by DCA, and a median of 1 (IQR, 0–2) reviews per non-Cochrane overview were affected by DCA. In 8 out of these 40 overviews (20%), quality assessment was conducted independently. Two non-Cochrane overviews affected by DCA described this as a limitation, and four as a declaration of interest. Safeguards against potential bias arising from DCA were described in two non-Cochrane overviews. Table 1 illustrates this by contrasting these figures to the results of our former study on Cochrane overviews.

Table 1 Comparison of Cochrane overviews and non-Cochrane overviews

Results from meta-analytical comparison

Out of 20 Cochrane overviews and 78 non-Cochrane overviews included in the descriptive analysis, 14 overviews (6 Cochrane overviews and 8 non-Cochrane overviews) were included in the meta-analysis (see Fig. 1). All Cochrane overviews applied AMSTAR to assess the methodological quality of the included SRs. Four of these applied the original instrument [13,14,15,16], while one of them calculated a percentage score to account for “not applicable” responses [16]. Two Cochrane overviews modified AMSTAR allowing for a maximum score of 10 [17, 18]. Among non-Cochrane overviews, three applied the original AMSTAR version [19,20,21], while five applied the original OQAQ version [22,23,24,25,26].

Fig. 1
figure 1

Flow Chart

In all but three overviews mean quality scores were higher for reviews with DCA. The meta-analysis showed a SMD of 0.58 (95% confidence interval (CI) 0.27 to 0.90) indicating higher quality scores in reviews with overlapping authors (see Fig. 2). There was little inconsistency in the observed SMDs (I2 = 19%, p = 0.24). The test for subgroup differences shows no evidence of a difference between Cochrane (SMD 0.44; 95% CI 0.07 to 0.81) and non-Cochrane overviews (SMD 0.62; 95% CI 0.06 to 1.17). The difference in subgroup estimates was 0.18 (95% CI − 0.48 to 0,84, p value 0.60) as calculated by Z test. There was some evidence of inconsistency in the meta-analysis for non-Cochrane overviews (I2 = 45%, p = 0.08), while no inconsistency was observed for Cochrane overviews (I2 = 0%, p = 0.60).

Fig. 2
figure 2

Mean quality scores

Six overviews were excluded due to skew data in the sensitivity analysis (see Fig. 3). All six excluded overviews were non-Cochrane overviews. Thus, the sensitivity analysis resembles the subgroup analysis for Cochrane overviews. However, the effect decreased to a SMD of 0.34 (95% CI − 0.00 to 0.69), with no evidence of inconsistency (I2 = 0%, p = 0.77).

Fig. 3
figure 3

Mean quality scores (sensitivity analysis)

In total, six overviews used the original AMSTAR version without any modifications. The meta-analysis showed that reviews affected by DCA were scored one point higher than reviews not affected by DCA with respect to their methodological quality. The MD was 1.06 (95% CI − 0.31 to 2.44), with strong evidence of inconsistency (I2 = 72%, p = 0.003) (see Fig. 4). The effect was stronger for the OQAQ. The MD was 1.92 (95% CI 1.19 to 2.65), with no evidence of inconsistency (I2 = 0%, p = 0.53) based on five overviews (see Fig. 5).

Fig. 4
figure 4

Mean quality scores for AMSTAR

Fig. 5
figure 5

Mean quality scores for OQAQ

Discussion

Within this study, we compared DCA of Cochrane and non-Cochrane overviews and investigated whether there is an association between DCA and quality assessments of SRs in Cochrane and non-Cochrane overviews.

The comparison of Cochrane overviews with non-Cochrane overviews revealed significant differences with respect to the prevalence of DCA. While nearly all Cochrane overview were affected by DCA to some extent, this was the case for only half of non-Cochrane overviews (90 vs 51%). Furthermore, the proportion of reviews affected by DCA was much larger in Cochrane overviews. Since the Cochrane Collaboration is dedicated to evidence syntheses, we would expect clustering of authors in overviews. The higher proportion of overlaps in Cochrane overviews can be explained by the fact that Cochrane overviews usually exclude non-Cochrane reviews [4].

However, authors of Cochrane overviews were also more aware of the problems that might arise from dual (co-)authorship. They more often considered DCA to be a limitation or reported it in the declaration of interest section. Also, quality assessments of included reviews with DCA were more often conducted by authors not involved in the reviews. This may be due to higher awareness of conflict of interests among Cochrane reviewers or Cochrane policies. Both the declarations of interests section as well as Cochrane’s code of conduct in the Cochrane Handbook emphasize independence, transparency and acknowledgement of conflicts of interest [27]. Furthermore, minimizing bias by avoiding conflicts of interest is also stated as a goal in the fourth principle of the Cochrane Collaboration [28]. Specifically, the Cochrane policy stipulates that authors should not extract data from or assess quality of research they were involved in. Such stringent policies do not seem to exist for authors conducting overviews outside of the Cochrane Collaboration or other similarly spirited organizations. A recent survey of SRs showed that statements on conflicts of interest are more often (100 vs. 83%) included in Cochrane reviews than in non-Cochrane reviews [29]. In another survey, 97% of SRs reported conflict of interest disclosures [30]. In this study, which specifically looked at non-financial conflicts of interest, Cochrane authors more frequently reported such conflicts of interest compared to non-Cochrane authors (19 vs. 5%, p = 0.004) [30].

While the majority of medical journals require conflict of interest statements nowadays, only about half require statements on non-financial conflicts of interest and hardly any ask for intellectual conflicts of interest specifically [31], although conflicts of interest definitions often vary [32]. Intellectual conflicts of interest are defined as “academic activities that create the potential for an attachment to a specific point of view that could unduly affect an individual’s judgment about a specific recommendation” [33]. However, there is still a discussion in the scientific community about the presence of intellectual conflicts of interest [34,35,36].

Despite the conflict of interest policies within the Cochrane Collaboration, our study showed that reviews affected by DCA obtained higher quality scores than reviews not affected DCA in Cochrane overviews. This finding also occurred in non-Cochrane overviews. Overviews with DCA scored one and two points higher in overviews applying the original AMSTAR or OQAQ tool, respectively. When interpreting this, it is important to keep in mind that the range of possible scores is 0–11 and 1–7 for AMSTAR and OQAQ, respectively. Thus, the difference of two points for the OQAQ is also more important in relative terms as the scale is shorter than for AMSTAR. A possible explanation for the difference between both tools observed here is the subjectivity of the OQAQ. However, there is no guidance on how to derive at the overall assessment. Counting the number of “yes” items in AMSTAR is therefore to a lesser extent subjective. It is also important to keep in mind that deriving an overall score is inherent in application of the OQAQ. A sum score is not mentioned for AMSTAR in its source publication and has never been validated [8]. It can be questioned whether any of these differences are relevant in terms of interpreting the methodological quality of a SR. Generally, a 1-point difference in AMSTAR should not reflect huge differences of methodological quality between SRs, although this might depend on the item affected by the judgment. For example, application of unjustified statistical methods will usually have a higher impact on the methodological quality of a SR than not providing a list of included and excluded studies. Nevertheless, it should be kept in mind that it is also common to categorize SRs based on their AMSTAR score. For example, the Canadian Agency for Drugs and Technologies in Health (CADTH) determines categories of quality as follows: low (score 0 to 3), medium (score 4 to 7), and high (score 8 to 11) [37]. When a cut-off is used as an inclusion or exclusion criterion for a SR within in overview, a 1-point difference may have an important impact.

This is the first study empirically assessing dual (co-)authorship in overviews. Although our results show that reviews affected by DCA obtain higher scores for methodological quality than reviews not affected by DCA, the difference is not necessarily a result of authors’ biased quality assessments. Our analysis was performed at the overview level, and we did not collect any content-specific characteristics from the included reviews. Several other aspects might also explain the results. It is well established in the literature that Cochrane reviews have a higher methodological quality than non-Cochrane reviews [9, 38, 39]. Unfortunately, we were not able to include this in our analysis due to the low number of reviews included in overviews. In addition, the methodological quality of SRs has risen over time [40, 41]. This might be of interest when comparing several health care interventions in an overview where some interventions are more up-to-date than others. Also, the comparison of health care interventions from different fields (e.g., pharmacology, surgery, complementary alternative medicine) might be important as the quality of SRs is not necessarily equal among disciplines. All these potential explanatory variables might have an impact on the results of our analysis if they are not equally distributed among reviews with and without DCA.

We are not able to draw any definite conclusions based on our findings. For example, we were not able to investigate the impact of independent quality assessment of SRs (i.e., quality assessment is performed by independent authors) in overviews with DCA due to a too low number of overviews. Thus, we were only able to differentiate between reviews with and without DCA. While doing so, we assumed that overview authors that have co-authored an included SR pose a potential conflict of interest to the whole overview, irrespective of their tasks performed. It has been stressed that although quality assessment of a review is not performed by any of its own authors, evaluating a review of one of the other group members might also introduce bias [19]. Another idea would be to ask independent authors to evaluate the quality of SRs in overviews. However, this approach seems not feasible as the actual authors will already have a guess about the methodological quality after having performed study selection and data extraction. Furthermore, authors may find it difficult to draw conclusions when they did not perform assessments themselves. Authors of overviews and SRs may also be very well aware of the advantages and drawbacks of quality assessment tools causing them to report what is necessary in order to receive as much points as possible on a quality assessment scale (e.g., AMSTAR or OQAQ). Therefore, our results could be explained by differences in reporting rather than methodological quality.

We encourage future authors of SRs and overviews to report who was involved in which steps of study selection, data collection, and quality assessment by providing initials of persons performing these steps. This would allow further analyses in the future as this would allow differentiating between overviews where authors affected by DCA were involved in quality assessment, for example. In other words, while the unit of analysis is the SR in the current analysis, it could be shifted to single authors.

Future studies are needed to investigate the influence of reviews affected by DCA and to identify ways of how to best deal with it. Also, it would be prudent to investigate whether overviews not affected by DCA have a lower methodological quality. This could be done by a reassessment and comparison of the methodological quality of included reviews. Assessors should be blinded against the aim of the study, and their assessments would be compared with the original ratings. In the absence of bias (i.e., reviews affected by DCA in fact do have a higher methodological quality), both assessments should be comparable in theory. If bias is present, we would assume that scores of reviews affected by DCA would be lower in the reassessment than their original ratings, while no such effect would be observed in reviews without DCA. Needless to say, the issue of DCA is not overview specific, but also arises with systematic reviews and primary studies.

Limitations

This study has some limitations which must be pointed out. First, our search strategy for the identification of non-Cochrane overviews followed a precision maximizing approach, thus our sample might lack representativeness. Second, we did not perform a sample size calculation before the study as sample sizes calculations are known to be difficult in this context [42]. Third, the analyzed data is skewed and unbalanced. This might question the use of standard methods for meta-analyses of parametric data. However, we have tried to estimate impact of skew data by excluding them in a sensitivity analysis. Fourth, our study is based on quality scores of reviews. We have also calculated scores in cases where the authors refrained from doing this. This is true for AMSTAR where the overall score has never been validated and might be an inappropriate measure of methodological quality. However, there is no alternative approach we could have opt for to investigate differences in quality scores of reviews with and without DCA. Lastly, four of the included non-Cochrane overviews were published by one group of authors [23,24,25,26]. Thus, the results of these overviews cannot be seen completely independent of each other.

Conclusions

DCA frequently occurs in overviews. Nearly all Cochrane overviews are affected by DCA. Reviews with DCA obtain higher methodological quality scores than reviews without. Potential conflicts of interest are one explanation for this association. The reasons need to be further investigated, however. Authors need guidance what to do if they are going to include their own review.