1.0 Background

Keeping up with current research for a practicing clinician or policy maker is a monumental task. Global research output is increasing exponentially [1] as is the quantity of published systematic reviews being produced yearly [2,3,4]. Systematic reviews (SRs) help clinicians navigate complex clinical topics by summarising large numbers of primary studies. Between 2000 and 2019, the number of SRs increased more than 20-fold, with 80 SRs published per day [5].

The increase in the number of SRs means that overlapping and redundant reviews are increasingly found on the same clinical, public health, or policy question of interest. Bolland et al. found 24 SRs on vitamin D supplements for prevention of bone fractures, and many of these contained conflicting results based on diverse methodological choices and differing included primary studies [6]. When encountering multiple SRs on the same question, clinicians and policy makers may be confused and unable to formulate a conclusive answer to their question [7].

To surmount this challenge, an algorithm was published in 1997 by Jadad et al. [8] to aid healthcare providers and policy makers select the “best evidence” SR(s) across multiple reviews of randomized controlled trials (RCTs) with meta-analysis addressing the same or a very similar therapeutic question, with results that “diverge” or conflict. If the reviews do not address the same question, then no further assessment is needed, and the decision maker simply chooses the review “closest to the problem to be solved”. If, however, two or more similar reviews are identified that are discordant, then a hierarchy of steps is followed to choose the best evidence.

Overviews of reviews (otherwise termed umbrella reviews, meta-reviews, etc.) were developed to summarise the results of SRs and can help make sense of potentially conflicting or discordant results [9,10,11,12]. However, a new type of study emerged, with more focused objectives than overviews of reviews, aiming to assess discordance in results across multiple similar SRs. The more focused discordance studies are often called ‘reviews of overlapping meta-analyses’, ‘reviews of discordant SRs’, or ‘reviews of discordant meta-analyses’. In this paper, we will call them “Discordant Reviews” for clarity and to distinguish them from other types of reviews and ‘overviews of reviews’. In our study, we define discordance as when SRs with identical or nearly identical clinical, public health, or policy eligibility criteria (as expressed in PICO [population, intervention, comparison, outcome] elements) report different results for the same outcome. We define discordant results, and authors’ interpretation of the results of SRs, as differences in results of SRs based on the methodological decisions authors make, or different interpretations or judgments about the results [11].

An empirical and systematic mapping study identified formal and informal approaches for dealing with multiple overlapping SRs with discordant results [10, 11]. One approach was to specify methodological criteria to select a single, most representative SR (e.g., select the highest quality and most comprehensive) [11]. Other identified approaches were to examine and record discordance and use tools (i.e. Jadad algorithm [8]) or decision rules to aid in the selection of one SR [10, 11]. This systematic mapping study identified only one tool to assess discordance, namely the Jadad algorithm, and given that there are no other options available to assess discordance across SRs, it will continue to be used by researchers. Indeed it is still being commonly used today [13,14,15].

The Jadad algorithm has not been universally adopted and has been inconsistently applied [16,17,18]. We believe our research is unique as we did not identify any study aiming to replicate the Jadad algorithm. Our study objectives were to: identify Discordant Reviews that used the Jadad algorithm to address discordance amongst SRs with meta-analysis of RCTs; replicate Jadad assessments done by authors to determine if the same SR(s) would be chosen, and explore reasons for reproducibility or lack thereof.

2.0 Methods

2.1. Study design

Cochrane SR guidance was followed when performing our study selection and data extraction [19]. Our protocol is registered as a preprint on the Research Square server [20].

2.2 Search methods

2.2.1 Database

As the basis for our search, we used an existing database of 1218 studies (2000–2020) collated from a bibliometric study [21]. The bibliometric study searched MEDLINE (Ovid), Epistemonikos, and the Cochrane Database of SRs of Interventions (CDSR) between January 1, 2000 and December 30, 2020. All studies included in the database: (a) synthesised the results of SRs, (b) systematically searched for evidence in a minimum of two databases, and (c) conducted their search using a combination of text words and MeSH terms. All included studies also had a full description of methods in the main body of the paper and focused on clinical or public health interventions.

Within this database, we identified Discordant Reviews using the EndNote search function and Boolean logic to identify the following words: overlap*[title/abstract] or discrepan*[title/abstract] or discord*[title/abstract] or concord*[title/abstract] or conflict*[title/abstract] or Jadad [abstract].

2.2.2 Medline (Ovid) search January to April 2021

In addition, we completed a more recent search in the first quarter of 2021 in MEDLINE (Ovid) using the following search string: (“SRs”.tiab or “meta-analyses”.tiab) AND (overlap.tiab or discrepant.tiab or discordant.tiab or difference.tiab or conflicting.tiab or Jadad.ab). This search was conducted on April 18, 2021.

2.3 Screening

2.3.1 Process for screening

Our screening form was piloted by all authors on 20 studies out of a possible 1251 identified by our searches to ensure high levels of agreement and common definitions of eligibility criteria. Articles were screened as full-text publications independently by two authors. Discrepancies were resolved by consensus, and arbitration by a third reviewer when necessary.

2.3.2 Stage 1 screening criteria

We first screened the studies to include all those aiming to assess discordant results across SRs on similar clinical, public health or policy questions. Studies assessing discordance can assess (a) discordant results, or (b) discordant interpretations of the results and conclusions. Both studies examining (a) and (b) were eligible using any approach (e.g. [22,23,24,25]). We thus did not restrict our eligibility based on a study’s definition of discordance. Studies meeting stage 1 criteria continued onto stage 2 screening.

2.3.3 Stage 2 screening criteria

In this stage, we selected studies that met the following inclusion criteria:

  • Included a minimum of two SRs with a meta-analysis of RCTs, but may have included other study types beyond RCTs; and.

  • Explicitly used the Jadad algorithm to choose between two or more discordant SRs for the primary outcome.

We included studies in any language and reviewers fluent in other languages used Google translate to aid in screening of non-English studies. Studies were eligible regardless of publication status and publication date.

2.4 Extraction of the primary intervention and outcome

As a systematic approach for assessing discordance, we first identified the primary outcome from each Discordant Review. We initially searched for the primary outcome as explicitly defined in the title, abstract, objectives, introduction, or methods sections [26, 27]. If the primary outcome was not identified in any of these sections, we deferred to the first outcome mentioned in the manuscript [26, 27].

The primary intervention was selected based on its relation to the primary outcome. If multiple interventions were assessed by the primary outcome, we chose the first intervention highlighted in the title or abstract [27]. We then determined which of the included SRs with meta-analysis of RCTs addressed the primary outcome and primary intervention.

The primary intervention and outcome were extracted by two authors independently, and any disagreement was discussed until consensus was reached.

2.5 Blinding of Jadad results in the discordant reviews

All included manuscripts underwent a blinding process where one reviewer independently deleted content related to Jadad results prior to our independent Jadad assessment. The one reviewer deleted pertinent components of the: abstract, highlights, results of the Jadad assessment, and discussion/conclusions sections, using Adobe Acrobat Pro or the freeware PDFCandy (https://pdfcandy.com). This individual was not involved in the subsequent Jadad assessment. Authors involved in the Jadad assessments were also instructed not to search for and read included Discordant Reviews prior to or during the assessment.

2.6 Achieving consensus instructions on how to do a Jadad assessment

The Jadad paper provides an algorithm for decision makers to choose across SRs and to identify sources of inconsistency and discordance, including differences in questions, inclusion/exclusion criteria, extracted data, methodological quality assessments, data combining, and statistical analysis methods. Within the Jadad manuscript there is little detailed guidance regarding the practical operationalisation of the algorithm. As such, out team met virtually to discuss interpretation and application of the algorithm, and to decide upon clear and specific decision rules for each Jadad algorithm step. Feedback was solicited and decision rules were adjusted until consensus was achieved. Our final interpretation of the Jadad algorithm steps and our decision rules are found in Appendix A, as well as in eight instructional videos located at: https://osf.io/2z7a5/.

2.7 Piloting Jadad assessments

The consensus instructions underwent pilot testing where further feedback was solicited and adjustments were made. Three Discordant Reviews [6, 26, 27] were used to pilot the Jadad assessments using our instructions. Two reviewers piloted three assessments using the Jadad algorithm independently and compared to a second reviewer’s assessment to identify discrepancies, which were resolved through discussion. Any necessary revisions and clarifications identified through this exercise were noted in our instructions (Sect. 2.8 and Appendix A).

2.8 Jadad assessment instructions

Briefly, Step A of the Jadad algorithm involves examining if the included SRs’ question match the Discordant Review’s question using a PICO framework [8]. If the clinical, public health, or policy questions were not identical, then Step B prompts a user to choose the SR closest to the decision makers’ question and no further assessment is necessary. If multiple SRs are found with the same PICO as the Discordant Review, then Step C should be investigated. As we were using Discordant Reviews with the same PICO as their included SRs, we started with Step C in the Jadad algorithm (Fig. 1).

Step C asks whether the same RCTs were included across the SRs (Fig. 1). If the SRs contained the same RCTs, then the assessor moved to Step D and assessed whether the SRs were of the same methodological quality or risk of bias. The AMSTAR (A MeaSurement Tool to Assess systematic Reviews) [22], the updated AMSTAR 2 [23] and outdated Oxman-Guyatt [24] are examples of tools used to assess methodological quality of SRs, and the ROBIS (Risk of Bias Assessment Tool for Systematic Reviews)( [25] tool is used to assess the risk of bias in SRs. In Step D, we either: (a) extracted the AMSTAR [22], AMSTAR 2 [23], or ROBIS [25] assessments from the Discordant Reviews, or if this was not done, (b) we conducted our own risk of bias assessment using the ROBIS tool [25]. If the included reviews varied in quality, the review of the highest quality was chosen at Step F.

Fig. 1
figure 1

Jadad Algorithm (1997)

If the SRs were of the same quality/risk of bias, then the next step is Step E - to assess and compare data extraction, clinical heterogeneity, and data synthesis across the reviews. Details about how we assessed this multi-tiered step is found in Appendix A.

If the SRs did not include the same trials, an assessment of the RCTs’ eligibility criteria as reported by the SRs was made at Step G. We found eligibility criteria information from the main text in the Discordant Reviews’ methods section, or in a table of characteristics. If the information was unavailable in the Discordant Review, two authors extracted the PICO eligibility criteria independently from the included SRs. Any discrepancies were resolved by discussion, and when necessary, with the involvement of a third reviewer.

If the SRs were determined to have sufficiently similar eligibility criteria, Step H prompts the assessor to compare the search strategies and the application of eligibility criteria across SRs to make a selection. If the eligibility criteria are not the same, Step I explores the publication status, quality, language, and availability of data on individual patients across the SRs. We applied multi-tiered hierarchical decision rules for these steps described in detail in Appendix A.

2.9 Study outcomes

2.9.1 Comparing results from our Jadad assessment with the Discordant Review authors’ assessment

We replicated the Jadad assessments and evaluated whether we chose the same SR, whether a Cochrane SR was chosen, and whether we followed the same steps as the Discordant Review authors. We also evaluated the utility, efficiency, and comprehensiveness of the Jadad algorithm, and defined them as:

  • Utility: Is the Jadad algorithm easy to use? (Sect. 2.9.2)

  • Efficiency: How much time does it take to apply the Jadad algorithm?

  • Comprehensiveness: Is the Jadad algorithm missing methods that might explain discordance (e.g., publication recency)?

  • Reproducibility: What are the possible reasons for reproducibility or lack thereof?

Our timed Jadad assessments and ease of use ratings started after Steps C and G were completed. We therefore can only report our time and utility outcomes to do a partial Jadad assessment (Steps H and I). Three Discordant Reviews [6, 13, 26] were used to pilot the Jadad assessments, and were excluded from our assessment of the amount of time it took us to complete Steps H and I. We also compared our interpretation of how to use the Jadad algorithm with the Discordant Review authors’ interpretation.

2.9.2 “Ease of use” outcome measure

The Jadad algorithm was assessed for ease of use by each assessor. A colour-coded ranking system was applied (green, yellow, red) based on how easy or difficult the assessment was judged to be for the user. The rating was based on the following rubric:

  • The step can be accomplished easily by the reviewer, due to low cognitive load or because it’s a recognised method (green).

  • The step requires a notable degree of cognitive load by the reviewer but can generally be accomplished with some effort (yellow).

  • The step is difficult for the reviewer, due to significant cognitive load or confusion; some reviewers would likely fail or abandon the task at this point (red).

2.9.3 ROBIS assessments

We assessed all SRs included in the Discordant Reviews for risk of bias using the ROBIS tool [25]. We chose to do this assessment (which is not part of Jadad) to gain knowledge about whether the Discordant Review authors or ourselves chose the SR which was at lowest risk of bias.

2.10 Data extraction

Information and data required to complete the Jadad algorithm were first sought directly from the Discordant Reviews, and if not reported, the full texts of the included SRs. The outcomes were extracted from 124 data items outlined in Appendix B. Discordant Review-level and SR-level data were extracted by two authors independently at full-text, and in the case when consensus was not reached, a third author arbitrated. Two reviewers also performed independent extractions of each Discordant Review’s interpretation of the Jadad algorithm steps. Any challenges or barriers that authors identified to using the Jadad algorithm were also extracted.

2.11 Data analysis

Our analyses were performed (a) descriptively for qualitative data, (b) using frequencies and percentages for categorical data, and (c) using median and interquartile range (IQR) for continuous data.

2.12 Deviations to our protocol

Due to the complexity of the Jadad instructions, we made several deviations to our protocol, which are outlined in Appendix C.

3.0 Results

3.1 Search results from the bibliometric study

We retrieved 16,610 records from the MEDLINE (Ovid), CDSR, and Epistemonikos databases, and 237 records from other sources (Fig. 2). Of the remaining 14,437 records after removal of duplicates, 11,481 were excluded at the title/abstract stage, and 1738 were excluded at the full text stage. A total of 1,218 articles were included that met our eligibility criteria.

Fig. 2
figure 2

Study selection flowchart

The first section in grey of the flowchart outlines the steps taken to select studies for the bibliometric study, and the second section in blue outlines our selection of studies for the Jadad replication.

3.2 Search results from the Jadad replication study

We updated our search in April 2021, which yielded 1,251 records of which 948 were excluded at the title/abstract stage. We screened 303 full text records, and of these, 24 studies included at least two SRs with meta-analysis of RCTs and used the Jadad algorithm. However, after scrutiny, we excluded another three studies [27,28,29] from our analysis as they did not choose one or multiple SRs based on the Jadad algorithm and did not follow the Jadad steps. These studies are described separately in Appendix D.

3.2 Characteristics of discordant reviews

The most common nomenclature for this study type was a ‘SR of overlapping meta-analyses’, or a ‘SR of discrepant meta-analyses’. One study was described by the authors as a ‘Systematic review of systematic reviews’ in the title and their primary aim was to assess discordance across the SRs, not to synthesize the results of multiple SRs. Other studies self-identified as SRs in the title but they did not collect and analyse primary study data. Instead, the authors of these articles assessed discordance across SRs. Despite the variety in terminology across our included studies, we have called them Discordant Reviews to distinguish them from their constituent SRs.

The 21 Discordant Reviews using the Jadad algorithm were published between 2014 and 2020 (Table 1), except for Poolman 2007 [30]. Overall, more than half of the 21 Discordant Reviews were from China alone; and when counted with the USA, accounted for most of the included Discordant Reviews. Eighteen of the 21 studies (86%) were done on conditions of the bone and joint, with the rest being on cholecystitis, orthodontically induced white spot lesions, and non-small cell lung cancer.

Within each Discordant Review, the number of included SRs with meta-analysis ranged from 2 to 7, except for Xing 2016, which included 10. The number of authors of the Discordant Reviews ranged from 2 to 10.

Table 1 Characteristics of included Discordant Reviews (n = 21)

3.3 Replication of Jadad assessments and Jadad steps assessed

Over the 21 Jadad assessments we conducted, we did not once answer yes to Step C, meaning the SRs included in the 21 Discordant Reviews did not contain the same RCTs.

As all SRs across the 21 Discordant Reviews contained different RCTs, we then progressed to assess Step G (i.e. do SRs contain the same selection criterion?). Of these, 17 Discordant Reviews (81%) were determined to have the same selection criteria across their included SRs so they moved onto to Step H, and four Discordant Reviews (19%) did not so they moved onto Step I.

Since the included SRs did not contain the same RCTs, we did not use Step D, E and F as final decision steps to select a SR. Within Step I of the algorithm, we found that no SRs performed an individual patient meta-analysis, and this sub-step was not used in assessing discordance.

3.3.1 Frequency of agreement and disagreement in the selection of the most appropriate systematic review(s) using the Jadad algorithm between ourselves and the Discordant Review authors

Of the 21 Jadad assessments, eight decisions (38%) on which SR(s) to choose agreed, and 13 (62%) disagreed (Tables 2 and 3). The author groups of 18 Discordant Reviews chose one SR, two groups chose two SRs, and one group chose three SRs. Over 21 Jadad assessments, we chose one SR in 16 instances, two SRs on four occasions, and three SRs once. In four cases, we chose the same SR as the Discordant Review authors, but also chose one additional SR.

Of the 21 Jadad assessments, 19 (90%) reported the Jadad step they used to make their final SR selection (hereafter called the “final decision step”). Of the 13 Jadad assessments that disagreed, in six instances we used the same final decision step as the Discordant Review authors, and we chose a different step seven times. Of the eight Jadad assessments between ourselves and the Discordant Review authors that agreed, six reported the Jadad final decision step. Of these, we used the same step to make our decision three out of six times.

Table 2 Agreement and disagreement in choice of systematic review(s) in replicated Jadad assessments

3.3.2 Frequency of agreement and disagreement in systematic review findings

Ten out of 13 (77%) discordant Jadad assessments led to agreement in the findings (direction of effect was the same) (Table 3). The remaining three independent Jadad assessments that disagreed led to a different direction of the effect estimates.

Overall, 18 out of the 21 (86%) independent Jadad assessments agreed in direction of the findings despite 13 having chosen a different SR. We present a case study in Appendix E to illustrate the clinical impact of choosing one SR using the Jadad algorithm.

3.3.3 Cochrane versus non-cochrane reviews chosen

Of the 21 Jadad assessments, four Cochrane reviews were chosen either by us or the Discordant Review authors (Table 3). On four occasions we chose a Cochrane review, and twice the Discordant Review authors chose a Cochrane review.

Table 3 Agreement and disagreement in systematic review findings from replicated Jadad assessments

3.3.4 Interpretation of Jadad steps by the discordant review author

Overall, there were major differences in the interpretation of the Jadad algorithm across Discordant Review author groups. Several Discordant Reviews (n = 12) did not implement the Jadad algorithm in sequential steps but rather, used components of the algorithm to assess SRs. In addition, various factors outside the Jadad algorithm were assessed by half (n = 10/21) Discordant Review authors, including: databases that were searched, the GRADE (Grading of Recommendations, Assessment, Development and Evaluations) approach, randomization method, methods used to measure outcomes, measures used to explore heterogeneity, measures used for establishing comparative superiority or inferiority, statistical approaches used in analyses, presence of subgroup analyses, software used to perform the analyses, and sources funding.

Most Discordant Reviews (n = 12) used Step I as the final decision step of the algorithm. We interpreted Step I to consist of (I1) publication status, (I2) quality of RCTs, (I3) language restrictions, and (I4) analysis of individual patient data (IPD). We operationalised publication status to be whether SRs included both published and unpublished (grey literature). In contrast, several Discordant Review authors interpreted this step to consider the date or recency of the review (n = 5) or to account for only published literature (n = 3).

The second most common final decision step was Step H (n = 5). We interpreted Step H to consist of (H1) search strategies across SRs and (H2) inclusion criteria and duplicate independent screening of RCTs. Different Interpretations of H1 included: whether the SR was published in a medical journal, and which electronic databases were searched. Interpretations of H2 included: whether the SR reported publication status and language. Many Discordant Review authors were not clear in the details of how they interpreted this step. Some Discordant Review authors reported Step H as the final decision step but ignored the criteria for Step G and selected the SR with the highest number of RCTs (n = 2). The rationale behind this decision was not reported.

3.3.4 Time and ease of use in completing steps H and I of the Jadad assessments

3.3.4.1 Time to do steps H and I of the Jadad algorithm

Of the 18 Jadad algorithm assessments completed for Steps H and I, the average time was 60 min per review (Appendix F and G, Table 1). Ten Jadad assessments took between 15 and 47.5 min with an average of 4.3 SRs to assess (range 3–6). Nine out of the ten were rated as easy to assess, and one was rated as being moderately difficult to assess. These nine easy-rated assessments had evaluated Step H of the Jadad algorithm as the final decision step. The moderately rated assessment had evaluated Step I as the final decision step.

3.3.4.1 Ease of use rating for final decision steps H and I of the Jadad algorithm

Of the 18 Jadad assessments we completed with final decision for Steps H and Step I, the median ease rating was easy (Appendix F and G, Table 1). Ten out of 18 (56%) assessments were rated easy, six (33%) were rated moderate, one moderate/hard (6%), and one hard (6%). All the assessments rated as easy were based on completing Step H. Of the easy assessments, the average number of SRs to assess was 4.4 (range 3–6 SRs). Of the eight moderate to hard assessments, three required assessment of Step I, and five Step H. They averaged 5.5 SRs to assess (range 2–10 SRs). By observation, we noticed that an easy Jadad assessment involved good reporting by the Discordant Review authors and the SR authors, the step assessed, and whether the Discordant Review interpreted the Jadad algorithm in a similar way than to us. By observation, we noticed that moderate to hard assessments involved inadequate reporting by the Discordant Review authors and discrepant data reported in their included SRs, having to conduct ROBIS assessments as the Discordant Review did not assess the quality of the SRs, and a greater number of SRs included.

3.3.6 Comprehensiveness of the Jadad algorithm (gaps or completeness)

We identified several missing methods for explaining discordance. The algorithm did not account for the date of last literature search, nor did it account for publication recency. The number of primary studies included in the SRs was also not considered within the Jadad algorithm. Lastly, the certainty of evidence, as measured by the GRADE or other approaches, was not examined by the algorithm.

3.4 ROBIS assessments

We assessed 98 SRs which were included in our 21 Discordant Reviews using the ROBIS tool. A total of 41 SRs were at low risk of bias, and 57 SRs were at high risk of bias. From the 21 Discordant Reviews, we chose 19 low risk SRs and 9 high risk SRs, while the authors chose 17 low risk SRs and 8 high risk SRs. A more detailed assessment as well as the full ROBIS assessments for each SR are found in Appendix H. Our ROBIS judgments of high or low risk of bias for each SR are found in Table 2.

4.0 Discussion

4.1 Summary and interpretation of the most important results

In our investigation, we identified research examining discordance across comparable SRs using the Jadad algorithm [8] and attempted to replicate their findings. In 62% of cases, we were unable to replicate the findings and ultimately chose a different “best evidence” SR. The lack of guidance on how to operationalise the Jadad algorithm likely contributed to the different interpretations, and ultimately disagreement between our choice and the Discordant Review authors’ choice of SR. Several Discordant Reviews did not implement the Jadad algorithm in sequential steps which also may have led to us choosing a different review using the Jadad algorithm. By observation, whenever a Cochrane SR was included in a Discordant Review, the authors or us chose the Cochrane review as the best evidence. Overall, the raters assessed the Jadad algorithm as easy to use, taking average time was 60 min with an average of 4.3 SRs to partially assess. By observation, we noticed that Jadad assessments took a longer time when there was: (a) greater number of SRs to assess, (b) having to do a quality assessment for the included SRs (as this was missing from the Discordant Review), and (c) inadequate reporting by the Discordant Review authors.

Due to limited reporting, it was challenging to replicate or obtain a comprehensive understanding of Discordant Review authors’ use of the Jadad algorithm. Inadequate reporting on how the authors interpreted the Jadad algorithm and operationalised each step led to challenges in extracting the required information. Often, Discordant Review authors only discussed the final decision step of the algorithm, and did not discuss the other steps taken that led to the final decision.

The Jadad algorithm has several limitations in terms of comprehensiveness. The algorithm does not account for the date of last literature search, publication recency, the number of RCTs included in the SRs and certainty of evidence assessment (e.g. using the GRADE approach). Moreover across 21 assessments, quality/risk of bias at the SRs was not assessed.

The exponential growth of SRs means duplication and redundant reviews will become a greater problem for clinicians and policy makers. As there is no similar tool to assess discordance, the Jadad algorithm will continue to be used by researchers , and is indeed still being used today [13,14,15]. However, the algorithm is out of date, and therefore fails to incorporate advances in biases, methodological and statistical approaches to evidence synthesis [45, 46]. Major methodological advances published after its emergence include the PRISMA reporting standard (2009 and updated in 2020; [47, 48]), AMSTAR (2007 and update in 2017; [22, 23]), ROBIS (2018; [25]), and GRADE (2009; [49]). Statistical advances of current interest include multiple imputations to model missing data, meta-regression and model selection, living systematic reviews, and network meta-analyses [45, 46], to name a few.

Intuitively, SR quality should be an important consideration when comparing reviews, but this was never considered by our team or the authors of the 21 included Discordant Reviews because the entire left side of the Jadad algorithm (Steps D, E and F) was never applied. This should be considered a design flaw in the algorithm. SR quality was only considered in the algorithm when SRs evaluated the same RCTs, which is highly unlikely with the exponential growth in research output [1] and the difficulty in locating trials. In our sample, it never happened. When two (or more) SRs asked clinical, public health, or policy questions with similar eligibility criteria it would be logical to evaluate SR quality using validated tools like AMSTAR-2 or risk of bias tools such as ROBIS and either choose the highest quality SR or eliminate lower quality SRs when there are several to choose from.

4.2 Comparison of our study with other similar studies

No methodological investigations or replications of the Jadad algorithm were identified, and therefore we are not aware of any study to which we could directly compare our study results. We are only aware of studies that applied the Jadad algorithm for choosing the best SR. This is surprising, as it is well-known that in most cases there are several SRs available for a given question [3], and general rules for practitioners have been suggested on how to choose the best review [50]. Others have noted that unexplained discordance can also result in formulating the need to conduct a new review [11, 51]. This underpins the urgent need for further empirical investigations.

4.3 Implications when clinicians encounter multiple discordant systematic reviews on the same question

Evidence-based medicine is commonly defined as the conscientious, explicit, and judicious use of current best evidence in the process of decision-making related to patient care [52]. Medical knowledge grows every day, so that evidence is rapidly evolving, and it seems impossible to stay current [1]. For example, it is recommended that a general practitioner read 19 articles every day [53], and dedicate an average of one hour per week to keep abreast of the literature [53]. With the additional problem of conflicting results in seemingly identical research, clinicians may struggle to find the time and guidance on how to do this.

Without the help of an updated algorithm to assess discordant results across SRs, clinicians will have difficulty in identifying and choosing the best evidence and thus engaging in decision-making with their patients and clinical teams. Uncertainty, disagreements, and differences in SR results undermine the ability of a healthcare provider to make an informed clinical decision [29]. As an example of the clinical impact of discordant results of SRs [29], patient reimbursement for hyaluronic acid treatment was stopped because of some negative meta-analyses results [54, 55], despite the fact that other SRs [56, 57] cited beneficial effects, as did RCTs for certain preparations [58, 59].

All Discordant Reviews using Jadad in our sample address a focused clinical question (e.g. comparing only two interventions for a specific condition and population). These Discordant reviews chose one or a small subset of SRs which may bring about simplicity in terms of summarising the SR results (since there will only be one or a few SRs included), but may lead to a loss of potentially important information through the exclusion of relevant reviews or qualitative data. At the eligibility step, the trade-off of authors choosing one SR among many is a loss of potentially important information, which may lead to greater uncertainty about the effects of the intervention, while at the same time removing the issue of discordance.

Overviews of reviews and guidelines with broad clinical, policy or public health questions synthesising results of multiple SRs may choose to weigh all the evidence on the topic and not choose one representative SR. Including all SRs is likely to introduce discordance, and will lead to other challenges when synthesising a large amount of review data (e.g. overlap in primary study data, standardising effect metrics). When including all SRs, resolving these challenges is likely to be resource intensive and cumbersome for the reader. When all SRs are included, authors may compare the effect estimates as we have in Table 3, to determine if there was agreement in the findings (i.e. direction of effect was the same).

4.4 Strengths and limitations

Our study has several strengths. First, we used consensus building strategies to develop clear instructions on how to operationalise the Jadad algorithm, and to ensure a consistent approach to assumptions and stepwise interpretation. We also adopted a systematic and transparent approach to address the objectives outlined in our protocol using SR guidance [20]. A comprehensive search strategy, including a search of the grey literature, was employed with no restrictions on language and publication status to minimise publication bias. To minimise error, screening, extractions, and assessments were completed by two independent reviewers, and subsequently compared. Any discrepancies were resolved upon consensus, and when necessary, with the involvement of a third reviewer. Pilot screening and pilot assessments were completed by reviewers and assessed to ensure consistency in understanding of the screening criteria, and definitions of coding and extracted items.

There are some aspects of our methods that should be considered limitations. Our assessments of discordance using the Jadad algorithm were conducted without clinical expertise on some conditions and interventions. We attempted to minimise the impact of this by including both methodologists and clinicians in our research team but given the breadth of topics addressed by these studies, judging the similarity and relevance of clinical, public health, or policy questions and eligibility criteria was at times difficult. It is also possible that a broader search of different databases may have identified more studies using the Jadad algorithm. Our search for overview of reviews (2000–2020) did not focus on identifying Discordant Reviews (i.e. studies aiming to assess discordant results across SRs with similar PICO), therefore we may have missed relevant studies during this period. We recommend that authors trying to identify Discordant Reviews, search for synonyms of discordance in the abstract. This might indicate that the use of the Jadad algorithm might be more prevalent than our study indicates. Findings from this study are not directly generalisable to SRs that include both RCTs and non-RCTs, which would have greater sources of heterogeneity in their study results. Furthermore, our sample is mostly representative of orthopedic conditions and may lack generalisability.

4.5 Future research

A tool that has better agreement between decision makers, addresses all pertinent variables that may contribute to discordance, and is easier to implement is needed. Such a tool, whether it be a framework or stepwise algorithm, will need to be applicable to a variety of settings (i.e., SRs of primary studies with and without standard meta-analysis or network meta-analysis). The proposed tool would also need to incorporate recent methodological and statistical advances in evidence synthesis. Validation of such a tool could involve expert opinion obtained from consensus building methods (i.e., Delphi methods) and other methods proposed by Whiting [60] and Moher [61]. Reproducibility will also be important so inter-rater reliability should also be tested. Ideally, such a tool would not only assist the clinician, policy maker, or researcher in choosing the most appropriate SR but guide the user in identifying the most likely sources of discordance.

To address the gap in investigations of discordance using algorithms or methods other than that of Jadad [8], we are currently analysing approximately 70 studies to identify how researchers assessed discordance. We are using a qualitative framework analysis to map out any stepwise approaches used. This study and the current one will inform the development by our team of a newly proposed automated algorithm to assess discordance across SRs with similar clinical, public health, and policy questions, called WISEST (WhIch Systematic rEview iS besT).

Finally, a new tool to assess discordance should take into consideration the conclusiveness, or the stability of the results, of the SRs under question [62,63,64,65]. For example, the Cochrane logo shows the summary results from a conclusive SR called “Antenatal corticosteroids for accelerating fetal lung maturation for women at risk of preterm birth” [66]. This review is conclusive, meaning it provides a definitive recommendation for an intervention based on at least one meta-analysis, and has overall consensus among clinicians and/or policy makers worldwide as to its efficacy or effectiveness.

4.7 Conclusion

Our results suggest that the Jadad algorithm is not reliably reproducible between decision makers and is inadequate for several reasons. First, there is no comprehensive prescriptive guidance on how to apply the algorithm. Second, quality was not considered when assessing SRs, which represents a major design flaw in the algorithm. Third, this tool fails to incorporate recent advances in biases, methodological and statistical approaches to evidence synthesis.

In the absence of a tool to assess discordance across SRs, we recommend that clinicians, policy makers, patients and researchers address conflicts between review findings by choosing the SR with meta-analysis of RCTs that most closely resemble their question, is the most recent (most recent search date), comprehensive (i.e. number of included RCTs), and is at the lowest risk of bias.