Introduction

The Grading of Recommendations, Assessment, Development and Evaluation (GRADE) approach is a system for rating the quality of a body of evidence in systematic reviews (SRs) and other evidence syntheses, such as health technology assessments, and guidelines and grading recommendations in health care [1]. GRADE approach offers a transparent and structured process for developing and presenting evidence summaries and for carrying out the steps involved in developing recommendations. It can be used to develop clinical practice guidelines and other health care recommendations (e.g. in public health, health policy and systems and coverage decisions), and is becoming increasingly popular among guideline developers and systematic reviewers [1]. Up to date, more than 110 organizations from 19 countries around the world have endorsed GRADE approach, including the World Health Organization, the Kidney Disease: Improving Global Outcomes (KDIGO) guidelines, the World Allergy Organization, and the World Society of the Abdominal Compartment Syndrome and the Cochrane Collaboration [1]. In addition, it is also becoming an international standard for judging the evidence in SRs and clinical guidelines [2]. GRADE approach differs from other appraisal tools for three reasons: (i) because it separates quality of evidence and strength of recommendation, (ii) the quality of evidence is assessed for each outcome, and (iii) observational studies can be ‘upgraded’ if they meet certain criteria [3]. Thus, appropriate application of the GRADE approach can provide quality of evidence and strength of recommendations that is explicit, comprehensive, transparent, and pragmatic.

Systematic reviews (SRs) attempt to identify, select, synthesize, and appraise all high-quality research evidence relevant to a well-honed question. There have been many studies based on SRs, such as the GBD database [4], the World Cancer Research Fund/American Institute for Cancer Research report [5], as well as dietary guidelines [6]. As far as we know, untreated urological conditions are the major burden of patients all over the world. So far, many studies have summarized the factors affecting the urinary system and the prognosis of the urinary system disease using SRs.

However, no studies to date have assessed how SRs in urology and nephrology journals have used the GRADE approach to evaluate the certainty (or quality) of evidence. Therefore, the purpose of this study was to identify and describe all relevant SRs that use the GRADE approach to evaluate the outcome-specific certainty of evidence published from 2016–2020 in the top 10 urology and nephrology journals with the highest impact factor according to the 2020 Journal Citation Reports (JCR), and to summarize and present the GRADE-specific information, such as the outcomes rated, the number of primary studies, the exposure category, the use of summary of findings tables, the total number of down- and up-grading domains, while also considering the study design (SRs of randomized-controlled trials RCTs vs. non-randomized studies NRSs).

Methods

Search strategy

SRs published in the top 10 urology and nephrology journals with the highest impact factor according to the 2020 JCR between 1 January 2016 and 31December 2020 were identified by searching in the PubMed database (Supplementary Appendix 1).

Selection of documents

The SRs were included when the following criteria were met: (1) SRs utilizing the GRADE approach to assess the certainty of evidence. We excluded the following SRs: (1) a modified version of the GRADE approach was applied to assess the certainty of evidence; and (2) failure to provide details of the GRADE approach evaluation process and results.

First, two reviewers (SZ, S-X L.) independently screened identified articles by title and abstract. Only specific irrelevant articles were excluded at this stage. Second, we independently obtained and checked all potentially relevant articles for final inclusion by two reviewers (SZ, S-X L.). Selection disagreements were resolved by discussion or consultation with a third author (Q-J W.).

Data extraction

We extracted the following information for all identified SRs: year of publication; journal name; number of primary studies included; type of studies included RCTs vs. NRSs (including, non-randomized intervention, case–control, cohort, and cross-sectional studies) vs. combination RCTs/NRSs; number of participants; description of intervention(s)/exposure(s); number and types of outcome(s) and comparison(s) rated; category of certainty of evidence ratings (high, moderate, low, or very low); meta-analysis conducted (yes vs. no); summary of findings table reported (yes vs. no); number of down- and/or up-grading (count of the respective downgrading/upgrading domain used at the outcome level); and reasons for down- and/or up-grading. The one reviewer (SZ) extracted the study characteristics of the SRs, then one reviewer cross-checked all data (S-X L). We extracted all downgrading factors listed in the SRs of RCTs and NRSs according to the study design. There were five SRs with pooled RCTs and NRSs in which the combined evidence was rated [7,8,9,10,11].

We referred to minimum criteria proposed by GRADE working group when rating the certainty of evidence in this GRADE methodologic survey, which refers to the evidence that was assessed and the methods that were used to identify and appraise that evidence were not be clearly described for SRs that were assessed in our study.

Results

Search results and sample

A flow diagram of the literature search is shown in Fig. 1. The initial search yielded 767 entries, of which 322 were excluded that were not SRs. Of the 445 SRs identified, 60 met the eligibility criteria [7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66]. Out of the 60 studies, 11 studies [56,57,58,59,60,61,62,63,64,65,66] (18.33%) did not report on outcome-specific rating, which refers to certainty of evidence of individual studies was assessed or overall certainty of the body of evidence was rated (Supplementary Table S1). Therefore, a total of 49 SRs (81.67%) that rated the outcome-specific certainty of evidence were included in our methodologic survey.

Fig. 1
figure 1

Flow diagram showing study selection process

The distribution of the 49 SRs according to the journal and published year are presented in Table 1. The greatest and least number of SRs published in the top 10 urology and nephrology journals was in 2016 (n = 115) and 2018 (n = 67), respectively. In addition, there was an increase in the proportion of SRs that rated the certainty of evidence using the GRADE approach among all SRs published in 2019 and 2020 (17.7% and 14.4%, respectively) compared to 2016, 2017, and 2018 (6.1%, 10.0%, and 7.5%, respectively). Four journals (Journal of Urology, European Urology, Clinical Journal of the American Society of Nephrology, and American Journal of Kidney Diseases) accounted for 86% of all SRs, whereas 3 journals (Kidney International Supplements, Nature Reviews Urology, Nature Reviews Nephrology) did not publish SRs using the GRADE approach.

Table 1 The distribution of systematic reviews (n = 49) that rated the outcome-specific certainty of evidence with GRADE by year and journal

Characteristics of the included SRs

SRs analyzing evidence from only RCTs (n = 20) [12, 14, 15, 18, 20, 22, 24, 26, 30, 32,33,34,35,36,37, 40, 41, 43, 45, 46], only NRSs (n = 15) [19, 21, 23, 31, 39, 42, 44, 47,48,49,50,51,52,53,54], or RCTs and NRSs (n = 14) [7,8,9,10,11, 13, 16, 17, 25, 27,28,29, 38, 55] were included into this methodological study. The main features of these 49 eligible SRs are summarized in Supplementary Table S2. Among the 14 SRs that combined RCTs and NRSs, 5 rated the outcome-specific certainty of evidence derived from RCTs and NRSs separately (in separate rows within a single Summary of Findings table or in separate Summary of Findings tables) [8, 17, 25, 27, 28], 5 pooled RCTs and NRSs and rated the combined evidence (in the same rows within a single Summary of Findings table) [7,8,9,10,11], and 6 only rated the NRSs [9,10,11, 13, 29, 38].

Forty-six SRs (94%) conducted at least one meta-analysis [7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37, 39,40,41,42,43,44,45,46,47,48,49,50,51,52, 55], and 39 SRs (80%) presented the findings in a summary of findings table [8,9,10,11,12,13, 15, 17,18,19,20,21,22,23, 25,26,27,28,29,30,31,32,33,34, 36,37,38, 40,41,42,43, 45,46,47,48,49,50,51, 53]. The median number of primary studies included in the SRs was 19 (range = 5 to 104). The median number of participants included in the SRs based on RCTs was 4253 (range = 484 to 16,990), 14,550 (range = 1394 to 2,791,732) in the SRs based on NRSs, and 2958 (range = 708 to 8163) in the SRs based on RCTs and NRSs. Additionally, we classified the interventions and exposures in the SRs as clinical therapies (n = 22), drugs (n = 14), specific clinical diseases (n = 6), lifestyle factors (n = 3), supplements (n = 2), clinical approaches (n = 1), and clinical care (n = 1) (Table 2).

Table 2 Summary of the systematic reviews characteristics

Certainty of evidence ratings

The median number of outcomes rated in a SR was 8 (range = 1 to 184) (Table 2). There were 811 individual outcome ratings (544 for RCTs and 256 for NRSs) The quality of evidence was assessed in 4 categories, as follows: very low (33%); low (32.1%); moderate (24.5%); and high (10.4%). Outcomes were rated in 23.0%, 34.0%, 27.7%, and 15.3% as a very low, low, moderate, and high certainty of evidence among SRs based on RCTs, respectively. The certainty of evidence was rated as very low (55.0%), low (27.0%), moderate (17.6%), and high (0.4%) among SRs based on NRSs. Among SRs that combined NRSs and RCTs, 11 outcomes were rated; the certainty of evidence was rated as very low (18.2%), low (54.5%), and moderate (27.3%) (Table 3).

Table 3 Details on the certainty of evidence ratings including down- and upgrading’s of the evidence according to study design

Up- and down-grading domains

A total of 1012 instances of downgrading were identified due to a risk of bias (RoB; 52.2%), imprecision (27.5%), inconsistency (14.4%), indirectness (5.4%), and publication bias (0.5%). According to the authors of those SRs, rating down for publication bias was as a result of asymmetry of the funnel plot or it was strongly suspected the study design (patient reports published in toxicology report very severe poisoning either with or without impressive recovery with treatments attempted) [43, 53]. Twenty-one upgrades of the certainty of evidence were due to a large effect (14.3%), dose–response (47.6%), and plausible confounding (38.1%). The authors of those SRs considered that the all plausible residual confounding in the included studies would reduce the demonstrated effect [21, 29, 53]. Downgrading for RoB was more common in SRs of RCTs (53.8%) than SRs of NRSs (49.3%), whereas downgrading for publication bias was more common in SRs of NRSs (1.2%) than SRs of RCTs (0.1%). Additionally, upgrading for large effect, dose–response, and plausible confounding were all in SRs of NRSs (Table 3).

We calculated 1.2 mean downgrades per outcome in SRs of RCTs and 1.3 mean downgrades per outcome in SRs of NRSs. The downgrading frequency (the number of downgrades per number of rated outcomes) among SRs of RCTs for the RoB (65.8%) and imprecision (38.0%) domains was greater than the SRs of NRSs. In contrast, the downgrading frequency among SRs of NRSs for the inconsistency (23.4%), indirectness (14.5%), and publications bias (1.6%) was greater than the SRs of RCTs (Table 3). In addition, 7% of outcomes rated in SRs of NRSs were upgraded; 3.9% of outcomes were upgraded for dose–response, 3.1% for plausible confounding, and 0.4% for a large effect. The reasons for the downgrade or upgrade of the certainty of evidence for outcomes is shown in Supplementary Table S3.

Discussion

Summary of findings

This is the first study to evaluate the application of the GRADE approach in SRs published in the top 10 urology and nephrology journals. In general, there are relatively few urology and nephrology SRs using GRADE approach to rate the certainty of evidence, but an increasing trend in the level of implementation was noted in the past 2 years [9, 11, 17, 19,20,21,22,23,24,25,26,27,28,29, 31, 33, 40,41,42,43, 45,46,47,48,49,50,51, 53, 55]. We identified 49 SRs that rated the outcome-specific certainty of evidence. Overall, a low and very low certainty of evidence accounted for 32.1% and 33% of the individual outcomes, respectively. In addition, the certainty of evidence was downgraded most of for RoB and imprecision among SRs of RCTs and NRSs.

Strengths and limitations

There were several strengths in our study. First, we have searched all published SRs that utilized the GRADE approach among the top 10 urology and nephrology journals from 2016–2020. A wide range of relevant information was extracted, such as the classification of outcomes assessment, and the number and reasons for down- and up-grading domains. Second, this is the first study to summarize and present the GRADE-specific information on SRs published in urology and nephrology journals, which can provide essential information for follow-up research. There were also some limitations in our study. First, our results are limited due to the risk of selection bias. We only included the top 10 urology and nephrology journals with the highest impact factors within 5 years (n = 60). Therefore, we failed to obtain information from medical journals with lower impact factors. Nevertheless, the journals with the highest impact factor are persuasive, which can provide a basis for future clinical research. Second, because there was no clear and definite research protocol to follow, we adopted the PRISMA framework to report our findings. In fact, the PRISMA framework can provide comprehensive content and is often used in SRs and meta-analyses. Third, our study was based on a descriptive examination of the application of the GRADE approach in urology and nephrology SRs rather than to determine if the SR authors correctly followed guidance issued by the GRADE working group to rate the certainty of evidence. In the process of data extraction, we found that some authors did not follow minimum criteria of GRADE approach in the evaluation. Specifically, the certainty of evidence in some SRs was upgraded due to a low RoB, narrow confidence interval, a very low P value, and/or mild statistical heterogeneity rather than undergoing evaluation according to the GRADE approach domains for upgrading. Therefore, future research should have a priority for focusing on the optimal use of the GRADE approach, so that those applying the approach can have awareness brought to the main issues and difficulties faced when applying GRADE approach, to be able to correct them. Finally, this report does not address or assess any potential time trends. It is possible that the adoption of GRADE approach has increased and improved over time.

Findings from other studies

The GRADE approach has been used in the fields of urology and nephrology during the recent 5 years. KDIGO Working Group formulated the scope of guidelines and graded evidence according to the GRADE approach, which now serves as the practice standard for KDIGO [67]. Similar to the findings in our study, the level of evidence was very low in most cases in the study conducted by Zare and colleagues [68]. The most frequent limitation involved indirectness due to the limited number of studies for each pair. In addition, the evidence quality was mainly downgraded due to the RoB and inconsistency. Moreover, presenting comprehensive evidence from RCTs and NRSs, as in our study, was similar to the Cuello-Garcia et al. study [69], which indicated that most respondents presented integrated data separately from RCTs and NRSs, either in a single summary of findings table or a standalone table. Compared to our study, these studies or guidelines merely indicated the use of the GRADE approach when rating the certainty of evidence or the preferences of experts when integrating RCTs and NRSs.

Implications for broader research

Although the GRADE approach has been used for RCTs and NRSs in the fields of urology and nephrology, there are still some challenges in evaluating the certainty of evidence in NRSs with the GRADE approach. On the basis of the GRADE approach, the quality grade for an aggregate of RCTs would begin at the high entry level. A collection of observational studies would begin at the low entry level and evidence from other study designs, such as case–control studies, would begin at the very low level [70]. The GRADE approach, especially with respect to RoB assessment, is challenging and could lead to excessive downgrading. GRADE approach users may inappropriately double calculate the confounding and selection bias risk by downgrading the initial body of evidence to low, followed by further downgrading owing to unknown confounders in observational studies. In our study, in addition to the initial downgrading of two levels according to the study design, 53% of the SRs based on NRSs were further downgraded due to RoB.

The GRADE approach requires separate evaluations of RCTs or NRSs by any validated tool, but the specific method is not recommended because the selection of tools depends on the context [71]. With the emergence of tools that use the concept of target trials as a reference point, such as risk of bias in non-randomized intervention studies (ROBINS-I), the initial certainty of evidence is also considered high for bodies of evidence from NRSs [72, 73]. Several SRs have used the ROBINS-I tool in the field of nephrology thus far [74,75,76]. ROBINS-I can more systematically and accurately assess the RoB in NRSs to avoid extensive downgrading [73]. In general, the purpose of integrating rigorous RoB assessment into the GRADE approach, such as the application of the ROBINS-I tool in the SRs based on NRSs, is to improve the trustworthiness and credibility of the specific evidence [73]. To accurately rate the certainty of evidence, it is recommended to follow the current standards of RoB assessments, such as the ROBINS-I and the RoB tool by Cochrane [73, 77].

Conclusion

The GRADE approach provides a system for rating quality of evidence and strength of recommendations that is explicit, comprehensive, transparent, and pragmatic. The GRADE approach is increasingly adopted by professional organizations worldwide. Our study demonstrated that the GRADE approach is not widely used (only 13.5% (60/445) of SRs reported using GRADE approach), within urology and nephrology SRs in top 10 journals. Thus, future research should focus on the optimal use of the GRADE approach by following the criteria proposed by the GRADE working group.