Introduction

Traditional Chinese medicine (TCM) includes acupuncture, massage, formula, Chinese patent drug, etc., especially acupuncture, is a widely used complementary and alternative therapy [1,2,3,4]. The application of TCM therapy in clinical practice requires more assertive guidance. TCM clinical practice guidelines (CPGs) play a huge role as guidelines for recommendations for TCM intervention. At present, several authoritative Traditional Chinese Medicine societies in China have published a considerable number of clinical practice guidelines to assist clinicians in making decisions [5,6,7,8,9,10,11].

Reliable, trustworthy CPGs, based on systematic evidence review and comprehensive consideration of various factors influencing the recommendations, are important as an guidance document for clinicians’ practice [12]. The Grading of Recommendations Assessment, Development, and Evaluation (GRADE) approaches is currently recognized as an international gold standard by various organizations worldwide and greatly improving the overall quality of guidelines development [13,14,15,16,17]. In general, The GRADE CPGs developed by the GRADE Working Group or the two Co-Chair, professors Gordon H. Guyatt and Holger J. Schünemann will strictly implement the GRADE approach. The GRADE approaches comprehensively considered the multiple factors that affect the recommendations (such as desirable anticipated effects, undesirable anticipated effects, certainty of evidence, etc.) through standard, structured and transparent methods, and avoids the increased implementation difficulty or being questioned of the CPGs caused by incomplete consideration [18]. Additionally, certainty of evidence, as one of the key determinants affecting the formation of recommendations, is clear associated with the strength of recommendations [19, 20]. Respecting the relationship between the two can avoid misleading as much as possible [21].

There were many studies on the methodological quality of TCM CPGs [22,23,24], but they focused more on the overall quality and content of the guidelines. The methodological on the influencing factors of the formation of evidence to recommendations remain largely under-explored. The systematic search of this study was conducted and critically evaluated the consideration of the GRADE CPGs and TCM CPGs on the factors influencing the recommendations, and compared the differences and objective gap between the two.

Methods

Literature search

Two reviewers searched databases including PubMed, Embase, China National Knowledge Infrastructure, VIP Database for Chinese Technical Periodicals, Chinese Biomedical Literature Database, and Wanfang (The full database search strategy is depicted in Additional file 1: Appendix A).

For the GRADE CPGs, we also searched the https://www.GRADEpro.org/ website as well as the methodological literature published by the two co-chairs and screened through the guidelines mentioned in those literature. For the TCM CPGs, We also searched six authoritative TCM society websites, including China Association of Chinese Medicine, China Association of Traditional Chinese Medicine, Chinese Association of Integrative Medicine, China Association for Acupuncture and Moxibustion, World federation of Chinese medicine societies, Doctor Society of integrative Medicine, Chinese Medical Doctor Association.

Eligibility criteria of CPGs

This study included GRADE CPGs and TCM CPGs published from 1 January 2018 to 31 December 2022. We consulted several guideline development methodology experts and used focus group discussions to determine the selection of GRADE CPGs and TCM CPGs. The GRADE CPGs were identified as the guidelines developed by the GRADE Working Group or the two Co-Chair, professors Gordon H. Guyatt and Holger J. Schünemann. For the TCM CPGs, We adopted simple random sampling, and randomly selected 15 of the guidelines published by six authoritative TCM societies and other societies, and less than 15 were all included. We excluded repeated publication as well as older versions of the CPGs.

Data extraction

Reviewers independently screened all titles, abstracts, full texts as well as data extraction, and discrepancies were resolved through consultation or by a third author (FY). Two kinds of information were extracted: (1) The basic characteristics of the CPGs, including the CPGs type, scope, discipline, whether the GRADE approaches were adopted, whether the certainty of evidence and the strength and direction of recommendations were reported, (2) relevant information on the influencing factors of evidence to recommendation, including the factors considered when the formation of recommendation (desirable or undesirable anticipated effects, certainty of the evidence, values and preferences, health equity and other factors) [15, 17], whether it is supported by evidence, whether it is considered as an independent dimension, the association between the certainty of evidence and the strength of recommendations, and the number of discordant and inappropriate discordant recommendations.

We defined discordant recommendations as strong recommendations based on low or very low certainty of evidence [25]. There are two types of discordant recommendations, appropriate or inappropriate. There are five special conditions that were allowed for strong recommendations based on low or very low certainty evidence: life-threatening situation is beneficial; Uncertain evidence is beneficial but high certainty of evidence is harmful; Low or very low certainty evidence has equal benefits, high-certainty evidence suggests that one is less harmful; High-certainty of evidence has equal benefits, and low or very low of certainty of evidence suggests that one of them is more harmful; Uncertain benefits but with potential catastrophic harm. Discordant recommendations that were made under one of the five special conditions were considered as appropriate, otherwise, inappropriate [26].

Data analysis

This study conducted the analyses using descriptive statistics and calculating the proportion of relevant items to compare the differences between the GRADE CPGs and the TCM CPGs. For recommendations access certainty of evidence beyond the GRADE approaches used, we consider the highest level as high-certainty, the second high level as moderate-certainty, the third high level as low-certainty, and the other levels as very-low-certainty. For recommendations used criteria other than the GRADE approaches to present the strength of recommendations, we consider that the explicit expression of recommendation strength is strong, or recommendations based on the multiple level of recommendation, the highest level is strong recommend or against, the others are conditional (Additional file 2: Appendix B).

Results

Search results

The GRADE CPGs and the TCM CPGs were searched separately. For the GRADE CPGs, a total of 523 records were identified through the database search, and 23 from other sources, and 180 duplicates were excluded. 302 were deemed ineligible, with 64 records remaining. We excluded 19 for the following reasons: 15 were not a CPG, and 4 were old version, and finally included 45 CPGs. For the TCM CPGs, a total of 13839 records were identified through the database search, and 128 from other sources, and 6962 remained after excluding duplicates, of those, 6697 were deemed ineligible, 13 CPGs were excluded for the following reasons: 10 were not a CPG, and 1 was old version, 2 were duplicate publications, and finally, We included 88 of the 262 TCM CPGs by simple random sampling (The details of the included 133 CPGs were displayed in Additional file 3: Appendix C and the screening process was detailed in Fig. 1).

Fig. 1
figure 1

Flow chart of selecting clinical practice guidelines

Characteristics of included CPGs

Forty-five were GRADE CPGs, 262 were TCM CPGs. All GRADE CPGs used GRADE approaches to develop guidelines and all report certainty of evidence and strength and direction of recommendation. 131 (50.0%) TCM CPGs did not use GRADE approaches, 35 (13.4%) CPGs did not report certainty of evidence and strength and direction of recommendation. The detailed characteristics of CPGs were summarized in the Table 1.

Table 1 Characteristic of included clinical practice guidelines N(%)

Association between number of GRADE factors adopted and strength and direction of recommendations

A total of 912 recommendations were presented from the 45 included GRADE CPGs. In the process of recommendation formation, all (912, 100%) five determinants, including desirable anticipated effects, undesirable anticipated effects, balance between desirable and undesirable effects, certainty of evidence and values and preferences, were considered, and almost all of them were considered as independent dimensions.

Less than 60% of the other factors were considered. It is worth noting that the CPGs published by the American College of Rheumatology considered other factors (eg, resource requirements, Health equity, acceptability, feasibility) poorly, even though they claim to use the GRADE approaches, excluding the recommendations in the eight American College of Rheumatology CPGs, the remaining GRADE recommendations considered more than 70% of all factors in the evidence to the decision (EtD) framework. However, TCM recommendations only considered the four determinants of desirable anticipated effects, undesirable anticipated effects, balance between desirable and undesirable effects, certainty of evidence, with less than 20% of other factors, and lack evidence support (Tables 2 and 3).

Table 2 Association between number of GRADE factors adopted and strength and direction of recommendations of the included CPGs N (%)
Table 3 GRADE Recommendations published by the American College of Rheumatology or not N (%)

An additional aspect to note is that the GRADE CPGs performed a more detailed presentation of the factors considered. For instance, of the 466 GRADE recommendations considering health equity, 336 were presented in detail, including socioeconomic status (158, 46.2%), followed by social capital (83, 24.2%), age (67, 19.5%) and gender (50, 14.7%). However, the seven TCM recommendations were not, they all considered only the age (Fig. 2).

Fig. 2
figure 2

Presentation of social stratification factors in health equity in clinical practice guidelines

Association between certainty of evidence and recommendations

Of the 912 GRADE recommendations, 864 reported the certainty of evidence and the strength and direction of the recommendations. Of the 2,452 TCM recommendations, 2,094 reported the certainty of evidence and the strength and direction of the recommendations. The GRADE CPGs presented a large number of recommendations (191,22.1%) with strong against or conditional against, Conversely, there was only one (1, 0.0%) against recommendation in the TCM recommendations. Quite interestingly, we also found that the TCM CPGs presented more strong recommend or against recommendations than the GRADE guidelines (Table 4).

Table 4 Association between certainty of evidence and recommendation N (%)

Discordant and inappropriate discordant recommendations

One hundred four discordant recommendations were identified in the GRADE CPGs, 91 (87.5%) were appropriate discordant recommendations. We identified 504 discordant recommendations in the TCM CPGs, but only 19 (3.8%) are appropriate discordant recommendations (Table 5).

Table 5 Appropriateness of recommendations with low or very low certainty of evidence N (%)

Discussion

Summary of findings

Forty five GRADE CPGs (including 912 recommendations) and 88 TCM CPGs (including 2452 recommendations) were included in this study. Compared with the GRADE CPGs, half of the TCM CPGs did not apply GRADE approaches, and the TCM recommendations have relatively insufficient consideration of some factors, such as values and preferences, health equity, resource requirements, and evidence to support them was lacking. Of note, we found that TCM CPGs tend to present more recommendations with strongly, and there is a lack of against recommendations. Another notable finding of our study is that more strong recommend recommendations in TCM CPGs are based on low or very low certainty of evidence, but a considerable proportion of discordant recommendations do not provide reasonable justification for this questionable behavior.

Strength and limitations

To our knowledge, this is the first methodological study to compare GRADE CPGs and TCM CPGs in terms of the influencing factors of recommendations. Our study has several strengths. First, we conducted a systematic and comprehensive retrieval, screening, data extraction and analysis of GRADE CPGs and TCM CPGs, respectively. Second, we included the CPGs published in 2018–2022, which contributed to a comprehensive assess of the methodology of GRADE CPGs and TCM CPGs in recent years. Third, we compared the GRADE CPGs with the TCM CPGs to help clarify the deficiencies and improvement of the TCM CPGs.

We acknowledged that one limitation of our study is that we did not further assess the reliability of certainty of evidence judgments reported in the CPGs, therefore, it is remains possible that some low or very low certainty of evidence was classified as medium or high certainty of evidence, and medium or high certainty of evidence was classified as low or very low certainty of evidence, so there may be partially recommendations actually discordant recommendations and some discordant recommendations actually may not be truly discordant. Meanwhile, this study analyzed the randomly selected CPGs for all durations, so there could be bias. In addition, our results were not compared with other CPGs such as Korean, Japan or other countries, which may limit the generalization of the results.

Comparison with prior work

A prior study used the Appraisal of Guidelines Research and Evaluation II instrument (AGREE II) to evaluate the guidelines for the symptomatic management of fever in children published in 2011 – 2016, but it focused on the overall quality of the guidelines [27]. An Australian guideline methodological study showed an inconsistency between guideline development that claims to use the GRADE approaches and the true GRADE approaches, however, it included guidelines published in 2011 – 2018 [28]. Colby et al. found that about one-third of the United States organizations used GRADE in developing evidence-based guidelines, but compliance using the GRADE domain was not optimistic and its findings were consistent with ours [29]. Those three studies lack attention to the comprehensive consideration of influencing factors in the formation of evidence to recommendations, and the link between the certainty of evidence and the strength of recommendations had not yet further been fully explored.

Implications

TCM CPGs have limited consideration of the influencing factors in the formation of recommendations, and lack of comprehensive consideration such as resource requirements, health equity, acceptability, etc., which may be one of the reasons for presenting a large number of unreasonable recommendations. In addition, some guidelines claim to apply the GRADE approaches, but we cannot prove that they did implement them, perhaps due to a misunderstanding of the GRADE approaches or adherence to the GRADE approaches but not explicitly reported. With the increasing apply of GRADE approaches in the development of guidelines, the advantages become increasingly apparent, Hence it is extremely important to advocate the correct application of GRADE approaches and continue to improve the adherence and further training of the methodology.

Careful consideration is needed. There are great differences between the diagnosis and treatment of TCM and western medicine. The evaluation of TCM effect has the difficulties caused by the discipline characteristics. When adopting GRADE approach in guideline development, some important effect impactors were not considered, such as the adequacy of the individualized treatments. Besides this, lacking of evidence for resource use, health equity, acceptability and feasibility decreased the applicability of GRADE approaches in the development of TCM guidelines. However, this is what TCM researchers need to work on rather than considering this as problems of the GRADE approaches. At present, the research in this field is still immature, and further exploration is needed in the future. We think this provides additional rational in future researches.

Conclusions

The TCM CPGs lack a comprehensive consideration of multiple influencing determinants from evidence to recommendations, and they lack the application of GRADE approaches. Four key determinants—desirable anticipated effects, undesirable anticipated effects, balance between desirable and undesirable effects, certainty of the evidence- was better considered and have more evidence to support, but other factors such as values and preferences, resource requirements, feasibility, health equity, problem priority is lack of consideration. This may be caused by the irrational use of the GRADE approaches. TCM CPGs present a considerable number of inappropriate discordant recommendations–strong recommendations based on low or very low certainty of evidence but cannot provide reasonable justification. Attention should be paid to appropriate GRADE approaches use in the future.