In order to acquire effective and credible outcomes, randomization and control are essential for clinical trials. Randomized controlled trials (RCTs) provide the most reliable evidence of health care intervention and are the basis for the establishment of many medical guidelines. However, RCTs are not always reported with sufficient details or clarity, potentially hindering interpretation of results [1, 2]. For a reader to accurately evaluate the conclusion of a published report, he (she) needs complete, clear, and transparent information on the methodology and findings of the report. Unfortunately, attempted assessments frequently fail because authors of many trial reports do not describe some critical data and only limited information is available [35].

The CONSORT (consolidated standards of reporting trials) statement was first published in 1996, revised in 2001 and updated in 2010 by the CONSORT Group. They provide authors and editors with a checklist for a minimum set of recommendations for reporting the trial design, analysis and results [68]. Many studies have showed that quality of trial reporting can be improved when authors follow the checklist of the CONSORT [912]. The Jadad score is considered a valid and reliable tool to assess the methodological quality of a clinical trial, and has been applied throughout the medical literature [13, 14].

Traditional Chinese medicine (TCM), including herbal medicine, are widely used in China to treat a variety of diseases and used increasingly to complement conventional medical care globally. In a nationally representative U.S. survey conducted in 2002, almost 20% of adults and 75%–100% of Asian-Americans had used herbal therapies in the past year [15]. They believe the TCM and conventional medicine provides more optimal healing than conventional medicine alone [1618]. However, in the era of evidence-based medicine, TCM has encountered a strong challenge from clinicians due to a shortage of evidence-based efficacy. Therefore, researchers have made a great deal of effort in TCM clinical studies. In the past decade, TCM RCT is avocated and a number of RCTs of TCM have been reported [1923]. Recently many TCM researchers evaluated the quality of RCTs with TCM according to the checklist of the CONSORT [2429]. Their studies show that the quality of TCM RCTs is generally low. However, these studies evaluated only one or several TCM journals, or evaluated publication on a specific disease. Thus they cannot give a comprehensive view on the overall quality of TCM RCTs.

The purpose of the present study was to compare the change in quality of reporting TCM RCTs prior to and after the publication of the 2010 CONSORT statement. We include all publications of TCM RCTs during this period in the CNKI database, aiming to comprehensively evaluate the overall quality of TCM RCTs.


Search strategy

The China National Knowledge Infrastructure (CNKI) database is the most comprehensive full-text database of journals published in China and was used in the present study [30]. The CNKI database has several subdatabases. Among them is the academic journals’ full-text database, which was used in the present study. We chose manuscripts published in 2005–2009 and 2011–2012, which respectively represent publications before and after the 2010 CONSORT statement. We used an electronic search strategy that involved subject term ‘traditional Chinese medicine’ and ‘clinical trial’ and “Fuzzy Search” method so as to acquire more potential manuscripts. To evaluate the tendency of publication quality, we evaluated the published reports on an annual base. The titles, index terms, and abstracts of the identified manuscripts were read and rated as “potential manuscript” or “not relevant”. We retrieved all potential manuscripts and reviewed their full texts according to the following criteria:

Inclusion criteria were manuscripts reporting TCM RCTs.Exclusion criteria were (1) review, literature analysis, experience, case report; (2) animal experiments; (3) Non-randomized clinical trials; (4) reduplicative reporting; (5) retrospective study; (6) others. Three reviewers (J L, Z L, R C) reviewed the texts of the manuscripts to identify TCM RCTs. Disagreements regarding inclusion were resolved by discussion. Figure 1 shows the process of collecting materials and analysis.

Figure 1
figure 1

The process of collecting materials.

Scoring according to CONSORT

A checklist of 25 items from the updated 2010 CONSORT guidelines was used [3133]. Among the 25 items, 12 have 2 subitems. The score for each item or subitem was either 0 or 1: 0 indicates no description of the corresponding item/subitem and 1 indicates there was description of the item/subitem in the report. We did not include the following subitems in our report because we found after analysis of all manuscript that (1) there were no reports that changed the methods after trial commencement (Subitem 3b); (2) there were no reports that changed trial outcomes after the trial commenced (Subitem 6b); (3) there were no reports that had interim analyses and stopping guidelines (Subitem 7b); (4) there were no reports that were stopped prematurely (Subitem 14b); (5) there were no reports that had additional analyses (Subitem 18). After these 5 subitems were excluded, the maximum score a paper could obtain 31 points. Each article was assessed for every item according to the checklist [29] by three investigators independently (J L, Z L and R C). When there were different opinions between three investigators, they discussed them until reaching a consensus. Otherwise the final decision was made by L L. The total score of each trial was calculated.

Scoring according to Jadad

The Jadad scale is a 5-point scale for measuring the quality of randomized trials. A score of three points or more indicates high quality [13]. The Jadad scale includes how generation of random sequence is described (0 = no description; 1 = inadequate description; 2 = adequate description); how the blinding is carried out (2 = double-blinding with adequate description; 1 = double-blinding with inadequate description; 0 = wrong usage of double-blinding), and why and how often withdrawal of patients happens (When the numbers and reasons of withdrawal and exit of patients were reported, we recorded 1. Otherwise, 0 was recorded). Similarly, the work was done by three investigators (J L, Z L and R C) separately. Disagreement was discussed by three until agreement was reached. Otherwise final decision was made by L L.


Pearson χ2 test was used to test whether differences among two periods (2005–2009 and 2011–2012) were statistically significant in terms of mean total score of CONSORT. Wilcoxon rank sum test was used to test differences of Jadad scores of the different years. The levels of significance for all tests were set at 0.05. Data were analyzed using SPSS version 18.0. The total score of each report and the percentage of different score were calculated.


Characteristics of selected RCTs

After screening the titles, abstracts and texts, we identified a total of 4133 reports in 2005–2009 and 2861 in 2011–2012 in the CNKI database that met the inclusion and exclusion criteria and were included in this analysis. The annual numbers of reports identified in each screening step are shown in Table 1.

Table 1 Results of screening for randomized clinical trials from 2005 to 2012

The CONSORT results

CONSORT: title, abstract, background and objectives

The proportion of reports with “randomized” in the title (1a) increased significantly (0.56% vs 1.15%, P = 0.006). However, the percentages were very low for both periods (Table 2 and Figure 2). 84.81% of reports had abstracts (1b) that included objective, methods, results and conclusions in 2005–2009, more than that in 2011–2012 (82.03%). The proportions of reports with detailed description of backgrounds (2a) of studies were low for both periods, but were higher in 2011–2012 (24.71% in 2005–2009 vs 35.20% in 2011–2012, P < 0.001). The proportions of reports with objectives (2b) were also low (6.36% vs 5.14%, P = 0.032) (Table 2).

Table 2 Comparision of randomized control trials indexed in CNKI database before and after 2010 in terms of CONSORT items
Figure 2
figure 2

CONSORT results of title, abstract, background and objectives in each year.

CONSORT: materials and methods

Description on the following items had obvious improvement in 2011–2012 over 2005–2009: inclusion and exclusion criteria of patients (4a) (65.26% vs 49.79%, P < 0.001), the place of collecting materials (4b) (79.48% vs 64.46%, P < 0.001). However, the proportions on the description of the patient distribution (3a) decreased (P = 0.011). Although the proportions on the description of interventions, outcomes and the calculated sample size were improved, there was no significant difference (Table 2). As shown in Figure 3, there is a fluctuation in the proportions on the description of these items during 2005–2012.

Figure 3
figure 3

CONSORT results of materials and methods in each year.

CONSORT: randomization

Description on sequence generation (8a) also increased significantly (13.77% in 2005–2009 vs 19.85% in 2011–2012, P < 0.001). However, the proportion of reports with blinding (11a) decreased in 2011–2012 (4.77% in 2005–2009 vs 2.48% in 2011–2012, P < 0.001). Similar trend was observed for description on detailed implement process (10) (0.75% vs 0.17%, P = 0.001). Few reports described the allocation concealment mechanism (9) (Table 2). As shown in Figure 4, there is a fluctuation in the proportions on the description of these items from 2005 to 2012.

Figure 4
figure 4

CONSORT results of ‘randomization’ in each year.

CONSORT: results

The proportion with detailed statistical methods (12a) was greater after 2010 (63.00% in 2005–2009 vs 72.77% in 2011–2012, P < 0.001). The proportion of reports with the dates of recruiting and follow-up (14a) was greater after 2010 (70.14% in 2005–2009 vs 80.36% in 2011–2012, P < 0.001). Although the proportion of papers that reported loss to follow-up (13b) and flow diagram (13a) increased after 2010, the quality remained to be improved. The proportion of reports with baseline data description (15) increased (P= 0.002), (Table 2). As shown in Figure 5, there is a fluctuation in the proportions of reports with the description of these items from 2005 to 2012 except recruiting and follow-up (14a).

Figure 5
figure 5

CONSORT results of ‘results’ in each year.

CONSORT: discussion

There was no difference in proportions of papers reporting harms (19), limitations (20), generalizability (21) and interpretation (22) before and after 2010 (Table 2). As shown in Figure 6, there is a fluctuation in the proportions on the description of these items from 2005 to 2012.

Figure 6
figure 6

CONSORT results of ‘discussion’ in each year.

CONSORT: other information

Only one paper reported the registration (23) or the protocol (24). The proportion of paper reporting fundings (25) decreased markedly during 2011–2012 compared to 2005–2009 (P = 0.007) (Table 2).

CONSORT: total score of each report

Figure 7 shows the distribution of the mean scores of reports before and after 2010, with 24 being the highest score. The scores range from 1 to 24, with most of them within the range of 4–11. Generally the scores of reports are low for both periods and the mean score of 2011–2012 is slightly higher (7.09 in 2005–2009 vs 7.70 in 2011–2012) (Figure 8). Figure 9 shows that the annual distributions of the reports with a specific score in each year are similar.

Figure 7
figure 7

The distribution of the mean scores before and after 2010.

Figure 8
figure 8

The mean score of publication of each year.

Figure 9
figure 9

The annual distributions of the CONSORT score of reports.

Jadad score

There are very few papers with a score above 2. The mean scores of reports are similar for both periods (1.22 for 2005–2009 vs 1.25 for 2011–2012, P = 0.405, Figure 10). The annual mean scores are similar from 2005 to 2012 (P = 1.000, Figure 11).

Figure 10
figure 10

The Jadad score before and after 2010. The scores of reports are similar for both periods (2005–2009 vs 2011–2012, P = 0.405).

Figure 11
figure 11

The annual distributions of the Jadad score of reprts. The mean scores are similar from 2005 to 2012 (P =1.000)


In the present study, we demonstrate that proportions of reports with descriptions of CONSORT items 1a, 2a, 4a, 4b, 8a, 12, 14a, 15 and 17b increase after 2010, while proportions of reports with descriptions of CONSORT items 1b, 2b, 3a, 10, 11a, and 25 decrease after 2010. And for most of the items, there is a fluctuation of proportion on description of the item from 2005 to 2012. These data indicate that publication of CONSORT has little, if any, influence on the most of the researchers reporting clinical trials in China.

TCM has been practiced in China for thousands of years. TCM doctors use herbal medicine to treat a variety of diseases. The medical herbs may be used singly or in combination. In the past decades, the effects of TCM have been evaluated in various animal models and the underlying mechanisms have also been explored in cellular, protein or DNA levels. Nevertheless, the efficiency of TCM should be demonstrated in RCTs, which is the top-level evidence for therapy. For example, Chansu, the skin and parotid venom glands of Bufo bufo gargarizans cantor, is a well-known TCM widely used for the treatment of a variety of tumors in China [34, 35]. Experimental studies suggested that Chansu and its active compounds exhibit significant anti-tumor activity via inhibiting cell proliferation, inducing apoptosis and cell arrest and inhibiting angiogenesis [36]. Further studies demonstrated that bufalin, one compound in Chansu, induced apoptosis of gastric cancer cells by inhibition of AKT signaling pathway [37] and inhibiting proliferation of hepatocellular carcinoma cells through inhibiting AKT/GSK3β/β-catenin/E-cadherin signaling pathway [38].

RCTs for TCM were first published in the 1980s [39]. Since then, a number of TCM RCTs have been published. However, the quality of the reports of the TCM RCTs were poor [3944]. For example, Fang et al. reported that only 13 trials in 338 RCTs reports had the detailed description on method of randomization [40]. In the present study, we identified that only 8 of 31 CONSORT items have significant improvements from 2005–2009 to 2011–2012. A detailed and informative introduction of background can make readers understand the purpose of the study. Detailed inclusion and exclusion criteria of patients will avoid the selection bias. Clear and definite description of intervention is critical for the study to be repeated. In particular, whether outcome assessments are blind has considerable implications for assessment of internal validity [45]. We found that 29.00% of the articles described the background from 2005 to 2012. Sequence generation was described in only 16.26% of the publications, blinding in 3.83% and calculation of simple size in 0.29%. Inadequate description of these items will make the results of the study incredible. Another problem was there were only 56 out of 6994 reports that had the term ‘randomize’ in their titles. Title is a very important part of an article. Researchers use title to screen potential studies in meta-analysis.

With regard to methodological items, calculation of sample size was done by only 20 reports out of 6994 reports. If the sample is too large, it would be a waste of time and money. The smaller number of patients will reduce statistical power and generate selection bias. There were 268 reports using blinding method. The proportion of description on blinding method decreased after 2010. Blinding, especially double-blinding, is challenging for studies in which the intervention is being randomized [46]. Inadequate measures to create and conceal the random allocation, selective attrition, and insufficient double-blinding have been theorized to bias the estimates of treatment effects in RCTs [47].

Reports on adverse events were obvious not detailed enough, which will overestimate the safety of TCM. In fact, the recorded information of TCM herbs in most classical books includes toxicities, incompatibilities between herbs, cautions, precautions and contraindications. Thus, contrary to a general misconception, toxicity data on Chinese herbs exist and are documented through clinical experience [48]. For example, cinnabar, which contains mercury sulfide, has been used in TCM for thousands of years and 40 cinnabar-containing traditional medicines are still used today. Absorbed mercury from cinnabar is mainly accumulated in the kidneys, resembling the disposition pattern of inorganic mercury. Following long-term use of cinnabar, renal dysfunction may occur [49].

In addition, the reporting of outcomes and ancillary analyses remained poor. For example, intention-to-treat analysis is advocated because it preserves the randomization process and allows for noncompliance and deviations from policy by clinicians [50]. There are only 9 in 6994 papers using intention-to-treat analysis.

Discussion is an important part of a report. The author(s) can discuss the advantages and generalizability of the treatment, as well as the limitations of the study there. As we noticed, few reports have an informative discussion (Table 2). Finally, only one report contained information about registration, another one report contained information of protocol.

According to Jadad scale, there were 188 reports which scores were over 2 points. There was no difference between publications before and after 2010. Thus, reporting of TCM RCTs improved very slowly in their quality. The average Jadad score was 1.25 during 2011–2012, compared to 1.22 during 2005–2009.

In the present study we chose the CNKI database as the database to avoid selection bias. The CNKI database is the most comprehensive database in China. It achieves the full-text publications of 1217 medical Chinese journals, including 26 journals for TCM and 18 for integrative TCM and modern Western medicine. In addition, two researchers assessed independently the quality of each report by reading its full text. This is in sharp contrast to the previous reports, which evaluated only one or several TCM journals, or evaluated publication on a specific disease [2429]. Thus the present study is the most comprehensive one on TCM RCTs.

Interestingly, we found that none of the manuscripts described change of the methods after trial commencement (Subitem 3b), change of trial outcomes after the trial commenced (Subitem 6b), interim analyses and stopping guidelines (Subitem 7b), premature discontinuation of the trial (Subitem 14b) and additional analyses (Subitem 18). The underlying reason is unknown.


Although some improvements have been made in reporting TCM RCTs, the pace remains slow. And there remains considerable room for further improvement. The problems include optimal design of randomization, the usage of blinding, the calculation of sample size, comparability of baseline information, the clear and definite inclusion and exclusion criteria, the usage of statistical method, the withdrawal and follow-up of patients and the records of adverse events. Doctors practicing TCM should be trained to write high-quality reports and active implementation of the CONSORT guidelines by journals is necessary to make the reports on TCM RCTs more credible and TCM be used more widely in the world as an alternative medicine. We also suggest that a bibliographic database of TCM RCTs, similar to AcuTrials(R), be developed to enhance the accessibility and quality of TCM RCTs [51].