Introduction

Pilot and feasibility trials have been published with a growing number. Pilot trials are significantly important for the design of a future main trial (or definitive trial) by providing evidence of feasibility issues and avoiding wasted recourses [1]. In 2016, Eldridge et al. published two critical publications aiming to reduce the misunderstanding and improve the reporting quality of pilot trials: the first providing a conceptual framework to define a pilot trial [2], and the second developing a CONSORT (Consolidated Standards of Reporting Trials) extension for pilot trials with a 26-item checklist included [3]. While the two publications may help with the design, implementation, reporting, and dissemination of pilot trials, it remains largely unknown about their impact on the pilot trials published in the literature. Confusions remained in the pilot trials including their definitions and terms, purpose, sample size determination, and criteria for progression or cessation, to mention a few [4,5,6].

Traditional Chinese medicine (TCM) is a hot topic in the health research community, especially given its alternative and integrated effect as a palliative treatment option [7]. Notably, some uncertainties and challenges exist in clinical trials for TCM that mainly include the difficulty in standardized procedures, potential heterogeneity in interventions and operators, control selection, and outcome assessment. Pilot trials for TCM offer a platform to identify and address these issues before a main trial. However, current evidence about the conduct and reporting of pilot trials for TCM is limited and sparse. Furthermore, little is known about whether the CONSORT extension for pilot trials can significantly enhance the quality of implementation and reporting of TCM pilot trials. Likewise, further evidence is needed to reveal the unidentified issues specific to TCM pilot trials from the guidelines [3]. Therefore, in this study, we conducted a literature review to investigate the guideline adherence of pilot trials for TCM, aiming to appraise the issues related to methodology and reporting. We also aimed to assess the impact of the CONSORT extension for pilot trials, and discuss any potential challenges specific to TCM pilot trials.

Methods

Search strategy and study selection

We systematically searched MEDLINE, EMBASE, and CNKI to retrieve TCM pilot trials. Descriptors including synonyms for traditional Chinese medicine or herbal medicine or folk medicine, and pilot trials or feasibility studies, were used in combination for the literature search (Supplemental Table 1 presents the search terms used). Studies were eligible for inclusion if they explicitly identified their TCM research as a randomized pilot or feasibility trial in the titles, abstracts, or introductions. Studies were excluded if they were not identified as a randomized pilot or feasibility trial, or they were not related to TCM, or they did not have information for methodological and reporting appraisal. Two reviewers (GL and XC) independently screened the records and determined study eligibility.

Data extraction

Data extraction was completed by two independent reviewers (GL and XC). We categorized the included TCM pilot trials into two groups: (1) pilot trials that had at least one objective or assessment of feasibility and were conducted in preparation for a future definitive trial (FDT) and (2) trials that did not have feasibility objectives or assessment, termed as non-feasibility trials (NFT). This methodology was similar to Horne’s approach [8]. We assessed the guideline adherence about Title and Abstract (1a and 1b listed in the checklist), Introduction (2a and 2b), Methods (3a, 4c, 6a, 6c, 7a, and 12a), Results (13a), and Discussion (20, 21, and 22a) [3], separated by the two groups (FDT and NFT).

To document the methodological issues specific to TCM pilot trials, we also extracted the relevant data throughout the text from the included studies, especially in their Discussion sections.

Statistical analyses

We expected that the proportion of FDT in our included studies would be approximately 15%. Therefore, we randomly chose 50 pilot trials from the 285 eligible studies for analyses (Fig. 1 shows the process of identifying eligible studies). To assess the impact of CONSORT extension for pilot trials on reporting, we selected the 50 studies that were published in either before or after the year 2016; i.e., no studies published in 2016 were identified for our analyses.

Fig. 1
figure 1

Flow diagram showing the process of eligible study identification

Guideline adherence was presented using counts and percentages. We performed a chi-square test to compare the guideline adherence levels between the two groups (FDT and NFT). To evaluate the impact of the CONSORT extension for pilot trials, we compared the guideline adherence of the included pilot trials published before and after 2016. When there was a cell with expected frequency < 5 in the contingency table, we used Fisher’s exact test to compare the guideline adherence levels between the groups. All analyses were conducted using the STATA version 13 (Stata Corp., College Station, TX, USA).

Results

As shown in Fig. 1, we identified 285 eligible TCM pilot trials, among which 50 were randomly selected for analyses [9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58]. The selected 50 trials were published between year 1998 and 2019, and had a sample size ranging from 7 to 160 (Table 1). The TCM assessed in the trials included herbs, acupuncture, Chinese patent medicine, Qigong, massage, and others. There were 12 trials categorized as FDT (24%) and 38 as NFT (76%). Thirty-eight trials (76%) were published before year 2016, and 12 trials (24%) after 2016.

Table 1 Characteristics of the 50 included studies

Table 2 presents the detailed guideline adherence levels of the selected trials. The adherence ranged from 4 to 96%, with the lowest adherence found in 6c (prespecified criteria used to judge progression to future definitive trial) and highest in 12a (qualitative or quantitative methods used to address objectives). The checklist items 2b (specific objectives or research questions), 7a (rationale for sample size), and 21 (generalizability of methods and findings) also had low guideline adherence levels (18%, 8%, and 18% respectively). Table 2 also shows comparisons between FDT and NFT, and between studies published before and after year 2016. Compared with the NFT, the FDT had a significantly higher guideline adherence in the item 7a (rationale for sample size; 25% vs 3%) and 20 (discussion of study limitation, bias and uncertainty; 58% vs 34%). Guideline adherence level was only found significantly higher in the item 12a (qualitative or quantitative methods used to address objectives) in trials published after year 2016, when compared with studies published before 2016 (100% vs 55%).

Table 2 Details for guideline adherence of the included studies

The methodological issues specific to TCM pilot trials from the guidelines are shown in Table 3. There were 3 trials raising the issue of blinding in TCM pilot trials, mainly due to the acupuncture, administration forms, smells, and other reasons [12, 27, 51]. Other issues included lack of standard formula of interventions, difficulty in comparison for effect assessment of interventions, and difficulty in bias control [12, 27, 47, 58] (Table 3). For instance, in a pilot trial conducted by Choi et al., they reported that it was extremely difficult to evaluate the intervention effect because no standard treatment for atopic dermatitis could be used for comparison based on the current evidence-based TCM [58].

Table 3 Details of identified issues specific to TCM pilot trials

Discussion

In this study, we performed a review to assess the guideline adherence of TCM pilot trials. The guideline adherence varied crossing the checklist items, where some items required significant improvement. The guidance papers published in 2016 seemed to exert minimal effect on guideline adherence in TCM pilot trials. We also identified several issues specific to TCM pilot trials in this review including blinding, standards for intervention and comparisons, effect assessment, and bias reduction.

Interestingly, there were only 24% TCM pilot trials that had an objective of feasibility and were performed in preparation for future definitive trials (FDT). This indicated the inappropriate use of the term pilot in many small trials that aimed to test the hypotheses of efficacy or safety with an insufficient sample size albeit being underpowered to do so [8, 59, 60]. It also corresponded to the item 2b (specific objectives or research questions), where surprisingly only 3 (25%) in the FDT group clearly stated their objectives related to feasibility. Furthermore, there were only two items (7a and 20) found with significant improved guideline adherence in FDT compared with NFT, implying that more endeavors were required even in those pilot trials with specified feasibility objective(s). Therefore, all these findings suggested further dissemination of the guideline to help clarify the definition of feasibility and pilot trials [2] and to enhance the guideline adherence [3].

Likewise, our study indicated that the impact of CONSORT extension for pilot trials warranted more efforts in TCM pilot trials because the improvement was only found in one item (12a) after the guidelines were published (Table 2). The minimal effect of the guidance papers may be because either the guidelines did not reach the relevant research parties, or that the guidelines were largely ignored by the research parties [8]. In any case, our review reveals the urgent need for both training and dissemination of research methodology and guideline adherence in TCM pilot trials.

Besides the common practice of inappropriate hypothesis testing and insufficient power for conclusion in pilot trials [59, 61], our study also identified some issues specific to TCM pilot trials including blinding, standards for intervention and comparisons, and bias reduction (Table 3). This entails more guidance on methodology and reporting specific to TCM pilot trials because the existing guidelines including CONSORT extensions to acupuncture [62], herbal interventions [63], and pilot and feasibility studies [3] could not fully cover these issues in TCM pilot trials. The progression criteria (guideline adherence level, 4%), sample size rationale (18%), and generalizability of methods and findings (18%) were also notable issues found in the TCM pilot trials (Table 2). This may be, at least in part, due to insufficient details on explanation and elaboration from the guideline. For example, even though the CONSORT extension recommended that authors should justify the number of participants in pilot trials [3], no sufficient details on how to exactly provide sample size rationale could be found in the guideline. Likewise, how to specify the progression criteria to determine whether the pilot trial can progress to future main trial, and whether the methods and findings can be generalizable to main trial and other pilot studies, required further detailed investigation and guidance in TCM pilot trials. The TCM field is substantially different from modern medicine, especially in their intervention, control, and outcome assessment. For example, our review found that the issues specific to TCM pilot trials including blinding, standards for intervention and comparisons, effect assessment, and bias reduction, were not discussed in the CONSORT extension (Table 3). Thus, our findings call for the need for further methodology and guidance in the research area of pilot and feasibility studies to address the methodological issues and the other notable issues specific to TCM pilot trials.

Our study was the first to explore the current practice of methodology and reporting in TCM pilot trials. We completed the data acquisition and analyses by two reviewers independently, thereby enhancing the accuracy of study findings [64]. There are also some limitations to our study. Due to the small numbers of the included FDT (n = 12) and studies published after year 2016 (n = 12), we only performed raw comparisons without adjustments, which may yield biased findings in univariate analyses. We could not further extract potential solutions from the included TCM studies, indicating the important gap in methodological guidance in TCM pilot trials. Furthermore, only studies in Chinese and English were screened and selected, which may therefore introduce selection bias due to lack of studies in other languages such as Japanese and Korean. Moreover, the impact of time lag between the publication of a new guideline and the adoption and implementation of it could not be fully assessed, which may therefore weaken the findings of our study.

To conclude, the current practice in TCM pilot trials required substantial improvement in the literature. The guideline seemed to have only minimal effect on the methodology and reporting in TCM pilot trials, and some issues related to TCM pilot studies still warranted further methodology and guidance. Further endeavors are needed for training and dissemination of guideline adherence, and development of more detailed methodology in the field of TCM pilot trials.