Introduction

Randomized Controlled Trial (RCT) is an optimal design to evaluate the effect of treatment [1]. High-quality RCTs can effectively reduce bias and provide evidence for clinical practice, while poor RCTs may adversely affect routine clinical practice [2]. Since high-quality reporting helps readers critically evaluate the quality of trials, it is of great significance to identify major quality predictions for such reporting. The Consolidated Test Reporting Standard (CONSORT) is a set of recommendations based on the least evidence, including checklists and flowcharts for RCT reports, which can help to judge the standardization and repeatability of RCTs [3]. Its purpose is to promote transparent and comprehensive trials reporting and to assist in the critical evaluation and interpretation of trials.

Atrial fibrillation (AF), one of the most common arrhythmias [4], is associated with a high rate of disability and mortality [5], increasing the risks of stroke, embolism, and death [6,7,8]. With the aging of the population, the incidence of AF is also on the increase year by year [9]. Therefore, the early treatment of AF is quite important.

Drug is administered primarily for the prevention of stroke and systemic embolism in the treatment of AF In the guidelines, warfarin has long been recommended as the preferred anticoagulation agent for AF, but it is subject to certain limitations and requires long-term and frequent laboratory tests [10]. As a new type of anticoagulant, the non-vitamin K oral anticoagulant (NOAC) works by preventing thrombin production and inhibiting factor Xa, which eliminates the need for long-term monitoring. Research has shown that NOACs are more effective and safer than warfarin [11,12,13,14], making it a promising replacement for warfarin in the future [15]. Therefore, it is particularly important to determine the accuracy of the RCTs of NOACs for AF.

This study aimed to evaluate the overall reporting quality and identify some essential issues of published RCTs of NOACs for AF based on the CONSORT statement and to analyze possible related causes, so as to provide reliable evidence for subsequent related studies and meta-analyses.

Methods

This systematic review was reported according to the PRISMA statement [16].

Search strategy

A comprehensive literature search was conducted to identify humans’ prospective randomized controlled trials. Through a manual trial search on advanced PubMed, Embase, Web of Science, and Cochrane Library databases all journal articles published in English as of September 2022 were obtained. The survey was accomplished in duplicate. Also, the final search standards were " atrial fibrillation " and “one of the four NOACs–apixaban, dabigatran, edoxaban, and rivaroxaban”. There was no limit to the year of publication, to the sample size or to the type of research (advantages, equivalence, non-disadvantages).

Inclusion and exclusion criteria

Studies meeting the following criteria were included in the study: (1) randomized controlled trials; (2) study subjects involved patients with AF; (3) interventions related to one of the four NOACs–apixaban, dabigatran, edoxaban, and rivaroxaban. The exclusion criteria were as follows: (1) duplicate publications; (2) animal experiments, and subsequently withdrawn publications; (3) abstract or full text not available.

Assessment of the reporting quality

Two reviewers (YYG and HYB) independently followed a standardized evaluation checklist to review and extract data from eligible articles. The 2010 CONSORT criteria were adopted. Each item on the criteria checklist was scored, from satisfied to dissatisfied. Any divergence between these two reviewers about these articles would be resolved by a third reviewer(ZGL). Cohen’s kappa will be calculated to assess consistency between reviewer ratings(0.45 to 0.81). A total of 25 spouse statements were revised in 2010. Among them, 12 items were separated into two parts (37 items in total). 1 point was assigned if item was reported, 0.5 points was assigned if sub-item was reported and 0 point was assigned if these items were not reported. An overall CONSORT score (score range 0 to25) was calculated by totaling the scores of all 37 quality items.

Statistical analysis

By using SPSS Version 24.0, descriptive statistics were generated, covering CONSORT (0–25) based on median and interquartile ranges. Categorical data are presented as a number (n) and percent (%), and the reporting score is recorded as mean and standard deviations (SD). Independence t-test and one-way analysis of variance (ANOVA) are used to compare the differences of general features, as the data are consistent with the homogeneity and normality of variance. Multi-distant linear regression analysis was used to explore the relationship between influencing factors and report quality. Potential predictors were coded as follows: Year of publication: 2001–2010 = 1, 2011–2022 = 2; Region in which trials were: Asia = 1, Europe and North America = 2; Sources of trial funding: Funding not reported = 1, Completely funded by industry = 2, Government and industry = 3, Government/foundation = 4; Journal impact factor:< 4 = 1, 4–10 = 2, ≥ 10 = 3; Sample size;≤200 = 1, 200–400 = 2, ≥ 400 = 3; International collaboration: no = 1, yes = 2. For all analyses, the statistical significance level was set at P < 0.05.

Results

Search results

According to the literature search results, 290 studies (Fig. 1) were found. Among them, 16 studies were removed as they were unrelated to AF, and 201 studies were excluded because they were not RCTs. Also, 11 studies were ruled out since they didn’t include NOACs. In the end, 62 studies were included, which were also parallel controlled studies.

Fig. 1
figure 1

Selection of randomized clinical trials in the systematic review

Characteristics of the included studies

Table 1 showed the Characteristics of RCTs included in this study. Most articles were published after 2010, accounting for 93.54%. 58.06% of the trials were conducted in Europe and North America. 46.77% of the trials were completely funded by the industry. 45.16% of the trials were published in journals with an impact factor greater than 10. 37 (59.86%) trials were compared with warfarin. The sample size of 50% trials is more than 400. Most of the trials (56.45%) did not involve international cooperation. In addition, the differences in consort scores of “Sources of trial funding” (P = 0.040), “Journal impact factor”(P < 0.01), “Sample size”(P < 0.01) and “International collaboration”(P = 0.017) were statistically significant.

Table 1 Trial characteristics

Reporting of all items

Figure 2 showed the research characteristics and descriptive results. Overall, the average reporting rate of all items was 55.45%. Specifically, Table 2 displays that 6 items were fully reported (> 95%) and 5 items were poorly reported (< 15%). The median of overall quality score in 2010 was 14 (range: 8.5–20). Figure 3 shows the frequency bar graphs of all researched OQS.

Fig. 2
figure 2

Overall Quality of Reporting: Rating Based on Items in the 2010 CONSORT Statement (n = 62)

Table 2 Report details for selected items
Fig. 3
figure 3

The frequency bar graphs of all researched OQS (n = 62)

Factors associated with reporting quality

Table 3 shows the results of the linear regression modeling. Six independent variables are input into the multiple linear regression model, among which the Sources of trial funding(P = 0.02), journal impact factor (P = 0.01), and international collaboration (P < 0.01) as noticeable predictors of the overall CONSORT score.

Table 3 Factors associated with key elements of the CONSORT guidelines

Discussion

In order to ensure accuracy and consistency in the reporting of RCTs, it is crucial that they are presented in an open and transparent manner, adhering strictly to the CONSORT guidelines. Failure to do so may result in overestimation of therapeutic efficacy and conflicting conclusions. [17,18,19,20]. Since 1996, the reporting quality of RCTs has improved dramatically when the CONSORT was launched [21], particularly those trials concerning drugs [22, 23]. CONSORT was updated in 2001 and 2010 [24, 25], respectively.

Unfortunately, although most RCTs of NOACs in AF have been published since the revision of 2010 CONSORT statement, the overall reporting quality was still unsatisfactory, with an overall average reporting rate of 55.45%, which is similar to the findings in the fields of lung cancer, and COVID-19. In addition, fewer project reports were added to or redefined in the 2010 revision compared with the 2001 CONSORT statement.

In this study, it was found that some items and measures of clinical characteristics had a high reporting rate of results, such as items 2a, 4a, 12a, 13a, 15, 17a, (95.16%, 98.39%, 96.77%, 96.77%, 96.77%,95.16%), which put emphasis on the theoretical background of the trials, eligibility criteria, statistical methods, participants assigned, baseline demographic table, and results with effect size and precision. However, the reporting rates of Changes to trial after commencement, Changes to outcomes, Interim analyses and stopping guidelines, Allocation, enrollment and assignment personnel, Blinding, Similarity of interventions, and Dates of recruitment (items 3b,6b,7b,10,11a,11b,14b) were only 8.06%, 14.52%, 16.13%, 4.94%, 14.52%, 8.06% and 16.13%, respectively. The results are worrying because the missing reporting items are also an important part of the details of the trial, and they are equally noteworthy. This finding was consistent with earlier research [26,27,28].

The results are worrying because the missing reporting items are also an important part of the details of the trial, and they are equally noteworthy. Inappropriate methods of randomization can lead to selection and confounding bias [29], and most trials that lack details of the randomization procedure provided also have problems with randomization-related bias, as well as influencing the reader’s assessment of the risk of selection bias [30]. In RCTs, blinding is used to avoid subjective bias. However, inappropriate blinding can increase the risk of bias and exaggerated results [30,31,32]. Some of the RCTs of NOACs for AF included in this study were not completely randomized trials per se, but used district group randomization, and patients in different groups may have different clinical characteristics and risk factors, which may introduce some bias and thus affect the reliability of the results [33,34,35,36,37]. Similarly, the lack of reporting of the way in which warfarin blinding was performed (blinded, unblended, masked) in some of the trials that were compared with warfarin may have introduced bias in the trial results [35, 38,39,40,41]. These should be discussed in the report. Reporting of significant events (patients lost to follow-up, treatment adherence, unbinding, and cross-contamination) after trial initiation is important, and these factors may also have an impact on trial results [42]. Especially in trials related to NOACs, the reporting of bleeding events plays an important role in the reliability of trial results. Secondly, the reporting of abnormal events is also essential in clinical trials, as these events may affect the validity and reliability of trial results [43]. For example, in the RE-LY trial, the reporting of myocardial infarction rate is an important issue because it may affect the interpretation and promotion of the trial results [44,45,46]. Therefore, accurate and clear event reporting in clinical trials is important and helps to increase the accuracy and reliability of trial results. The reason for these problems may be that the clinical features of RCTs are considered more important and engaging since many authors are also clinicians.

The reason for these problems may be that the clinical features of RCTs are considered more important and attractive because many authors are also clinicians. Here, it is suggested that the design of experimental methods should be carried out through full cooperation of scientists and clinicians. In addition, it is possibly lied in that most trials had developed detailed plans and remedial measures before they started, and that the trials proceeded smoothly without errors or RCTs with early termination or negative results are often rejected for publication [47]. For the sake of simplicity, the detailed test procedures of some articles cannot be fully reported, thus affecting the quality of the report. Experiments have shown a close relationship between inadequately informed trials and poorly designed or conducted tests [48, 49]. Hence, it is suggested that journals can revise their requirements for word count, especially for RCT, or offer online resources that facilitate authors to make these key omissions.

This study shows that journal impact factors, experimental Sources of trial funding and international collaboration are also important factors in the quality of RCTs reporting. Better reporting quality is associated with higher journal impact factors, international cooperation and high sample size, which is similar to a previous study [50, 51]. This may be because journals with higher impact factors have higher requirements for papers [52], and many countries participate in cooperative experiments with more perfect and detailed experimental designs. We believe that the potential reason for the poor quality of RCT reports is that some journals do not evaluate articles strictly according to the CONSORT statement, so we suggest that journal editors and peer reviewers should strictly judge the completeness and accuracy of the tests according to the CONSORT statement when reviewing the RCT. Meanwhile, the journal should also increase the promotion of CONSORT statement, calling on researchers to refer to CONSORT statement to design RCT experiments and write articles.

Nevertheless, this study is also plagued by several limitations. The release time of NOACs is relatively short; there are few RCTs on it; the number of included trials is limited. Besides, each CONSORT item is assigned with an equal weight, which may overemphasize some less important things. In addition, some of the item requirements in the CONSORT statement are very complex; although it provides explanatory notes, and we have designed and tried data collection templates prior to data collection, including detailed reporting requirements for specific items (such as allocation, hiding, blocking or setting and location), it is still impossible to avoid subjective factors during grading.

Conclusion

According to the data collected herein, there is currently a lack of clinical randomized controlled studies of NOACs in the treatment of atrial fibrillation. The quality of the existing articles needs to be improved. However, for confirming the effect of treatment, RCT is still recommended by this study as the preferred design in an ideal environment. In the future, more comprehensive efforts shall be made to increase the awareness of readers, reviewers, and editors on these issues. At the same time, the quality and accuracy improvement of such research conclusions shall also be focused on to guarantee the credibility of the evidence that they have offered.