Background

Randomized controlled trials (RCTs) provide strong evidence for efficacy of healthcare interventions [1]. Carefully planned and well-executed RCTs give us the best estimates of treatment effect and can thus guide clinical decision making [2, 3], although trials that lack methodological rigor cause over- or underestimation of treatment effect sizes due to bias [46]. Hence, efforts have been undertaken toward improving the design and reporting of RCTs [1, 611].

While RCTs represent a small proportion of original research published in surgical journals [12, 13], they still represent an important component of the literature and a high level of evidence [14]. But, this literature appears to indicate that surgical RCTs lag behind the general literature in terms of methodological quality. Methodological quality mainly refers to the formal aspects of study design, performance and analysis. For example, one study found that only 33% of RCTs published in surgical journals but 75% published in general medicine journals were of high quality [15]. RCTs of orthopaedic surgery appear to be no better, with greater than half of the RCTs in one study lacking proper concealment of randomization, blinding of outcome assessors and reporting of reasons for excluding patients [16]. In another study looking at the quality of RCTs in pediatric orthopaedics, the authors found that only 19% of the included articles met their criteria for high quality [12]. In contrast, it appears that RCTs published in general internal medicine journals is of generally of higher quality. For example, Moher et al. included 211 reports of RCTs from the top four English-language internal medicine journals and found that greater than 60% of RCTs were of high quality [17]. Therefore, it is obvious that RCTs in orthopaedic surgery are in need of improvement.

It is important to note the difference between methodological quality and reporting quality. Our study is designed to evaluate the methodological conduct of studies; however poor reporting can innately make this task difficult. While it is imperative to decipher between reporting and methodology, it can be tempting to draw similar conclusions from both. This will ultimately hamper a true risk of bias assessment and must not be carried out.

To our knowledge, there has not been an assessment of the methodological quality, or risk of bias, of RCTs across the top journals in orthopaedics. Nor has there been an effort to characterize the proportions of published papers that represent the highest levels of evidence. The purpose of the present study was to assess the risk of bias of all randomized trials published in the last 5 years of the top five journals in orthopaedics.

Methods

Search strategy

We determined the top five journals in orthopaedic surgery by their impact scores from the Thompson ISI citation reports. These journals included the American Journal of Sports Medicine (AJSM), Journal of Orthopaedic Research (JOR), Journal of Bone and Joint Surgery, American (JBJS Am), Spine Journal (SJ) and Osteoarthritis and Cartilage (OC). These journals were hand searched on the journal’s website and assessed for reports for inclusion by one individual (LC). Decisions regarding inclusion of potential studies were based on the following criteria: (1) consisted solely of human subjects, (2) random subject allocation, (3) the experimental design included both treatment and control groups comparing an orthopaedic intervention, (4) and had a publication date between January 2006 and December 2010 in the journals mentioned above. These criteria were used as a measure of a methodological quality based Cochrane Collaboration’s widely accepted risk of bias tool as well as Modern Epidemiology 3rd Edition risk of bias assessment recommendations. It is important to note there was no formal protocol for this assessment.

Data extraction

The investigators separately and independently extracted data from each study using preformatted Excel (Microsoft, Redmond, WA) spreadsheets. Extracted data included: journal name, journal impact factor, and publication year. All included studies were assessed on ten criteria related to risk of bias (Table 1). The ten criteria required sufficient reporting regarding randomization method, allocation sequence concealment, participant blinding, outcome assessor blinding, outcome measurement, interventionist training, withdrawals, intent to treat analyses, clustering, and baseline characteristics. For each of these criteria the RCT was judged as fulfilling each criterion (indicated as a “Yes”), not fulfilling it (indicated as a “No”) or having insufficient information to determine fulfillment (“Not Reported”) (see Figure 1). In order to be considered a “Yes” the paper must have included a complete description regarding the process and outcome of each criterion. If investigators felt that there was too little information or that they would be unable to replicate the process based on unclear reporting, the article was designated as a “Not Reported” for that criterion. A complete lack of reporting or an erroneous method (i.e., Randomization by patient number or date of birth) was marked as “No.” Disagreements were documented and resolved by discussion between data collectors along with the primary investigator.

Table 1 Risk of bias criteria
Figure 1
figure 1

Trial flow diagram.

Statistical analysis

Statistical analyses included calculating the mean number of criteria that were met (“Yes), not met (“No”), or of unknown fulfillment (“Not Reported”) within and across all journals. First, we assessed the distribution of Yes/No/NotReported of each article. Then we calculated the mean proportion of fulfilled items for all the articles from the same journals stratified by criterion (Table 2). We then compared these mean proportions across journals using an analysis of variance (ANOVA) to test for differences in reporting quality. To note, the more favorable distribution is one with a greater proportion of fulfilled items, indicating that the journal has met more criteria for methodological quality. Linear regression was also applied with the outcome variable being the total number of fulfilled items per trial and predictor variables being journal impact factor and year of publication. We also performed a sub-analysis on the proportion of met criteria as categorized by geographic location, anatomical region, study size and orthopedic specialty (see Tables 3, 4, 5 and 6). All statistical tests had significance set at p = 0.05.

Table 2 Ratings for each methodological quality criterion within and across journals
Table 3 Sub-group analysis of item responses by anatomical region (article n = 240)
Table 4 Sub-group analysis of item responses by subject
Table 5 Sub-group analysis of item responses by Geographic Location
Table 6 Sub-group analysis of item responses by Study Size (in quartiles)

Results

We identified a total of 261 RCTs of which 232 met out inclusion criteria. The most common reason for exclusion was the lack of human participants in the RCTs (N = 29). JBJS Am accounted for the largest number of included RCTs (N = 106) followed by AJSM (N = 74), OC (N = 36), SJ (N = 16) and JOR (N = 7). A total of 49% of the criteria were fulfilled across these journals, with 42% of the criteria not being amendable to assessment due to inadequate reporting (Table 7). The RCTs from AJSM had the highest number of fulfilled criteria, or were at the lowest risk of bias, while RCTs from SJ and JBJS Am had the highest number of unfulfilled criteria, and JOR had the largest number of unknown fulfillment of criteria (Table 7). Less than 1% of the included RCTs fulfilled all ten methodological criteria. Results of the ANOVA test revealed that the difference in proportion of items fulfilled (“Yes”) between studies was statistically significant (p = 0.034) at alpha = 0.05 level.

Table 7 Summary of overall methodological ratings by journal

OC had the largest proportion of “yes” ratings, or adequate fulfillment, for four of the ten criteria (proper analysis, description of withdrawals/ compliance, subject blinding, outcome assessor blinding), JBJS Am was the leader for three criteria (randomization process, allocation concealment, accounting for clustering), AJSM led for two criteria (baseline characteristics, intervention administration) and JOR led in one category (blinded outcome assessment). Table 2 contains the complete list of all methodological quality criteria ratings within and across journals.

We also found that the total number of RCTs published increased slightly from 54 in 2006 to 61 in 2008 but fell to 57 and 46 in 2009 and 2010, respectively. But, the proportion of RCTs per total published articles fell from 6% in 2006 to 4% in 2010. The results of our regression revealed that the year of publication was significantly associated with more fulfilled criteria (β = 0.171; CI = −0.00 to 0.342; p = 0.051), but the impact factor was not a significant predictor (β = 0.307; CI = −0.221 to 0.836; p = 0.253). Figure 2 contains the ratings across all criteria by year of publication.

Figure 2
figure 2

Percentage of RCTs Meeting Criteria by Publication Year.

Discussion

We found that only a very small proportion of the analyzed RCTs met all ten methodological quality criteria, indicating that many of these studies are at a serious risk of bias, but that these trials are improving with time (Figure 2). In addition, we found that many RCTs did not report sufficient information to judge if they met many of the included criteria. Overall, it is clear that the methodological and reporting quality in orthopaedic RCTs has significant room for improvement.

The poor methodological quality of orthopaedic RCTs has been shown in previous literature [12]. Dulai et al. [12] reported that despite increasing numbers of RCTs, only 19% of pediatric orthopaedic trials evaluated met the standard for methodological acceptability. They found that in particular there was inadequate rigor and reporting of randomization methods, use of inappropriate or poorly described outcome measures, inadequate description of inclusion and exclusion criteria, and inappropriate statistical analysis. In another study, Bhandari and colleagues [13] assessed 72 RCTs from JBJS Am published from January 1988 to the end of 2000 and found that while the number of RCTs increased over the years, their mean overall score was only 68.1% of the total Detsky quality score. Similar to our study, they found that more than half of the RCTs were limited by lack of concealed allocation, lack of blinding of outcome assessors, and failure to report reasons for excluding patients [14]. Furthermore, Herman and colleagues [14] found that only 35% of the RCTs in eight leading orthopaedic journals used an intention-to-treat analysis, which was similar to our finding of 41%. Also, Karanicolas and colleagues [15] found that less than 10% of 171 included orthopaedic trauma RCTs had blinded outcome assessors. This is much lower than our nearly 51% finding, the difference of which is most likely due to the broader nature of the trials that we included, going beyond trauma and including any orthopaedic RCTs from a select list of orthopaedic journals.

Beyond methodological deficits in these trials some evidence suggests, similar to our findings, that RCTs in orthopaedic surgery fail to report much important information [16, 18]. That is, to adequately assess the quality of any methodological component of an RCT, sufficient information must be present in the published report to make that assessment, and it appears many orthopaedic RCTs fall short in this regard. For example, the most recent of these investigations of reporting quality [19] applied the Consolidated Standards of Reporting Trials (CONSORT) statement [20] to a sample of RCTs, the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement [21] to a sample of case–control, cohort and cross-sectional studies, and a statistical questionnaire was used to assess all included studies. They found that for the 100 included studies only 58% of the CONSORT items were met on average across the seven included journals. We found inadequate reporting on average for approximately 42% of the items on which the RCTs were assessed. The slight difference in findings between these studies can likely be accounted for by the use of different checklist items and the inclusion of different selections of journals. Either way, research in the area indicates serious inadequacies in reporting in orthopaedic RCTs. This trend of poor reporting has been seen in other fields as well, including internal medicine [17] and general surgery [22].

Despite the deficits, the RCTs we included did have some common strengths. In general, the intervention and primary outcome was well described in most papers. Also, the proportion of methodological quality items fulfilled increased with increasing publication year, which is consistent with trends in internal medicine journals [17]. This is promising and may suggest that clinical trialists, editors and reviewers are putting more emphasis on proper methodology.

Our study has several strengths. First, we conducted a comprehensive hand search of the tables of contents of the top orthopaedic journals in a recent span of 5 years. Thus, the findings presented here for the included RCTs likely represent the most read and cited RCTs in the orthopaedic community and therefore give an excellent idea of the quality of the RCTs that might be impacting clinical decision making. If in fact this assumption is true, the trials that are the most influential are at a high risk of having biased estimates of treatment effect. But, due to the limited selection of journals included, it is possible that higher quality and more influential RCTs are being published in other journals. For example, we found that RCTs make up only a very small proportion of all articles published in these five journals and therefore may not be influencing decision making to any high degree. In order to ensure a proper meta-analysis, our paper is in accordance with the PRISMA Statement and meets all criteria. Additionally, we included methodological quality criteria that have been empirically proven to bias estimates of treatment effect when not properly implemented [3, 4, 2334]. All included criteria have empirical evidence that not using them in RCTs or not assessing them in systematic reviews results in bias in the estimates of treatment effect or in misclassification of trials as high or low quality. But, due to the lack of reporting of the included studies we could not directly test the influence of specific inadequacies in methodology on effect estimates. Therefore, we cannot be certain that the flaws in methodology in these orthopaedic studies absolutely bias the estimates of treatment effect. We can only extrapolate for the extensive literature that has shown this to be true for RCTs in other clinical areas [3, 4, 2334].

It is important to note that just because a study did not report a certain methodology does not imply that it was not performed. For example, in this study, subject allocation and cluster analysis had two of the lowest fulfillment proportions. We acknowledge that descriptive reporting of these topics may not have been emphasized despite proper methodology and that poor reporting may not necessarily be a proxy for poor methodology [35]. Thus, this paper fails to account for these underreporting deficiencies and may falsely underestimate the quality of methodology in this literature. In any case, to adequately assess the quality of a reported study the relevant information must be present for the reader to assess the potential risk of bias in the estimates of effect to determine the potential import or not of the RCT to clinical decision-making.

In common with other authors, we can make some recommendations on how to improve this literature. First, we suggest that investigators include on their team an epidemiologist, clinical epidemiologist, clinical trial methodologist or someone with experience in conducting RCTs and a statistician or biostatistician to ensure proper planning and implementation of the trial. There is evidence that including such individuals on the investigative team improves the quality of the resultant RCT [13]. In addition, we would suggest that investigators and authors refer to the revised CONSORT statement [20] and the related explanatory paper [36] to guide them on the important information to include when reporting their RCT. The CONSORT statement has been shown to improve the quality of reporting in these studies [37]. In addition to these documents, there are other reporting guidance documents located on the equator network website that may be of use [11]. Finally, we suggest that journal editors enforce the use of the CONSORT statement so that published reports are completely reported and have the best chance of being interpreted properly for clinical decision making.

Conclusions

There are some obvious flaws in the methodology and reporting of RCTs in the orthopaedic literature. These flaws may cause seriously biased estimates of effect in those studies. We expect that these types of initiatives mentioned above will improve these important types of clinical research which are an integral aspect to improving the empirical base for orthopaedic procedures [38]. And remember, just because a study is rated as level I evidence does not imply that it is without methodological flaws and that these flaws can bias the reported effect estimates [39].