Background

Definitive randomized controlled trials (RCTs) exist to demonstrate unmistakable evidence of a certain inventions benefit on a patient [1]. Although they are very impactful for clinical practice are typically expensive and time-consuming [2]. Given the resources and time, investigators often conduct pilot trials designed with an aim to demonstrate the feasibility of the larger-scale definitive trial [3]. Pilot trials can identify possible challenges, predict costs, and fine-tune study design. In addition, by demonstrating feasibility, a successful pilot trial can be used to leverage momentum and definitive trial funding [2].

Effective pilot trials have a well-defined set of objectives to assess feasibility.3 Feasibility is assessed in terms of whether the intervention of interest, trial design, and protocol can be successfully implemented and completed by the researchers [3]. Feasibility can be determined at the program level, study level, and site or investigator level. Program level feasibilities include determining the prevalence of particular diseases in a particular region and include clinical and epidemiological trials [4]. Study level feasibilities are centered on assessing whether a specific clinical trial can be conducted in a country or region [3]. Site or investigator level feasibility trials focus on identifying challenges and probable solutions with respect to the investigator and clinical aspects of the trial (drug dosages, actual study population, recruitment and follow-up, usage of assessment tools, etc.) [4].

Despite the benefits of pilot trials, previous literature has demonstrated that they do not always lead to a definitive trial. In 2004, Lancaster et al reviewed four general medicine journals and three specialist journals and identified 90 pilot studies published from 2000 to 2001; of which 45 reported the intention to carry out further work [1]. However, in 2010, Arain et al. found that only eight out of the 45 were followed by a larger, definitive study [5]. The impact of pilot data and subsequent research remains to be evaluated in the orthopaedic surgical literature.

This systematic review assessed the quality of pilot RCTs and frequency of ensuing definitive RCTs in the orthopaedic surgical literature. The primary objectives of this review were to: 1) assess feasibility outcomes across pilot trials in the orthopaedic surgery literature; 2) identify the proportion of pilot trials that lead to and how they inform definitive RCTs, and 3) evaluate the quality and frequency of pilot trials over time.

Methods

Identification of RCTs

EMBASE, MEDLINE and Pubmed were searched for relevant articles published from database inception until January 25, 2018 (Additional file 1). All search results were imported into the Mendeley Reference Manager software (Elsevier Publishing, 2013) to remove all duplicate trials.

Once the final pool of included pilot RCTs was determined, an additional search was conducted in the same electronic databases in an attempt to find corresponding definitive trials. If a literature search of titles was unsuccessful, other trials conducted by at least one of the authors after the pilot were considered. The secondary search was conducted using key terms used in the pilot RCT. Additionally, clinicaltrials.gov, an online database of ongoing clinical trials, was reviewed to determine if the previously identified pilot RCTs had a definitive trial in progress. Finally, if no definitive trial was found using these methods, the pilot RCT authors were contacted by email and asked whether a definitive trials was ongoing, published, or submitted for publication.

Eligibility criteria

Trails had to be defined explicitly and reported as pilot trials within the paper itself to be included in this review. Trials reported as pilot RCTs were deemed eligible for this review if they: 1) included an orthopaedic surgical intervention, 2) included a drug that was used intra-operatively at the site of surgery/fracture, or 3) evaluated the difference between two surgical interventions or surgical vs. non-surgical orthopaedic interventions. Only clinical trials in humans published in English were included. RCTs were excluded if they were: 1) non-pilot RCT designs (including small trials not reported as pilot trials) 2) trial interventions were exclusively non-surgical including physiotherapy, exercise regimens, post-operative rehabilitation, anesthesia, post-operative pain management interventions, or 3) trial interventions were surgical procedures not related to orthopaedics (e.g. oral, urology, and ocular surgeries), and 4) drugs and supplements administered orally (intravenously administered during surgery were included).

Screening

Articles were independently screened in duplicate at the title, abstract, and full-text stage by decisions were independently recorded in a spreadsheet (Microsoft Excel, 2015). In order to ensure comprehensive screening, an article was progressed to the next screening stage if at least one reviewer had noted that the article should be included, and illustrated as a flow diagram and checklist in Fig. 1 below. All disagreements were resolved by consensus during the full-text screening phase in consultation with a third senior reviewer (AD).

Fig. 1
figure 1

Flow diagram of the search and screening strategies to define the final pool of trails

Data abstraction

Pilot and definitive trial data, such as the country, number of patients in the RCT, orthopaedic condition being treated, orthopaedic intervention(s), controls, primary and secondary outcomes, percentage of patients that were lost to follow-up, follow-up schedule, and feasibility objectives were abstracted. In addition, for the definitive trials, any changes in interventions, controls, primary and secondary outcomes, or patient sample from the pilot trial were noted. For definitive RCTs, the time elapsed between the date of publication of the pilot and definitive trial and whether or not the sample size was calculated based on event rates from the pilot trial were determined.

Assessment of feasibility

Feasibility trials were defined as trials with a primary purpose of piloting the protocol to inform a definitive trial. In order to distinguish between pilot trials created solely for investigating the efficacy of interventions compared to feasibility trials, specific reference to feasibility objectives were evaluated. Feasibility objectives include determining the preliminary efficacy of a surgical intervention as well as the safety of the intervention, accurate event rates for a definitive sample size, the cost of a large scale clinical trial, patient recruitment rates, trial design, randomization procedure, and ability to maintain blinding.

Assessment of methodological quality

The reviewers (BD, VD, ALS, and SS) independently assessed the quality of each included pilot and definitive trial in duplicate using the Checklist to Evaluate A Report of Non-Pharmacological Trial (CLEAR NPT). The CLEARN NPT is designed for the critical appraisal of RCTs in nonpharmacological and surgical trials [6]. The original checklist was modified, where the question regarding patient adherence was omitted as all our trials evaluated a single, one-time surgical intervention. As the original checklist did not provide a scoring method, the criteria employed by Somford et al. was adopted to provide a modified CLEAR NPT (Additional file 2) [6]. The maximum CLEAR NPT score was 18, whereby a score of 0–6 indicated a low quality trial, 7–12 indicated a medium quality trial, and 13–18 indicated a high quality trial.

Statistical analysis

A kappa (κ) statistic was used to determine agreement at all stages of article screening with 95% confidence intervals (CI) [7]. An intraclass correlation coefficient (ICC) was calculated for the purpose of evaluating inter-rater reliability for the CLEAR NPT quality assessment. Agreement for both the κ and ICC was categorized as follows: > 0.90 indicated an almost perfect level of agreement, 0.80 < 0.90 strong agreement, 0.60 < 0.79 moderate agreement, 0.40 < 0.59 weak agreement, 0.21 < 0.39 minimal agreement, and 0.0 < 0.20 no agreement [8].

A t-test was performed using an online statistical calculator (Vassal Stats) to compare trial quality between pilot and definitive trials and a Pearson’s r correlation was calculated to determine if there was a relationship between number of studies and quality of pilot RCTs over time. A p-value less than 0.05 was considered significant. Descriptive statistics including means, proportions, standard deviations, and CIs are reported. A meta-analysis was not performed given the broad heterogeneity of the trial designs, interventions, and outcome measures.

Results

Screening

The initial screening of online databases yielded 3857 articles after the removal of 2230 duplicates. After title, abstract, and full text screening, 49 pilot RCTs were included (Fig. 1 and Additional file 3). Of these, we identified five definitive trials (one of which is still ongoing) that corresponded to the original published pilot trial. Inter-reviewer agreement was high at all stages of screening (title, κ = 0.886 (95% CI 0.878 to 0.893); abstract, κ = 0.740 (95% CI 0.693 to 0.780); and full text, κ = 0.792 (95% CI 0.737 to 0.835)).

Pilot trial characteristics

Pilot trials were commonly published from the UK and Canada (22 and 16%, respectively) (Tables 1 and 2). A total of 2117 patients were recruited across all pilot RCTs, and 5.84 ± 10.9% of patients, on average, were lost to follow up. The greatest proportion of pilot trials (59.2%, 29/49) focused on surgical fracture repair, including long bone, knee, spinal, foot and hip fractures (Fig. 2). As classified by the World Bank, 40 of the pilot trails were conducted in high-income countries, 6 were classified as middle income and 3 were classified as low income [9].

Table 1 Characteristics of the Included Pilot RCTs
Table 2 Characteristics of Included Definitive trials
Fig. 2
figure 2

Frequency of various types of pilot RCT interventions in each intervention category

Primary and secondary outcomes of the pilot trials were divided into physician-reported and patient-reported outcomes. 65.3% (32/49) of all pilot RCTs used radiographic analysis, such as x-rays, MRIs, ultrasounds and CT scans. Patient-reported outcomes were recorded through self-reporting or interview style questionnaires. Questionnaires addressed outcomes such as quality of life, pain, function/independence, and emotional health. Of the pilot trials, 67.3% (33/49) made use of patient-reported questionnaires as tools for monitoring trial outcomes.

Overall, 73.5% (36/49) of the pilot RCTs found in the orthopaedic surgery literature were framed as feasibility trials (Table 1). The two most commonly explored feasibility objectives were safety and efficacy of an orthopaedic surgical intervention (Fig. 3). 26.5% (13/49) of pilot RCTs explored more than one feasibility objective. The pilot trials CLEAR NPT rating varied from 10 to 18. Only 3 of the 5 definitive RCTs included in this review determined their sample size based on their corresponding pilot trial. None of the definitive RCTs enrolled the pilot patients into the definitive trial. Additionally, 22.4% (11/49) of the pilot trials listed the efficacy/effectiveness of the surgical intervention as a primary outcome. Of these, only one led to a definitive trial.

Fig. 3
figure 3

Number of RCTs that define each of these feasibility objectives in their pilot RCT

Definitive trial characteristics

Of the 49 identified pilot RCTs, five (10.2%) corresponding definitive RCTs were found (Table 2). On average, definitive trials were published at a mean of 4.25 years (3–7 years) after the pilot trial. The sample size of the pilot trial was 7.2% of the definitive trials. The total number of patients recruited to definitive RCTs was 4016, with one trial still recruiting participants (Table 2). Only one of these definitive trials was ongoing according to clinicaltrials.gov (31). Authors from 17 pilot trials (34.7%) responded to our email confirming that a definitive trial had not been published. Of these, 8 authors cited the following reasons for not conducting a definitive RCT: a lack of funding (12.5%), inability to meet recruitment targets (12.5%), preliminary efficacy of the intervention was not demonstrated (25.0%), the pilot study was thought to yield reliable results therefore eliminating the need for further investigation (50.0%).

Trial quality

There was no correlation (r = − 0.1508, p = 0.5655) between number of studies and quality of pilot RCTs over time (Table 3). The overall quality of the pilot RCTs was relatively high (mean CLEAR NPT score 15.9 ± 1.53). Based on the CLEAR NPT scale, the highest quality pilot RCTs involved the treatment of arthrodesis and repair of knee fractures. All of the definitive RCTs were given a score of 18 and were therefore 2.6 points higher on the CLEAR NPT scale than their corresponding pilot trials (p < 0.01). The agreement among reviewers for the quality assessment was very high (ICC = 0.969 (95% CI 0.948 to 0.982)).

Table 3 Number and average quality of pilot RCTs over time of publication

Discussion

Results from this systematic review demonstrate that the majority of orthopaedic surgical pilot RCTs were framed as feasibility trials, and that the pilot trials mostly evaluated site or investigator level feasibility. As expected, the quality of the corresponding definitive RCTs was higher than their respective pilot trial. Despite the majority (87%) of pilot RCTs being conducted in the high-income countries, the majority of the included pilot trials however, did not lead to a definitive RCT. In these cases, reasons cited included: a lack of funding, inadequate sample sizes, and that research questions were sufficiently answered in the pilot phase.

Similar to other fields of medicine, the majority of orthopaedic surgical pilot trials were not followed by a definitive trial. Arain et al. reviewed seven medical journals, including four general medicine journals (British Medical Journal, Lancet, the New England Journal of Medicine and the Journal of American Medical Association) and three specialist journals (British Journal of Surgery, British Journal of Cancer, British Journal of Obstetrics and Gynecology) to identify 54 pilot studies [5]. The authors reported a very low number of follow up studies, wherein only 14.8% (8/54) pilot studies yielded published definitive studies. Additionally, a systematic review published in 2017 by Kaur et al., looked at the quality of pilot studies within the Clinical Rehabilitation journal over the past 30 years, and they concluded that only 12% of their pilot studies led to a definitive trial [10].

The limited number of published pilot trials and corresponding definitive trials may be attributed to numerous factors. Firstly, the pilot may have demonstrated that a definitive trial was not feasible based on criteria established a priori (e.g. ability to recruit patients). However, we would expect that in some of these cases, researchers would amend their trial design, interventions, and outcomes to ensure feasibility in the definitive trial. Secondly, if found to be feasible, investigators may refrain from publishing their pilot trial and instead, roll the pilot patients into the definitive RCT to help save on time and costs. Trial methods papers and online registries are often used to first describe these trials. Thirdly, based on author responses in this review, definitive trials may not be feasible due to a lack of funding. In one case, the authors noted that their research question was answered by the pilot trial [11]. However, the published pilot did not provide a sample size calculation, and therefore, we cannot determine if the statistical power threshold was met for the primary outcome [12].

The majority of the orthopaedic surgical pilot trials found in this review posited feasibility objectives and were of relatively high quality. The first published pilot surgical trial was found in 1996, and since then, there has been an increase in the number of pilot RCTs published over time, with a relatively constant quality of trials up until 2013, with a decline in publications up until 2016. From 2016 to the end of our search in 2018, there were no orthopaedic surgical pilot RCTs published. This may be due to a more recent trend of trialists to roll their pilot patients into a definitive trial to save on costs and maximize recruitment. There may also be a lag in pilot publications in the past 3 years.

Strengths and limitations

Strengths of this review include a broad systematic search and high agreement at all stages of screening and quality assessment. The main limitation is the minimal data available regarding the reasons why pilot trials have not led to definitive RCTs. There was a lack of response from authors, limiting further insight into barriers to definitive trials. Within the past 5 years, 13 of the 49 pilot RCTs and 4 of the 5 definitive trials were published. Thus, the inclusion of more recent pilot RCTs may be a limitation, as their current definitive trials may be underway, and/or not yet published. This potential source of bias was mitigated by searching the clinical trial registry, clinicaltrials.gov, for any records of ongoing definitive RCTs.

This review includes the use of the CLEAR NPT checklist to evaluate each pilot trial. Specifically within orthopedic literature, the quality of reporting RCTs using the CLEAR NPT is suboptimal, and that there is a need for improved surgical reporting [13]. However, in comparison to the CONSORT statement, the CLEAR NPT scale proves to be more useful in its analysis in interventions that require technical skill, with unique considerations in both conducting and reporting trials [14]. In this review, to account for methodological considerations, a modified CLEAR NPT scale was used instead to increase reliability and remove the necessity of including the Cochrane Risk of Bias Tool. The CLEAR NPT scale was modified, tested and optimized for orthopaedic trials, which was the focus of this paper.

Conclusion

While the majority of pilot RCTs found in the surgical orthopaedic literature are framed as feasibility trials, most did not lead to definitive trials. The reported reasons include: minimal funding, the inability to recruit an adequate sample size and that the research questions were sufficiently answered in the pilot phase. Although, most pilot RCTs did not result in a definitive trial, this does not diminish the value of the pilot trial in determining feasibility.