Pilot trials are considered preliminary studies performed to inform methodologic considerations to design a subsequent study that is larger and confirmatory.1,2 Most consider a pilot study to be a feasibility test of the methods and procedures1,2,3 and as such do not address clinical efficacy.4 Accurate, transparent, and consistent reporting indicates scientific integrity and minimizes the potential for bias and distortion of effect estimates associated with incomplete reporting. The Consolidated Standards of Reporting Trials (CONSORT) framework was introduced to improve reporting of randomized-control trials (RCTs).5 Nevertheless, an assessment of reporting in anesthesia literature, comparing before and after CONSORT implementation, did not show much improvement.6 It is essential that pilot studies are reported to avoid redundancy to make appropriate use of resources and funds and to allow the possibility of including the results in an appropriate meta-analysis.7,8 Despite the relative increase in their publication,9 their reporting quality is often poor.7 In response, a framework for reporting pilot studies based on the CONSORT statement was developed and published in 2016.10 This includes separate guidance statements and checklists for the abstract (16 items) and full text of the manuscript (26 items). As suggested, for the purposes of this study, “pilot” and “feasibility” will be used interchangeably and the included studies will be referred to as pilot trials. To assess standards of reporting, previous studies have looked at adherence to CONSORT RCT guidelines,11 and some have explored predictive factors.12,13,14 Since no studies have evaluated the reporting standards of pilot trials in the anesthesia literature, we decided to look at studies published between 2006 and 2016 (prior to the CONSORT pilot trial extension) and evaluate their reporting in reference to the CONSORT pilot trial extension checklists for abstracts and full texts. Our objectives were to assess the reporting quality of 1) abstracts and 2) full texts in five leading anesthesia journals and explore potential factors associated with reporting quality.

Methods

This is a cross-sectional study of pilot trials published in leading anesthesia journals. This manuscript adheres to the STROBE statement and checklist for cross-sectional studies.

Study screening and selection

We identified the top five highest impact journals in the field of anesthesiology and pain according to the 2015 journal citation reports.15 The impact factor (IF) refers to an index number assigned and calculated by Thomson Reuters as the number of citations received in a specific year for articles published in that journal during the two preceding years divided by the total number of articles published in that journal during the two preceding years.16 Our search strategy was designed by an experienced librarian (R.C.) using the OVID platform and was guided by the terms “pilot”, “feasibility”, or “preliminary” studies (Appendix 1). The initial search was performed on 6 October 2016 to look for studies published from 2012 to 2016 (last five years). Since our first search potentially led to only 38 full-text articles, we expanded our search on 26 October 2016 to expand the search to include articles from the last ten years. We included all pilot trials that were designed to assess the feasibility, safety, or acceptability of an intervention and that included a randomization procedure. We excluded trials on animals and pharmacokinetic studies. No language exclusion criterion was applied. For our review, study selection based on the term “feasibility” was challenging. In general, the term “feasibility” reflects the “ease” or “possibility” of either an intervention or method. For example, we had to separate studies that were aimed at looking at the ease of ultrasound use for injection or visualization or time taken with a fibreoptic bronchoscope compared with other methods of intubation. Such studies were excluded as they were not expected to lead to a subsequent larger trial. The study selection process was carried out in duplicate by two independent reviewers (A.K. and H.S.), both trained in health research methods. The title and abstract screening was conducted using a free online software, “rayyan” (https://rayyan.qcri.org/). The full-text screening was done independently and in duplicate. The agreement on full-text screening was reported using a kappa (k) statistic and interpreted as poor if k ≤ 0.2, fair if 0.21 ≤ k ≤ 0.4, moderate if 0.41 ≤ k ≤ 0.6, substantial if 0.61 ≤ k ≤ 0.8, and good if k > 0.8.17 Disagreement was resolved by consensus or arbitration by a third reviewer (L.M.).

Data abstraction

Data abstraction was carried out by two independent reviewers (A.K. and H.S.) using extraction forms created in Microsoft Excel, 2016 (Microsoft, Redmond, WA, USA). Extracted items included the study characteristics; reason for identifying a study as a pilot trial; study conclusion as feasible, not feasible, or unclear; and CONSORT reporting items separately for the abstract and full text, as guided by the recently published CONSORT pilot trial extension statement.10 Reviewers were provided with clear and explicit rules to judge each CONSORT item as either “reported” or “not reported” (Appendix 3). All items with sub-items were collapsed into a single item allowing for a single decision to be made as “reported” or “not reported” for each item. The extraction forms were piloted to ensure consistency until an agreement score of k > 0.8 was achieved for at least three consecutive studies. The total number of items assessed for abstracts was 15; the item “recruitment” was not considered as it was relevant only to conference abstracts.10 The total number of items assessed for full texts was 24; the items “ancillary analyses (results of any other analyses performed that could be used to inform the future definitive trial)” and “trial protocol (where the pilot trial protocol can be accessed, if available)” were omitted from the total items as they were optional and difficult to judge.

Study outcomes

Our primary outcomes were to assess the reporting quality of 1) abstracts and 2) full texts using the CONSORT extension for pilot trials and secondarily as explored factors associated with reporting quality.

Predictor variables

Apart from including the following four variables as either present or absent in all studies—1) industry funding; 2) clinical trial registration as a pilot trial; 3) primary outcome or objective as a feasibility outcome or objective; 4) trial identified as either “pilot” or “feasibility” in either its title and/or abstract—we considered the journal name as a grouping factor. We hypothesized that not registering the trial as a pilot trial, not reporting the feasibility target as a primary objective or outcome, and not identifying the trial as a pilot trial in its title and/or abstract would be associated with poor reporting quality. Information on industry funding was considered when it was identified with any part of study reporting. For registration, we checked whether each trial was registered within ClinicalTrials.gov or the International Clinical Trials Registry Platform (ICTRP)-World Health Organization registry. Further, whenever registered we also checked whether they were in fact registered as a pilot or feasibility trial and if there were any amendments to that effect. We hypothesized that a higher journal impact factor and industry funding would be associated with better reporting quality.

Statistical analysis

Important study characteristics were tabulated along with their reported frequencies and percentages. For continuous outcomes, mean (SD) or median [interquartile range (IQR)], was reported. Categorical data, such as the number of studies reporting a particular item, were reported as counts and percentages. All included trials were assessed for completeness of CONSORT items reporting (abstract = 15 and full text = 24). Normality of the distribution was tested using the Kolmogorov-Smirnov (KS) test (at a significance of P < 0.05) and visual inspection of Q-Q plots.

Individual abstract and full-text items, and their percentage reporting across studies, were represented with bar diagrams. We explored the association of predictive factors with the total number of CONSORT-recommended abstract and full-text items reported using a generalized estimating equations model. Our exploratory variables included four categorical variables and the specific journal (grouping factor). The articles from each journal can be clustered and correlated because of the journal policy, word limitations, and other publication considerations. Nevertheless, since we do not know the structure of this correlation, we assumed the working correlation matrix as unstructured, with the covariance matrix for the robust estimator and not the model-based estimator. With a generalized linear modelling (genlin), we used the Poisson log-linear function after checking for Poisson distribution, by KS test (at a significance of P ≤ 0.05) for Poisson distribution, and also comparing the mean and variance of the dependent variables. Under the model, we considered the above-specified covariates and looked for main effects. Under parameter estimation, we used a hybrid method with Fisher (1): scale = 1, maximum iterations = 100 and maximum step-halving = 5, and parameter convergence = 1E-006 (absolute). Under statistics, we looked for analysis type III for Wald Chi-square statistics with a confidence interval (CI) specified at 99% and full log quasi-likelihood function. Although we had more than five counts (dependent variable) per covariate, there is potential for a false discovery rate, so we considered significance at alpha (α) ≤ 0.01 and estimated the incidence rate ratios (IRR), along with their 99% CI. We explored variance inflation factors (VIFs), with a threshold of 10, to assess collinearity between predictors.18 Microsoft Excel 2016 and IBM SPSS 24.0.0.0 (2016; IBM Corporation, Armonk, NY, USA) were used to perform the statistical analysis.

Sample size

We did not estimate the actual sample size required. To get a fairly unbiased estimate we decided to obtain at least 50 studies as a convenience sample.

Results

The five highest-impact anesthesia journals identified as per the 2015 journal citation reports (ranked from higher to lower) were British Journal of Anaesthesia (BJA), Pain, Anesthesiology, Anesthesia & Analgesia (AA), and Anaesthesia; IFs ranged from 5.61 to 3.79. Our search identified a total of 364 potential citations, of which 83 were selected for full-text review and 58 were eligible for data extraction (Fig. 1). The reviewers agreed on 79 of the 83 articles leading to a kappa of 0. 91 (95% CI, 0.81% to 1.00). Of 58 articles, three were published as letters to editors19,20,21 and the rest as original articles. Important study characteristics are highlighted in Table 1. Two of the five journals (BJA and AA) contributed 62% of the included articles. The median [IQR] sample size was 43 (30-76). In 42 studies (72%) the rationale for reporting the study as pilot or feasibility was unclear or not provided (Table 1).

Fig. 1
figure 1

Study flow chart

Table 1 Characteristics of included studies (n = 58)

Primary outcomes

CONSORT abstract reporting

We considered a total of 55 articles for the reporting quality of abstracts (by excluding the three trials reported as letters). Tests for normality indicated that they were not normally distributed. The median [IQR] number of items reported was 5 [4-7]. All CONSORT recommended items except four (trial design, participants, intervention, and randomization) were reported in less than 40% of studies (Fig. 2). “Trial funding” was observed to be the least commonly reported item. For CONSORT full-text reporting, we included all 58 articles for assessment. The reporting of CONSORT items was normally distributed (KS test significance 0.200) with a mean (SD) of 13 (5). Among CONSORT items, we observed that “ethical approval” was the most commonly reported item, while “title and abstract” was the least commonly reported (Fig. 3). Reporting of individual studies is provided in Appendix 2.

Fig. 2
figure 2

Consolidated Standards of Reporting Trials pilot extension abstract items reported in percentages

Fig. 3
figure 3

Consolidated Standards of Reporting Trials pilot extension full-text items reported in percentages

Secondary outcomes (Table 2)

None of the VIFs were > 10 in our regression models, suggesting no major collinearity among independent factors. The IRR is interpreted similarly to the odds ratio.22 For example, an IRR of 1.0 for “trial registration” would mean there is no difference in reporting, whereas > 1 would mean there is increased odds of better reporting with trial registration. For CONSORT abstract reporting, our regression analysis showed that “not registering the trial as a pilot trial” (IRR, 0.68; 99% CI, 0.54 to 0.87; P < 0.001) and “not identifying it as a pilot trial in the title or abstract” (IRR, 0.69; 99% CI, 0.54 to 0.89; P < 0.001) were significantly associated with poor reporting quality. For CONSORT full-text reporting, “not registering the trial as a pilot trial” (IRR, 0.67; 99% CI, 0.54 to 0.82; P < 0.001) and “using clinical hypothesis testing as the primary objective and/or outcome” (IRR, 0.77; 99% CI, 0.63 to 0.93; P = 0.001) were significantly associated with poor reporting quality (Table 2).

Table 2 Multivariable regression analysis of predictive factors and CONSORT-pilot extension reporting adherence for abstracts and full texts

Discussion

Summary of findings

We found that that the reporting quality of pilot trials is poor for both the abstract and full text or reports published in leading anesthesia journals. For abstracts, the median [IQR] items reported was 5 [4-7], and for full texts the mean (SD) items reported was 13 (5). A significant association of poor reporting was observed with “not registering as a pilot trial” and “not identifying as a pilot trial in their title or abstract” and with “not registering as a pilot trial” and “reporting clinical hypothesis testing as the primary outcome/objective” for abstracts and full texts, respectively.

To measure the quality of reporting of published RCTs, the CONSORT RCT statement and its checklist have been widely used.23 All journals included in this study do endorse CONSORT guidelines as per the CONSORT website.24 A recent study looking at the top 11 anesthesia journals and their CONSORT adherence of all items among 319 RCTs reported a median [IQR] compliance rate of 60% [22.9-88.9%]. They also observed a weak correlation between adherence to CONSORT items and the number of citations.25 Similar observations have been made in other studies in general medical journals26 and specialty journals.27 While previous studies have evaluated specific elements of reporting, the present study uses the entire CONSORT pilot extension statement and checklists to assess the reporting quality of pilot trials. Outside of anesthesia, other published studies on pilot trials have only looked at specific items of pilot study reporting. Arian et al. studied pilot trial reporting in seven major medical journals published between 2007-2008.4 Out of 54 studies, 26 were studies of interventions, of which 86% incorporated hypothesis testing and 62% included randomization. Shanyinde et al. looked at a sample of 50 pilot trials among 3,652 publications of pilot RCTs identified within the MEDLINE and EMBASE databases between 2000-2009. Only 56% articles discussed adequate methodologic issues and only 18% discussed future trials.9

In our study, items such as “title and abstract”, “background and objectives”, “outcomes”, and “statistical methods” were observed to be appropriately reported in < 20% of studies, as the majority focused on hypothesis testing as their primary outcome (Fig. 3). Duffet et al. studied pilot RCTs published in pediatric critical care literature and found only 32 trials up to July 2014. They also observed that the majority of those trials focused on clinical outcomes and uncommonly reported explicit feasibility outcomes.28 This finding is similar to our study, wherein we also observed a significant association of poor reporting when a clinical hypothesis was tested as their primary outcome. Possibly as a consequence of misdirected objectives, we also observed that 40/58 trials were rated as “unclear” for their conclusions on the feasibility of a larger study (Table 1).

In our study, poor reporting was associated with “not registering the trial as a pilot”. Registration improves the quality of reporting and makes the trial information publicly available, thereby making the authors accountable, and helps to reduce publication bias and selective reporting.29,30 It has been observed that registered trials tend to have a low risk of bias for random sequence generation, allocation concealment, and selective reporting, but not blinding or incomplete outcome data.31 Nine trials did not identify their study as a pilot in either their title or abstract. As the title and abstract are the items freely accessible to readers, it is important to be as explicit as possible. For example, Bakri et al. noted their trial to be pilot only in their discussion section.32 A clear rationale to identify as a pilot trial was unavailable in 76% of the trials (Table 2). We also think it is possible that many were identified so because of the small sample size. It is important that one distinguishes between a small-sized trial, whose estimate of effect can have a large boundary of CI and hence uncertainty, from a pilot study (large or small is relative) that should be powered or conducted to test the study methods and not the clinical hypothesis. Our study is not without limitations. Selection of studies only from the five journals with the highest IF may not reflect the true picture in anesthesia. Moreover, there are inherent limitations in considering IF as a metric of a journal’s quality. It has been observed that citations can be increased by promoting more editorial letters, proportionately more review articles compared original research articles, and by self citations.16 Nevertheless, our pre-study preparatory search revealed relatively fewer pilot trials in low-impact anesthesia journals. Although we did not approach the individual journal editors to enquire about their policy on publication of pilot trials, it could be important as observed by Arain et al.4 They note that, while some journal editors did not encourage them, others considered them less rigorous, and some considered them when trials did not include a sample size calculation.4 The item of “sample size” was difficult to judge. Although sample size estimation for a clinical hypothesis has clear elements, the same may not apply for a pilot trial. We based our judgement as “appropriately reported” if the authors provided any clarification about their sample size based on feasibility objectives and not on hypothesis testing.2 Finally, although it is possible that reporting quality could be greater when separately published study protocols are assessed along with their resulting manuscripts,18,33 we have to note that adherence to the CONSORT statement and its checklist is for each individual manuscript and not to be considered as a whole for the protocol plus resulting manuscript.

In conclusion, the reporting of pilot trials in leading anesthesia journals is poor for both abstracts and full texts. By being consistent in their policy towards adherence to the CONSORT extension statement on pilot trials and possibly enforcing it as a mandatory requirement, journal editorial boards can encourage and improve reporting standards. Investigators and authors should be aware of the differences in the conduct and reporting of pilot studies.