Background

We recently found suboptimal quality of reporting of RCTs in three general endocrinology journals and identified important deficiencies in the reporting of key methodological items [1]. Poor quality of reporting of RCTs is not limited to the endocrine literature. Similar findings have been reported for RCTs published in leading general medical journals and subspecialty journals [26].

Reporting deficiencies can reduce the confidence in RCT results and hinder their applications in developing clinical practice guidelines and conducting unbiased meta-analyses. The Consolidated Standards of Reporting Trials (CONSORT) group has been making efforts to improve the transparency and quality of RCT reports since 1996. They have published reporting guidelines and updates, which are available at http://www.consort-statement.org[79]. Journals which have adopted the CONSORT statement have shown some improvement in the quality of reporting of their manuscripts, although the magnitude of this improvement has been variable [10].

Finding predictors or determinants of quality of reporting of RCTs could guide the triage of articles that are worthwhile for busy clinicians who act as peer-reviewers. Most importantly, it could also help to improve the quality of reporting of RCTs. In our previous study, we found that journal of publication, type of funding and sample size were significantly associated with overall quality of reporting, where the assessment of quality of reporting was based on 15 out of the 22 CONSORT items [1]. We could not find any variable significantly associated with quality of reporting of key methodological items - namely, allocation concealment, blinding, and analysis by intention to treat principle [1].

A clear research question (RQ) is the starting point for any research project because it guides the development of the research design and protocol. Expert methodologists have proposed the use of a structured research question to guide this process [11]. A structured RQ about therapy should contain the following five elements: population, intervention, comparator, outcome, and time-frame. These elements are commonly referred to by the acronym PICOT [11]. There are many benefits of having a well-structured research question which include increasing the likelihood of finding a solution to the problem [12], enhancing the clarity of the thought process in developing the protocol, informing the design, guiding analysis decisions, and ensuring publication [1315]. Whether the use of a structured RQ is associated with better quality of reporting is unknown. The aims of this study were to assess how the PICOT format is used to frame research questions, objectives or hypotheses based on reports of RCTs published in general endocrinology journals and evaluate the association between the presence of a structured RQ using the PICOT format and the quality of reporting of RCT reports.

Methods

Study design and setting

This is an analysis based on a systematic review of 89 RCTs published between January 2005 and December 2006 in three general endocrinology journals. We selected the three general endocrinology journals with the highest impact factor (IF) in 2006 as our source of RCT reports. These are the Journal of Clinical Endocrinology and Metabolism (IF = 5.8), Clinical Endocrinology (IF = 3.4) and the European Journal of Endocrinology (IF = 3.1). Details on how we selected the studies are described elsewhere [1]. Briefly, we included all parallel design RCTs that addressed a question of treatment or prevention. We excluded cross-over trials and trials evaluating pathophysiological mechanisms, pharmacokinetics or drug tolerability as well as economic studies and trial reports that had published their methods in a separate publication. The selection process was carried out in duplicate by two independent investigators in two screening phases: title and abstract and full text. Investigators resolved discrepancies by consensus.

Rating the reporting quality

Full details on how we measured the reporting quality are available in our previous publication [1]. In our previous study, we chose 15 items from the revised CONSORT statement [7] to construct a 15-point overall quality score (OQS). We chose the CONSORT criteria because they are the accepted standards for reporting RCTs and they have been endorsed by many medical journals and leading editorial organizations. We defined quality of reporting as the extent to which the rationale, method, conduct and results of the trial are reported. Therefore, we adopted 15 CONSORT items pertaining to the Introduction, Methods and Results sections for our appraisal (Table 1). We chose these 15 items because lack of their reporting has been associated with higher level of bias [7]. We excluded the CONSORT discussion section items because we considered them too subjective to evaluate. We also excluded three key methodological qualities for a separate assessment. For this study, we additionally excluded the CONSORT item on the description of the objective or research question (item 6 in Table 1) as this is represented by PICOT, our explanatory variable. We scored each item 1 if it was reported and 0 if it was not clearly stated or definitely not stated. Thus, the OQS had a possible value between 0 and 14. We note here that as a study quality score, the OQS is a measure of the completeness of reporting of 14 CONSORT items.

Table 1 Overall Reporting Quality items

We also constructed a 3-point key score (KS) based on three items that are highly important in avoiding bias: allocation concealment, blinding and analysis according to the intention to treat (ITT) principle [16, 17]. We scored each item 1 if it was present and 0 if it was absent according to the definitions below. Thus, the KS had a possible value between 0 and 3.

Allocation concealment was considered to be present if one of the following allocation methods was reported: a) centralized randomization, b) numbered coded vehicles, and c) opaque, sealed and sequentially numbered envelopes. Blinding was considered to be present if at least two groups were explicitly reported as blinded. The groups considered for blinding included patients, caregivers, data collectors, outcome assessors, data analysts and manuscript writers. For studies in which blinding of patients and caregivers was considered not feasible by the reviewer, then, blinding was considered as present if at least one specific group was explicitly reported as blinded.

As the term ITT is not used consistently by researchers [18, 19], we intended to capture how investigators actually conducted the analysis instead of just checking whether they stated performing an ITT analysis. For this purpose, we examined the numbers presented in the text, tables and figures of each article. We defined ITT analysis as one where all patients were analyzed as part of the group to which they were originally assigned, regardless of whether they actually satisfied the entry criteria, the treatment actually received, and subsequent protocol deviations, participant compliance or withdrawal [18].

Rating the framing of the research question

We chose one paragraph from the introduction or methods section that best described the primary research question, hypothesis or objective. In that paragraph, we evaluated the framing of the RQ, regardless of whether it was formulated as a research question, hypothesis or objective. We examined whether the five elements of a structured RQ were present in that paragraph. The five elements were the type of patients or population relevant to the question (P), the intervention (I), the comparative intervention (C), the outcome of interest (O), and the time horizon for measuring the outcome (T). We scored each element 1 if it was present and 0 if it was absent. Thus, we created a PICOT score with a possible score between 0 and 5. The score represents a measure of the completeness of the description of the primary research question. The concept of a structured RQ was originally described involving four elements (PICO) [13] and this was probably the concept better known by investigators at the time of publication of the RCT reports under evaluation. Therefore, we decided to qualify a report as providing a structured RQ every time it described the four elements (Complete PICO) in the context of the description of the primary research question, study objective or research hypothesis. Reports that did not describe these 4 elements (Incomplete PICO) did not qualify as providing a structured RQ.

Data abstraction

We used a standardized data abstraction form to extract data from each article. Two reviewers (LR, CY) -blinded to each other's ratings-abstracted data independently. In rating the framing of the RQ, the reviewers were blinded to the OQS and KS for each article and they resolved any disagreement through consensus. We used kappa statistics to measure inter-rater agreement for each of the five elements of the RQ. Kappa statistics for the KS items have been reported in our previous study [1] and varied from 0.55 to 0.65.

Statistical Analysis

We calculated the percentage of trials that clearly stated each PICOT element and associated 95% confidence interval (95% CI). We used Cohen's Kappa (κ) statistic to calculate chance-adjusted inter-rater agreements. We judged agreement as poor if κ ≤ 02; fair if 0.21 ≤ κ ≤ 0.4; moderate if 0.41 ≤ κ ≤ 0.6; substantial if 0.61 ≤ κ ≤ 0.8; and good if κ >0.8 [20].

We reported descriptive statistics on categorical data as numbers (percentages) and scores (i.e., PICOT score, OQS, and KS) as median (interquartile range [IQR]).

We evaluated whether a higher PICOT score was associated with better reporting quality by conducting univariate and multivariable regression analyses with OQS and KS as the outcome variables. In our previous study on this dataset, we found three variables, i.e., publication in the JCEM, complete industrial funding and sample size, which were significantly associated with better OQS. Therefore, we included these variables in the multivariable models for OQS and KS. We used generalized estimating equations (GEE) [21] to account for the plausible correlation in the reporting quality within the same journal. We modelled within-journal correlation using an exchangeable working correlation matrix. We assumed the Poisson distribution for outcomes in GEE, as rating scores are non-negative counts. The results were reported as exponents of the coefficient estimates of the GEE analysis, which represent the incidence rate ratios (IRR) over the period of interest - January 2005 to December 2006. Using the same statistical approach as above, we also conducted univariate and multivariable regression analyses to determine whether the use of a structured RQ (complete PICO) was associated with a better OQS and KS. Variables were considered to be statistically significant at alpha = 0.05. We conducted all analyses using SAS 9.0 (Cary, NC).

Results

Framing of the research question

For the rating of the individual components of the RQ, the κ inter-rater agreement estimate was 0.54 (95% CI: 0.32, 0.77) for patients, 0.52 (95% CI: 0.15, 0.88) for intervention, 0.87 (95% CI: 0.59, 0.87) for comparator, 0.20 (95% CI: 0.10, 0.30) for outcome, and 0.60 (95% CI: 0.41, 0.78) for time-frame. The median PICOT score was 3 (IQR = 1). The percentage of articles that reported each element of the primary RQ is provided on Table 2. Patients, intervention and outcome were often adequately described. However, half of the reports did not describe the comparison intervention and a minority described the study time frame. A complete description of an ideal PICOT RQ was present in 16 out of the 89 reports (18.0%). A structured RQ (Complete PICO) was present in 30 reports (33.71%).

Table 2 Frequency of description of each PICOT element

Association between framing of the research question and reporting quality

Tables 3 and 4 display the results of the univariate and multivariable analyses of factors associated with OQS and KS, respectively. A higher PICOT score was significantly associated with a higher OQS (Table 3) and KS (Table 4) in both univariate and multivariable analyses. After adjusting for journal of publication, sample size and funding source, each point increase in PICOT score was significantly associated with a 2.1% increase in the OQS and a 14.2% increase in the KS.

Table 3 Association between PICOT score and overall reporting quality
Table 4 Association between PICOT score and reporting of key quality elements

As shown on Tables 5 and 6, the use of a structured RQ (complete PICO) was also significantly associated with better overall reporting quality (Table 5) and better reporting of key quality elements (Table 6). After adjusting for confounding variables, the presence of a structured RQ was associated with a 9.5% increase in the OQS and a 53.0% increase in KS. The association between the reporting of individual PICOT elements and OQS is available as an additional file (additional file 1).

Table 5 Association between a structured research question and overall reporting quality
Table 6 Association between a structured research question and reporting of key quality elements

Discussion

We evaluated the prevalence of the use of the PICOT format in framing the RQ in a sample of articles published in three general endocrinology journals in 2005 and 2006. The framing of the RQ was usually incomplete and unclear, with only one-third of the reports using a structured approach based on the PICOT format. These observations are consistent with a recent survey of four anesthesia journals, which found that 96% of the studies did not fully apply the PICOT approach in reporting the research question [13].

To the best of our knowledge, this is the first study assessing the association between the framing of the RQ and RCT reporting quality. Our results consistently indicate a significant association between the completeness of the RQ description and quality of reporting. We found that the presence of a structured RQ is significantly associated with a 9.5% increase in the OQS and a 53.0% increase in KS.

A bigger sample size, complete industry funding and publication in the Journal of Clinical Endocrinology and Metabolism were also significantly associated with overall reporting quality but not with the report of key methodological items. It is plausible that part of the variation of the quality of reporting between RCT reports can be explained by other variables such as awareness of the CONSORT statement by authors, adoption of CONSORT by journals and availability of advice from a methodological expert when planning an RCT. However, testing these hypotheses was out of the scope of our study.

The use of a structured RQ has been proposed as a systematic way to construct the study question to aid the search for valid answers [22]. In general, a structured RQ can guide the literature search, protocol development and the conduct of a study. The explicit statement of the five PICOT elements prompts the investigator to think about the design to use and to consider the balance between the research question and the feasibility to answer it [22]. This also forms the basis for the recommendation by experts in clinical epidemiology to use a structured approach when formulating research questions [11]. The Cochrane collaboration also advocates the same approach in formulating research questions for their systematic reviews [23].

There are several limitations to our study. First, we did not measure RCT methodological quality directly, as we did not verify the information from the authors or their protocols. Therefore, the quality of reporting should be taken only as a surrogate of true methodological quality since important methodological detail may be omitted from published reports[24, 25]. In addition, some of the items of the OQS explicitly assessed the completeness of reporting - as a measure of reporting quality-rather than the adequacy of the methods. On the other hand, the KS can be considered a better indicator of methodological quality since the three elements directly relate to the adequacy of the methods used in each trial. The presence of a stronger association with KS as compared to OQS suggests that the use of a structured RQ could be associated with better methodological quality. Second, our reporting quality scores are not validated. There are more than 25 quality assessment scales, but most of them have not been rigorously tested for validity and reliability [26]. Our OQS score is mainly a measure of the completeness of reporting. We based our score on the CONSORT criteria because they are the most accepted standards for reporting RCTs and have been widely endorsed by many clinical journals and editorial organizations. Third, our analyses rely on quality scores, which can be problematic [2731]. Problems with scales relate to both the choice of elements to include in a score and how these elements should be weighted [28, 29]. Different methods to create the scores may lead to different results when the scores are used in a particular analysis. Several studies have shown a lack of agreement between scores or scales in separating studies into low and high quality and no scale has been found to be the best at validly measuring quality [27, 29, 31]. This suggests that different scales are probably measuring different constructs and it can be difficult to assign a meaning to a particular quality score. Therefore, evaluating the quality of RCTs for systematic reviews by analyzing quality items individually is often considered a more preferred approach than relying on combining the information in a single numerical value. This approach may allow assigning different levels of importance to individual quality items depending on the context of the particular trial [28]. To avoid the limitations of using quality scores, we could have conducted an analysis of the association between the use of a structured RQ and the reporting of each individual key element. However, our study lacked of statistical power for such analysis as key methodologies were infrequently reported. Therefore, we opted for using the OQS and the KS. Caution should therefore be taken in assigning a specific meaning to each of our scores. The OQS is merely a measure of completeness of reporting. The KS combines the information on the use of allocation concealment, blinding and intention to treat analysis. These items have been shown to be associated with bias [16, 17]. Fourth, the framing of PICOT is itself an aspect of reporting quality. To avoid this problem, at least in part, we excluded the item related to description of the objectives or RQ from our OQS. Finally, the inclusion of only general endocrinology journals may affect the generalizability of our results. Our findings would need to be confirmed by a similar and larger study applied to a broader sample of RCT reports in other specialties and also in leading general medical journals. In spite of these limitations, we think our results have good internal validity. We used a standardized evaluation instrument, two reviewers independently performed the selection and abstraction processes, and disagreements were always resolved by consensus.

Conclusions

Our study findings show that the use of the PICOT format to structure the RQ in RCT reports published in general endocrinology journals over 2005-2006 was low. We also found a small association between the use of a structured RQ based on the PICOT format and a better overall reporting quality of RCTs. The effect on key methodologies was more pronounced. It is important to recognize that while poor reporting does not necessarily mean poor design or conduct of a study, the quality of reporting is routinely used by researchers as a proxy for study quality in systematic reviews. An examination of a broader sample of studies, including other areas of medicine, would be necessary to confirm our results. The main implication of this study is that the researchers should pay attention to proper framing of the research question - they should consider using a structured approach such as the PICOT format to frame it as this is likely to determine how the study is designed, conducted and ultimately reported.

Authors Information

LT is a clinical trials mentor for the Canadian Institutes of Health Research.