Background

Scientific research commonly falls short of the ideal in both design and reporting, sometimes as a consequence of unavoidable practical issues. Ideally, unavoidable methodological shortcomings should be disclosed in the paper, and additional design and reporting flaws should be discussed in the correspondence after publication, during the process of post-publication peer review. There is extensive anecdotal evidence that this ideal is not met: that methodological shortcomings go undisclosed, that correspondence raising legitimate criticisms is rejected by journal editors, and that engagement by scientists after concerns are raised is not always constructive.

Correct outcome reporting is an important methodological and reporting issue because discrepancies between pre-specified and reported outcomes are a widespread source of bias in clinical trials [1]. Where outcome misreporting is permitted, it increases the likelihood that reported differences have arisen through chance or are exaggerated. Clinical trial registers were established to address selective reporting [2] and require that all pre-specified outcomes be entered at the outset of the trial in a time-stamped and publicly accessible location. Registering clinical trials and pre-specifying their outcomes are mandated by legislation in the US [3] with strong support from the World Health Organization [4], the International Committee of Medical Journal Editors (ICMJE) [2], and an extensive range of professional bodies, funders, ethics committees, publishers, universities and legislatures. The importance of reporting all pre-specified outcomes and documenting changes is also emphasised in the International Conference on Harmonisation of Good Clinical Practice (ICH-GCP) [5] and the CONSORT (Consolidated Standards of Reporting Trials) guidelines [6], which are endorsed by 585 academic journals [7]. However, despite near universal recognition of the importance of this issue and extensive public commitments to address the problem, trial reports in academic journals routinely fail to report pre-specified outcomes, and add in non-pre-specified outcomes, without disclosing that this has occurred. A 2015 systematic review [1] found 27 studies comparing pre-specified outcomes against those reported, in cohorts of between 1 and 198 trials (median n = 65 trials). The median proportion of trials with a discrepancy on primary outcomes was 31% (interquartile range 17–45%). Eight studies also assessed the impact of outcome switching on the statistical significance of the published outcome and found that outcome switching favoured the reporting of significant outcomes in half the trials.

In the Centre for Evidence-Based Medicine Outcome Monitoring Project (COMPare), we aimed to explore whether it was possible to publish correction letters on all trials with misreported outcomes in real time, as they were published, in order to ensure that the academic record was more CONSORT-compliant, as per journals’ public commitments. We also aimed to monitor responses from editors and trialists to this standardised set of correction letters, to better understand why outcome misreporting persists despite public commitments to address it; to test the ability of academic journals to self-correct when breaches of their public commitments are reported; and to establish how researchers respond when legitimate objective criticisms of their work are raised.

Here, we analyse the complete set of trialists’ public responses to all published correction letters from COMPare, using thematic analysis to explore inaccurate or problematic statements and misunderstandings around correct outcome reporting. We aimed to answer two related questions. First, what can trialists’ responses to corrections on their own misreported outcomes tell us about trialists’ knowledge of correct outcome reporting? Second, what can a cohort of responses to a standardised correction letter tell us about the techniques that researchers use, explicitly or implicitly, when responding to critical post-publication peer review?

Methods

Detailed methods of the COMPare trials project are reported in our accompanying paper on the prevalence of misreporting and responses from journals to correction letters [8]. In brief, we monitored for outcome misreporting in five high-impact journals—New England Journal of Medicine (NEJM), Journal of the American Medical Association (JAMA), Annals of Internal Medicine, British Medical Journal (BMJ) and Lancet—and found a high prevalence of outcome misreporting, consistent with previous work; we therefore submitted 58 correction letters for publication. Twenty-three letters were published: NEJM and JAMA rejected all letters; BMJ accepted all letters as online comments only; Annals of Internal Medicine accepted all letters online and two for print; and The Lancet accepted the majority of letters in print but with long delays (mean 150 days).

All correspondence published by journals from researchers in response to a correction letter from COMPare was filed, indexed and reviewed by the COMPare team in order to write a reply. Themes from the trialists’ letters were analysed and extracted by the COMPare team and a researcher with expertise in qualitative research methods (CM). Key elements from responses were first extracted systematically to a two-way table with quotes, summaries and further notes explaining the significance of the trialists’ comments as appropriate. These were then organised into the overarching groups, themes and sub-themes presented here.

Results

The level of engagement with published correction letters was high. Overall, 20 teams of trialists replied to the 23 published letters: two in Annals, two in the BMJ, and 16 in The Lancet. Nearly all responses contained inaccurate or problematic statements.

Researchers’ inaccurate statements about outcome reporting

Trialists frequently expressed views that contradicted the CONSORT guidelines or made inaccurate statements around key issues of correct outcome reporting. We grouped these comments and views into five themes:

  1. i).

    Incorrect statements about when outcomes should be pre-specified.

  2. ii).

    Failure to recognise the need to report changes to pre-specified outcomes in the trial report.

  3. iii).

    Incorrect statements about the role and operation of trial registries.

  4. iv).

    Stating or implying that it was acceptable to have multiple discrepant sets of pre-specified outcomes contemporaneously, by making reference to inaccessible protocols which were claimed to contain different outcomes, discrepant with those in the contemporaneous registry entry.

  5. v).

    Incorrect statements around how to pre-specify and report when measuring the same outcome at multiple time points.

Since a thorough explanation of the inaccuracies in these statements requires some detail about the individual trial or the wider context of correct outcome reporting, specific illustrative examples are given in Table 1, alongside explanations of the error made. These examples are taken from Additional file 1—the full table of trialists’ inaccurate statements, with rich information on the individual claims and errors made—which we encourage interested readers to examine.

Table 1 Researchers’ inaccurate statements about outcome reporting

Researchers’ response styles

The second group of themes relates to the more general techniques and approaches used by researchers when responding to criticism of their work, whether consciously or unconsciously. We identify this group of themes broadly as “rhetoric”, although we do not suggest that this was always the explicit intention of the researchers: we explore this further in the Discussion section. We identified five core themes in this group, as set out in Table 2.

Table 2 Themes and sub-themes in “researchers’ response styles”

Under “diversion”, we included responses that we regarded as distractions from a focused discussion on correct outcome reporting. These included statements that trials are hard work to conduct, stating that other issues are more important, and responses based on issues that were not raised by COMPare. We also include ad hominem comments.

Under “challenging legitimacy of discussion”, we included responses that we regarded as challenging whether an open conversation about CONSORT-compliant outcome reporting should happen at all. These included the following: expressing a preference for conventional peer review over open post-publication critical appraisal; disagreement with the general approach of COMPare/CONSORT; asserting that there should be the opportunity to post comments on COMPare’s own raw data sheets online; and stating that they applaud the overall goal of COMPare, followed by a caveat. Examples here include author responses criticising COMPare for focusing on “the negative”, such as “Although we commend the efforts of COMPare, we find it difficult to appreciate their focus on the negative aspects of published studies” (Trial 70, BMJ, 04/02/16).

Under “Trust the trialist”, we included responses asserting that trialists should be relied upon to make unbiased judgements about whether to report a finding. This included statements that discrepancies were not motivated by desire to manipulate findings and statements that outcome misreporting does not matter if the main results of the study are unlikely to be affected.

Under “Incorrect statements about outcome reporting in their own paper”, we included individual responses denying that specific misreported outcomes were indeed misreported and general denial of COMPare’s findings. We do not suggest that any or all of these examples are deliberate misrepresentations: however, we categorised them in the “response styles” group because these were researchers making incorrect statements about the factual content of their own specific publication, rather than technical misunderstandings of what constitutes correct outcome reporting in general.

Lastly, some themes within the “responses styles” group appealed to specific technical issues but overall appeared to us to be used in a way that was more strategic or rhetorical than those within the first group of statements, which were clearly factually inaccurate. We characterised these as “technical - rhetorical”. These included appealing to the existence of a novel category of outcomes whose results need not be correctly reported, stating that space constraints prevent all pre-specified outcomes being reported, stating that it is not necessary to pre-specify some outcomes as they are “necessarily implied” by other outcomes, and making inaccurate statements about COMPare’s methods. The full table of examples grouped by theme is in Additional file 2; a selection is presented in Table 3.

Table 3 Researchers’ response styles

Researchers’ correction of errors

Only eight teams of trialists clearly and publicly acknowledged a specific discrepancy between their pre-specified and reported outcomes, out of 58 misreported trials. Of these, all but two acknowledgements of error were given in the context of caveats or further problematic statements about correct outcome reporting. Only one misreported trial was updated with a correction. In addition, three teams of trialists incorrectly stated that they should have retrospectively updated the pre-specified outcomes in the registry to ensure that reported outcomes were consistent with those pre-specified and presented this as an acknowledgement of an error; in other words, while they acknowledged that an error was made, they seemed to continue to misunderstand the nature of the error. Examples are given in Table 4, taken from the full sample in Additional file 3.

Table 4 Researchers’ correction of errors

Correcting inaccurate statements in researchers’ responses

COMPare submitted letters for publication setting out corrections and clarifications to all errors and inaccurate statements noted in Tables 1 and 3. To date, only two of these letters have been published in The Lancet, and none in print by either the BMJ or Annals: both BMJ and Annals accepted comments online (Annals only if brief); NEJM and JAMA rejected all initial correspondence notifying readers and researchers of outcome misreporting, as previously reported; therefore, no interaction with these trialists was possible.

Discussion

Summary

We found that trialists engage at length with published correspondence identifying misreporting of pre-specified outcomes. However, inaccurate statements and misunderstandings about what constitutes correct outcome reporting were common, even among trialists publishing in high-impact journals. In addition, response styles such as ad hominem criticism, distraction and denial were commonly used.

Strengths and weaknesses

A larger sample of trials and trialists would have been preferable. Our study included the full correspondence with 20 teams of researchers and could have included all 58 trials with misreported outcomes identified during COMPare: however, our ability to engage with trialists was hindered by journal editors rejecting the majority of initial correction letters identifying misreporting of outcomes, despite clear evidence that these trial reports had all breached the CONSORT guidelines on correct outcome reporting; and by journals rejecting the majority of COMPare follow-up letters engaging with errors in trialists’ responses, as discussed below.

Context of other research

There have been extensive previous anecdotal reports in the grey and academic literature of researchers’ failures to engage constructively with post-publication peer review that is critical of study methods and results. COMPare is the first study to approach and document this problem systematically with a standardised set of correction letters and on an objective issue of accurate study reporting in line with standard best practice guidelines. COMPare is also the first study to systematically solicit and analyse detailed technical responses from a representative sample of trialists and engage them in a practical real-world detailed discussion of outcome reporting using examples of misreporting from their own work to identify knowledge gaps. There has been extensive previous research establishing the high prevalence of outcome misreporting [1] and other reporting flaws [9] and some questionnaire data on the limitations of trialists’ knowledge around correct outcome reporting. One previous survey on the prevalence of outcome misreporting also engaged trialists in semi-structured telephone interviews to explore their reasons for not reporting specific outcomes: this study design yielded less detail in terms of specific misunderstandings or inaccurate statements than ours; however, consistent with our findings, they did report that trialists “seemed generally unaware of the implications for the evidence base of not reporting all outcomes and protocol changes” and that some regarded non-significant results as “uninteresting” [10]. Another series of semi-structured telephone interviews with 59 trialists similarly yielded the finding that non-significant findings are sometimes regarded as uninteresting, and space constraints may hinder complete outcome reporting [11].

Interpretation

It is challenging to reach a fair interpretation of what drives trialists’ incorrect statements about correct outcome reporting. To retain neutrality, we have labelled all of these statements as “inaccurate” rather than either “misunderstandings” or “misleading comments” because it is not possible to know the level of knowledge for all researchers assessed. Some, none, or all of the inaccurate statements documented may have represented genuine misunderstandings or a lack of knowledge. To expand on this, it is possible that these trialists do not know what correct outcome reporting consistent with CONSORT looks like and are making genuine unintended errors; it is also possible that they do not care about CONSORT and are speaking implicitly or explicitly to a more vague alternative set of unstated principles around correct outcome reporting which they regard as superior.

Equally, some, none, or all of the inaccurate statements may have been used deliberately in an attempt to deflect criticism and publicly defend what the researchers knew to be misreporting. This would imply that researchers were not primarily concerned with what constitutes correct outcome reporting but rather with defending their reputation. At face value, it seems likely that anyone with good knowledge of correct outcome reporting, and concerned to defend their reputation, would be equally concerned by the negative reputational consequences of formally publishing a letter that contained clear misunderstandings around what constitutes correct outcome reporting. For this to be a rational position therefore, researchers would also have to believe that the public discussion is likely to be brief, poorly understood by onlookers (or ignored), and unlikely to lead to a resolution establishing who was right or wrong on matters of fact.

To an extent, this view is vindicated by the initial findings of COMPare, where journal editors mostly rejected letters reporting outcome misreporting, and often defended such misreporting, despite the journal’s being publicly listed as endorsing CONSORT. Researchers may also feel bolstered by the fact that a journal has published their paper after peer review and is therefore likely to feel some commitment to supporting it; by the fact that a paper with misreported outcomes is unlikely to be retracted, or even corrected, so this is just a matter for correspondence; and by the fact that letters in journals have lower visibility than original research. Related to the issue of managing the visibility of correspondence, it is notable that some research teams suggested that the discussion on their misreported outcomes should take place as annotations to our raw data archive rather than in the journal where their research was published.

There is also a third option combining both of the previous two: that these were “motivated misunderstandings”, where researchers do not have a full clear working understanding of correct outcome reporting, but are not inclined to develop one, and merely seek to survive a single round of public criticism in the reasonable expectation that any potentially inaccurate statements will not be exposed in the full cycle of post-publication peer review. Under any of these three models, two core problems obtain. First, the failure of journals to curate post-publication peer review such that errors on matters of fact are resolved has resulted in a sub-optimal approach from scientists to the accurate reporting of their own work; second, a widespread lack of knowledge around correct outcome reporting has contributed to both misreporting and poor discourse around that misreporting.

Separately to this, we found many examples of obfuscation, ad hominem criticisms, and other techniques that can fairly be described as “rhetorical”. Although these do not directly relate to the specific issues of outcome reporting and may not be reasonably regarded as unacceptable per se, they are part of a broader set of processes restricting adequate scrutiny of correct reporting. It is also worth noting that we may not have had access to the full breadth of ad hominem comments, because we do not have access to the text of the letters submitted, only those published. Letters published in The Lancet (the majority in our cohort) go through an extensive process of editorial control, proof-reading, and some re-drafting; we note that the tone of BMJ “rapid responses”—which are posted online within hours of submission, and usually unchanged—was often much more raw than the formal letters published after a delay in The Lancet. On the issue of self-censorship, it is also possible that the constitution of the COMPare team reduced the quantity of ad hominem criticism. Because such criticism is based on denigrating the recipient rather than ideas, it likely to be mediated by perceived relative social status, which in turn is mediated by factors such as class, gender and race. It is therefore possible that we received less than a different team might have done, since those submitting correction letters were all academics at Oxford, recently listed as the leading medical research institute in the world; we have a professor and other senior staff on our team; and the COMPare correspondents named on correction letters were all male and mostly identifiable as White British.

A related issue of power relations concerns the question of who should decide whether an outcome requires reporting. CONSORT is clear that all pre-specified outcomes should be reported or discrepancies flagged. As per our section “Trust the trialist”, many trialists stated that outcome switching is irrelevant if it does not affect the outcomes of the study. Ultimately, in our view, this reflects scientists asserting that they should be trusted to faithfully report summary results without oversight and asserting authority over the data as if it were owned by the trialist rather than participants or the wider community. This is inconsistent with the wider societal shift towards greater transparency and accountability in science.

Implications

We identify various implications of our study for editors, funders, trial registries, and ethics and regulators; for initiatives seeking to improve research methods and reporting; and for researchers whether they are publishing work, responding to published work, or consuming published work. We have found that trialists publishing in high-impact journals routinely misreport their pre-specified outcomes and, when challenged, regularly make incorrect statements on the topic of correct outcome reporting. This may reflect a lack of knowledge: where this is the case, we suggest that better education and training on research methods may improve matters. However, trialists are also deprived by journal editors of important feedback that would likely help to raise standards. Journals could improve standards by policing correct outcome reporting, giving feedback to trialists where they have submitted papers that fail to comply with CONSORT standards on outcome reporting, and encouraging trialists to engage positively with feedback on methodological and reporting flaws, as already recommended in ICMJE guidance. In some cases, the incorrect statements made by trialists may reflect deliberate or unconscious use of superficially plausible but incorrect arguments as a rhetorical device to defend misreported studies. Where this is the case, research integrity training may improve standards, alongside support for ongoing efforts to foster a culture of positive and reciprocal critical appraisal in scientific discourse.

Trial registries should emphasise that information on registries is important, give additional guidance on the specific elements required, and give feedback to trialists when registry entries fall short on required information. Registry managers and ethics committees could remind trialists that pre-specified outcomes in protocols and registry entries should match. Ethics committees and funders could take responsibility for “closing the loop” with a report at the end of a project, confirming that all results have been appropriately published, deviations from the ethically approved protocol accounted for, and post-publication peer review engaged with constructively. Organisations such as the EQUATOR (Enhancing the Quality and Transparency of Health Research) network, running the CONSORT guidelines, should disambiguate any areas in their recommendations that are perceived by researchers as unclear, and could offer a service for trialists or journals to check that trials have been correctly reported across a range of methodological issues. Lastly, consumers of the research literature should be aware that the peer-reviewed academic literature contains a high prevalence of misreported research and that efforts to correct this are routinely resisted by journal editors. The majority of initial letters from COMPare were rejected, and the overwhelming majority of responses to authors’ responses were also rejected. Therefore, the extensive errors documented in Table 1, in Additional file 1, and in the longer COMPare correspondence archive currently stand unaddressed and without a published response in the scientific literature, other than in this article.

Lastly, we believe that the rhetorical approaches demonstrated by many respondents in our cohort—such as diversion, hostility, and challenging the legitimacy of having a discussion—will be recognised by academics more broadly. We hope that this will be useful for those writing letters criticising the content of a scientific paper or anxious about a response they have received from an author. Although clarity and professionalism are important, the wide variation in responses we received to our large set of identical correction letters strongly suggests that hostile or obfuscatory responses are, at least in part, a function of the responding authors rather than the letter that stimulated the response.

Future research

The academic literature already contains a very large number of studies which retrospectively document the overall prevalence of methodological flaws or reporting discrepancies in clinical trials. These studies are expensive, requiring skilled labour from experienced researchers to identify a large number of flaws in published research. In our view, by publishing these findings as only a single anonymised prevalence figure, these teams are failing to maximise the value and impact of their work. We suggest that wherever research is done documenting the prevalence of flaws in individual studies, researchers should also submit letters for publication on each individual paper where a shortcoming has been identified, in order to alert other consumers of the academic literature to the presence of specific flaws in specific studies, to generate informative or corrective discussion with the researchers concerned, to raise awareness among individual researchers about flaws in their own research, and to generate dialogue allowing methodologists to better understand the misunderstandings or structural challenges driving methodological and reporting flaws, and so devise interventions to improve standards.

Conclusions

Outcome misreporting is common in clinical trials. Journal editors and trialists do not engage constructively when misreporting is reported. It is unlikely that these problems are limited to the specific issue of outcome reporting in the specific field of clinical trials research. The findings here, and in our accompanying article on COMPare, provide strong evidence from a large cohort of studies that the institutions of research in practice commonly fall short of the scientific ideal.