FormalPara Key Points

The language used to describe results could affect perceptions of the efficacy or safety of interventions.

There are differences in the adjectives used when study findings are described in industry-authored reports compared with non-industry-authored reports.

Authors should avoid overusing adjectives that could be inaccurate or result in misperceptions.

Editors and peer reviewers should be attentive to the use of adjectives and assess whether the usage is context appropriate.

1 Background

Accurate understanding of the efficacy and safety of health interventions is crucial for public health. Major impediments to such understanding include selective reporting of trial results and inadequate reporting of trial results. Publication of only studies that show benefit, known as publication bias, leads to overestimation of the efficacy of interventions. Inadequate reporting of trial results limits the ability of the reader to assess the validity of trial findings [1, 2]. The CONSORT initiative [3] has led to improvements in the quality of reporting of trial results [4, 5]. In addition, mandatory registration of clinical trials and mandatory publication of trial results are strategies implemented to diminish the impact of publication bias [6].

How trial results are described in publications may influence the reader’s perception of the efficacy and safety of interventions. For example, an intervention can be portrayed as beneficial in the publication despite having failed to differentiate statistically from placebo. In this type of bias, called spin bias, the reader is distracted from the non-significant results [7]. The language used to describe trial results could also affect perceptions of the efficacy or safety of health interventions as well as the quality of the study. We studied the vocabulary used to report trial results and compared it between two authorship groups (industry versus non-industry).

2 Objective

The objective of this study was to compare the adjectives used to report results of clinical trials between industry and non-industry (academia and government). We focused on adjectives because their use adds “color” (potentially biasing interpretation) to the description of study findings.

3 Methods

3.1 Inclusion Criteria

We included studies indexed in PubMed that were randomized, controlled trials; assessed humans; had an abstract; and were published in English. The search was conducted on October 7, 2013, without any time limit (all articles present in PubMed until that time). The PubMed query used to identify the studies was “English[lang] AND Randomized Controlled Trial[ptyp] AND humans[MeSH Terms] AND has abstract[text]”.

3.2 Classification of Abstracts

Studies were classified as industry-authored or non-industry-authored (academia and government), depending on the affiliation of the authors, using an automated algorithm. To determine the affiliation of an author, the affiliation field in PubMed was used to scan for word patterns indicating an industry (e.g., “janssen”, “johnson & johnson”), academic affiliation (e.g., “university”, “school”) or government (e.g., “centers? for disease control”, “u\\.?s\\.? agency”). Because the PubMed affiliation field contains the affiliation of only one of the authors and therefore could not be used as conclusive evidence in papers written by multiple authors, we supplemented the search for the authors’ affiliation, using PubMed Central®. PubMed Central is a free full-text archive of biomedical journals and therefore lists the affiliations of each one of the authors of a manuscript. Appendix 1 contains the complete list of patterns used for the abstract classification.

For the abstracts not included in PubMed Central, we developed an algorithm to predict the affiliation of the authors. We assumed that if an author had a particular affiliation in one manuscript, that author would also have that affiliation in any other manuscript written by that author in the same year. Because there are no unique identifiers for authors in PubMed, we used an author name disambiguation algorithm similar to Authority [8], which models the probability that two articles sharing the same author name were written by the same individual. The probability was estimated using a random forest [9] classifier using these features as input: length of author name, author name frequency in Medline, similarity in MeSH terms, words in the title or words in the abstract, whether the paper was in the same journal, overlap of other authors, and time between publication in years. The classifier was trained on a set where positive cases were identified using author e-mail addresses (only available for very few authors), and negative controls cases were identified based on mismatch in author first name. The probability was subsequently used in a greedy cluster algorithm to group all papers by an author.

An abstract was classified as non-industry-authored when all authors of the publication had academic or government affiliations. An abstract was classified as industry-authored when any of the authors in the publication had an industry affiliation. Publications in which the algorithm found none of the patterns to classify an author, or found an author with affiliations to both industry and academia or government, were excluded from the analysis.

To assess the accuracy of the algorithm that predicted author affiliation, we selected a random sample of 250 abstracts and manually checked the affiliation of each one of the authors in the full manuscript and compared these results with the algorithm’s classification.

3.3 Adjective Selection

To compare the use of language between industry and non-industry authors, we downloaded the Medline database. The abstracts that met the inclusion criteria were run through the part-of-speech tagger of OpenNLP [10], which allowed us to classify the adjectives. OpenNLP uses the Penn Treebank tagset, and we considered all tokens with tags JJ (adjective), JJR (adjective, comparative) and JJS (adjective, superlative) [11]. OpenNLP is an open-source machine learning-based toolkit for the processing of natural language text, made available by the Apache Foundation. It has an overall accuracy of around 97 % [12]. We focused on abstracts because more people read the abstract than the whole article, and because only abstracts are available in Medline.

After extracting all adjectives from the abstracts, we selected a set of adjectives we considered relevant to coloring the results of a trial. This selection was performed independently by two authors (MSC and MS), after which discrepancies were resolved in conference. All subsequent analyses were limited to this set to reduce the risk of false positives. Examples of excluded adjectives are “viscous” and “intellectual”. The list of adjectives included and excluded is shown in Appendix 2.

3.4 Location in the Abstract

To locate where in the abstract the differences in adjective use occurred, we looked separately at the title and conclusion. The title is clearly identified in PubMed records. For unstructured abstracts, the conclusion was considered to be in the last two sentences of the abstract. Sentences were detected using the OpenNLP toolkit [10].

3.5 Analysis

To determine whether an adjective was used more often by industry or non-industry authors, we used an exact test for contingency tables [13], stratifying by journal to adjust for any differences in language in the different journals. This test is similar to the well-known Mantel–Haenszel test in that it tests for an overall difference between groups through differences within strata, but it uses an exact method, making it more robust for small numbers within each stratum. We further restricted the adjectives to those that were present in at least 100 papers in our final data set. Because of the large number of tests, we corrected for multiple testing, using Holm’s technique [14].

We also calculated a relative estimate. Values >1 mean that industry uses that adjective more. Values <1 mean that the adjective is used more often by academia and government. A value of exactly 1.0 would indicate equal use by both groups of authors. We report 95 % confidence intervals (CIs), but these intervals are not adjusted for multiple testing.

We further computed the average number of “colored” adjectives used in the title and abstract, where any adjective that was used multiple times in an abstract was counted multiple times. For this analysis, we used all adjectives we considered relevant, including those that appeared in fewer than 100 articles.

3.6 Source of Funding

Using authors’ affiliation is one way to classify studies as either industry or non-industry. Funding of the study is another way. PubMed identifies financial support of the research, but it would only allow for a US-government and non-US-government funding classification. Sources of financial support are often listed in the full manuscript. For the subset of abstracts that had the full-text articles in PubMed Central, we identified the source of funding and then compared that classification with our affiliation-based classification to provide an estimate of the degree of potential discordance. For example, a trial conducted by an academic institute may be authored by academicians only but funded by a pharmaceutical company. Under the affiliation classification, the research would be considered as non-industry, while under the sponsorship classification, it would be considered as industry. Because the information was only available for a limited number of abstracts, we could not conduct sensitivity analyses, but we are reporting the findings.

4 Results

A total of 306,007 publications met the inclusion criteria. We were able to classify 16,789 abstracts; 9,085 were classified as industry, and 7,704 were classified as non-industry. The algorithm correctly identified 235 of the 250 manuscripts sampled for accuracy (15 were incorrectly assessed as non-industry-authored), indicating that the accuracy of the classifying algorithm was 94 % with a Kappa value of 0.88 (Table 1).

Table 1 Assessment of the accuracy of the automated algorithm to classify abstracts as “industry-authored” and “non-industry-authored”, compared with a manual classification, in a subsample of abstracts

The abstracts were published from 1981 to 2013, and 92.5 % were published in 2000 or after. The abstracts were published in 1,788 journals, and 50 % were published in 98 journals. Appendix 3 provides the list of journals, with the number of abstracts by journal.

The 16,789 abstracts had a total of 4,690 adjectives: 298 were considered relevant by both of the authors (see Appendix 4), and 72 adjectives were present in at least 100 papers in our final data set and were analyzed (Table 2). With few exceptions, these were positive adjectives.

Table 2 Adjectives in the final analysis, with the numbers of abstracts in which each adjective appears

The use of adjectives differed between industry and non-industry (Table 3). Ten adjectives located in the title or conclusion, and 15 adjectives located anywhere in the abstract, had relative use values >1, indicating preferential use by industry. Most notably, adjectives such as “well tolerated” and “meaningful” were more commonly used by industry-authored reports in the title or conclusion of the abstracts [relative use 5.20 (CI 2.73–10.03) and 3.08 (CI 1.73–5.44), respectively], whereas adjectives such as “feasible” were more commonly used in abstracts classified as non-industry-authored [relative use 0.34 (CI 0.18–0.6)]. Adjectives such as “successful” and “usual” were also more commonly used by non-industry, when considering the abstract overall [relative use 0.46 (CI 0.31–0.68) and 0.40 (CI 0.30–0.53), respectively] (Table 3).

Table 3 Adjectives favored by industry or non-industry (academia and government) by location in the abstract

Examples of the contexts in which the adjectives were used in the title or conclusion of the abstract are presented in Table 4.

Table 4 Selected examples of the context of adjective use in the abstract

On average, there were 2.6 “colored” adjectives in each abstract, and this number was the same for both industry-authored and non-industry-authored research.

4.1 Source of Funding

When we estimated the degree of potential discordance between abstracts classified by author affiliation or by source of funding, we found that of the 16,789 abstracts that we could classify as industry-authored or non-industry-authored research, only 189 (1.1 %) had the full text available in PubMed Central and disclosed either partial or total funding by industry; 16 % of these studies were classified as being from non-industry when looking at authors’ affiliations.

5 Discussion

There are differences in the adjectives used when study findings are described in industry-authored compared with non-industry-authored reports. Certain adjectives are five times more commonly used by industry, although, on average, both groups of authors use the same number of “colored” adjectives.

The differences in the adjectives used that were noted in the present study support anecdotal evidence about the way results of clinical trials are reported by industry. The Medical Publishing Insights and Practices (MPIP) initiative in 2012 recommended avoiding broad statements such as “generally safe and well tolerated” [15] when reporting trial results—precisely the type of adjectives we found were more commonly used by industry. The MPIP initiative was founded by members of the pharmaceutical industry and the International Society for Medical Publication Professionals to elevate trust, transparency, and integrity in publishing industry-sponsored studies. Describing an intervention as “well tolerated”, which is the adjective with the largest use in industry-authored manuscripts compared with academic or government-authored manuscripts—although accurate in certain circumstances, considering the nature of the trials conducted by industry—might not be generalizable to the broader population when the trial is small, when relatively “healthy” or “stable” participants are recruited (compared with the broader population with the target indication), or when the follow-up is short.

The use of those adjectives (such as “acceptable”, “meaningful”, “potent”, or “safe”) more commonly present in industry-authored reports than in non-industry-authored reports could suggest that industry-authored reports tend to focus on the positive aspects of the health intervention being evaluated. However, the differences in adjective use could also reflect variations in the types of trials conducted by industry and academic or governmental institutions. Industry studies tend to focus on drugs or devices, while non-industry work is likely to be more inclusive of other types of heath interventions. By controlling for journal, we adjusted partially for potential differences in the studies.

We used the affiliation of the authors to classify publications as industry-authored and non-industry-authored. Even if one author out of many was from industry, the paper was classified as industry-authored. Although this approach could seem extreme, it has also been recommended by others [16]. It might seem preferable to look at the source of funding instead of the author affiliation; however, this approach too has its shortcomings. Funding mechanisms are complex, some journals do not report funding sources, full reporting of all sources of financial support is not complete, and there are different levels of support, from unrestricted educational grants to support that includes input of the manufacturers into trial designs, conduct of the analysis and publication [16, 17]. We assessed how many of the abstracts in which the full manuscript reported support from industry were classified as being non-industry in origin. We found that 16 % of those papers with industry support were classified as being from non-industry. It is difficult to predict the direction of bias related to any potential misclassification because of the shortcomings listed above, but in the worst-case scenario, it would lead to an underestimation of the relative measure and consequently to a smaller set of adjectives because of loss of power.

The results of this study are based on assessing the abstract instead of the full paper. Many readers just read the abstract of the published article and may be influenced by it, so we argue that it is an important place to look for differences in reporting style. The results of a study that assessed the impact of “spin” on interpretation of cancer trials showed that clinicians’ interpretation was affected by reading the abstracts of the trials [18]. In addition, the study focused on randomized, controlled trials, thus the findings may not apply to other types of study designs.

The present study was limited to counting adjectives and assessing the difference in those counts in industry-authored and non-industry-authored reports. Although we have provided some examples to illustrate how they were used, the study ignored the context of the usage. A thematic analysis could allow detection of patterns, and a critical review of the full text of the paper could determine whether the use of the adjectives was indeed appropriate in view of the data or what it is known in the field.

6 Conclusion

We assessed a very large number of abstracts and found differences in how study findings are described by industry-affiliated authors as compared with non-industry-affiliated authors. The language used to describe trial results could affect perceptions of the efficacy or safety of health interventions. Authors should avoid overusing adjectives that could be inaccurate or potentially lead to misperceptions. Editors and peer reviewers should be attentive to the use of adjectives in the abstract (and the manuscript in its entirety) and assess whether their use is appropriate.