Evaluation by Citation: Trends in Publication Behavior, Evaluation Criteria, and the Strive for High Impact Publications

van Wesel, Maarten

doi:10.1007/s11948-015-9638-0

Evaluation by Citation: Trends in Publication Behavior, Evaluation Criteria, and the Strive for High Impact Publications

Original Paper
Open access
Published: 06 March 2015

Volume 22, pages 199–225, (2016)
Cite this article

Download PDF

You have full access to this open access article

Science and Engineering Ethics Aims and scope Submit manuscript

Evaluation by Citation: Trends in Publication Behavior, Evaluation Criteria, and the Strive for High Impact Publications

Download PDF

Maarten van Wesel^1,2

9272 Accesses
69 Citations
58 Altmetric
5 Mentions
Explore all metrics

Abstract

Criteria for the evaluation of most scholars’ work have recently received wider attention due to high-profile cases of scientific misconduct which are perceived to be linked to these criteria. However, in the competition for career advancement and funding opportunities almost all scholars are subjected to the same criteria. Therefore these evaluation criteria act as ‘switchmen’, determining the tracks along which scholarly work is pushed by the dynamic interplay of interests of both scholars and their institutions. Currently one of the most important criteria is the impact of publications. In this research, the extent to which publish or perish, a long standing evaluation criterion, led to scientific misconduct is examined briefly. After this the strive for high impact publications will be examined, firstly by identifying the period in which this became an important evaluation criterion, secondly by looking at variables contributing to the impact of scholarly papers by means of a non-structured literature study, and lastly by combining these data into a quantitative analysis.

The Use of Bibliometrics for Assessing Research: Possibilities, Limitations and Adverse Effects

Reflections on how to evaluate the professional value of scientific papers and their corresponding citations

Article 09 March 2017

Evaluation of the professional worth of scientific papers, their citation responding and the publication authority

Article 22 February 2017

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Not ideas, but material and ideal interests, directly govern men’s conduct. Yet very frequently the ‘world images’ that have been created by ‘ideas’ have, like switchmen, determined the tracks along which action has been pushed by the dynamics of interest. ‘From what’ and ‘for what’ one wished to be redeemed and, let us not forget, ‘could be’ redeemed, depended upon one’s image of the world. (Weber 1970b)

Scientific misconduct has been increasing, however until recently awareness of these practices appears to be limited (Regmi 2011).^{Footnote 1} Recent cases of scientific misconduct, such as the Stapel affaire (fabrication of data, see Levelt Committee et al. 2012), and the Karl-Theodor zu Guttenberg plagiarism affair, have created some awareness amongst both scholars and the general public. This sparked some national or field-specific movements such as Science in Transition in the Netherlands (Dijstelbloem et al. 2013) and the American Society of Cell Biology’s Declaration on Research Assessment (Moustafa 2014). These movements perceive a link between the system of evaluation of science and cases of scientific misconduct, such as some of the extreme cases mentioned above.

Reflecting on the switchmen metaphor put forth by Weber in his study of religion and its influence on economic systems, we can begin to see that what is deemed as important in science steers the behavior of scholars in a certain direction. Two crucial ideas about what is important in science are currently at work: the idea that publishing more is better and the idea that a journal, paper, scholar and institution should have a high Impact Factor.

From what and for what does a scholar wish to be redeemed, and can a scholar be redeemed? In most western countries a publication track record is necessary for obtaining tenure and it is hard to secure funds for research without one (Regmi 2011). Especially peer-reviewed publications are important for the career of scholars at all stages of employment (Elliott 2013; Moustafa 2014). Someone who only publishes work with a low impact factor will have difficulty obtaining tenure and funds. Thus both the number of publications as well as their impact are crucial for the career of a scholar, redeeming him/her from joblessness, or at least careerlessness, if their publication record is better than that of their peers. Because when there is stiff competition for positions, funding, and other academic rewards, those with slightly greater achievements will reap in a disproportionately larger share of the rewards (Anderson et al. 2007).

The ideas about what is important in science are important themselves. Scholars are constantly being reminded that these ideas are important, thus these ideas determine the tracks via which action has been pushed by the interest of scholars. This is coupled with a rise of careerism among scientists, in which, sometimes, the shortest routes to success are taken, including fraudulent behavior (Kumar 2008) or cutting a few corners (Anderson et al. 2007). Competition between scientists increases the chance of scientific misconduct (Anderson et al. 2007).

First, the literature on publish or perish will be examined as it has already been established how this idea has shaped scholarly publishing behavior, and contributed to unethical behavior. Secondly the idea of publishing for high impact will be studied, as this is less well researched. Here the focus will not be on unethical behavior, but rather trends in publishing behavior will be examined to establish the link between a paradigmatic shift in what is important and publishing behavior. The period in which high impact publications became a criteria for evaluation will have to be identified, after which variables contributing to the impact of papers will be identified by means of a small literature study. These variables will be used to conduct a quantitative analysis of how these factors have changed over the years.

Publish or Perish

Scholarly publication rates, Errami and Garner (2008) claim, are at an all-time high. This is not caused by an increase in productivity, but rather by changes in the way scholars publish (Broad 1981), which is linked to the pressure to publish (Errami and Garner 2008). A 1981 commentary in Science reported that co-authorship and multiple publication of the same data were on the rise, whilst the length of papers was decreasing (Broad 1981). The increase in co-authorship is attributable to interdisciplinary papers, multi-institutional clinical trials, but also to gratuitous listing of co-authors (Broad 1981). Gift authorship (Matías-Guiu and García-Ramos 2010), for instance including the head of a department or lab, is a common practice in some disciplines as is adding other researchers out of courtesy (Broad 1981), or an expectation of reciprocity (Webster et al. 2009).

In addition to a rise in co-authorship, a decrease in paper length was already noted in the early 1980s (Broad 1981). Scholars prefer to publish four short papers instead of one long paper (Broad 1981). They slice their data as if it were a salami, hence the term salami-slicing is sometimes used. The terms Least Publishable Unit (LPU) (Broad 1981), or Smallest Publishable Unit (SPU) (Elliott 2013) are used to describe these papers that contain the minimum amount of information needed to get published.

Another trend, closely related to salami slicing, is that of duplicate, or multiple, publication publishing articles that overlap substantially (Andreescu 2013; Kumar 2008; von Elm et al. 2004). This could be a simple copy (with the same authors, same data, basically same content, maybe a different title), a salami-sliced article without cross references to other articles based on the same data, a meat extender also called data augmentation (which is an expansion of an existing article with more data, sometimes without cross-reference), salami-sliced articles published by different authors (most common in multicenter trials), and a textual copy of an article with a different dataset and, possibly, different results and/or conclusion (Kumar 2008; von Elm et al. 2004).

Using text comparison software followed by manual verification, Errami and Garner (2008) uncovered a growing trend of duplicate publications in the biomedical literature; just below 2 per 1000 in 1975 to just below 10 in 1000 in 2005. Whilst this quintupled number still only represents 1 % of the papers, it is worrying since duplication represents just one possible mode of scientific misconduct.

The importance of having many publications is now declining in favor of the impact of publications (Franco 2013), although there are still scientists evaluated solely on the number of publications (Anderson et al. 2007). Impact is the focus of the next section.

High Impact Publications

The idea that the importance of a publication can be judged from the number of references it receives is not recent. Even before Garfield (2006) first published about a Citation Index for Science, in 1955, the idea already existed, as he himself readily acknowledges. Early in the twentieth century, Gross and Gross (1927) postulated that the number of references a journal receives from a set of representative journals suggests something about its importance to the field, aiding librarians in choosing journals to add to their collections.

“The impact factor” states Moustafa, “became a major detrimental factor of quality, creating huge pressures on authors, editors, stakeholders and funders” (2014). But when did the impact of a single scholar, as measured by the citations (s)he receives, become important? This seems hard to pin point. In 1990, Tsafrir and Reis state “administrators are turning more to the citation performance of individuals” (1990) suggesting an increase in importance in or shortly prior to 1990, at least for Medicine. But it seems to have started earlier, in 1975 Wade provides cases where scholars’ citation counts were used for tenure and funding decisions, but it was by no means commonly used as an assessment tool at that time (Wade 1975). If indeed the idea about the importance of being cited influences scholars, consciously or unconsciously, we would expect this to be reflected in their work, starting between 1975 and 1990, in at least some scientific fields.

Recent research, discussed below, has examined the characteristics of papers, such as their writing style, which have an influence on subsequent citations. Whilst we should look at factors influencing subsequent citations in papers published in the period that we are interested into truly understand what was relevant then, the factors identified in current research offer some insights. These factors are expected to differ between the period before 1975 and the period after 1990, as the transition by then has already started.

The number of references a paper contains has been found to be positively correlated with the number of times a paper is cited, and this holds for all fields researched (Vieira and Gomes 2010; Webster et al. 2009; Wesel et al. 2014). Having many references can be useful to defend a paper against attacks (Latour 1987). Whilst references should be relevant to the paper, their numbers could be inflated by simply copying references from other papers (Ramos et al. 2012) or via a process of I cite you, you cite me in a form of reciprocal altruism (Webster et al. 2009).

The number of authors contributing to a paper is also a stable positive influencer across fields (Frenken et al. 2005; Glänzel and Thijs 2004; Levitt and Thelwall 2009; Vieira and Gomes 2010; Webster et al. 2009; Wesel et al. 2014). The rise of multi-authored papers, already observed by de Solla Price (1963) in the early 1960s, is often thought of as resulting from a rise in multi-disciplinary research. However other explanations for this rise are gratuitous listing of co-authors and gift-authorship, already mentioned above in the context of publish or perish. In the context of high-impact publication the naming of extra authors not only helps these authors gather extra publications, but could also help the paper to become highly cited, by extending the network to which the paper can easily be introduced (Frenken et al. 2005). Especially when eminent co-authors are named this has an even greater effect on the number of times a paper is cited (Haslam et al. 2008).

Another factor which, in most fields, correlates positively with the times an article is cited is its total length (Haslam et al. 2008; Hudson 2007; Vieira and Gomes 2010; Wang et al. 2012; Wesel et al. 2014), although this does not seem to hold in Applied Physics (Wesel et al. 2014). Notice, this seems to contrast with a trend observed for publish or perish which stimulates short, sliced, papers. Some suggests that lengthening is done to meet a presumed standard (Andreescu 2013).

Other interesting factors include the presence of a colon in the title and the length of the title (Haslam et al. 2008; Jacques and Sebire 2010). The direction of the effect of title length seems to differ across fields. In Sociology, Applied Physics, and a sub-set of PLoS journals a shorter title is associated with more citations (Jamali and Nikzad 2011; Wesel et al. 2014). Whilst in General and Internal Medicine the effect is reversed (Wesel et al. 2014).

The readability of abstracts also influences the number of citations an article receives, at least in Applied Physics and General and Internal Medicine (Wesel et al. 2014). A less readable than average abstract, as measured by the Flesch Reading Ease Score (Flesch 1948), has a positive effect on the number of citations an article receives. More sentences in the abstract is also related to more frequent citation in Sociology, Applied Physics, and General and Internal Medicine (Wesel et al. 2014).

The mechanism by which these factors are understood to influence the number of incoming citations is not relevant for this work (for exploration see, for instance, Wesel et al. 2014). What does matter is if the utilization of tricks that increase the number of received citations is increasing. These tricks do not necessarily represent scientific misconduct, although artificially inflating the author count, adding unnecessary references, and purposely making the abstract hard to read clearly can be considered misconduct. Depending on the circumstances this could also be said for lengthening a paper, if this lengthening occurs without adding new, relevant, information, this could be seen as misconduct.

Historically some of these, or related, factors have been shown to be stable whilst others are known to have changed. According to Gross et al. (2002) the number of citations per 100 words has risen from 0.3 in the period 1901–1925 to 1.8 in the period 1976–1995. This rise has been quite steep, in the period 1926–1950 there were 0.8 citations per 100 words and 1.5 in the period 1951–1975. The number of references quoted in articles was quite stable over a long period, in 1955 Garfield calculated an average of ten (Garfield 2006), and in the early ‘60s de Solla Price (1963) gives just under ten as the norm, stating it has been stable for many years.

Reproducing of Practices

Scholars who have traits enabling them to produce more and higher cited papers than another scholar in the same field are more likely to secure resources, e.g. career, funding, PhD candidates and the like (Anderson et al. 2007). Since the relationship between a professor and a Ph.D. candidate is a socialization process, many Ph.D. candidates are influenced by the publishing style of their professors. Thus they pick up on traits about what constitutes good scholarly conduct and what constitutes misconduct. Furthermore, productive scholars will be read more, and are thus more likely to influence their readers with their style and approach to citation. Scholars, at all moments in their career but especially if they are new to the field, are further socialized by what they read, what they see, and what they hear from their peers and especially from those who are seen as successful.

Thus scholarly (mis)conduct is reproduced via a form of sociocultural evolution. The selection mechanism (Nolan and Lenski 2006) is evident, as described in the above paragraph. In other words; “selection theory takes the following from: when interactors interact, replicators create lineages by a process of selection” (Gross et al. 2002).

As such conducts becomes more widespread, scholars have come to see these practices as the norm, and as the accepted way to conduct science. As Elliott suggested when discussing salami-slicing “there is no intentional deceit taking place, just an assumption that this practice is perfectly acceptable” (2013).

Expected Results

Following the discussion above one would expect to observe the following:

A decrease in the length of the paper title, in most fields
A rise in the number of authors contributing to a paper
An increase in paper length
An increase in the number of sentences in the abstract
Most likely a decrease in readability of the abstract until it reaches an optimum
A rise in the number of references a paper contains
And an increase in paper titles with a (semi-)colon

Given the generational effect described above we would aspect these changes to accelerate, at least until reaching an optimum or plateau level.

Methodology

To select representative journals, 50 journals with the highest Impact Factor for the years 1997 and 2012 from Thomson Reuters Journal Citation Reports (JCR) Science and Social Science edition were compared to identify journals which have been influential for many years.^{Footnote 2} There was an overlap of 18 journals in the JCR Science Edition and 20 journals in the JCR Social Science Edition. For these journals the availability of data in Thomson Reuters Web of Knowledge was checked, as data was required from 1960 till 2004 in order to create three 15 year periods (1960–1974, 1975–1989, and 1990–2004) of which the first and third can be compared. Eight journals in the JCR Science Edition and four journals in the JCR Social Science Edition met this criterion.

Information about the papers which appeared in these journals was downloaded from the Web of Knowledge (WoK). WoK data provided information on the publication year, the title, the authors, the DOI. From the title, the length in the number of words, and presence of a (semi-)colon were recorded. From the list of authors, the number of authors was counted, by counting the separating semi-colons and adding 1, for papers with an anonymous author the author count field was left blank. Data on the number of references contained in the paper were also extracted from WoK, however this data was unavailable for papers published before 1988,^{Footnote 3} and thus this variable was not analyzed. Using CrossRef^{Footnote 4} the DOI was translated to the URL of the papers at the publisher’s website. When the DOI was missing, the article name, journal, and year were used to query CrossRef for the DOI, which was only accepted if the first author was listed and the match had a 100 % score. From the publishers website the abstract and type of paper were acquired, as well as the start and end page, as there was incongruity between publisher and WoK data. For Chemical Reviews and Pharmacological Reviews it proved not possible to obtain information about the paper type, thus these journals were removed from the sample.

HTML codes^{Footnote 5} were removed from the abstract when necessary. Using the built-in readability function in Microsoft Word 2010 the Flesch Reading Ease was calculated. The formula used by Word (Microsoft 2007) for this is as follows;

$$\begin{aligned} {\text{Flesch}}\,{\text{Reading}}\,{\text{Ease}}\,{\text{Score}} & = 20 6. 8 3 5{-}\left( { 1.0 1 5\times {\text{Total}}\,{\text{Words}}/{\text{Total}}\,{\text{Sentences}}} \right) \\ &\quad {-}\left( { 8 4. 6\times {\text{Total}}\,{\text{Syllables}}/{\text{Total}}\,{\text{Words}}} \right) \end{aligned}$$

The Flesch Reading Ease Score (FRES) is a readability scale in which a higher score indicates easier readability, for all practical considerations the scale can be thought of as ranging from 0 to 100, where a score from 0 to 30 indicates very difficult and a score from 90 to 100 very easy.

Three rough categories of paper types were deemed suitable for analysis; Articles (review and original), Letters, and short scientific communications. This leads to the fifteen journal paper type combinations shown in Table 1. Differences in naming had to be resolved, for instance Correspondence and Letters to the Editor in Lancet and in Nature were combined for their respective journals.

Table 1 Number of papers in the dataset

Evaluation by Citation: Trends in Publication Behavior, Evaluation Criteria, and the Strive for High Impact Publications

Abstract

Similar content being viewed by others

The Use of Bibliometrics for Assessing Research: Possibilities, Limitations and Adverse Effects

Reflections on how to evaluate the professional value of scientific papers and their corresponding citations

Evaluation of the professional worth of scientific papers, their citation responding and the publication authority

Introduction

Publish or Perish

High Impact Publications

Reproducing of Practices

Expected Results

Methodology

Results

Conclusion and Discussion

Notes

References

Acknowledgments

Conflict of interest

Ethical standard

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation