Evaluation by Citation: Trends in Publication Behavior, Evaluation Criteria, and the Strive for High Impact Publications

Criteria for the evaluation of most scholars’ work have recently received wider attention due to high-profile cases of scientific misconduct which are perceived to be linked to these criteria. However, in the competition for career advancement and funding opportunities almost all scholars are subjected to the same criteria. Therefore these evaluation criteria act as ‘switchmen’, determining the tracks along which scholarly work is pushed by the dynamic interplay of interests of both scholars and their institutions. Currently one of the most important criteria is the impact of publications. In this research, the extent to which publish or perish, a long standing evaluation criterion, led to scientific misconduct is examined briefly. After this the strive for high impact publications will be examined, firstly by identifying the period in which this became an important evaluation criterion, secondly by looking at variables contributing to the impact of scholarly papers by means of a non-structured literature study, and lastly by combining these data into a quantitative analysis.


Introduction
Not ideas, but material and ideal interests, directly govern men's conduct. Yet very frequently the 'world images' that have been created by 'ideas' have, like switchmen, determined the tracks along which action has been pushed by the dynamics of interest. 'From what' and 'for what' one wished to be redeemed and, let us not forget, 'could be' redeemed, depended upon one's image of the world. (Weber 1970b) Scientific misconduct has been increasing, however until recently awareness of these practices appears to be limited (Regmi 2011). 1 Recent cases of scientific misconduct, such as the Stapel affaire (fabrication of data, see Levelt Committee et al. 2012), and the Karl-Theodor zu Guttenberg plagiarism affair, have created some awareness amongst both scholars and the general public. This sparked some national or field-specific movements such as Science in Transition in the Netherlands (Dijstelbloem et al. 2013) and the American Society of Cell Biology's Declaration on Research Assessment (Moustafa 2014). These movements perceive a link between the system of evaluation of science and cases of scientific misconduct, such as some of the extreme cases mentioned above.
Reflecting on the switchmen metaphor put forth by Weber in his study of religion and its influence on economic systems, we can begin to see that what is deemed as important in science steers the behavior of scholars in a certain direction. Two crucial ideas about what is important in science are currently at work: the idea that publishing more is better and the idea that a journal, paper, scholar and institution should have a high Impact Factor.
From what and for what does a scholar wish to be redeemed, and can a scholar be redeemed? In most western countries a publication track record is necessary for obtaining tenure and it is hard to secure funds for research without one (Regmi 2011). Especially peer-reviewed publications are important for the career of scholars at all stages of employment (Elliott 2013;Moustafa 2014). Someone who only publishes work with a low impact factor will have difficulty obtaining tenure and funds. Thus both the number of publications as well as their impact are crucial for the career of a scholar, redeeming him/her from joblessness, or at least careerlessness, if their publication record is better than that of their peers. Because when there is stiff competition for positions, funding, and other academic rewards, those with slightly greater achievements will reap in a disproportionately larger share of the rewards (Anderson et al. 2007).
The ideas about what is important in science are important themselves. Scholars are constantly being reminded that these ideas are important, thus these ideas determine the tracks via which action has been pushed by the interest of scholars. This is coupled with a rise of careerism among scientists, in which, sometimes, the shortest routes to success are taken, including fraudulent behavior (Kumar 2008) or cutting a few corners (Anderson et al. 2007). Competition between scientists increases the chance of scientific misconduct (Anderson et al. 2007).
First, the literature on publish or perish will be examined as it has already been established how this idea has shaped scholarly publishing behavior, and contributed to unethical behavior. Secondly the idea of publishing for high impact will be studied, as this is less well researched. Here the focus will not be on unethical behavior, but rather trends in publishing behavior will be examined to establish the link between a paradigmatic shift in what is important and publishing behavior. The period in which high impact publications became a criteria for evaluation will have to be identified, after which variables contributing to the impact of papers will be identified by means of a small literature study. These variables will be used to conduct a quantitative analysis of how these factors have changed over the years.

Publish or Perish
Scholarly publication rates, Errami and Garner (2008) claim, are at an all-time high. This is not caused by an increase in productivity, but rather by changes in the way scholars publish (Broad 1981), which is linked to the pressure to publish (Errami and Garner 2008). A 1981 commentary in Science reported that co-authorship and multiple publication of the same data were on the rise, whilst the length of papers was decreasing (Broad 1981). The increase in co-authorship is attributable to interdisciplinary papers, multi-institutional clinical trials, but also to gratuitous listing of co-authors (Broad 1981). Gift authorship (Matías-Guiu and García-Ramos 2010), for instance including the head of a department or lab, is a common practice in some disciplines as is adding other researchers out of courtesy (Broad 1981), or an expectation of reciprocity (Webster et al. 2009).
In addition to a rise in co-authorship, a decrease in paper length was already noted in the early 1980s (Broad 1981). Scholars prefer to publish four short papers instead of one long paper (Broad 1981). They slice their data as if it were a salami, hence the term salami-slicing is sometimes used. The terms Least Publishable Unit (LPU) (Broad 1981), or Smallest Publishable Unit (SPU) (Elliott 2013) are used to describe these papers that contain the minimum amount of information needed to get published.
Another trend, closely related to salami slicing, is that of duplicate, or multiple, publication publishing articles that overlap substantially (Andreescu 2013;Kumar 2008;von Elm et al. 2004). This could be a simple copy (with the same authors, same data, basically same content, maybe a different title), a salami-sliced article without cross references to other articles based on the same data, a meat extender also called data augmentation (which is an expansion of an existing article with more data, sometimes without cross-reference), salami-sliced articles published by different authors (most common in multicenter trials), and a textual copy of an article with a different dataset and, possibly, different results and/or conclusion (Kumar 2008;von Elm et al. 2004).
Using text comparison software followed by manual verification, Errami and Garner (2008) uncovered a growing trend of duplicate publications in the biomedical Evaluation by Citation 201 literature; just below 2 per 1000 in 1975 to just below 10 in 1000 in 2005. Whilst this quintupled number still only represents 1 % of the papers, it is worrying since duplication represents just one possible mode of scientific misconduct. The importance of having many publications is now declining in favor of the impact of publications (Franco 2013), although there are still scientists evaluated solely on the number of publications (Anderson et al. 2007). Impact is the focus of the next section.

High Impact Publications
The idea that the importance of a publication can be judged from the number of references it receives is not recent. Even before Garfield (2006) first published about a Citation Index for Science, in 1955, the idea already existed, as he himself readily acknowledges. Early in the twentieth century, Gross and Gross (1927) postulated that the number of references a journal receives from a set of representative journals suggests something about its importance to the field, aiding librarians in choosing journals to add to their collections.
''The impact factor'' states Moustafa, ''became a major detrimental factor of quality, creating huge pressures on authors, editors, stakeholders and funders'' (2014). But when did the impact of a single scholar, as measured by the citations (s)he receives, become important? This seems hard to pin point. In 1990, Tsafrir and Reis state ''administrators are turning more to the citation performance of individuals' ' (1990) suggesting an increase in importance in or shortly prior to 1990, at least for Medicine. But it seems to have started earlier, in 1975 Wade provides cases where scholars' citation counts were used for tenure and funding decisions, but it was by no means commonly used as an assessment tool at that time (Wade 1975). If indeed the idea about the importance of being cited influences scholars, consciously or unconsciously, we would expect this to be reflected in their work, starting between 1975 and 1990, in at least some scientific fields.
Recent research, discussed below, has examined the characteristics of papers, such as their writing style, which have an influence on subsequent citations. Whilst we should look at factors influencing subsequent citations in papers published in the period that we are interested into truly understand what was relevant then, the factors identified in current research offer some insights. These factors are expected to differ between the period before 1975 and the period after 1990, as the transition by then has already started.
The number of references a paper contains has been found to be positively correlated with the number of times a paper is cited, and this holds for all fields researched (Vieira and Gomes 2010;Webster et al. 2009;Wesel et al. 2014). Having many references can be useful to defend a paper against attacks (Latour 1987). Whilst references should be relevant to the paper, their numbers could be inflated by simply copying references from other papers (Ramos et al. 2012) or via a process of I cite you, you cite me in a form of reciprocal altruism (Webster et al. 2009).
The number of authors contributing to a paper is also a stable positive influencer across fields (Frenken et al. 2005;Glänzel and Thijs 2004;Levitt and Thelwall 2009;Vieira and Gomes 2010;Webster et al. 2009;Wesel et al. 2014). The rise of multi-authored papers, already observed by de Solla Price (1963) in the early 1960s, is often thought of as resulting from a rise in multi-disciplinary research. However other explanations for this rise are gratuitous listing of co-authors and giftauthorship, already mentioned above in the context of publish or perish. In the context of high-impact publication the naming of extra authors not only helps these authors gather extra publications, but could also help the paper to become highly cited, by extending the network to which the paper can easily be introduced (Frenken et al. 2005). Especially when eminent co-authors are named this has an even greater effect on the number of times a paper is cited (Haslam et al. 2008).
Another factor which, in most fields, correlates positively with the times an article is cited is its total length (Haslam et al. 2008;Hudson 2007;Vieira and Gomes 2010;Wang et al. 2012;Wesel et al. 2014), although this does not seem to hold in Applied Physics (Wesel et al. 2014). Notice, this seems to contrast with a trend observed for publish or perish which stimulates short, sliced, papers. Some suggests that lengthening is done to meet a presumed standard (Andreescu 2013).
Other interesting factors include the presence of a colon in the title and the length of the title (Haslam et al. 2008;Jacques and Sebire 2010). The direction of the effect of title length seems to differ across fields. In Sociology, Applied Physics, and a sub-set of PLoS journals a shorter title is associated with more citations (Jamali and Nikzad 2011;Wesel et al. 2014). Whilst in General and Internal Medicine the effect is reversed (Wesel et al. 2014).
The readability of abstracts also influences the number of citations an article receives, at least in Applied Physics and General and Internal Medicine (Wesel et al. 2014). A less readable than average abstract, as measured by the Flesch Reading Ease Score (Flesch 1948), has a positive effect on the number of citations an article receives. More sentences in the abstract is also related to more frequent citation in Sociology, Applied Physics, and General and Internal Medicine (Wesel et al. 2014).
The mechanism by which these factors are understood to influence the number of incoming citations is not relevant for this work (for exploration see, for instance, Wesel et al. 2014). What does matter is if the utilization of tricks that increase the number of received citations is increasing. These tricks do not necessarily represent scientific misconduct, although artificially inflating the author count, adding unnecessary references, and purposely making the abstract hard to read clearly can be considered misconduct. Depending on the circumstances this could also be said for lengthening a paper, if this lengthening occurs without adding new, relevant, information, this could be seen as misconduct.
Historically some of these, or related, factors have been shown to be stable whilst others are known to have changed. According to Gross et al. (2002) the number of citations per 100 words has risen from 0.3 in the period 1901-1925 to 1.8 in the period 1976-1995. This rise has been quite steep, in the period 1926-1950 there were 0.8 citations per 100 words and 1.5 in the period 1951-1975. The number of references quoted in articles was quite stable over a long period, in 1955 Garfield calculated an average of ten (Garfield 2006), and in the early '60s de Solla Price (1963) gives just under ten as the norm, stating it has been stable for many years.

Reproducing of Practices
Scholars who have traits enabling them to produce more and higher cited papers than another scholar in the same field are more likely to secure resources, e.g. career, funding, PhD candidates and the like (Anderson et al. 2007). Since the relationship between a professor and a Ph.D. candidate is a socialization process, many Ph.D. candidates are influenced by the publishing style of their professors. Thus they pick up on traits about what constitutes good scholarly conduct and what constitutes misconduct. Furthermore, productive scholars will be read more, and are thus more likely to influence their readers with their style and approach to citation. Scholars, at all moments in their career but especially if they are new to the field, are further socialized by what they read, what they see, and what they hear from their peers and especially from those who are seen as successful.
Thus scholarly (mis)conduct is reproduced via a form of sociocultural evolution. The selection mechanism (Nolan and Lenski 2006) is evident, as described in the above paragraph. In other words; ''selection theory takes the following from: when interactors interact, replicators create lineages by a process of selection'' (Gross et al. 2002).
As such conducts becomes more widespread, scholars have come to see these practices as the norm, and as the accepted way to conduct science. As Elliott suggested when discussing salami-slicing ''there is no intentional deceit taking place, just an assumption that this practice is perfectly acceptable'' (2013).

Expected Results
Following the discussion above one would expect to observe the following: • A decrease in the length of the paper title, in most fields • A rise in the number of authors contributing to a paper • An increase in paper length • An increase in the number of sentences in the abstract • Most likely a decrease in readability of the abstract until it reaches an optimum • A rise in the number of references a paper contains • And an increase in paper titles with a (semi-)colon Given the generational effect described above we would aspect these changes to accelerate, at least until reaching an optimum or plateau level.

Methodology
To select representative journals, 50 journals with the highest Impact Factor for the years 1997 and 2012 from Thomson Reuters Journal Citation Reports (JCR) Science and Social Science edition were compared to identify journals which have been influential for many years. 2 There was an overlap of 18 journals in the JCR Science Edition and 20 journals in the JCR Social Science Edition. For these journals the availability of data in Thomson Reuters Web of Knowledge was checked, as data was required from 1960 till 2004 in order to create three 15 year periods (1960-1974, 1975-1989, and 1990-2004) of which the first and third can be compared. Eight journals in the JCR Science Edition and four journals in the JCR Social Science Edition met this criterion.
Information about the papers which appeared in these journals was downloaded from the Web of Knowledge (WoK). WoK data provided information on the publication year, the title, the authors, the DOI. From the title, the length in the number of words, and presence of a (semi-)colon were recorded. From the list of authors, the number of authors was counted, by counting the separating semi-colons and adding 1, for papers with an anonymous author the author count field was left blank. Data on the number of references contained in the paper were also extracted from WoK, however this data was unavailable for papers published before 1988, 3 and thus this variable was not analyzed. Using CrossRef 4 the DOI was translated to the URL of the papers at the publisher's website. When the DOI was missing, the article name, journal, and year were used to query CrossRef for the DOI, which was only accepted if the first author was listed and the match had a 100 % score. From the publishers website the abstract and type of paper were acquired, as well as the start and end page, as there was incongruity between publisher and WoK data. For Chemical Reviews and Pharmacological Reviews it proved not possible to obtain information about the paper type, thus these journals were removed from the sample.
HTML codes 5 were removed from the abstract when necessary. Using the builtin readability function in Microsoft Word 2010 the Flesch Reading Ease was calculated. The formula used by Word (Microsoft 2007)  The Flesch Reading Ease Score (FRES) is a readability scale in which a higher score indicates easier readability, for all practical considerations the scale can be thought of as ranging from 0 to 100, where a score from 0 to 30 indicates very difficult and a score from 90 to 100 very easy.
Three rough categories of paper types were deemed suitable for analysis; Articles (review and original), Letters, and short scientific communications. This leads to the fifteen journal paper type combinations shown in Table 1. Differences in naming had to be resolved, for instance Correspondence and Letters to the Editor in Lancet and in Nature were combined for their respective journals.
These variables were compared using an independent-samples t-test grouping the papers in the period 1960-1974 and 1990-2004. The presence of a (semi-)colon in the title was compared using the Chi square test. Effect size was calculated using the a Some years with zero papers, most likely due to a low number of papers overall b For the period 1972-1990 there were no papers identified as article, this is due to how Science Direct displayed paper identification for part of the papers c Some years with zero papers, are most likely due to a low number of papers overall d One paper in 1997 and zero in 1998-1999, reason unknown r = sqrt((t 2 /(t 2 ? df)) formula for the t test, resulting in only positive effect sizes. And / = sqrt(x 2 /n) for the Chi square test, also resulted in only positive effect sizes.

Results
Paper titles, measured by the number of words, are longer in the second period, 1990-2004, for fourteen of the fifteen sets, only Letters to editor and correspondence in Nature show a reduction in title length (see Table 8). An independentsamples T test determines these differences are significant (p \ 0.05), even for this before mentioned outlier ( Table 2). The effect sizes for the different sets vary from small to large, the effect size is smallest for Nature; letters to the editor and correspondence and largest for Annual Review of Psychology; review-article (Table 8). Fourteen out of fifteen sets behave contrary to the prediction. For all fifteen sets the mean number of authors per paper increases between the two periods (Table 9) and these increases are significant for all sets (Table 3). The effect sizes for the different sets vary from halfway between small and medium to large, the effect size is smallest for American Psychologist; Comment and Reply and largest for Science; report (see Table 9). All fifteen sets follow the predicted behavior.
The page count increases for ten out of the fifteen sets, in the other five the page count decreases (Table 10). The changes are significant in twelve sets, for American Psychologist; Comment and Reply, American Psychologist; Journal Article, and Lancet; hypothesis the change is not significant at all (Table 4). The effect sizes for the sets in which the change is significant vary from very small for Lancet; letters to the editor to very large for Science; report (see Table 10). Most sets behave as predicted. The number of sentences in the abstract rises in seven of the nine sets for which statistics for the number of sentences in the abstract could be calculated, for the other two, from the same journal, this number decreased (Table 11). These differences are significant ( Table 5). The effect sizes for the different sets vary from small to very large the effect size is smallest for American Psychologist; Journal Article and largest for Lancet; article and Nature; Article (Table 11). This rise is in line with the predicted behavior.
In eight out of nine sets examined, the Flesch Reading Ease Score of the abstract is lower in the period 1990-2004 then it was in the period 1960-1974 (Table 12), these differences are significant (Table 6). This suggests that for eight of these sets the abstracts became harder to read, only Lancet; Articles became easier to read. The effect sizes for the different sets vary from small to halfway between small and medium, the effect size is smallest for Lancet; article and Nature; letters to editor and correspondence and largest for American Psychologist; Comment and Reply (see Table 12). As predicted, abstracts became harder to read.
For five of the fifteen sets the proportion of titles with a (semi-)colon in the title rises, for two the proportion stays about the same, and for seven the proportion is lower when we compare the period 1960-1974to 1990-2004. And these changes were found to be significant for the twelve sets in which we see a rise or drop ( Table 7). The effect sizes for the sets in which the change is significant vary from small to halfway between medium and large, the effect size is smallest for Nature; article and largest for Lancet; article (Table 7). For some of the sets the predicted behavior is followed, others follow the opposite behavior.
This observation might, however, be misguided, as it appears that in the period 1990-1994, and most likely some surrounding years, the number of (semi-)colon in titles was very low, which might point to a discrepancy in the way Web of Knowledge treated this character (see the exemplary graphs in Figs. 1, 2). For instance the articles noted on APA PsycNET as ''Support theory: A nonextensional representation of subjective probability'' and ''Simultaneous over-and underconfidence: The role of error in judgment processes.'' are registered in WOK as ''Support theory-A nonextensional representation of subjective probability'' and ''Simultaneous over-and underconfidence-The role of error in judgment processes'' respectively. Note this is not only limited to these two journals, but occurs throughout the dataset for this period.

Conclusion and Discussion
Whilst the predicted pattern is followed in most cases there are notable exceptions such as the title length, for which the predicted pattern is only followed in one set. For other variables the predicted pattern is followed more closely, for every set the number of authors increases, although the single-authored paper did not die out, something de Solla Price (1963) predicted would happen for Chemical Abstracts, not included in this sample, by 1980. For the other cases there are one or more sets not following the predicted behavior. Exception being the presence of a colon with in the title, which only rises in six sets, but might have an alternative explanation (see ''Results'' section). Given only fifteen journal/paper type sets, representing ten journals, which in turn represent four JCR Science Edition categories and only one JCR Social Science Edition, were included in this paper some explanation for changes in individual variables can be sought in changes in the journal's editorial policies or in changes in the field. A change in policies could have caused the change in the number of authors, but this would then have to have happened to for all journals examined. External factors of influence, other than the ideas that citing for impact and publish or perish are important, could also explain changes. The decline in readability could be caused by the rise in the use of word-processing software. Tin and Inggeris (2000) did find that students produce more complex text when word processing than when writing with pen and paper. Changes in the field could both be caused by changes related and not related to evaluation criteria based on publish or perish or on publishing for impact. The rise in authors could be explained by a rise in multidisciplinary research. However this has already been dismissed as the sole explanation in earlier writings (Broad 1981;Matías-Guiu and García-Ramos 2010). Maybe the extravagant number of authors listed on some articles is not the result of an attempt to beat the Publish or Perish game or an attempt to become highly cited. Maybe it is a genuine attempt to acknowledge the contribution of those without whose work the research would not have been possible, such as lab technicians, doctors collecting data in their practices, and technicians keeping the machines of big science up and running. Instrumental and valuable, but not authors. Their contributions are perhaps too great to merely thank them in the Acknowledgements. Thus perhaps one solution is to find an alternative form of recognition that fills this apparent void between 'Acknowledgement' and 'author' ought to be filled. Gratuitous listing of co-authors greatly devalues authorship, something overlooked when research performance is evaluated. With authorship comes rights, the right for individuals to put a publication in their CVs, the right for a department or institution to claim the output. But authorship also comes with responsibility, as all authors are responsible for the content, right or wrong (for the latter, including responsibility for fraudulent actions, such as plagiarism and data fraud). Is the 50th author willing to bear this responsibility, responsibility for an article (s)he did not witness being created and might not even have read before it was submitted for publication?
Given the predicted behavior is followed in most of the cases, and there is a realistic case for why these changes occurred, it is not unreasonable to link the Fig. 2 Use of colon in title for Psychological Review; journal article observed changes in publication behavior to a change in evaluation criteria, which is also not out-of-line with what commonsense would predict. These findings combined with those of other researchers, for instance the link between high impact publications and hot topics (Moustafa 2014), lead to the conclusion that evaluation criteria act as switchmen, determining the tracks along which scholarly work is pushed by the dynamic of interests of both scholars and their institutions.
Both the changes which follow as well as those which are counter to the expected behavior could be explained by the journal's editorial policies. It would be possible to discover such policies by studying editorials and comments on submitted manuscripts. This would establish if editorial policies could have influenced these variables, but will, most likely, not explain what caused the change in policies, which could also be a response to external evaluation criteria.
We also need to consider the underlying cause of the rise of these ideas: why have they become so dominant in science? This is a harder question to answer, and one needs to work through national and university policy documents in order to find an answer to this question. One explanation might be sought in changes in how governments try to justify expenditures. Starting in the early 1980s university policies have increasingly been influenced by a need for accountability, at least in EU countries (Geuna 2001). Without a Citation Index it is questionable if the number of citations would be as important as it is now. This is not a technologically deterministic stance (Smith and Marx 1994), a feasible way of counting citations was needed to facilitate the operationalization of the idea, or in other words ''knowledge is embedded in and performed by infrastructures'' (Wyatt et al. 2013). Most likely the negative effect of the competition system for distributing careers, funding and the like on the behavior of scientist has been underestimated by those who bestow the rewards (Anderson et al. 2007).
An interesting future research opportunity would be to compare how these trends differ between fields. Some are traditionally more book-than article-based, notably in the humanities, and still try to hold on to these traditions. Such publications are less easily evaluated by criteria which are predominantly used in the STEM and medical fields.
This research has aimed to find whether publications and citation pressures have resulted in changes in the number of papers scholars produce and in their characteristics, but there are other, potentially more fruitful approaches. Another method, more in line with Weber, would be to examine the teachings of practices, or to study the theology 6 of the ideas (for inspiration Weber 1958). These teachings could be distilled from scholarly guidebooks, such as methods texts, editorial guidelines or codes of practice, by looking for suggestions about salami slicing, duplicate publications, multi-authorship, inflating references, etc. The theology could be found in policies on promotion (such as the granting of tenure), basis for research funding, and the basis for rankings on university and (inter)national levels.
Acknowledgments The author would like to thank the anonymous reviewer of JSEE, and the anonymous reviewers and visitors of the International Integrity and Plagiarism Conference for their suggestions and Sally Wyatt and Jeroen ten Haaf for their many insightful comments, suggestions, and discussions.

Conflict of interest The authors declare that they have no conflict of interest.
Ethical standard This article does not contain any studies with human participants or animals performed by any of the authors.
Open Access This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.