Peer Review, Citation Ratings and Other Fetishes
- 3.2k Downloads
Academic success or ability is often assessed through peer review or the seemingly more objective method of citation ratings. However, citation ratings may be more objective in that they offer a more automated or mechanical method of assessing quality, but it does not necessarily follow that these ratings provide an assessment of genuine scientific impact. Likewise, although peer review may provide an effective filtering system, it cannot be assumed that it provides objective critiques. This review discusses the fetishes and flaws of both methods, and suggests that future reviewing methods should involve both quantitative and qualitative methods, tailored to the specific individual subject area.
Key wordsCitations Peer review Impact factor h-index
It is argued that academic success or quality should be assessed using a combination of both seemingly objective indicators, such as citation counts or single number indices like the h-index, and more subjective, or expertise-informed, peer reviews . These measures are often used to inform decisions about grant/institutional funding, staff hiring, promotion and/or tenure [1, 26]. However, both of these ostensible indicators of quality are criticized for having significant weaknesses, and appear open to significant bias and manipulation.
Since the 17th century, when the first ‘scientific’ journals began, manuscripts submitted to journals for publication have usually been subject to peer review. In this process, the journal editor usually consults with other experts, or ‘peers’, to provide an objective and skilled critique of the science presented . This process allows editors to screen for any flaws, and select the best papers for publication [22, 43]. In some cases, peer reviewers may also act as ‘shepherds’, working with the authors to improve the quality of a paper [2, 8]. Therefore, peer review is often discussed as a sacred process, the linchpin of scientific rigour, ensuring the quality of published papers .
However, peer reviews are subject to a number of different influences and rarely completely objective . Indeed, reviewers have been found to give more favourable reviews to experts in the field [41, 44], who are from more prestigious institutions [28, 29, 37, 41], are male  and speak English as their first language .
Many of these issues may be obviated using blinded reviews (‘single’ blinded when the authors are anonymous, and ‘double’ blinded when both authors and reviewers are anonymous [29, 32]). Indeed, one study reported a significant increase in the number of female first-authored papers published in one journal following the adoption of double-blinded reviews . However, double-blinded reviews have also been criticized for preventing the reviewer from judging the novelty of the contribution (does the paper represent true scientific advance, or the rehashing of the author’s previously published data?) or the author’s level of expertise, leading to possible inappropriate reviewer comments .
Furthermore, double-blinded reviews may not be truly anonymous. An expert reviewer may actually be able to identify the author through self-citations and references to earlier works. Similarly, authors may be able to identify the identity of the reviewer through the nature of their comments, thus reducing the effectiveness of any such blinding procedures.
There are also issues around conflict of interests, when reviewers may wish to delay or suppress certain work being published, or indeed lack specialist knowledge, failing to recognize either errors or important contributions. Thus, editors may seek the advice of a number of reviewers, perhaps chosen for their differing areas of expertise. However, each review can take considerable time, and a greater number of authors may lead to a greater likelihood of disagreement between reviewers, or lack of consistency, with little evidence that an increased number of reviewers leads to better papers .
Despite these shortcomings, several studies have found peer reviews to have high predictive validity, with a high association between reviewers’ ratings and later citation ratings of both individual papers [10, 35] and individual scientists .
Citation ratings are the number of times that a paper has been cited in published work, and thus a higher number of citations is thought to reflect more important or influential work. Citation ratings are also used to calculate the personal impact of individual scientists, through single number indices, such as the h-index . The h-index is calculated by determining the number of papers a scientist has with at least h citations each, with other papers having no more than h citations each. Such single number indices have been applauded for allowing the quick calculation of an individual’s impact , with clear implications for that individual.
However, citation ratings can be misleading. Citation ratings assume that authors reference all the previous work that has influenced them, but of course authors often have limited space in which to include citations, and thus must choose to include only a select few, and face competing pressures when doing so . Several studies have found that authors tend to cite papers that have a greater number of authors [18, 42], are longer in length , are in a journal issue that contains a high-impact article , are reviews  and report positive findings . Moreover, citations may be negative, where papers are cited to criticize work, rather than praise .
Citation ratings may also be manipulated. Authors may make reference to their own articles, inadvertently artificially inflating their apparent impact [14, 19], which may be particularly significant if the paper has multiple authors . Schreiber  argues that self-references should be removed from citation ratings; but, self-references have also been found to increase the number of citations from others  and, therefore, solely subtracting the number of self-references from citation rates may not actually remove their total effect, but also handicap productive groups .
Citation rates are also used to calculate journal impact factors, with higher impact factors often taken as a proxy of journal quality. Indeed, journals with higher citation rates have lower acceptance rates, suggesting that such journals publish better quality papers [13, 23]. The most widely used method for calculating impact factor is that published by Thomson Reuters. In this calculation, the total number of citations gathered during a specific year by articles published in the previous 2 years is divided by the number of substantive or ‘source’ items. Total number of citations encompasses citations to all items published within the journal, but the number of source items includes only articles, reviews and proceeding papers. Thus, it is clearly possible to massage impact factor by reducing the number of source items, and boosting the number of other types of publications, such as letters, serving to increase the overall citation rating. Journal articles also tend to have an unequal distribution of citations, with some articles receiving more attention than others . Such intra-journal variation is obscured through the current calculation process.
Citation rates also differ according to subject area, with certain disciplines being faster moving than others [15, 16, 17], and certain topics more attractive for publication in wider science journals. For instance, fundamental science journals appear to have larger mean impact factors than do journals of specialized, or applied, subject areas , and thus it is not possible to compare citations or impact factor across subject areas. Citation rates also vary according to whether or not the journal is open-access, with freely accessible papers gathering more citations [6, 30, 34].
In sum, citation ratings are often thought to be more of an ‘objective’ indicator of quality than ‘subjective’ peer review, but is this only an illusion? Citation ratings may be more objective in that they offer a more automated or mechanical method of assessing quality, but it does not necessarily follow that these ratings provide an assessment of genuine scientific impact. Likewise, although peer review may provide an effective filtering system, it cannot be assumed that it provides objective critiques. Indeed, it appears that although both methods are lauded, both methods are flawed.
Therefore, journals are increasingly adopting novel techniques to assess quality. These include conducting public as well as traditional peer reviews; asking reviewers to rank rather than review papers; asking for papers to be submitted with any previous reviewer comments; as well as ranking reviewers themselves [4, 8]. The jury is still out on whether any of these newer methods actually improve the assessment of scientific quality, but perhaps a combination of both quantitative and qualitative methods, tailored to each subject area, would offer the best compromise.
- 1.Adler KB (2009) Impact factor and its role in academic promotion. Am J Respir Cell Mol Biol 41:127Google Scholar
- 4.Akst J (2010) I hate your paper. The Scientist 24:36Google Scholar
- 5.Amin M, Mabe M (2000) Impact factors: use and abuse. Perspect Publ 1:1–6Google Scholar
- 6.Antelman K (2004) Do open-access articles have greater impact? Coll Res Libr 65:372–382Google Scholar
- 33.Nature (2008) Working double-blind. Nature 451:605–606Google Scholar
- 36.Pendlebury DA (2009) The use and misuse of journal metrics and other citation indicators. Scientometrics 57:1–11Google Scholar
- 38.Research Excellence Framework (2009). http://www.hefce.ac.uk/pubs/hefce/2009/09_38/#exec. Accessed 24 Aug 2012