Evaluations in Science

Evaluations are an inevitable hallmark of science; scientists who are able to escape confrontation with ‘sceptical’ experts over their work are few and far between. ‘Quality evaluations are at the heart of the scientific enterprise. Such evaluations and their criteria and measurement determine who gets which job, who gets tenure, who gets which grants and who gets which awards and honours’ [17]. ‘Nearly every aspect of every role performed by scientists is evaluated with potentially consequential effects for scientists and their careers’ [25].

Peer Review as a Tool for Evaluation

The first forms of peer review arose with the foundation of scientific societies—particularly the Royal Society in England—in the middle of the 17th century. They were intended to ensure the quality of the manuscripts which were published in the journals of the societies [5, 31, 35]. The societies were faced with the problem that people who were not members and therefore not seen as trustworthy by a society were presenting experimental results the quality of which was not at all clear. In order to render the results trustworthy, the societies either permitted a member to guarantee the correctness of the results or several members of the society undertook a critical review of the scientific work. Initially, the people had the experiments repeated under the members’ observation; later they reviewed the description of the process and the results of the experiments in manuscripts.

At first, the peer review procedure was applied exclusively to the review of manuscripts which were to appear in the scientific societies’ own journals; in modern science, it has developed into the most important evaluation tool for scientific work and is used to review planned (ex ante) and completed (ex post) research [19, 28]. The procedure is not only used to select manuscripts for publication in journals [20]; it also determines who is awarded prizes (such as the Nobel Prize), grants and jobs. Furthermore, it is used to evaluate research groups and scientific institutions [24], whereby not only research might be under review, but also the scientists’ teaching activities (evaluation of study and teaching), see [8]. With universities forced to make savings, a trend has become apparent over the years in which scientists have less access to normal funding from their universities and increasingly must finance their research from external funding which is granted with the peer review process [21, 29].

As a rule, peer review processes are structured so that scientists are asked by an institution to review and legitimise an evaluation object and their expert report is submitted to decision-makers for further use (for example, to accept or reject a manuscript or an application for research). Depending on the area of application (and also within an area of application), the actual arrangements for the evaluation process can vary to a greater or lesser degree [23, 30, 40]. The evaluation can be undertaken in accordance with highly formalised specifications, but the selection of criteria on which to judge the work can also be left to the referees. Referees can remain anonymous or be named. The researchers under review can be anonymous or their names can be visible to the referees (‘double-blind’ vs. ‘single-blind’). A referee can be deployed on a permanent or an ad hoc basic [19]. An evaluation can be carried out by one referee or by several [30]. If a group of referees is used, the members can evaluate the work independently or work with the others [19]. The peer review process can be designed to be accessible to the public or it can be confidential.

Typically, peer review is used in science to evaluate manuscripts and research applications. In these areas, the referees are tasked with recommending the ‘best’ scientific research given the scarce resources (that is, a limited number of printed pages in journals or limited resources for research) and possibly with formulating suggestions for improvement to the submitted work [15, 22]. They are also supposed to find errors in scientific work or incorrect scientific conduct in association with the work [41]. ‘Peer review should be a filter to separate the wheat from the chaff, and thus to guarantee and improve the quality of published research’ [20].

The proponents of peer review consider the process for checking and legitimising scientific work (manuscripts or applications) to be more suitable than any other proposed up to now. For example, Roy [37] developed a ‘Peer Reviewed Formula System’, in which research funding was granted in proportion to previous scientific productivity. However, this did not prevail as an alternative to peer review. A proponent of peer review, Abelson [2], wrote that ‘the most important and effective mechanism for attaining good standards of quality in journals is the peer review system’ (p. 62). Many years later, ScholarOne [39] expressed a similarly positive view: ‘The peer review process is considered an integral part of scholarly communication, helping to ensure validity, increase accuracy, and enhance content prior to publication and dissemination’. A global survey on the attitudes and behaviour of 3,040 academics in relation to peer review in journals shows that ‘the overwhelming majority (93 %) disagree that peer review is unnecessary. The large majority (85 %) agreed that peer review greatly helps scientific communication and most (83 %) believe that without peer review there would be no control’ [34, p. 1].

The proponents of peer review consider active scientists to be the most suitable people to evaluate the scientific quality of work done by colleagues in their own field [14, 19]. In their view, peer review guarantees like no other procedure an evaluation of performance that is appropriate for the scientific requirements using the scientific expertise constitutionally required. ‘When the peer review process works, statements and opinions are not arbitrary, experiments and data meet certain standards, results follow logically from the data, merit rather than influence determines what is published, and researchers do not have to waste their time reading unfiltered material’ [32, p. 64]. The situation whereby scientists are both referees and refereed in the peer review process contributes to standardising the evaluation criteria and also to generalising the formal requirements [42].

Critics of peer review (see, for example, [1, 18, 26, 36, 37]) see a number of weaknesses in the process: (1) Different referees do not agree on the assessment of the same scientific work (lack of reliability). (2) Recommendations from referees are systematically biased. The work is judged not only on its scientific quality, but also on non-scientific criteria (lack of fairness). (3) Accordingly, the correlation between the judgements in the peer review process and the quality of the refereed work is low (lack of validity). The only reason, according to the critics, for continuing to use the peer review process is that there is no clear consensus on a ‘better’ alternative [51].

Research into the peer review process which has looked at the criticisms levelled against it relates largely to peer review for journals (see overviews in [4, 6, 9, 10, 33, 44, 49]) and less often to peer review of research and grant applications (see overviews in [6, 13, 50]). This research is criticised on the grounds that ‘most of the publications on journal peer review are more opinion than research, often the ruminations of a former editor. Likewise, most of the letters to editors on the topic, the comments of one kind or another are predominantly opinion’ [44, p. 215]. For example, when Overbeke and Wager [33] researched a number of peer review studies in the area of biomedicine, they found that many of these studies had been published as editorials, comments or letters and not as original research articles (see also [43]). An overview of research into the peer review process [12] summarise: ‘While peer review is central to our science, concerns do exist. Despite its importance, it is curious that we have not required the same rigour of study of the peer review process as we do for our science’ (p. 275).

The Connection Between Bibliometric Indicators and Peer Review

While some see ‘qualitative’ peer review and ‘quantitative’ techniques of bibliometrics as competing approaches [7], most others recommend that they should be complementary so that each compensates for the weaknesses of the other [46]. ‘Winking at the tradition of library studies, the term “bibliometrics”, coined by Alan Pritchard in the late 1960s, stresses the material aspect of the undertaking: counting books, articles, publications, citations, in general any statistically significant manifestation of recorded information, regardless of disciplinary bounds. “Scientometrics”, instead, emphasizes the measurement of a specific type of information, upon which a certain kind of value judgment—relative to the status of “scientific”—has already been made by someone put in charge of (and trusted for) giving it’ [11, p. 3].

On one hand, goes the argument, peer review is (1) informed by bibliometric indicators and also (2) subjected to verification. (1) ‘Bibliometric analysis will never be a substitute for peer review, but, if the analysis is comprehensive and sound, it should inform peer review’ [27, p. 321]. ‘The experience at INSERM [The French National Institute of Health and Medical Research, Paris] shows that introducing bibliometric indicators … strengthens the transparency of decisions and provides for their substantiation’ [16]. (2) For Weingart [47], the indicators have an important function in that they subject the evaluations from a peer review to verification and thus protect them from the ‘old boys’ network effect’— a fundamental problem which afflicts the process [48].

On the other hand, the results of bibliometric analyses can be commented on and interpreted by referees. Schneider [38] considers that direct evaluation of scientific performance in accordance with formalised and context-independent criteria and without additional interpretation is impossible. It would require expert referees to undertake a relevance specification appropriate to the situation, taking into account the repercussions on individuals and institutions [47]. Only interpretation by experts would make it possible to develop qualitative assessments of scientific performance from the results of bibliometric analyses. The indicators would not only require annotation by an expert, but also enhancement and correction from an awareness of the context.

For Abramo and D’Angelo [3], the combination of peer review and bibliometrics offers a number of benefits: ‘The use of both methods, with the peer-reviewer having access to bibliometric indicators (hopefully appropriately standardised) concerning the publications to be evaluated, permits the reviewer to form an evaluation that emerges from comparison between his/her personal subjective judgment and the quantitative indicators. The pros and cons of the two approaches for evaluation of single scientific products are probably balanced, making it difficult to establish which would be preferable: the variables of context and the objectives of the evaluation could shift the weight in favour of one or the other. In fact, it is not an accident that many studies have demonstrated a positive correlation between peer quality esteem and frequency of citation’ (p. 503).

Conclusions

Nowadays, referees, editors, and programme managers are facing significantly increased demands. Whereas in the past their task was to filter out reliably that which did not meet a certain minimum standard (negative selection), today they have to choose the best from many, frequently very good, research applications (positive selection). Where the bodies who fund research are concerned, positive selection, unlike negative selection, is associated with a lower likelihood of supporting research projects (or people) which or who are later revealed to be less than successful. Where the applicants are concerned, however, positive selection can lead to the disappointing experience of having the funding for their project declined, although it later turns out, if they have acquired funding elsewhere, to be just as successful as funded projects [45].

There are good reasons to see ‘qualitative’ peer review and the ‘quantitative’ techniques of bibliometrics not as competing processes in scientific evaluation, but as two options with which to view a scientific work from different perspectives. It is only by linking both perspectives that a comprehensive view of the whole emerges, making possible credible statements about the quality of a scientific study.