A statistical approach to calibrating the scores of biased reviewers of scientific papers
- First Online:
- Cite this article as:
- Kuhlisch, W., Roos, M., Rothe, J. et al. Metrika (2016) 79: 37. doi:10.1007/s00184-015-0542-z
- 183 Downloads
Peer reviewing is the key ingredient of evaluating the quality of scientific work. Based on the review scores assigned by individual reviewers to papers, program committees of conferences and journal editors decide which papers to accept for publication and which to reject. A similar procedure is part of the selection process of grant applications and, among other fields, in sports. It is well known that the reviewing process suffers from measurement errors due to a lack of agreement among multiple reviewers of the same paper. And if not all papers are reviewed by all reviewers, the naive approach of averaging the scores is biased. Several statistical methods are proposed for aggregating review scores, which all can be realized by standard statistical software. The simplest method uses the well-known fixed-effects two-way classification with identical variances, while a more advanced method assumes different variances. As alternatives a mixed linear model and a generalized linear model are employed. The application of these methods implies an evaluation of the reviewers, which may help to improve reviewing processes. An application example with real conference data shows the potential of these statistical methods.