On peer review in computer science: analysis of its effectiveness and suggestions for improvement
- 751 Downloads
In this paper we focus on the analysis of peer reviews and reviewers behaviour in a number of different review processes. More specifically, we report on the development, definition and rationale of a theoretical model for peer review processes to support the identification of appropriate metrics to assess the processes main characteristics in order to render peer review more transparent and understandable. Together with known metrics and techniques we introduce new ones to assess the overall quality (i.e. ,reliability, fairness, validity) and efficiency of peer review processes e.g. the robustness of the process, the degree of agreement/disagreement among reviewers, or positive/negative bias in the reviewers’ decision making process. We also check the ability of peer review to assess the impact of papers in subsequent years. We apply the proposed model and analysis framework to a large reviews data set from ten different conferences in computer science for a total of ca. 9,000 reviews on ca. 2,800 submitted contributions. We discuss the implications of the results and their potential use toward improving the analysed peer review processes. A number of interesting results were found, in particular: (1) a low correlation between peer review outcome and impact in time of the accepted contributions; (2) the influence of the assessment scale on the way how reviewers gave marks; (3) the effect and impact of rating bias, i.e. reviewers who constantly give lower/higher marks w.r.t. all other reviewers; (4) the effectiveness of statistical approaches to optimize some process parameters (e.g. ,number of papers per reviewer) to improve the process overall quality while maintaining the overall effort under control. Based on the lessons learned, we suggest ways to improve the overall quality of peer-review through procedures that can be easily implemented in current editorial management systems.
KeywordsPeer review Quality metrics Reliability Fairness Validity Efficiency
Mathematics Subject Classification (2000)62-07 62P25 91C99
This paper is an extended version of the 12 pages paper titled “A Quantitative Analysis of Peer Review” presented at the 13th Conference of the International Society for Scientometrics and Informetrics, Durban (South Africa), 4–7 July 2011 (Ragone et al. 2011). We acknowledge the financial support of the Future and Emerging Technologies (FET) programme within the Seventh Framework Programme for Research of the European Commission for the LIQUIDPUB project under FET-Open grant number: 213360. We also want to acknowledge the anonymous reviewers of our manuscript. Their comments have really helped us to improve our work, underlying something that we knew already (and mention in our work): peer review is not only focused on filtering and selecting manuscripts to publish but also to provide constructive feedbacks to authors.
- Akst, J. (2010). I hate your paper. The Scientist, 24(8), 36–41.Google Scholar
- Barnes, J. (1981). Proof and the syllogism. In E. Berti (Ed.), Aristotle on science: The posterior analytics (pp. 17–59). Padua: Antenore.Google Scholar
- Bornmann, L., Wallon, G., & Ledin, A. (2008b). Does the committee peer review select the best applicants for funding? An investigation of the selection process for two European Molecular Biology Organization Programmes. PLoS ONE, 3. doi: 10.1371/journal.pone.0003480.
- Brink, D. (2008). Statistics. Fredriksberg: Ventus Publishing ApS.Google Scholar
- Cabanac, G., & Preuss, T. (2013). Capitalizing on order effects in the bids of peer-reviewed conferences to secure reviews by expert referees. Journal of the American Society for Information Science and Technology. doi: 10.1002/asi.22747.
- Ceci, S. J., & Peters, D. P. (1982). Peer review: A study of reliability. Climate Change, 14(6), 44–48.Google Scholar
- Cicchetti, D., & Sparrow, S. (1981). Developing criteria for establishing interrater reliability of specific items: Applications to assessment of adaptive behavior. American Journal of Mental Deficiency, 86, 127–137.Google Scholar
- Davidoff, F., DeAngelis, C., Drazen, J., et al. (2001). Sponsorship, authorship, and accountability. JAMA, 286(10), 1232–1234. doi: 10.1001/jama.286.10.1232/data/Journals/JAMA/4799/JED10056.pdf.
- Fisher, R. A. (1925). Statistical methods for research workers. Edinburgh: Oliver and Boyd.Google Scholar
- Grudin, J. (2010). Conferences, community, and technology: Avoiding a crisis. In iConference 2010.Google Scholar
- Lokker, C., McKibbon, K. A., McKinlay, R. J., Wilczynski, N. L., & Haynes, R. B. (2008). Prediction of citation counts for clinical articles at two years using data available within three weeks of publication: Retrospective cohort study. British Medical Journal, 336(76450), 655–657.CrossRefGoogle Scholar
- Ragone, A., Mirylenka, K., Casati, F., & Marchese, M. (2011). A quantitative analysis of peer review. In E. Noyons & P. Ngulube (Eds.), Proceedings of ISSI 2011—The 13th IIternational conference on scientometrics and Iiformetrics, South Africa, Durban, July 4–7, pp. 724–746.Google Scholar
- Shrout, P. E., & Fleiss, J. L. (1979). Intraclass correlations: Uses in assessing rater reliability. International Statistical Review, 86(2), 420–428.Google Scholar
- Tung, A. K. H. (2006). Impact of double blind reviewing on sigmod publication: A more detail analysis. SIGMOD Record, 35(3), 6–7.Google Scholar