, Volume 97, Issue 2, pp 317–356 | Cite as

On peer review in computer science: analysis of its effectiveness and suggestions for improvement

  • Azzurra Ragone
  • Katsiaryna Mirylenka
  • Fabio Casati
  • Maurizio MarcheseEmail author


In this paper we focus on the analysis of peer reviews and reviewers behaviour in a number of different review processes. More specifically, we report on the development, definition and rationale of a theoretical model for peer review processes to support the identification of appropriate metrics to assess the processes main characteristics in order to render peer review more transparent and understandable. Together with known metrics and techniques we introduce new ones to assess the overall quality (i.e. ,reliability, fairness, validity) and efficiency of peer review processes e.g. the robustness of the process, the degree of agreement/disagreement among reviewers, or positive/negative bias in the reviewers’ decision making process. We also check the ability of peer review to assess the impact of papers in subsequent years. We apply the proposed model and analysis framework to a large reviews data set from ten different conferences in computer science for a total of ca. 9,000 reviews on ca. 2,800 submitted contributions. We discuss the implications of the results and their potential use toward improving the analysed peer review processes. A number of interesting results were found, in particular: (1) a low correlation between peer review outcome and impact in time of the accepted contributions; (2) the influence of the assessment scale on the way how reviewers gave marks; (3) the effect and impact of rating bias, i.e. reviewers who constantly give lower/higher marks w.r.t. all other reviewers; (4) the effectiveness of statistical approaches to optimize some process parameters (e.g. ,number of papers per reviewer) to improve the process overall quality while maintaining the overall effort under control. Based on the lessons learned, we suggest ways to improve the overall quality of peer-review through procedures that can be easily implemented in current editorial management systems.


Peer review Quality metrics Reliability Fairness Validity Efficiency 

Mathematics Subject Classification (2000)

62-07 62P25 91C99 



This paper is an extended version of the 12 pages paper titled “A Quantitative Analysis of Peer Review” presented at the 13th Conference of the International Society for Scientometrics and Informetrics, Durban (South Africa), 4–7 July 2011 (Ragone et al. 2011). We acknowledge the financial support of the Future and Emerging Technologies (FET) programme within the Seventh Framework Programme for Research of the European Commission for the LIQUIDPUB project under FET-Open grant number: 213360. We also want to acknowledge the anonymous reviewers of our manuscript. Their comments have really helped us to improve our work, underlying something that we knew already (and mention in our work): peer review is not only focused on filtering and selecting manuscripts to publish but also to provide constructive feedbacks to authors.


  1. Akst, J. (2010). I hate your paper. The Scientist, 24(8), 36–41.Google Scholar
  2. Barnes, J. (1981). Proof and the syllogism. In E. Berti (Ed.), Aristotle on science: The posterior analytics (pp. 17–59). Padua: Antenore.Google Scholar
  3. Bartko, J. J. (1966). The intraclass correlation coefficient as a measure of reliability. Psychological Reports, 19, 2–11.CrossRefGoogle Scholar
  4. Bartko, J. J. (1974). Corrective note to “the intraclass correlation coefficient as a measure of reliability”. Psychological Reports, 34, 418.CrossRefGoogle Scholar
  5. Benos, D. J., Bashari, E., Chaves, J. M., Gaggar, A., et al. (2007). The ups and downs of peer review. Advances in Physiology Education, 31(2), 145–152.CrossRefGoogle Scholar
  6. Birman, K., & Schneider, F. (2009). Program committee overload in systems. Communications of the ACM, 52(5), 34–37.CrossRefGoogle Scholar
  7. Bollen, J., Van de Sompel, H., Smith, J., & Luce, R. (2005). Toward alternative metrics of journal impact: A comparison of download and citation data. Information Processing & Management, 41(6), 1419–1440.CrossRefGoogle Scholar
  8. Bornmann, L. (2007). Bias cut: Women, it seems, often get a raw deal in science—So how can discrimination be tackled?. Nature, 445, 566.CrossRefGoogle Scholar
  9. Bornmann, L., & Daniel, H. D. (2005a). Committee peer review at an international research foundation: Predictive validity and fairness of selection decisions on post-graduate fellowship applications. Research Evaluation, 14(1), 15–20.CrossRefGoogle Scholar
  10. Bornmann, L., & Daniel, H. D. (2005b). Selection of research fellowship recipients by committee peer review. Reliability, fairness and predictive validity of board of trustees’ decisions. Scientometrics, 63(2), 297–320.CrossRefGoogle Scholar
  11. Bornmann, L., & Daniel, H. D. (2010a). Reliability of reviewers’ ratings when using public peer review: A case study. Learned Publishing, 23(2), 124–131.CrossRefGoogle Scholar
  12. Bornmann, L., & Daniel, H. D. (2010b). The validity of staff editors initial evaluations of manuscripts: A case study of angewandte chemie international edition. Scientometrics, 85(3), 681–687.CrossRefGoogle Scholar
  13. Bornmann, L., Mutz, R., & Daniel, H. D. D. (2008a). How to detect indications of potential sources of bias in peer review: A generalized latent variable modeling approach exemplified by a gender study. Journal of Informetrics, 2(4), 280–287.CrossRefGoogle Scholar
  14. Bornmann, L., Wallon, G., & Ledin, A. (2008b). Does the committee peer review select the best applicants for funding? An investigation of the selection process for two European Molecular Biology Organization Programmes. PLoS ONE, 3. doi: 10.1371/journal.pone.0003480.
  15. Bornmann, L., Wolf, M., & Daniel, H. D. (2012). Closed versus open reviewing of journal manuscripts: How far do comments differ in language use? Scientometrics, 91(3), 843–856. doi. 10.1007/s11192-011-0569-5.
  16. Brink, D. (2008). Statistics. Fredriksberg: Ventus Publishing ApS.Google Scholar
  17. Cabanac, G., & Preuss, T. (2013). Capitalizing on order effects in the bids of peer-reviewed conferences to secure reviews by expert referees. Journal of the American Society for Information Science and Technology. doi: 10.1002/asi.22747.
  18. Ceci, S., & Williams, W. (2011). Understanding current causes of women’s underrepresentation in science. Proceedings of the National Academy of Sciences, 108(8), 3157–3162.CrossRefGoogle Scholar
  19. Ceci, S. J., & Peters, D. P. (1982). Peer review: A study of reliability. Climate Change, 14(6), 44–48.Google Scholar
  20. Chen, J., & Konstan, J. A. (2010). Conference paper selectivity and impact. Communications of the ACM, 53(6), 79–83. doi: 10.1145/1743546.1743569.CrossRefGoogle Scholar
  21. Cicchetti, D., & Sparrow, S. (1981). Developing criteria for establishing interrater reliability of specific items: Applications to assessment of adaptive behavior. American Journal of Mental Deficiency, 86, 127–137.Google Scholar
  22. Cicchetti, D. V., Lord, C., Koenig, K., Klin, A., & Volkmar, F. R. (2008). Reliability of the autism diagnostic interview: Multiple examiners evaluate a single case. Journal of Autism and Developmental Disorders, 36(4), 764–770.CrossRefGoogle Scholar
  23. Cohen, J. (1960). A coefficient of agreement for nominal scales. Education and Psychological Measurement, XX(1), 37–46.CrossRefGoogle Scholar
  24. Davidoff, F., DeAngelis, C., Drazen, J., et al. (2001). Sponsorship, authorship, and accountability. JAMA, 286(10), 1232–1234. doi: 10.1001/jama.286.10.1232/data/Journals/JAMA/4799/JED10056.pdf.
  25. Donner, A. (1986). A review of inference procedures for the intraclass correlation coefficient in the one-way random effects model. International Statistical Review, 54(1), 67–82.MathSciNetCrossRefzbMATHGoogle Scholar
  26. Ebel, R. L. (1951). Estimation of the reliability of ratings. Psychometrika, 16(4), 407–424.CrossRefGoogle Scholar
  27. Fisher, R. A. (1925). Statistical methods for research workers. Edinburgh: Oliver and Boyd.Google Scholar
  28. Fleiss, J. L. (1971). Measuring nominal scale agreement among many raters. Psychological Bulletin, 76(5), 378–382.CrossRefGoogle Scholar
  29. Freyne, J., Coyle, L., Smyth, B., & Cunningham, P. (2010). Relative status of journal and conference publications in computer science. Communications of the ACM, 53(11), 124–132. doi: 10.1145/1839676.1839701.CrossRefGoogle Scholar
  30. Godlee, F., Gale, C. R., & Martyn, C. N. (1998). Effect on the quality of peer review of blinding reviewers and asking them to sign their reports a randomized controlled trial. JAMA, 280(3), 237–240.CrossRefGoogle Scholar
  31. Goodman, S. N., Berlin, J., Fletcher, S. W., & Fletcher, R. H. (1994). Manuscript quality before and after peer review and editing at annals of internal medicine. Annals of Internal Medicine, 121(1), 11–21.CrossRefGoogle Scholar
  32. Grudin, J. (2010). Conferences, community, and technology: Avoiding a crisis. In iConference 2010.Google Scholar
  33. Hosmer, D. W., & Lemeshow, S. (2000). Applied logistic regression. Chichester: Wiley.CrossRefzbMATHGoogle Scholar
  34. Ingelfinger, F. J. (1974). Peer review in biomedical publication. American Journal of Medicine, 56(5), 686–692.CrossRefGoogle Scholar
  35. Jacso, P. (2010). Metadata mega mess in Google Scholar. Online Information Review, 34(1), 175–191.CrossRefGoogle Scholar
  36. Jefferson, T., Alderson, P., Wager, E., & Davidoff, F. (2002a). Effects of editorial peer review: A systematic review. JAMA, 287(21), 2784–2786.CrossRefGoogle Scholar
  37. Jefferson, T., Wager, E., & Davidoff, F. (2002b). Measuring the quality of editorial peer review. JAMA, 287(21), 2786–2790.CrossRefGoogle Scholar
  38. Kassirer, J. P., & Campion, E. W. (1994). Peer review: Crude and understudied, but indispensable. Journal of American Medical Association, 272(2), 96–97.CrossRefGoogle Scholar
  39. Katz, D. S., Proto, A. V., & Olmsted, W. W. (2002). Incidence and nature of unblinding by authors: Our experience at two radiology journals with double-blinded peer review policies. The American Journal of Roentgenology, 179, 1415–1417.CrossRefGoogle Scholar
  40. Kendall, M. G. (1938). A new measure of rank correlation. Biometrika, 30(1–2), 81–93.MathSciNetzbMATHGoogle Scholar
  41. Krapivin, M., Marchese, M., & Casati, F. (2010). Exploring and understanding citation-based scientific metrics. Advances in Complex Systems, 13(1), 59–81.MathSciNetCrossRefzbMATHGoogle Scholar
  42. Kruskal, W. H., & Wallis, W. A. (1952). Use of ranks in one-criterion variance analysis. Journal of the American Statistical Association, 47(260), 583–621.CrossRefzbMATHGoogle Scholar
  43. Li, X., Thelwall, M., & Giustini, D. (2012). Validating online reference managers for scholarly impact measurement. Scientometrics 91(2), 461–471. doi: 10.1007/s11192-011-0580-x. Scholar
  44. Link, A. M. (1998). US and non-US submissions an analysis of reviewer bias. JAMA, 280(3), 246–247.CrossRefGoogle Scholar
  45. Lock, S. (1994). Does editorial peer review work?. Annals of Internal Medicine, 121(1), 60–61.CrossRefGoogle Scholar
  46. Lokker, C., McKibbon, K. A., McKinlay, R. J., Wilczynski, N. L., & Haynes, R. B. (2008). Prediction of citation counts for clinical articles at two years using data available within three weeks of publication: Retrospective cohort study. British Medical Journal, 336(76450), 655–657.CrossRefGoogle Scholar
  47. Madden, S., & DeWitt, D. (2006). Impact of double-blind reviewing on sigmod publication rates. ACM SIGMOD Record, 35(2), 29–32.CrossRefGoogle Scholar
  48. McGraw, K. O., & Wong, S. P. (1996). Forming inferences about some intraclass correlation coefficients. Psychological Methods, 1(1), 30–46.CrossRefGoogle Scholar
  49. Montgomery, A., Graham, A., Evans. P., & Fahey, T. (2002). Inter-rater agreement in the scoring of abstracts submitted to a primary care research conference. BMC Health Services Research, 2(1), 8.CrossRefGoogle Scholar
  50. Ragone, A., Mirylenka, K., Casati, F., & Marchese, M. (2011). A quantitative analysis of peer review. In E. Noyons & P. Ngulube (Eds.), Proceedings of ISSI 2011—The 13th IIternational conference on scientometrics and Iiformetrics, South Africa, Durban, July 4–7, pp. 724–746.Google Scholar
  51. Reinhart, M. (2009). Peer review of grant applications in biology and medicine. Reliability, fairness, and validity. Scientometrics, 81(3), 789–809.CrossRefGoogle Scholar
  52. Shrout, P. E., & Fleiss, J. L. (1979). Intraclass correlations: Uses in assessing rater reliability. International Statistical Review, 86(2), 420–428.Google Scholar
  53. Smith, R. (2006). Peer review: A flawed process at the heart of science and journals. JRSM, 99(4), 178.CrossRefGoogle Scholar
  54. Spier, R. (2002). The history of the peer-review process. Trends in Biotechnology, 20(8), 357–358.CrossRefGoogle Scholar
  55. Tung, A. K. H. (2006). Impact of double blind reviewing on sigmod publication: A more detail analysis. SIGMOD Record, 35(3), 6–7.Google Scholar
  56. van Rooyen, S., Godlee, F., Evans, S., Black, N., & Smith, R. (1999). Effect of open peer review on quality of reviews and on reviewers’ recommendations: A randomised trial. British Medical Journal, 318, 23–27.CrossRefGoogle Scholar
  57. Walsh, E., Rooney, M., Appleby, L., & Wilkinson, G. (2000). Open peer review: A randomised controlled trial. The British Journal of Psychiatry, 176, 47–51.CrossRefGoogle Scholar
  58. Welch, B. L. (1947). The generalization of student’s problem when several different population variances are involved. Biometrika, 34(1/2), 28–35.MathSciNetCrossRefzbMATHGoogle Scholar
  59. Wenneras, C., & Wold, A. (1997). Nepotism and sexism in peer-review. Nature, 387, 341–343.CrossRefGoogle Scholar
  60. Zuckerman, H., & Merton, R. (1971). Patterns of evaluation in science: Institutionalisation, structure and functions of the referee system. Minerva, 9, 66–100. doi: 10.1007/BF01553188.CrossRefGoogle Scholar

Copyright information

© Akadémiai Kiadó, Budapest, Hungary 2013

Authors and Affiliations

  • Azzurra Ragone
    • 1
  • Katsiaryna Mirylenka
    • 1
  • Fabio Casati
    • 1
  • Maurizio Marchese
    • 1
    Email author
  1. 1.Department of Information Engineering and Computer ScienceUniversity of TrentoTrentoItaly

Personalised recommendations