In this work, we try to answer the question of which method, peer review versus bibliometrics, better predicts the future overall scholarly impact of scientific publications. We measure the agreement between peer review evaluations of Web of Science indexed publications submitted to the first Italian research assessment exercise and long-term citations of the same publications. We do the same for an early citation-based indicator. We find that the latter shows stronger predictive power, i.e. it more reliably predicts late citations in all the disciplinary areas examined, and for any citation time window starting 1 year after publication.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Price excludes VAT (USA)
Tax calculation will be finalised during checkout.
In this work, we deal with the evaluation of research output encoded in written form and indexed in bibliographic repertories (such as Scopus and Web of Science, WoS), i.e. after fellow specialists have assessed its suitability for publication.
To account for time devoted to teaching activities, 1 professor equals 0.5 FTE.
More precisely, Martin and Irvine (1983) argue that “it is necessary to distinguish between, not two, but three concepts—the “quality”, “importance”, and “impact” of the research described in a paper”, whereby “importance” of a publication refers to its potential influence on the advance of scientific knowledge.
ISO (the International Organization for Standardization) is a worldwide federation of national standards bodies. ISO 9000: 2015 is accessible at https://www.iso.org/obp/ui/#iso:std:iso:9000:ed-4:v1:en, last accessed 12 June, 2019.
We take this opportunity to thank the CWTS for making its database accessible to us for research purposes.
For reasons of space, we omit reporting the case of BS_2Y score because, as expected, results are just between those for BS_1Y score and BS_3Y score.
Abramo, G. (2018). Revisiting the scientometric conceptualization of impact and its measurement. Journal of Informetrics, 12(3), 590–597.
Abramo, G., Cicero, T., & D’Angelo, C. A. (2011a). Assessing the varying level of impact measurement accuracy as a function of the citation window length. Journal of Informetrics, 5(4), 659–667.
Abramo, G., Cicero, T., & D’Angelo, C. A. (2012). The dispersion of research performance within and between universities as a potential indicator of the competitive intensity in higher education systems. Journal of Informetrics, 6(2), 155–168.
Abramo, G., Cicero, T., & D’Angelo, C. A. (2013a). National peer-review research assessment exercises for the hard sciences can be a complete waste of money: the Italian case. Scientometrics, 95(1), 311–324.
Abramo, G., & D’Angelo, C. A. (2016). Refrain from adopting the combination of citation and journal metrics to grade publications, as used in the Italian national research assessment exercise (VQR 2011-2014). Scientometrics, 109(3), 2053–2065.
Abramo, G., D’Angelo, C. A., & Di Costa, F. (2011b). National research assessment exercises: a comparison of peer review and bibliometrics rankings. Scientometrics, 89(3), 929–941.
Abramo, G., D’Angelo, C. A., & Felici, G. (2019). Predicting long-term publication impact through a combination of early citations and journal impact factor. Journal of Informetrics, 13(1), 32–49.
Abramo, G., D’Angelo, C. A., & Rosati, F. (2015). The determinants of academic career advancement: evidence from Italy. Science and Public Policy, 42(6), 761–774.
Abramo, G., D’Angelo, C. A., & Viel, F. (2013b). Selecting competent referees to assess research projects proposals: a study of referees’ registers. Research Evaluation, 22(1), 41–51.
Aksnes, D.W., Langfeldt, L., & Wouters, P. (2019). Citations, citation indicators, and research quality: An overview of basic concepts and theories. SAGE Open, January–March, 1–17.
Aksnes, D. W., & Taxt, R. E. (2004). Peer reviews and bibliometric indicators: A comparative study at Norvegian University. Research Evaluation, 13(1), 33–41.
Alfò, M., Benedetto, S., Malgarini, M., & Scipione, S. (2017). On the use of bibliometric information for assessing articles quality: an analysis based on the third Italian research evaluation exercise. In 2017 STI conference, Paris.
Allen, L., Jones, C., Dolby, K., Lynn, D., & Walport, M. (2009). Looking for landmarks: The role of expert review and bibliometric analysis in evaluating scientific publication outputs. PLoS ONE, 4(6), e5910.
Ancaiani, A., Anfossi, A. F., Barbara, A., Benedetto, S., Blasi, B., Carletti, V., et al. (2015). Evaluating scientific research in Italy: The 2004–10 research evaluation exercise. Research Evaluation, 24(3), 242–255.
ANVUR. (2013). Valutazione della qualità della ricerca 2004–2010. Rapporto finale. http://www.anvur.it/rapporto/. Last Accessed 12 June 2019.
Baccini, A., Barabesi, L., & De Nicolao, G. (2018). The Holy Grail and the bad sampling: a test for the homogeneity of missing proportions for evaluating the agreement between peer review and bibliometrics in the Italian research assessment exercises. arXiv:1810.12430v1.
Baccini, A., & De Nicolao, G. (2016). Do they agree? Bibliometric evaluation versus informed peer review in the Italian research assessment exercise. Scientometrics, 108(3), 1651–1671.
Bertocchi, G., Gambardella, A., Jappelli, T., Nappi, C. A., & Peracchi, F. (2015). Bibliometric evaluation versus informed peer review: Evidence from Italy. Research Policy, 44(2), 451–466.
Bornmann, L. (2011). Scientific peer review. Annual Review of Information Science and Technology, 45, 199–245.
Bornmann, L., & Daniel, H.-D. (2005). Does the h-index for ranking of scientists really work? Scientometrics, 65(3), 391–392.
Bornmann, L., & Daniel, H.-D. (2008). What do citation counts measure? A review of studies on citing behavior. Journal of Documentation, 64(1), 45–80.
Bornmann, L., & Leydesdorff, L. (2013). The validation of (advanced) bibliometric indicators through peer assessments: A comparative study using data from InCites and F1000. Journal of Informetrics, 7(2), 286–291.
Cabezas-Clavijo, Á., Robinson-García, N., Escabias, M., & Jiménez-Contreras, E. (2013). Reviewers’ ratings and bibliometric indicators: Hand in hand when assessing over research proposals? PLoS ONE, 8(6), e68258.
Cetina, K. K. (1981). The manufacture of knowledge: An essay on the constructivist and contextual nature of science. New York: Pergamon Press.
Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1), 37–46.
Cohen, J. (1968). Weighted kappa: Nominal scale agreement with provision for scaled disagreement or partial credit. Psychological Bulletin, 70(4), 213–220.
Cole, J. R., & Cole, S. (1973). Social stratification in science. Chicago: University of Chicago Press.
Cuccurullo, F. (2006). La valutazione triennale della ricerca–VTR del CIVR. Analysis, 3(4), 5–7.
Fleiss, J. L., Levin, B., & Myunghee, C. P. (2003). Statistical methods for rates and proportions. Hoboken, NJ: Wiley.
Franceschet, M., & Costantini, A. (2011). The first Italian research assessment exercise: A bibliometric perspective. Journal of Informetrics, 5(2), 275–291.
Garfield, E. (1979). Citation indexing-its theory and application in science, technology, and humanities. New York, NY: Wiley.
Garfield, E. (1980). Premature discovery or delayed recognition: Why? Current Contents, 21, 5–10.
Glänzel, W. (2008). Seven myths in bibliometrics. About facts and fiction in quantitative science studies. In H. Kretschmer & F. Havemann (Eds.), Proceedings of WIS fourth international conference on webometrics, informetrics and scientometrics & ninth COLLNET meeting. Berlin: Institute for Library and Information Science.
Harnad, S. (2008). Validating research performance metrics against peer rankings. Ethics in Science and Environmental Politics, 8(1), 103–107.
Herrmannova, D., Patton, R., Knoth, P., & Stahl, C. (2018). Do citations and readership identify seminal publications? Scientometrics, 115(1), 239–262.
Horrobin, D. F. (1990). The philosophical basis of peer review and the suppression of innovation. Journal of the American Medical Association, 263(10), 1438–1441.
Ke, Q., Ferrara, E., Radicchi, F., & Flammini, A. (2015). Defining and identifying sleeping beauties in science. Proceedings of the National Academy of Sciences, 112(24), 7426–7431.
Kreiman, G., & Maunsell, J. H. R. (2011). Nine criteria for a measure of scientific output. Frontiers in Computational Neuroscience, 5(48), 11.
Kulczycki, E., Korzeń, M., & Korytkowski, P. (2017). Toward an excellence-based research funding system: Evidence from Poland. Journal of Informetrics, 11(1), 282–298.
Latour, B. (1987). Science in action: How to follow scientists and engineers through society. Cambridge, MA: Harvard University Press.
Leydesdorff, L., Bornmann, L., Comins, J. A., & Milojević, S. (2016). Citations: Indicators of quality? The impact fallacy. Frontiers in Research Metrics and Analytics, 1(1), 1–15.
Lin, L. I.-K. (1989). A concordance correlation coefficient to evaluate reproducibility. Biometrics, 45(1), 255–268.
Lin, L. I.-K. (2000). Erratum: A note on the concordance correlation coefficient (biometrics (1989) (214)). Biometrics, 56(1), 324–325.
Mahdi, S., D’Este, P., & Neely, A. (2008). Citation counts: are they good predictors of RAE scores? Technical Report February. Advanced Institute of Management Research. https://doi.org/10.2139/ssrn.1154053.
Martin, B. R., & Irvine, J. (1983). Assessing basic research: Some partial indicators of scientific progress in radio astronomy. Research Policy, 12(2), 61–90.
McBride, G. B. (2005). A proposal for strength-of-agreement criteria for lins concordance correlation coefficient. NIWA Client Report, HAM2005-062.
Meho, L. I., & Sonnenwald, D. H. (2000). Citation ranking versus peer evaluation of senior faculty research performance: a case study of Kurdish Scholarship. Journal of the American Society for Information Science, 51(2), 123–138.
Merton, R. K. (1973). Priorities in scientific discovery. In R. K. Merton (Ed.), The sociology of science: Theoretical and empirical investigations (pp. 286–324). Chicago: University of Chicago Press.
Mingers, J., & Leydesdorff, L. (2015). A review of theory and practice in scientometrics. European Journal of Operational Research, 246(1), 1–19.
Moxam, H., & Anderson, J. (1992a). Peer review. A view from the inside. Science and Technology Policy, 5(1), 7–15.
Moxam, H., & Anderson, J. (1992b). Peer review. A view from the inside. Science and Technology Policy, 5(1), 7–15.
Mryglod, O., Kenna, R., Holovatch, Y., & Berche, B. (2015). Predicting results of the research excellence framework using departmental h-index: revisited. Scientometrics, 104(3), 1013–1017.
Oppenheim, C. (1997). The correlation between citation counts and the 1992 research assessment exercise ratings for British research in genetics, anatomy and archaeology. Journal of Documentation, 53(5), 477–487.
Oppenheim, C., & Norris, M. (2003). Citation counts and the research assessment exercise V: Archaeology and the 2001 RAE. Journal of Documentation, 56(6), 709–730.
Pendlebury, D. A. (2009). The use and misuse of journal metrics and other citation indicators. Scientometrics, 57(1), 1–11.
Pichappan, P., & Sarasvady, S. (2002). The other side of the coin: The intricacies of author self-citations. Scientometrics, 54(2), 285–290.
Pride, D., & Knoth, P. (2018). Peer review and citation data in predicting university rankings, a large-scale analysis. In International conference on theory and practice of digital libraries, TPDL 2018: Digital libraries for open knowledge, 195–207. https://doi.org/10.1007/978-3-030-00066-0_17. Last Accessed 12 June 2019.
Reale, E., Barbara, A., & Costantini, A. (2007). Peer review for the evaluation of academic research: Lessons from the Italian experience. Research Evaluation, 16(3), 216–228.
Reale, E., & Zinilli, A. (2017). Evaluation for the allocation of university research project funding: Can rules improve the peer review? Research Evaluation, 26(3), 190–198.
Rinia, E. J., van Leeuwen, T., van Vuren, H. G., & van Raan, A. F. J. (1998). Comparative analysis of a set of bibliometric indicators and central peer-review criteria: Evaluation of condensed matter physics in the Netherlands. Research Policy, 27(1), 95–107.
Sheskin, D. J. (2003). Handbook of parametric and nonparametric statistical procedures. London: Chapman & Hall.
Sugimoto, C. R., & Larivière, V. (2018). Measuring research. Oxford: Oxford University Press.
Taylor, J. (2011a). The assessment of research quality in UK universities: Peer review or metrics? British Journal of Management, 22(2), 202–217.
Taylor, J. (2011b). The assessment of research quality in UK universities: Peer review or metrics? British Journal of Management, 22(2), 202–217.
Thomas, P. R., & Watkins, D. S. (1998). Institutional research rankings via bibliometric analysis and direct peer-review: A comparative case study with policy implications. Scientometrics, 41(3), 335–355.
Traag, V. A., & Waltman, L. (2019). Systematic analysis of agreement between metrics and peer review in the UK REF. London: Palgrave Communications.
van Raan, A. F. J. (2004). Sleeping beauties in science. Scientometrics, 59(3), 461–466.
van Raan, A. F. J. (2006). Comparison of the Hirsch-index with standard bibliometric indicators and with peer judgment for 147 chemistry research groups. Scientometrics, 67(3), 491–502.
Vieira, E. S., Cabral, J. A. S., & Gomes, J. A. N. F. (2014a). Definition of a model based on bibliometric indicators for assessing applicants to academic positions. Journal of the Association for Information Science and Technology, 65(3), 560–577.
Vieira, E. S., Cabral, J. A. S., & Gomes, J. A. N. F. (2014b). How good is a model based on bibliometric indicators in predicting the final decisions made by peers? Journal of Informetrics, 8(2), 390–405.
Vieira, E. S., & Gomes, J. A. N. F. (2018). The peer-review process: The most valued dimensions according to the researcher’s scientific career. Research Evaluation, 27(3), 246–261.
Wilsdon, J., Allen, L., Belfiore, E., Campbell, P., Curry, S., Hill, S., et al. (2015). The Metric Tide: Report of the independent review of the role of metrics in research assessment and management. Bristol: HEFCE.
About this article
Cite this article
Abramo, G., D’Angelo, C.A. & Reale, E. Peer review versus bibliometrics: Which method better predicts the scholarly impact of publications?. Scientometrics 121, 537–554 (2019). https://doi.org/10.1007/s11192-019-03184-y