Scientometrics

, Volume 108, Issue 3, pp 1651–1671

Do they agree? Bibliometric evaluation versus informed peer review in the Italian research assessment exercise

Article

Abstract

During the Italian research assessment exercise, the national agency ANVUR performed an experiment to assess agreement between grades attributed to journal articles by informed peer review (IR) and by bibliometrics. A sample of articles was evaluated by using both methods and agreement was analyzed by weighted Cohen’s kappas. ANVUR presented results as indicating an overall “good” or “more than adequate” agreement. This paper re-examines the experiment results according to the available statistical guidelines for interpreting kappa values, by showing that the degree of agreement (always in the range 0.09–0.42) has to be interpreted, for all research fields, as unacceptable, poor or, in a few cases, as, at most, fair. The only notable exception, confirmed also by a statistical meta-analysis, was a moderate agreement for economics and statistics (Area 13) and its sub-fields. We show that the experiment protocol adopted in Area 13 was substantially modified with respect to all the other research fields, to the point that results for economics and statistics have to be considered as fatally flawed. The evidence of a poor agreement supports the conclusion that IR and bibliometrics do not produce similar results, and that the adoption of both methods in the Italian research assessment possibly introduced systematic and unknown biases in its final results. The conclusion reached by ANVUR must be reversed: the available evidence does not justify at all the joint use of IR and bibliometrics within the same research assessment exercise.

Keywords

Informed peer review Research assessment Meta-analysis Bibliometric evaluation Italian VQR Peer review Cohen’s kappa 

Supplementary material

11192_2016_1929_MOESM1_ESM.pdf (587 kb)
Supplementary material 1 (PDF 587 kb)

References

  1. AA.VV. (2013). I voti all’università. La Valutazione della qualità della ricerca in Italia. MIlano: Corriere della Sera.Google Scholar
  2. Abramo, G., & D’Angelo, C. A. (2015). The VQR, Italy’s second national research assessment: Methodological failures and ranking distortions. Journal of the Association for Information Science and Technology.,. doi:10.1002/asi.23323.Google Scholar
  3. Aksnes, D. W., & Taxt, R. E. (2004). Peer reviews and bibliometric indicators: A comparative study at a Norwegian University. Research Evaluation, 13, 33–41. doi:10.3152/147154404781776563.CrossRefGoogle Scholar
  4. Allen, L., Jones, C., Dolby, K., Lynn, D., & Walport, M. (2009). Looking for landmarks: The role of expert review and bibliometric analysis in evaluating scientific publication outputs. PLoS ONE, 4(6), e5910. doi:10.1371/journal.pone.0005910.CrossRefGoogle Scholar
  5. Altman, D. G. (1991). Practical statistics for medical research. London: Chapman and Hall.Google Scholar
  6. Ancaiani, A., Anfossi, A. F., Barbara, A., Benedetto, S., Blasi, B., Carletti, V., et al. (2015). Evaluating scientific research in Italy: The 2004–10 research evaluation exercise. Research Evaluation, 24(3), 242–255. doi:10.1093/reseval/rvv008.CrossRefGoogle Scholar
  7. ANVUR. (2013). Rapporto finale. Valutazione della qualità della ricerca 2004-2010 (VQR 20042010). Roma. http://www.anvur.org/rapporto/.
  8. Baccini, A. (2014a). La VQR di Area 13: una riflessione di sintesi. Statistica & Società, 3(3), 32–37.MathSciNetGoogle Scholar
  9. Baccini, A. (2014b). Lo strano caso delle concordanze della VQR. http://www.roars.it/online/lo-strano-caso-delle-concordanze-della-vqr/. www.roars.it.
  10. Baccini, A. (2016). Napoléon et l’évaluation bibliométrique de la recherche. Considérations sur la réforme de l’université et sur l’action de l’agence national d’évaluation en Italie. Canadian Journal of Information and Library Science-Revue Canadienne des Sciences de l’Information et de Bibliotheconomie.Google Scholar
  11. Berghmans, T., Meert, A. P., Mascaux, C., Paesmans, M., Lafitte, J. J., & Sculier, J. P. (2003). Citation indexes do not reflect methodological quality in lung cancer randomised trials. Annals of Oncology, 14(5), 715–721. doi:10.1093/annonc/mdg203.CrossRefGoogle Scholar
  12. Bertocchi, G., Gambardella, A., Jappelli, T., Nappi, C. A., & Peracchi, F. (2013a). Bibliometric evaluation vs. informed peer review: Evidence from Italy. Department of Economics DEMB. University of Modena and Reggio Emilia, Department of Economics Marco Biagi.Google Scholar
  13. Bertocchi, G., Gambardella, A., Jappelli, T., Nappi, C. A., & Peracchi, F. (2013b). Bibliometric evaluation vs. informed peer review: Evidence from Italy. ReCent WP. Center for Economic Research, University of Modena and Reggio Emilia, Dept. of Economics Marco Biagi.Google Scholar
  14. Bertocchi, G., Gambardella, A., Jappelli, T., Nappi, C. A., & Peracchi, F. (2013c). Bibliometric evaluation vs. informed peer review: Evidence from Italy. IZA Discussion paper. Institute for the Study of Labour (IZA), Bonn.Google Scholar
  15. Bertocchi, G., Gambardella, A., Jappelli, T., Nappi, C. A., & Peracchi, F. (2013d). Bibliometric evaluation vs. informed peer review: Evidence from Italy. CEPR Discussion papers.Google Scholar
  16. Bertocchi, G., Gambardella, A., Jappelli, T., Nappi, C. A., & Peracchi, F. (2013e). Bibliometric evaluation vs. informed peer review: Evidence from Italy. CSEF working papers. Naples: Centre for Studies in Economics and Finance (CSEF).Google Scholar
  17. Bertocchi, G., Gambardella, A., Jappelli, T., Nappi, C. A., & Peracchi, F. (2014). Assessing Italian research quality: A comparison between bibliometric evaluation and informed peer review. In V. C. s. P. Portal (Ed.). www.voxeu.org. http://www.voxeu.org/article/research-quality-assessment-tools-lessons-italy. CEPR (Centre for Economic Policy Research).
  18. Bertocchi, G., Gambardella, A., Jappelli, T., Nappi, C. A., & Peracchi, F. (2015). Bibliometric evaluation vs. informed peer review: Evidence from Italy. Research Policy, 44(2), 451–466. doi:10.1016/j.respol.2014.08.004.CrossRefGoogle Scholar
  19. Cicero, T., Malgarini, M., Nappi, C. A., & Peracchi, F. (2013). Bibliometric and peer review methods for research evaluation: a methodological appraisement (in Italian). MPRA (Munich Personal REPEc Archive). Munich.Google Scholar
  20. Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1), 37–46. doi:10.1177/001316446002000104.CrossRefGoogle Scholar
  21. Cohen, J. (1968). Weighted kappa: Nominal scale agreement with provision for scaled disagreement or partial credit. Psychological Bulletin, 70(4), 213–220. doi:10.1037/h0026256.CrossRefGoogle Scholar
  22. De Nicolao, G. (2014). VQR da buttare? Persino ANVUR cestina i voti usati per l’assegnazione FFO 2013. http://www.roars.it/online/vqr-da-buttare-persino-anvur-cestina-i-voti-usati-per-lassegnazione-ffo-2013/.
  23. Fleiss, J. L., Levin, B., & Myunghee, C. P. (2003). Statistical methods for rates and proportions. Hoboken, NJ: Wiley.CrossRefMATHGoogle Scholar
  24. George, D., & Mallery, P. (2003). SPSS for windows step by step: A simple guide and reference (4th ed.). Boston: Allys & Bacon.Google Scholar
  25. HEFCE. (2015). The metric tide: Correlation analysis of REF2014 scores and metrics (Supplementary Report II to the Independent Review of the Role of Metrics in Research Assessment and Management).Google Scholar
  26. Koenig, M. E. D. (1983). Bibliometric indicators versus expert opinion in assessing research performance. Journal of the American Society for Information Science, 34, 136–145.CrossRefGoogle Scholar
  27. Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159–174.MathSciNetCrossRefMATHGoogle Scholar
  28. Lee, F. S. (2007). The research assessment exercise, the state and the dominance of mainstream economics in British universities. Cambridge Journal of Economics, 31(2), 309–325.CrossRefGoogle Scholar
  29. Lovegrove, B. G., & Johnson, S. D. (2008). Assessment of research performance in biology: How well do peer review and bibliometry correlate? BioScience, 58(2), 160–164. doi:10.1641/B580210.CrossRefGoogle Scholar
  30. McNay, I. (2011). Research assessment: Work in progress, or ‘la lutta continua’. In M. Saunders, P. Trowler, & V. Bamber (Eds.), Reconceptualising evaluation in higher education the practice turn (pp. 51–57). New York: McGRaw Hill.Google Scholar
  31. Mryglod, O., Kenna, R., Holovatch, Y., & Berche, B. (2015). Predicting results of the research excellence framework using departmental h-index. Scientometrics, 102(3), 2165–2180. doi:10.1007/s11192-014-1512-3.CrossRefGoogle Scholar
  32. RAE. (2005). RAE 2008. Guidance to panels. London: HEFCE. http://www.rae.ac.uk/pubs/2005/01/rae0105.pdf.
  33. Rinia, E. J., van Leeuwen, T. N., van Vuren, H. G., & van Raan, A. F. J. (1998). Comparative analysis of a set of bibliometric indicators and central peer review criteria: Evaluation of condensed matter physics in the Netherlands. Reseach Policy, 27(1), 95–107.CrossRefGoogle Scholar
  34. Sheskin, D. J. (2003). Handbook of parametric and nonparametric statistical procedures. London: Chapman & Hall.CrossRefMATHGoogle Scholar
  35. Spiegelhalter, D. J. (2005). Funnel plots for comparing institutional performance. Statistics in Medicine, 24(8), 1185–1202. doi:10.1002/sim.1970.MathSciNetCrossRefGoogle Scholar
  36. Stemler, S. E., & Tsai, J. (2008). Best practices in interrater reliability three common approaches. In J. Osborne (Ed.), Best practices in quantitative methods (pp. 29–49). Thousand Oaks: Sage.CrossRefGoogle Scholar
  37. Sun, S. (2011). Meta-analysis of Cohen’s kappa. Health Services and Outcomes Research Methodology, 11(3–4), 145–163. doi:10.1007/s10742-011-0077-3.CrossRefGoogle Scholar
  38. van Raan, A. F. J. (2006). Comparison of the Hirsch-index with standard bibliometric indicators and with peer judgment for 147 chemistry research groups. Scientometrics, 67(3), 491–502. doi:10.1556/Scient.67.2006.3.10.CrossRefGoogle Scholar
  39. Wouters, P., Thelwall, M., Kousha, K., Waltman, L., de Rijcke, S., Rushforth, A., et al. (2015). The metric tide: Literature review (Supplementary Report I to the Independent Review of the Role of Metrics in Research Assessment and Management). HEFCE.Google Scholar

Copyright information

© Akadémiai Kiadó, Budapest, Hungary 2016

Authors and Affiliations

  1. 1.Department of Economics and StatisticsUniversity of SienaSienaItaly
  2. 2.Department of Electrical, Computer and Biomedical EngineeringUniversity of PaviaPaviaItaly

Personalised recommendations