Skip to main content
Log in

Reply to the comment of Bertocchi et al.

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

The aim of this note is to reply to Bertocchi et al.’s comment to our paper “Do they agree? Bibliometric evaluation versus informed peer review in the Italian research assessment exercise”. Our paper analyzed results of the experiment conducted by the Italian governmental agency ANVUR during the research assessment exercise about the agreement between informed peer review (IR) and bibliometrics. We argued that according to available statistical guidelines, results of the experiment are indicative of a poor agreement in all research fields with only one exception, results reached in the so called Area 13 (economics and statistics). We argued that this difference was due to the changes introduced in Area 13 with respect to the protocol adopted in all the other areas. Bertocchi et al.’s comment dismiss our explanation and suggest that the difference was due to “differences in the evaluation processes between Area 13 and other areas”. In addition, they state that all our five claims about Area 13 experiment protocol “are either incorrect or not based on any evidence”. Based on textual evidence drawn from ANVUR official reports, we show that: (1) none of the four differences listed by Bertocchi et al. is peculiar of Area 13; (2) their five arguments contesting our claims about the experiment protocol are all contradicted by official records of the experiment itself.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Notes

  1. We agree with Bertocchi et al. that final results of the assessment cannot be compared, as we clearly stated on p. 6 of our paper: “there was lack of comparability not only between Areas but also between research fields inside the same research Area”. It is therefore a bit disappointing to read on p. 2 of Bertocchi et al.’s comment that “A crucial point of the VQR is that evaluations cannot be compared directly across research areas (which differ in terms of publication standards, publication types, refereeing style, citations, etc.). The entire [Baccini and De Nicolao's] paper is instead based on such comparison.” The absence in our paper of any comparison of results of the research assessment is something that can be verified very easily, given that we did not present any result of the research assessment at all.

  2. "Citation may improve the merit class than that of the journal in which the article is published, or at least lead to a more detailed examination of the article in case of significant difference between the merit class of the journal and the indicator given by the number of citations”. Faq n. 13 in http://www.anvur.org/attachments/article/77/gev01_faq.pdf.

  3. http://www.anvur.org/rapporto/files/Area01/VQR2004-2010_Area01_RapportoFinale.pdf; http://www.anvur.org/rapporto/files/Area08/VQR2004-2010_Area08_RapportoFinale.pdf.

  4. http://www.anvur.org/rapporto/files/Area13/VQR2004-2010_Area13_Appendici.pdf.

  5. “The [panel] will evaluate all the journal articles by bibliometrics, and at least the 10 % of the same articles also by peer review” [translation by the authors]; http://www.anvur.org/attachments/article/92/gev13_criteri.pdf, p. 3

  6. “Starting from that date [4th September 2012, the date of publication of the final journal ranking], preceding the beginning of the peer review evaluation, the list was not integrated or corrected” (p. 5 of http://www.anvur.org/rapporto/files/Area13/VQR2004-2010_Area13_RapportoFinale.pdf) and again: “Peer review evaluation took place from the end of September 2012 to February 2013” (p. 116 of http://www.anvur.org/rapporto/files/Area13/VQR2004-2010_Area13_Appendici.pdf) [Translation by the authors].

  7. This is stated on the panel official document dated of 2nd April 2012 available here: http://www.anvur.org/attachments/article/92/gev13_allegati.zip.

  8. Data provided in their comments are at odds with data published in ANVUR final reports. They wrote: “The panel received 6816 journal articles for evaluation”. The final reports of Area 13 instead recorded 7457 journal articles (Table 1.7 and 2.5) submitted and 7442 evaluated by bibliometrics or peer review (Table 5.5) http://www.anvur.org/rapporto/files/Area13/VQR2004-2010_Area13_RapportoFinale.pdf.

  9. ibidem, p. 60.

  10. Ibidem, p. 64. Translation by the authors.

  11. For highlighting the procedural differences between Area 13 and all the other Areas, it is useful to note that a couples of area reports disclosed the total number of consensus group activated. According to the Area 7 report “938 Consensus groups were activated out of 9.878 evaluated products” http://www.anvur.org/rapporto/files/Area07/VQR2004-2010_Area07_RapportoFinale.pdf, p. 31; analogously in Area 9, 1610 consensus groups were activated out of 7500 research products evaluated, http://www.anvur.org/rapporto/files/Area09/VQR2004-2010_Area09_RapportoFinale.pdf, p. 20.

  12. It is not relevant that the choices of consensus groups were partially constrained. The existence of a constrained procedure and of a web platform, to the best of our knowledge, was disclosed for the first time in the Bertocchi et al.’s comment.

  13. The number 15 refers to the number of articles of the experiment for which a three-class disagreement between reviewers was registered. Bertocchi et al. indicate a wrong table (Table 11) as the source of this datum. It can be found on Table 12 of (Bertocchi et al. 2015).

  14. Analogously, they wrote that we made “the allegation that [Area 13] panel has manipulated the data”. Also this allegation is false. The wording “manipulated experiment” is a technical expression that indicates that the conditions of an experiment were deliberately changed by experimenters. In this case the expression simply means that the protocol of the Area 13 experiment was deliberately modified by the panel.

  15. Bertocchi et al. (2015) were aware of at least one of the problems of the experiment: “the influence exerted on the reviewers by the information on the publication outlet implies that, in our study, assessment by bibliometric analysis and peer review are not independent”. If this is true, it is really difficult to understand how they could draw the following policy conclusion: “the agencies that run these evaluations could feel confident about using bibliometric evaluations and interpret the results as highly correlated with what they would obtain if they performed informed peer review”.

  16. These data are drawn from pp. 4–5 of http://www.anvur.org/rapporto/files/Area13/VQR2004-2010_Area13_RapportoFinale.pdf.

  17. Valutazione della Qualità della Ricerca 2004‐2010 (VQR 2004‐2010) Comunicato del GEV13 del 2 aprile 2012. http://www.anvur.org/attachments/article/92/gev13_allegati.zip [English translation by the authors].

  18. “Please note that these classifications should not in any way be considered exhaustive of the disciplines of interest of the Area 9, but just want to provide a simple aid to the authors, reducing the effort during product selection. It is possible to submit for evaluation … articles published in journals classified in WoS and not included in the published lists. [In these cases] The classification of the journal will be carried out ex post”. http://www.anvur.org/attachments/article/87/gev09_criteri.pdf.

  19. Because, according to the panel, these journals published only reviews. As a consequence the distribution of journals according to IF changed with respect to the simple distribution reachable through JCR.

  20. This is the description of the use of the citation thresholds in Area 1: “One way to get a rough idea of the thresholds is to go on the ISI Web of Science … site and do an advanced search by specifying year of publication and Subject Area (for example, “Mathematics”). Then refine your search by selecting a Subject Category (for example, still “Mathematics”), sort the list of results based on the number of citations, and you will find the number of citations that allows it to be, respectively, in the first 20 %, in the second 20 %, and in the upper half of this ordered list. An example of the calculated threshold values in this way for the year 2009 and the Subject Category "Mathematics applied" is in ANVUR document http://www.anvur.org/sites/anvur-miur/files/la_bibliometria_della_vqr.pdf. Remember, however, that the threshold values that will be used in VQR may be different from those so determined, because they will be calculated based on citations received 31 December 2011”. FAQ n. 12 in http://www.anvur.org/attachments/article/77/gev01_faq.pdf [Translation by the authors].

  21. http://www.anvur.org/rapporto/files/Appendici/VQR2004-2010_AppendiceB.pdf, p. 5 [Translation by the authors].

  22. In Italian: “Valutazione di sintesi dei giudizi del primo e secondo revisore”.

  23. Appendices to the Final reports of Area 1–9 are available here: http://www.anvur.org/rapporto/ under the head “Rapporti di Area”.

  24. In their comment Bertocchi et al. wrote that Consensus groups might have “effectively graded” the papers” when “the Consensus group disagreed on the arithmetic average of the score”, without defining if this disagreement might happen also for two reviewers who agreed on the final merit class of an article but by giving different scores.

  25. http://www.anvur.org/rapporto/files/Area13/VQR2004-2010_Area13_Appendici.pdf, p. 52.

  26. http://www.anvur.org/rapporto/files/Area13/VQR2004-2010_Area13_Appendici.pdf, p. 52. See also p. 459 of Bertocchi et al. (2015) where P is called “final evaluation of the Consensus group”.

  27. http://www.anvur.org/rapporto/files/Area13/VQR2004-2010_Area13_Appendici.pdf, p. 65 [Translation by the authors].

  28. http://www.anvur.org/rapporto/files/Area13/VQR2004-2010_Area13_RapportoFinale.pdf, p. 15, This description of the procedure was already reported in the footnote n. 26 of Baccini and De Nicolao (2016).

References

  • Baccini, A. (2016). Napoléon et l’évaluation bibliométrique de la recherche. Considérations sur la réforme de l’université et sur l’action de l’agence national d’évaluation en Italie. Canadian Journal of Information and Library Science-Revue Canadienne des Sciences de l’Information et de Bibliotheconomie,. doi:10.1353/ils.2016.0003.

    Google Scholar 

  • Baccini, A., & De Nicolao, G. (2016). Do they agree? Bibliometric evaluation versus informed peer review in the Italian research assessment exercise. [journal article]. Scientometrics,. doi:10.1007/s11192-016-1929-y.

    Google Scholar 

  • Bertocchi, G., Gambardella, A., Jappelli, T., Nappi, C. A., & Peracchi, F. (2015). Bibliometric evaluation vs. informed peer review: Evidence from Italy. Research Policy,. doi:10.1016/j.respol.2014.08.004.

    Google Scholar 

  • Bertocchi, G., Gambardella, A., Jappelli, T., Nappi, C. A., & Peracchi, F. (2016). Comment to: Do they agree? Bibliometric evaluation versus informed peer review in the Italian research assessment exercise. [journal article]. Scientometrics,. doi:10.1007/s11192-016-1965-7.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alberto Baccini.

Appendices

Appendix 1: availability of journal and the citation classification system

The deadline for submission was 15th June 2012. For Area 13 a first provisional list of journals was published the 30th April 2012 containing data about 2 years impact factor, 5 years impact factor, and h-index. Updates to the list were published the 10th May and the 12th June 2012. The final ranking was published on 4th September 2012.Footnote 16 Moreover, the 2nd April 2012 the GEV published “for better steering professors’ choices” the citation classification system, in this case a simple threshold of five citations per year: if a journal article had received five of more citation per year it will be automatically inserted in the merit class immediately superior to the one of the journal in which it was published.Footnote 17

The situation of Area 13 was not so different from the ones of all the other areas. Indeed Area 1 and Area 9 published provisional lists of journals to be integrated after the completion of the submission process.Footnote 18 Areas 2,3,4,7 did not publish list of journals by limiting to refer to the Journal of Citation Reports (JCR), where professors had to consider the proper distribution of journals in the relevant subject categories and years. Area 7 published also a list of journals included in the JCR for which bibliometric classification did not apply.Footnote 19 Areas 5 and 6 did not publish a list of journals, by referring to JCS again, but in these areas the subject categories were defined by joining-up groups of Web of Science subject categories. No areas published in advance citation classification systems that were going to be used for articles evaluation.Footnote 20

Appendix 2: protocol used for synthesizing reviewers reports

ANVUR final reports summarized this part of the procedure as follows:

The reviewers’s evaluations were then synthesized in a final evaluation on the basis of algorithms specifically defined by each Area panel, and described in details in the Area reportsFootnote 21

These details are contained in an Appendix of each Area report. These appendices for Areas 1 to 9 were written by inserting specific results in a pre-defined common framework. The procedure for synthesizing the two reviewers reports was therefore described in a nearly identical form. P1 and P2 were identified, respectively, as the numerical scores assigned to an article by a first and a second reviewer; P indicated the “Synthetic evaluation of the scores of the first and second reviewers”.Footnote 22 In each area report, the procedure for arriving to the synthetic evaluation P was described as follows:

The reviewers scores [P1 and P2] were then synthesized on the basis of a specific algorithm for Area panel [number], according to which, respectively Excellent products have a score of [numerical score interval]; Good products have a score of [numerical score interval]; Acceptable products have a score of [numerical score interval]; Limited products have a score of [numerical score interval].Footnote 23

For Area 13 the procedure was completely changed. If the opinions of the two referees coincided, the final evaluation was probablyFootnote 24 automatically defined. If the opinion of the two referees diverged, a complex process started:

‘The opinion [sic] of the external referees [P1 and P2] was then summarized by the internal Consensus Group: in case of disagreement between P1 and P2, the P index is not simply the average of P1 and P2, but also reflects the opinion of two (and occasionally three) members of the GEV13 (as described in detail in the documents devoted to the peer review process).Footnote 25

In the Area 13 report P was significantly renamed as “evaluation of the Consensus Group”.Footnote 26 A Consensus group was formed by the two members of the panel in charge to choose reviewers. The work of the consensus groups for reaching P, that is the “evaluation of the Consensus group”, was described as follows:

The Consensus Groups will give an overall evaluation of the research product by using the informed peer review method, by considering the evaluation of the two external referees, the available indicators for quality and relevance of the research product, and the Consensus Group competences.Footnote 27 (ANVUR 2013).

The consensus groups in some cases evaluated also the competences of the two referees, and gave ‘‘more importance to the most expert referee in the research.Footnote 28

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Baccini, A., De Nicolao, G. Reply to the comment of Bertocchi et al.. Scientometrics 108, 1675–1684 (2016). https://doi.org/10.1007/s11192-016-2055-6

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-016-2055-6

Keywords

Navigation