The ability of different peer review procedures to flag problematic publications

Horbach, S. P. J. M.; Halffman, W.

doi:10.1007/s11192-018-2969-2

The ability of different peer review procedures to flag problematic publications

Open access
Published: 29 November 2018

Volume 118, pages 339–373, (2019)
Cite this article

Download PDF

You have full access to this open access article

Scientometrics Aims and scope Submit manuscript

The ability of different peer review procedures to flag problematic publications

Download PDF

9113 Accesses
35 Citations
88 Altmetric
8 Mentions
Explore all metrics

Abstract

There is a mounting worry about erroneous and outright fraudulent research that gets published in the scientific literature. Although peer review’s ability to filter out such publications is contentious, several peer review innovations attempt to do just that. However, there is very little systematic evidence documenting the ability of different review procedures to flag problematic publications. In this article, we use survey data on peer review in a wide range of journals to compare the retraction rates of specific review procedures, using the Retraction Watch database. We were able to identify which peer review procedures were used since 2000 for 361 journals, publishing a total of 833,172 articles, of which 670 were retracted. After addressing the dual character of retractions, signalling both a failure to identify problems prior to publication, but also the willingness to correct mistakes, we empirically assess review procedures. With considerable conceptual caveats, we were able to identify peer review procedures that seem able to detect problematic research better than others. Results were verified for disciplinary differences and variation between reasons for retraction. This leads to informed recommendations for journal editors about strengths and weaknesses of specific peer review procedures, allowing them to select review procedures that address issues most relevant to their field.

On the peer review reports: does size matter?

Article 11 March 2024

The changing forms and expectations of peer review

Article Open access 20 September 2018

Plagiarism, Fake Peer-Review, and Duplication: Predominant Reasons Underlying Retractions of Iran-Affiliated Scientific Papers

Article 04 November 2020

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

There is a growing concern about erroneous or fraudulent research that gets published in the scientific literature. Mainly originating in the biomedical sciences, scholars have demonstrated that a large proportion of published articles contains flaws (Ioannidis 2005), are not reproducible (Open Science Collaboration 2015), involve questionable research practices, or even outright misconduct (e.g. Horbach and Halffman 2017b; Nuijten et al. 2016). The potential consequences of such problematic publications include sending research on unfruitful avenues, wasting valuable research time and funds, skewing meta-analyses or systematic reviews (Tramer et al. 1997), or building policy recommendations and medical treatments on shaky grounds. Others have expressed worries over the potential reputational damage to science (Drenth 2006).

Retractions are one of research journals’ tools to correct the scientific record or redress fraudulent publication credits. The transition to electronic publishing has made it relatively easy to retroactively flag problematic publications. With considerable hesitation over the possible reputational damage for both authors and journals, editors have nonetheless increased the use of this instrument to rectify problematic publications (Cokol et al. 2008; He 2013). As a result, the number of retractions has grown sharply over the last decades, which has led some scholars and journalists to use retractions as a window to study problematic research practices (e.g. Fanelli 2009; Fanelli et al. 2015).

Besides attempts to retroactively redress problematic publications, there have also been several calls and initiatives to try and improve journals’ peer review systems to prevent problematic publications in the first place. However, journals’ use of peer review to identify fraudulent research is highly contentious. Some argue that peer review was never intended to track fraud and cannot be expected to do so (Biagioli 2002; Smith 2006). Nevertheless, concerns about tracing data manipulation, plagiarism, sloppy statistics, inappropriate referencing, or similar improper behaviour have explicitly motivated several recent peer review innovations (e.g. Scheman and Bennett 2017; Epskamp and Nuijten 2014; Kharasch and Houle 2018; Horbach and Halffman 2018). Such initiatives include the use of various software tools, such as text similarity software or ‘plagiarism scanners’ (Zhang 2010), but also modifications in peer review procedures, such as the use of checklists or specialised statistics reviewers (Goodman 2017).

These contradictory expectations raise the question to what extent peer review innovations are able to catch problematic research reports before publication and thereby prevent the need for retractions further down the line. In fact, while various actors have been calling for ‘evidence-based’ improvements of the peer review system, very little is known about the performance of different review models (Rennie 2016). We will address this knowledge-gap in this study. More specifically, in this article we investigate whether and how different peer review procedures (e.g. blind, double blind, or ‘open’) and instruments are related to retraction rates. Using survey data on peer review procedures in a wide range of journals, we relate journal articles to the review procedure they went through. Subsequently, we analyse the relative number of retractions for each review procedure, taking a closer look at the research discipline in which the article was published and, in case of retracted articles, the reason for retraction. Thereby we analyse the effectiveness of different peer review procedures to detect various types of errors and questionable or fraudulent research practices. This leads to informed recommendations for journal editors about strengths and weaknesses of peer review procedures, allowing them to select review procedures that address issues relevant to their field.

In this article, we first discuss the contentious expectations for journal peer review and the motivation behind its recent innovations, resulting in a taxonomy of peer review procedures. Second, we discuss retractions and their ambivalent nature as both indicator of problematic research and measure against problematic research, leading to important caveats about the interpretation of our findings. Third, we describe the methods used, with a survey among editors and the use of the Retraction Watch database. Next, we present and discuss our results per peer review procedure, along with a discussion of the motivation behind them and a discussion of the findings. In the final section, we provide an overview of the statistically significant relations and discuss the limitations of our findings, the consequences and recommendations for journal editors, as well as some questions for further research.

Theoretical framework

Diversity and expectations in peer review

Self-regulating mechanisms are considered an important means of ensuring the quality of the published literature (Stroebe et al. 2012; Hiney 2015). Among them, the peer review system holds a central position (Horner and Minifie 2011). Especially after WWII, peer review of publications gradually came to be seen as the best quality guarantee for the research record, spreading from the natural sciences to other disciplines (Cintas 2016; Baldwin 2017; Fyfe et al. 2017).

Even though the expectation that peer review can detect erroneous research has historically been criticised, it is currently expressed with increasing intensity (LaFollette 1992; Stroebe et al. 2012). Mainly editors and publishers have long asserted that peer review was never designed, nor meant to detect errors or fraud in submitted manuscripts. However, various other actors have increasingly come to expect peer review to help assure a fraud-free published literature. This trend mainly emerged as a response to high subscription costs for journals, leading users to demand better quality assurance, as well as to novel technologies and techniques that promise to help editors and journals to detect errors in research (Fyfe et al. 2017; Larivière et al. 2015).

Peer review procedures are highly diverse, with innovations appearing at an increasing pace (Horbach and Halffman 2018). Whereas the use of external reviewers did not become common practice till well after WWII (Baldwin 2015, Baldwin 2017), subsequent innovations in review procedures have emerged quickly. These include changes in the relative timing of review in the publication process (Chambers 2013; Nosek and Lakens 2014; Knoepfler 2015), the range and anonymity of actors involved in the review process and the interaction between them (Pontille and Torny 2014; Okike et al. 2016; Godlee 2002), the level of cooperation and specialisation in review (Barroga 2013; Goodman 2017), and the use of digital tools to assist review.

However, very little is known about the effectiveness of various peer review procedures to detect erroneous or fraudulent research. Several studies suggest that peer review is currently under severe threat and falling below standards. Faulty and even fraudulent research slips through peer review at alarming rates (Stroebe et al. 2012; Bohannon 2013; Lee et al. 2013; van der Heyden et al. 2009; Claxton 2005). The fact that only very few of the widely reported misconduct cases were detected through peer review (Stroebe et al. 2012) also raises questions about its fraud detection potential. However, even though peer review in general seems to fail to detect problematic research, little is known about the relative effectiveness of its different procedures.

To assess the effectiveness of various review procedures, we use the taxonomy presented in Table 1, based on the peer review inventory in (Horbach and Halffman 2018). The peer review procedures are characterised by twelve key attributes, grouped in four dimensions.

Table 1 Procedures of peer review categorized by dimension and attribute

The ability of different peer review procedures to flag problematic publications

Abstract

Similar content being viewed by others

On the peer review reports: does size matter?

The changing forms and expectations of peer review

Plagiarism, Fake Peer-Review, and Duplication: Predominant Reasons Underlying Retractions of Iran-Affiliated Scientific Papers

Introduction

Theoretical framework

Diversity and expectations in peer review

What are retractions?

Methods

Data collection

Retracted journal articles

Peer review procedure questionnaire

Mailing

Data analysis

Results

Timing

Review criteria

Type of reviewers

Author anonymity

Reviewer anonymity

Review reports

Interaction between actors

Checklists: level of structure in review criteria

Statistics review

External sources

Digital tools

Reader commentary

Summary results

Limitations

Conclusion

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (PDF 98 kb)

Supplementary material 2 (DOCX 14 kb)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation