Overview
We used a previously developed ABM that was calibrated with empirical data and adopts a unified approach of scientific publication and peer review (Kovanis et al. 2016a). This ABM was structured in independently parameterized sub-models pertaining to the submission and peer-review process. Structural changes to some of these sub-models allowed us to model the alternative peer-review systems.
We compared five alternative systems of peer review discussed in the literature and to some extent already implemented by some journals and publishers: re-review opt-out, cascade peer review, portable peer review, crowdsourcing peer review, and immediate publication (Fig. 1). Their main characteristics and parameters are summarized in the Table 1.
Table 1 Summary of the characteristics and parameters of the alternative peer-review systems
Model for the conventional publication and peer-review system
Here we provide a brief description of our ABM of the conventional scientific publication and peer-review system. For a more detailed description, see Kovanis et al. (2016a).
We characterized N researchers by resources R(t) and scientific level S(t). The scientific level was defined as S(t) = R(t) + S
b
(t), where t the time step and S
b
(t) the sum of all the rewards that a researcher can receive to determine scientific level. The resources represent all the means that researchers have at their disposal for conducting research. The scientific level expresses a researcher’s experience and capacity to conduct better research.
Manuscripts were characterized by an intrinsic quality score (Q score), which serves as a proxy for their intrinsic scientific value but also their disruptive, innovative, or controversial nature as well as quality of reporting. At each time step, N
s
randomly selected researchers submitted their paper. At the time of submission (t
s
) of their paper, authors would lose an amount of resources R
inv associated with the conduct of the research reported in that paper with \( 0.2R\left( {t_{s} } \right) \le R_{\text{inv}} \le 0.7R \left( {t_{s} } \right) \). Each paper had an initial expected quality E
Q
defined as:
$$ E_{Q} = 0.8\frac{{0.1R_{\text{inv}} }}{{0.1R_{\text{inv}} + 1}} + 0.2\frac{{0.01S\left( {t_{s} } \right)}}{{0.01S\left( {t_{s} } \right) + 1}} $$
The weights were chosen to represent the greater contribution of invested resources to the scientific level and to not allow the magnitude of \( S\left( {t_{s} } \right) \) to surpass the final \( E_{Q} \) value. The Q score was drawn from a normal distribution \( Q\,\sim\,N \left( {E_{Q} ,\; 0.1\;E_{Q} } \right) \). This score determines how a researcher chooses a target journal and drives in-house and external peer-review assessments.
We characterized J journals by 3 state variables: a reputation value (we used rescaled impact factors) and by related rejection or acceptance thresholds, \( T_{\hbox{min} }^{j} < T_{\hbox{max} }^{j} , j = 1, \ldots , J \). We assumed that authors had a general knowledge of journal standards and, given the Q score, would try to obtain the most recognition from their work. Hence, the journal for the first submission was chosen at random among those with T
j
min
within the asymmetrical range \( Q - 0.45 \varepsilon \le T_{\hbox{min} }^{j} \le Q + 0.55\varepsilon \), where \( \varepsilon \sim2 \times N \left( {\frac{Q}{5},\frac{Q}{20}} \right) \). This process resulted in a slight trend of high targeting in every first submission.
We drew the editor’s assessment of the manuscript Q
e
from a uniform distribution over \( \left[ {0.9Q; \;1.1Q} \right] \). If Q
e
< T
jmin
, the manuscript could be rejected without external peer review. If Q
e
≥ T
jmin
, the manuscript was sent for external peer review to 2 or 3 reviewers. The reviewers’ assessments were defined as \( Q_{r} \,\sim\,N \left( {Q - c,\;r \times Q} \right) \), where r was a random error and c measured the competitiveness of the reviewer. We defined r = r
r
+ r
j
− r
Q
, where r
r
is the reviewing error, r
j
the journal error and r
Q
the score error. With 65% probability, we set r
t
= 0.1; with 12%, r
t
= 0.05; and with 13%, r
t
= 0.01. We drew r
j
randomly from a uniform distribution over [0; 0.15], where r
j
= 0 corresponded to the highest reputation journal and r
j
= 0.15 to the lowest. Finally, r
Q
= 0.05 × Q. We assumed that a competitive behavior would occur more often for journals with higher reputation. The probability of appearance ranged uniformly from 10 to 66%, where c was drawn randomly from a uniform distribution over [0.01; 0.05].
We randomly selected one of the reviewers’ evaluations as a proxy of the editor’s opinion. If Q
r
≥ T
max, the manuscript was accepted and if Q
r
≤ T
min, it was rejected. When T
min ≤ Q
r
< T
max, the author was asked to revise the manuscript before a second round of peer review. In the latter case, the author invested an extra amount of resources \( R_{\text{imp}} \,\sim\,N\left( {\frac{8}{60},\frac{1}{60}} \right) \times \left( {R - R_{\text{inv}} } \right) \). The cumulative amount of invested resources was used to derive a new Q score as before. The manuscript was re-evaluated and accepted only if \( Q_{r} \ge T_{\hbox{max} } \). The probability of resubmission P
res after a rejection decreased with increasing number of resubmissions r, P
res = 0.88r−1. After the first rejection, the authors would target journals of lower reputation. Thus, they randomly selected journals in the (symmetrical this time) range \( 0.22Q - 0.5\varepsilon \le T_{\hbox{min} }^{j} \le 0.22Q + 0.5\varepsilon \), where Q is the initial score of the manuscript.
Resources and scientific level were updated at each time step. If an article is published, the author received a random reward \( p \times \left( {R_{\text{inv}} + \mathop \sum \limits_{i} R_{\text{imp}}^{i} } \right),\quad 0\,\le\,p\,\le 0.5 \), otherwise, the author would permanently lose all the resources invested. In case of publication, the author also received a reward for resources in scientific level. The scientific level of a reviewer was credited with a random reward between 0 and 0.001 every time the reviewer completed a review because of knowledge acquired from the paper. Moreover, at the end of each week, the researchers received an update to their resources and scientific levels randomly drawn between 0.1 and 1, which reflected an increase of the means to conduct research with time.
We assumed that when a paper was published, it released scientific information to the community \( {\text{SI}} = {\text{IF}}_{j} \times Q_{F} \), where IF
j
is the reputation value of the journal (j) that published it and Q
F
is the Q score of the paper, after all revisions. Scientific information is a comparative variable and its purpose is to assess the effectiveness of a system in producing more papers of higher Q score and in disseminating them to the rest of the scientific community. The reputation value (IF) of a scientific journal is a proxy of the size of the community that will read the paper and the Q score a proxy of how much people who read the paper will benefit from it.
Re-review opt-out
The intent of this system, currently implemented by BMC Biology, is to shorten the time of peer review by allowing authors to opt out from a second round of reviews. Thus, authors with a paper judged publishable with major revisions by the reviewers can choose whether they want their manuscript to be evaluated by the editor only or again by the reviewers after revising it (Robertson 2013).
We chose to model a maximum implementation of this intervention so that authors would always choose to opt out from a second round and therefore all decisions for every submission would be made after at most one peer-review round. For papers undergoing peer review, the authors always revised, and then the editor made an assessment (Q
e
) of the revised manuscript from a uniform distribution between 0.9 \( Q_{\text{revised}} \) and \( 1.1\;Q_{\text{revised}} \). With Q
e
≥ T
jmax
, a paper was accepted; otherwise it was rejected. All other processes were handled as in the conventional system.
Cascade peer review
When papers are rejected, their authors usually revise them and resubmit to other journals for publishing. In the conventional peer-review system, this implies that the same manuscripts will be reviewed multiple times and their publication can be seriously delayed. To avoid this situation, some publishers have decided to share reviews for rejected manuscripts among the journals they manage, thus avoiding redundant reviews and shortening the evaluation time. Such publishers include Nature Publishing Group, JAMA, BioMed Central and British Medical Journal (Walker and Rocha da Silva 2015a; Cals et al. 2013; Van Noorden 2013).
We randomly allocated 105 journals of various reputation value to one of four arbitrary publisher groups. We assumed that every journal belonged to one of these groups. Each journal was allocated to one of the publisher groups by using a categorical distribution with parameters (probability of belonging to each group) drawn from a normal distribution \( \sim N\left( {0.25,0.025} \right) \) for the three first groups, with the remaining ones allocated to the fourth.
When a paper was rejected, the editor proposed that the author send it to journals of the same network but of lower reputation. We assumed that if authors decided to resubmit, then they never rejected this proposal. Then, one of the next five journals of lower reputation value (of the same network) was randomly selected and the manuscript was resubmitted to it, along with the last evaluation value (Q
r
).
The new editor immediately accepted the resubmitted paper without asking for further reviews if \( Q_{r} \ge T_{\hbox{max} } \); otherwise, the editor asked for revisions if \( \frac{{\left| {Q_{e} - Q_{r} } \right|}}{{Q_{r} }} \le 0.1 \), where Q
e
is the editor’s assessment of the manuscript (drawn uniformly from between 0.9Q and 1.1Q). Then the editor re-assessed the paper and decided whether to accept or reject it. Papers rejected were more likely to be resubmitted in this system than in the conventional system; thus the probability of resubmission was modified as \( P_{\text{res}} = 0.88^{{\left( {N_{\text{sub}} - 1} \right)/2}} \). Authors cascaded their submissions always using the last reviews they obtained. With \( \frac{{\left| {Q_{e} - Q_{r} } \right|}}{{Q_{r} }} > 0.1 \), the editor asked for new reviews and the submission was handled as in the base model.
Portable peer review
In this system, the authors resubmit their rejected manuscripts along with the reviews they received from their last peer-reviewed submission (if any). In contrast to the cascade system, the journals were not organized in groups and thus the authors sent their previous reviews to any of the journals they would be resubmitting to as in the conventional system. Based on the same rule as in the cascade system, editors could choose to ask for new reviews or revisions before deciding on acceptance or rejection.
Crowdsourcing peer review (Immediate publication with online and invited reviews)
Crowdsourcing online reviews is implemented in part by various journals such as F1000Research, Philica and the Semantic Web Journal. The purpose of this system is the immediate release of scientific information and the more accurate evaluation of papers because of any additional online comments or reviews. The journal of Atmospheric Chemistry and Physics (ACP) is also a well-known example of the use of such a system. Papers submitted to ACP pass a quick editorial pre-screening and are almost immediately published, following their submission, in the journal’s website as “discussion papers”. A published paper is then assigned external peer reviewers. The peer reviewers start an online discussion with the authors and other interested members of the scientific community. After a fixed number of weeks, the discussion stops and the authors revise the paper and resubmit it for publication (Walker and Rocha da Silva 2015a; Pöschl 2012; Journal 2015; Hunter 2012).
In our approach, papers were subject to traditional editorial assessment instead of a quick editorial pre-screening. This discussion did not have any pre-specified time limit and the rejected manuscripts could be left on the journal’s webpage or resubmitted to another journal.
Each manuscript that passed the conventional in-house review stage was immediately published, along with a call for online reviews/comments and the traditional invitation to two or three external reviewers selected by the journal’s editor. Every fresh submission released an initial amount of scientific information, \( {\text{SI}}_{\text{init}} = {\text{AR}}_{j} \times {\text{Q}} \), where \( {\text{AR}}_{j} \) represents the reputation of the “discussion papers” section of the journal (j). We obtained \( {\text{AR}}_{j} \) from the original simulations of the conventional system, and it is equal to the acceptance rate of papers, after the editorial screening process. We assumed that a publication attracted a number of online reviewers equal to \( \frac{{{\text{SI}}_{\text{init}} }}{{{\text{mean}}\left( {{\text{SI}}_{\text{total}} } \right)^{2} }} \), rounded to the nearest integer, where \( {\text{mean}}\left( {{\text{SI}}_{\text{total}} } \right) \) is the average \( {\text{SI}}_{\text{init}} \) of all papers submitted at each time step (\( {\text{SI}}_{\text{total}} \) represents the distribution of \( {\text{SI}}_{\text{init}} \) values at a time step).
The online commenters evaluated the paper in the same way as the normal reviewers. The editor averaged the scores of the online commenters (\( Q_{\text{online}} \)) and randomly selected one of the invited reviewers’ scores (\( Q_{\text{invited}} \)), as in the conventional system to make a decision (\( Q_{\text{r}} \)). We assumed that editors took more into account comments from reviewers they invited than uninvited reviewers, thus \( Q_{r} = \frac{{Q_{\text{online}} + nQ_{\text{invited}} }}{{n{ + 1}}} \), where n is the number of invited reviewers. Thus, the more online reviewers, the greater the chance a paper was more accurately evaluated. If the paper did not attract any online comments, then \( Q_{r} = Q_{\text{invited}} \).
With \( Q_{r} \ge T_{\hbox{max} } \), the paper was revised once, considered indexed in the bibliographical databases (Web of science, MEDLINE etc.) and included as a part of the next issue of the journal, thus releasing the rest of its scientific information at the time of indexation. With \( Q_{r} < T_{\hbox{max} } \), the authors decided to resubmit based on \( P_{\text{res}} = 0.88^{{N_{\text{sub}} - 1}} \) or leave their paper unindexed on the webpage of the journal. In the latter case, the paper would still release some scientific information because it can be found online but less so because it will be hidden in the journal’s website. Thus, subtracting an amount from the total scientific information the scientific community had already accumulated (because of the paper’s higher visibility as a “discussion” paper), the manuscript’s final scientific information becomes \( {\text{SI}} = 0.2\;{\text{SI}}_{\text{init}} \).
Immediate publication
In the system of immediate publication, papers are immediately available to the readers as “discussion papers” before they are peer reviewed via the webpage of the journal. This system is similar to the crowdsourcing system (“Crowdsourcing peer review (Immediate publication with online and invited reviews)” section) but without assuming that editors would take into consideration any online reviews or comments.
Implementation and system comparison
We programmed the models by using MATLAB (MATLAB and Statistics Toolbox Release 2016b, The MathWorks, Inc., Natick, MA, USA) with a total number of researchers N = 25,000, total number of journals J = 105 and weekly submissions drawn from a normal distribution ~N (850, 85) (each simulation week is 1 time-step). We ran the simulations for 10 years, with a burn-in period of 1 year for the initialization of the model. All main results were averaged over 100 simulation runs. Code is available at http://www.clinicalepidemio.fr/peerreview_alternative_systems/.
We defined three different types of outcomes to compare all alternative systems with the conventional system; peer-review efficiency, reviewer effort and scientific dissemination. Peer-review efficiency corresponded to the double purpose of peer review. We measured it by using the separation of the Q score distributions of the published and unpublished papers and the relative improvement in average Q score for all papers after revision as compared to that for the first submission. We used the Hellinger distance as a quantifying measure of the overlap between two distributions: the higher the Hellinger distance, the less the overlap (Nikulin 2001). We measured reviewer effort by using the total time reviewers devoted to peer review in a year. We obtained this outcome in hours from our simulations and transformed it into working years per year with the following equation:
$$ {\text{time spent in peer review}} = \frac{{{{{\text{hours devoted to}}\;{\text{peer review}}} \mathord{\left/ {\vphantom {{{\text{hours devoted to}}\;{\text{peer review}}} {\text{work hours}}}} \right. \kern-0pt} {\text{work hours}}}}}{{{\text{year}} - {\text{weekends}} - {\text{holidays}}}} $$
where \( {\text{work hours}} = 8\,{\text{h per day}} \), \( {\text{year}} = 365\,{\text{days}} \), \( {\text{weekends}} = 104\,{\text{days}} \) and \( {\text{holidays}} = 25.3\,{\text{days}} \) (average paid holidays in 21 OECD countries) (Ray and Schmitt 2007). Finally, we measured scientific dissemination by using the number of annual publications, the median weeks between first submission of a paper and the final decision, the average Q score for all papers and the average weekly release of scientific information. For estimating the two time-related measures, we used the respective distributions from an international survey of 4000 participants (Mulligan et al. 2013).
Finally, we considered that a peer-review system was beneficial if it improved any of the outcomes without deteriorating the peer-review efficiency and more efficient than the conventional if it improved all types of outcomes.
Sensitivity analyses
We performed sensitivity analyses of two of the alternative peer-review systems: cascade and crowdsourcing. We excluded the re-review opt-out, portable and immediate publication systems because the first is already at its maximum configuration and cannot realistically be improved in our ABM and the second and third can be considered special cases of the cascade and crowdsourcing systems, respectively. These analyses focused on identifying the effect of different configurations of the cascade and crowdsourcing systems on their outputs. All sensitivity analyses were averaged over 10 simulation runs.
Exploring the parameter space for the cascade peer-review system
In the main version of the cascade system, with initialized N
g
= 4 journal groups, the editor asks for new reviews or not based on \( \frac{{\left| {Q_{e} - Q_{r} } \right|}}{{Q_{r} }} \le \alpha \), where α = 0.1, and the probability that an author accepts the editor’s proposal is P
cas = 1.0. We explored the parameter space by varying these three parameters one at a time while keeping the other two the same as in the main version of the cascade system. We ran the cascade system for \( N_{g} = 2, 3 \;{\text{and}}\; 5 \), for α = 0.0 and 1.0 and for \( P_{\text{cas}} = 0.7, 0.8 \) and 0.9. The cases with α = 0.0 and 1.0 represent those for which all and none of the resubmitted papers receive new peer review, respectively.
Effect of the editor’s decision and online comments with the crowdsourcing system
We explored different assumptions on how editors decide on acceptance or rejection of a paper and how the online comments affect the system overall. Here we explored the cases in which all papers received 1, 5 and 20 comments. Moreover, we simulated when editors averaged all reviews, online and invited, and when they chose at random one of the online or invited peer reviews to represent their decision. The last two cases assumed a mechanism of attracting online comments identical to the main version of the system.