Description of the system
Scientific publication in its most typical form can be described as a complex system in which researchers interact with each other taking the roles of authors, journal editors and reviewers (Fig. 1) (Brown 2004). Researchers conduct research by using many resources (e.g., grants, research facilities or collaborations). They promote their findings and make them available to the scientific community by reporting them in scholarly manuscripts, which they submit to scientific journals for publication. Decisions on publication are based on multiple factors including the paper’s quality, novelty, importance or controversy.
Journals first perform an in-house review to determine whether they will reject a manuscript immediately (e.g., irrelevant to a journal’s scope or below quality standards) or send the manuscript for external peer review. In-house review commonly involves the editor-in-chief and members of the editorial board. For the external peer review, the editor solicits external researchers to review articles. On the basis of the editor’s and external peer-reviewers’ assessments, the editor decides to accept the paper, ask for revision (acceptance is not guaranteed) or reject the manuscript. Revisions require a second or further round of peer review (Wilson 2012). Rejected manuscripts may be resubmitted to other journals or ultimately be abandoned and remain unpublished. Published articles, depending on their impact on the scientific community, help researchers obtain additional resources. Moreover, researchers benefit from reviewing scientific manuscripts in terms of knowledge.
Agent-based model
We modeled researchers, manuscripts and journals as agents of the scientific publication system from the interactions of their respective state variables (Fig. 1). The researchers could be both authors and reviewers, but editors and journals were modeled as the same agent. The ABM is organized in submodels. The ABM is organized in submodels. Each of the submodels can be parameterized independently. Some submodels pertain to the submission process, including the creation of manuscripts and the targeting of journals for the first submission. Others pertain to the peer review process, including peer review rounds and resubmissions.
Researchers
We characterized N researchers by two state variables: resources R(t) and scientific level S(t) (Squazzoni and Gandelli 2013). The scientific level was defined as S(t) = R(t) + S
b
(t), where t the time step and S
b
(t) the sum of all the rewards that a researcher can receive to determine scientific level, as explained at the end of this section. The resources represent all the means that researchers have at their disposal for conducting research. The scientific level expresses a researcher’s experience and capacity to conduct better research. In our model, scientific knowledge evolves by a researcher’s own research (published articles), the evolution of resources, and from reading and reviewing other manuscripts.
For t = 0 we set S
b
(0) = S
p
(t), where S
p
(t) is the cumulative amount of publications per researcher at time t. We initialized S
p
(0) (Fig. 2a) using the empirical distribution in Table 2C and set R(0) = γS
p
(0) (Fig. 2b) where γ was uniformly distributed over 0.1 and 3. The initial distribution of S(0) can be seen in Fig. 2c.
Manuscripts
Manuscripts were characterized by the state variable Q, which serves as a proxy for their intrinsic scientific value but also their disruptive, innovative, or controversial nature as well as quality of reporting. At each time step, N
s
randomly selected researchers submitted their paper (as detailed in the Calibration section). At the time of submission t
s
of their paper, authors would lose an amount of resources R
inv associated with the conduct of the research reported in that paper—\(0.2R(t_{s} ) \le R_{\text{inv}} \le 0.7R(t_{s} )\). However, for researchers with resources, we set R(t
s
) < R
min = 1 so that they could not submit any work for publication and had to wait until they obtained more resources.
Each paper had an initial expected quality E
Q
defined by both the amount of resources the author invested and the author’s scientific level at t
s
(F Squazzoni and Gandelli 2013)
$$E_{Q} = 0.8\frac{{0.1R_{\text{inv}} }}{{0.1R_{\text{inv}} + 1}} + 0.2\frac{{0.01S\left( {t_{s} } \right)}}{{0.01S\left( {t_{s} } \right) + 1}}$$
The Q score was drawn from a normal distribution \(Q - N \left( {E_{Q} , 0.1E_{Q} } \right)\). This score determines how a researcher chooses a target journal and drives in-house and external peer-review assessments. If all researchers invested half of their initial resources at t
s
= 0 to create manuscripts, then the distribution of Q scores would be as seen in Fig. 2d.
Journals
We characterized J journals by three state variables: a reputation value [we used rescaled impact factors (Fig. 3a)] and by related rejection or acceptance thresholds, \(T_{\hbox{min} }^{j} < T_{\hbox{max} }^{j} , \;j = 1, \ldots , J\) (Fig. 3b). The reputation and thresholds were used to define how a researcher chose a target journal and if a manuscript was rejected or accepted after in-house or external peer review.
The rejection or acceptance thresholds reflected the ranking of journals by their reputation and were defined by the expected scores of submissions journals receive. For each year, we drew N score values for a fictitious sample of upcoming submissions; we estimated the J-quantiles q
j of this distribution, including the minimum value, and defined \(T_{\hbox{min} }^{j} = \delta_{\hbox{min} } q^{j} + n^{j}\) and \(T_{\hbox{max} }^{j} = \delta_{\hbox{max} } {\rm T}_{\hbox{min} }^{j} + n^{j} - C\), where \(\delta_{\hbox{min} } ,\delta_{\hbox{max} }\), and C were constants and n
j was random (as detailed in the Calibration section). This definition kept the distribution of acceptance rates insensitive to changes in the distribution of resources.
Journal targeting process
To define how a researcher chose a target journal, we assumed that authors had a general knowledge of journal standards and, given the score, would try to obtain the most recognition from their work. Hence, the journal for the first submission was chosen at random among those with \(T_{\hbox{min} }^{j}\) within the asymmetrical range \(Q - 0.45 \varepsilon \le T_{\hbox{min} }^{j} \le Q + 0.55\varepsilon\), where \(\varepsilon - 2 \times N \left( {\frac{Q}{5},\frac{Q}{20}} \right)\). This process resulted in a slight trend of high targeting in every first submission.
In-house and external peer-review process
We drew the editor’s assessment of the manuscript Q
e
from a uniform distribution over \(\left[ {0.9Q;1.1Q} \right]\). If \(Q_{e} < T_{\hbox{min} }^{j}\), the manuscript could be rejected without external peer review, depending on the journal’s reputation; the likelihood of editorial rejection was larger for journals with larger reputation (as detailed in the Calibration section). If \(Q_{e} \ge T_{\hbox{min} }^{j}\), the manuscript was sent for external peer review; two or three reviewers (with 20 % probability) were randomly selected to their scientific level and the journal’s reputation; the top 10 % journals randomly select reviewers among the top 10 % researchers and so on. The reviewers’ assessments were defined as \(Q_{r} - N \left( {Q - c,r \times Q} \right)\), where r was a random error and c measured the competitiveness of the reviewer.
The error factor r represents the reliability of the reviewer’s assessment. It depended on the amount of time the reviewer spent evaluating the manuscript, the reputation of the journal and the score of the manuscript itself. We assumed that the more time spent on the assessment, the greater the reputation of the journal, and the greater the score, the greater the chance of an accurate assessment. Formally, we defined r = r
r
+ r
j
− r
Q
, where r
r
is the reviewing error, r
j
the journal error and r
Q
the score error. With 65 % probability, we set r
t
= 0.1; with 12 %, r
t
= 0.05; and with 13 %, r
t
= 0.01 We drew r
j
randomly from a uniform distribution over [0; 0.15], where r
j
= 0 corresponded to the highest reputation journal and r
j
= 0.15 to the lowest. Finally, r
Q
= 0.05 × Q.
The competitiveness factor c depended solely on the reputation of the journal and represents potential reviewer conflict of interest affecting the assessment of the manuscript. We assumed that a competitive behavior would occur more often for journals with higher reputation. The probability of appearance ranged uniformly from 10 to 66 %, where c was drawn randomly from a uniform distribution over [0.01; 0.05].
We randomly selected one of the reviewers’ evaluations as a proxy of the editor’s opinion. We simulated more than one reviewer to be able to update their scientific levels appropriately. If Q
r
≥ T
max, the manuscript was accepted and if Q
r
≤ T
min, it was rejected. When T
min ≤ Q
r
< T
max, the author was asked to revise the manuscript before a second round of peer review.
In the later case, the author invested an extra amount of resources \(R_{\text{imp}} - N\left( {\frac{8}{60},\frac{1}{60}} \right) \times \left( {R - R_{\text{inv}} } \right)\). The cumulative amount of invested resources was used to derive a new Q score as before. The manuscript was re-evaluated by two or three reviewers, randomly selected again, and accepted only if Q
r
≥ T
max. The Q
r
from the second round of peer review was calculated only from the randomly chosen evaluation from the two or three new reviewers.
Following a rejection after in-house review or external peer review, an author could resubmit the manuscript.
Resubmission process
The probability of resubmission P
res after a rejection decreased with increasing number of resubmissions r increases, \(P_{\text{res}} = P_{0}^{r - 1}\). The P
0 value was defined with the calibration procedure.
If a manuscript was rejected after external peer review, we assumed that the authors could substantially revise it by investing extra resources \(R_{\text{imp}} - N\left( {\frac{20}{60},\frac{2}{60}} \right) \times \left( {R \left( {t_{s} } \right) - \left( {R_{\text{inv}} + \sum\nolimits_{i} {R_{\text{imp}}^{i} } } \right)} \right)\), where R(t
s
) are the resources before at the time of submission and i the times the author invested extra resources to improve it. If a manuscript was rejected after in-house review, we assumed that authors invested a smaller amount of extra resources \(R_{\text{imp}} - N\left( {\frac{1}{60},\frac{0.1}{60}} \right) \times \left( {R (t_{s} ) - \left( {R_{\text{inv}} + \sum\nolimits_{i} {R_{\text{imp}}^{i} } } \right)} \right).\)
We assumed that after a first rejection, the authors would target journals of lower reputation than for the first submission. Thus, they randomly selected journals in the (symmetrical this time) range \(pQ - 0.5\varepsilon \le T_{\hbox{min} }^{j} \le pQ + 0.5\varepsilon\), where Q is the initial score of the manuscript and 0 < p < 1 the targeting of lower reputation journals. This rule allowed for easier acceptance after the second submission, because the score of the manuscript was >pQ after resubmission.
Duration of the peer-review process
For estimating the duration of the peer-review process from submission to final decision, we used the distribution from Table 2B. We assumed that rejection after in-house review occurred within 3 weeks, whereas decisions after one or more rounds of external peer review occurred after ≥1 month. When a manuscript is accepted, it takes an extra 1–2 months for publication. Resubmissions occur instantly as the final decision is announced.
Updating of variables
Resources and scientific level were updated at each time step. Resources invested for conducting and reporting research R
inv were subtracted at the time of initial submission, whereas the extra resources R
imp were subtracted uniformly until the time of a journal’s final decision. Thus, a researcher allocated resources to both new research manuscripts and already (re)submitted manuscripts. If the article is published, the author received a reward between 0 and 50 % of the total amount of invested resources, \(p \times \left( {R_{\text{inv}} + \sum\nolimits_{i} {R_{\text{imp}}^{i} } } \right),0 \le p \le 0.5\). If a manuscript remained unpublished, the author would permanently lose all the resources invested.
The scientific level S(t) = R(t) + S
b
(t) evolved according to resources and number of published or reviewed manuscripts. In case of publication, the author received a reward for resources in scientific level together with an increase in the number of publications S
P
(t). The extra resources invested for revisions were subtracted uniformly from the scientific level until the time of the final decision. The scientific level of a reviewer was credited with a random reward between 0 and 0.001 every time the reviewer completed a review because of knowledge acquired from the paper. Moreover, the scientific level of all researchers was credited with a reward at each time step to reflect the impact of newly published articles, drawn from a normal distribution N(I, 0.1I), where I is the average across all articles published the previous week of \(0.1Q_{\text{final}} \times IF_{\text{final}}^{j}\) (i.e., the quality score of a published article × the impact factor of the journal that published it). The greater the article quality score and journal impact factor, the higher the chance a researcher would read the article and gain knowledge from it and the larger the reward. Finally, at the end of each week, the researchers received an update to their resources and scientific levels randomly drawn between 0.1 and 1, which reflected an increase of the means to conduct research with time.