How do academics choose which papers to read? One might want to assume that academics have some form of utility function which they maximize. But this has a number of problems: whether (expected) utility maximization can provide a good model of rationality is controversial; and even if it is a good model of rationality academics may not act rationally so the descriptive power of the model may be poor.
Moreover, one would need to argue for the specific form of the utility function, requiring a detailed discussion of academics’ goals. For example, if academics aim only at truth an epistemic utility function may be needed, and it is not clear what that should look like (Joyce 1998; Pettigrew 2016). If they aim only at credit or recognition (as in Strevens 2003), a pragmatic utility function is needed. If they aim at both truth and credit (as seems likely) these two types of utility function must be combined (Kitcher 1993, Chap. 8; see also Bright 2016 for a comparison of these three types of utility function), and if they have yet other goals things get even more complicated.
Here I take a different approach. I state two assumptions, or behavioral rules, that constrain academics’ choices to some extent (although they still leave a lot of freedom). I then show that these assumptions are sufficient for the appearance of superstars in the model.
For those who think that, despite the problems I mentioned, academics’ behavior should be modeled using (Bayesian) expected utility theory, I show that a wide range of utility functions would lead academics to behave as the assumptions require (see theorems 1 and 2). For those who are impressed by the problems of that approach, I argue that one should expect academics to behave as the assumptions require even if they are not maximizing some utility function.
The first assumption relies on the notion that a paper may have higher epistemic value than another. I first state the formal definition of higher epistemic value and the assumption before discussing how they are intended to be interpreted.
Consider the information sets (i.e., papers) of two academics, i and \(i^\prime \). Say that academic i’s paper is of higher epistemic value than academic \(i^\prime \)’s paper if academic i has performed at least as many replications of each experiment as academic \(i^\prime \), and more replications overall. Formally, \(A_i\) is of higher epistemic value than \(A_{i^\prime }\) (written \(A_{i^\prime }\sqsubset A_i\)) if \(n(i,j) \ge n(i^\prime ,j)\) for all experiments j, with \(n(i,j) > n(i^\prime ,j)\) for at least one j.Footnote 12
The “higher epistemic value” relation is a partial order. For example, if academic i has performed experiment 1 ten times and experiment 2 zero times, and academic 2 has performed experiment 1 zero times and experiment 2 five times, neither paper has higher epistemic value than the other.
Assumption 1
(Always Prefer Higher Epistemic Value). If academic i’s paper has higher epistemic value than academic \(i^\prime \)’s (\(A_{i^\prime }\sqsubset A_i\)) then other academics prefer to read academic i’s paper over academic \(i^\prime \)’s.
On a strict interpretation, Always Prefer Higher Epistemic Value requires that academics prefer to read papers that contain more replications (of those experiments that they are interested in reading about in the first place). For example in the medical sciences, where the number of replications might refer to the number of patients studied, a higher number of replications would correspond to more reliable statistical tests, and would as such be preferable.
The strict interpretation, however, probably has limited application. Academics presumably care about more than just the quantity of data in a paper when they judge its epistemic value (although it should be noted that Always Prefer Higher Epistemic Value will usually not fix academics’ preferences completely, leaving at least some room for other considerations to play a role). For example, an important consideration may be what, if any, theoretical advances are made by the paper. How could such a consideration be incorporated in the model?
One suggestion is to consider the theoretical advances made by the paper as an additional experiment, and to define the number of replications of the additional experiment as a qualitative measure of the importance of the theoretical advances. Assuming that academics will mostly agree on the relative importance of theoretical advances this idea can be made to fit the structure of the model.
This suggests a looser interpretation of the notion of higher epistemic value in Always Prefer Higher Epistemic Value (one I alluded to in Sect. 3). On this interpretation the different “experiments” are categories academics use to judge the epistemic value of papers, and “replications” are just a way to score papers on these categories. Always Prefer Higher Epistemic Value then says that academics will not read a paper if another paper is available that scores at least as well or better on all categories.Footnote 13
The loose interpretation makes a more complete picture of academics’ judgments of epistemic value possible, at the cost of some level of formal precision. But on either interpretation Always Prefer Higher Epistemic Value may be unrealistically simple. Given that there is, to my knowledge, no research trying to model academics’ decisions what to read, Always Prefer Higher Epistemic Value should be read as a tentative first step. Future research may fruitfully explore improved or alternative ways of making the factors that go into such decisions formally precise.
The second assumption gets its plausibility from the observation that there is a finite limit to how many papers an academic can read, simply because it is humanly impossible to read more. More formally, there exists some number N (say, a million) such that for any academic the probability (as implied by her decision procedure) that she reads more than N papers is zero.
But rather than making this assumption explicitly, I assume something strictly weaker: that the probability of reading a very large number of papers is very small.
Assumption 2
(Bounded Reading Probabilities). Let \(p_{i,A,n}\) denote the probability that academic i reads the papers of at least n academics with information set A.Footnote 14 For every \(\varepsilon > 0\), there exists a number N that does not depend on the academic or the size of the academic community, such that \(n\cdot p_{i,A,n} \le \varepsilon \cdot p_{i,A,1}\) for all \(n > N\).Footnote 15
So Bounded Reading Probabilities says that for very high numbers, the probability of reading that number of papers is very small, independent of the academic reading or the size of the academic community. If, as I suggested above, no academic ever reads more than a million papers, then \(p_{i,A,n} = 0\) for all i and A whenever n is greater than a million, and so the assumption would be satisfied.
As I indicated, these assumptions are not only independently plausible, but are also satisfied by Bayesian academics (who maximize expected utility) under quite general conditions. The most important of these conditions is that reading a paper has a fixed cost c. This cost reflects the opportunity cost of the time spent reading the paper.Footnote 16
The relation between my assumptions and Bayesian rationality is expressed in the following two theorems. The proofs are given in Heesen (2016, Sect. 3).
Theorem 1
If \(c > 0\) and if each replication of an experiment is probabilistically independent and has a positive probability of changing the academic’s future choices, then the way a fully Bayesian rational academic chooses what to read satisfies Always Prefer Higher Epistemic Value.
Theorem 2
If \(c > 0\) and if each replication of an experiment is probabilistically independent, then a community of fully Bayesian rational academics with the same prior probabilities over possible worlds and the same utility functions chooses what to read in a way that satisfies Bounded Reading Probabilities.
Now consider the graph or network formed by viewing each academic as a node, and drawing an arrow (called an arc or directed edge in graph theory) from node i to node \(i^\prime \) whenever academic i reads academic \(i^\prime \)’s paper.Footnote 17
In order to study social stratification in this network, I need a metric to identify superstars. A natural idea suggests itself: rank academics by the number of academics who read their work. In the network, the number of academics who read i’s work is simply the number of arrows ending at i. In graph-theoretical terms, this is the in-degree of node i. This idea is illustrated in Fig. 1.
If, as the empirical evidence suggests, science is highly stratified, then one should expect large differences in in-degree among academics. Theorem 3, below, says that this is exactly what happens in my model.
The theorem relates the average in-degrees of academics (denoted, e.g., \(\mathbb {E}\left[ d(A)\right] \) for the average in-degree of an academic with information set A).Footnote 18 In particular, it relates the average in-degrees of academics with information sets A and B to that of academics with information sets \(A\sqcup B\) and \(A\sqcap B\). An academic has information set \(A\sqcup B\) if she has performed each experiment as many times as an academic with information set A or an academic with information set B, whichever is higher (whereas an academic with information set \(A\sqcap B\) takes the lower value for each experiment).Footnote 19
For example, if information set A contains twelve replications of experiment 1 and eight replications of experiment 2 and information set B contains ten replications of each experiment then \(A\sqcup B\) contains twelve replications of experiment 1 and ten replications of experiment 2 (and \(A\sqcap B\) contains ten and eight replications, respectively). Or, on the loose interpretation, if the paper represented by information set A makes a greater theoretical advance than the paper represented by information set B, but B contains valuable empirical work as well, then information set \(A\sqcup B\) indicates a paper containing as great a theoretical advance as A, and empirical work as valuable as that in B.Footnote 20
Theorem 3
(Stratified In-Degrees). Let I be an academic community satisfying Always Prefer Higher Epistemic Value and Bounded Reading Probabilities. If I is large enough, then for any two information sets A and B such that at least one academic in I has information set \(A\sqcup B\)
$$\begin{aligned} \mathbb {E}\left[ d(A\sqcup B)\right] + \mathbb {E}\left[ d(A\sqcap B)\right] \ge \mathbb {E}\left[ d(A)\right] + \mathbb {E}\left[ d(B)\right] . \end{aligned}$$
Moreover, if neither A nor B is identical to \(A\sqcup B\)
Footnote 21 and \(\mathbb {E}\left[ d(A\sqcup B)\right] > 0\) then the above inequality can be strengthened to
$$\begin{aligned} \mathbb {E}\left[ d(A\sqcup B)\right] > \mathbb {E}\left[ d(A)\right] + \mathbb {E}\left[ d(B)\right] . \end{aligned}$$
What the theorem says is that if the set of academics is sufficiently large, the number of times a given paper is read increases rapidly—on average, faster than linearly—in the epistemic value of the information set. See Heesen (2016, Sect. 2.5) for a proof.
What do I mean by “faster than linearly”? The following corollary makes this more precise.Footnote 22 Suppose that (part of) the academic community only performs replications of some subset E of the experiments (where E contains at least two experiments).
Let \(A_E(n)\) denote an information set containing n replications of each of the experiments in E (and nothing else). Let \({\hat{n}}\) be the highest value of n such that at least one academic in I has information set \(A_E(n)\). Say that I is dense in E if for each combination of number of replications of the experiments in E some academic has performed exactly that number of replications.Footnote 23 The corollary (proven in Heesen 2016, Sect. 2.6) shows that in dense communities the average number of times a paper is read increases exponentially in n.
Corollary 1
(Exponential In-Degrees). Let I be an academic community satisfying Always Prefer Higher Epistemic Value and Bounded Reading Probabilities. Let E be a subset of the experiments (\(\left| E\right| \ge 2\)) and suppose that I is dense in E. If I is large enough, then for all n (with \(1 \le n \le {\hat{n}}\)),
$$\begin{aligned} \mathbb {E}\left[ d(A_E(n))\right] \ge 2^{(n-1)(|E|-1)}\mathbb {E}\left[ d(A_E(1))\right] . \end{aligned}$$
The theorem and its corollary show that the pattern of stratification in my model reflects the pattern that can be seen in empirical metrics of stratification, e.g., using citation metrics. That is, most papers have few citations, while a rare few have a great number of citations (Price 1965; Redner 1998). This pattern is seen to arise from academics’ preference for papers of high epistemic value. Thus this preference can be viewed as a sufficient condition for these patterns to arise.