Introduction

Papers with multiple authors pose a problem to scientometric analysis. Who deserves the credit? There are three common solutions to this (Abbas 2011). The first and perhaps most common approach is to ignore the problem and let all co-authors take full credit. Bad incentives are the result: Authors may add each other to their papers even without a contribution.Footnote 1 The second solution—“egalitarian weights”—is to share the credit equally between the co-authors (Batista et al. 2006; Ellison 2010; Schreiber 2008). Essentially, the analyst claims to have no information about who contributed most. The third solution—“rank weights”—is to share the credit based on the order of the authors (Hagen 2009; Hodge and Greenberg 1981; Sekercioglu 2008; Zhang 2009). This is entirely ad hoc, and conventions on the order of authors differ greatly between disciplines. None of these rules are satisfactory. In this paper, I present a method to apportion credit in an objective manner using readily available data.Footnote 2

The idea is straightforward. Suppose that a mediocre researcher and a star jointly write a paper. The paper is widely cited. Surely, most of the credit should go to the star.

While simple, the idea cannot be implemented without defining the relative stardom of the two authors. Again, I adopt a simple approach. Based on the citation record of the two researchers, I find the probability that a paper by them is cited N times. By definition, the star would have a higher probability of N citations (if N is large) than the mediocre researcher. The relative credit is proportional to the relative probabilities.

In the next section, I formalize this, defining Pareto weights. “The data” section presents the data used to illustrate the proposal. “Results” section discusses the results. “Discussion and conclusion” section concludes.

A method for attributing citations

Consider S scholars who published papers that were cited C i,s  > 0 times. For convenience, we disregard uncited papers. The Pareto distribution (Pareto 1896) is often used to describe the number of citations (Egghe 1987, 1991, 1998, 2005):

$$ f\left( {C_{i,s} } \right) = \frac{{\mu \alpha_{s} }}{{C_{i,s}^{{1 + \alpha_{s} }} }} $$
(1)

We set μ = 1 so that we allow for any number of citations.Footnote 3 The maximum likelihood estimate for the Pareto index α is

$$ \alpha_{s} = \frac{{n_{s} }}{{\mathop \sum \nolimits_{i = 1}^{{n_{s} }} C_{i,s} }} $$
(2)

Now consider a paper l that is cited C times and that is co-authored by scholars s and t. Each scholar is allocated a share w of the citations according to

$$ w_{l,s} = \frac{{\alpha_{s} C^{{ - \alpha_{s} - 1}} }}{{\alpha_{s} C^{{ - \alpha_{s} - 1}} + \alpha_{t} C^{{ - \alpha_{t} - 1}} }};w_{l,t} = \frac{{\alpha_{t} C^{{ - \alpha_{t} - 1}} }}{{\alpha_{s} C^{{ - \alpha_{s} - 1}} + \alpha_{t} C^{{ - \alpha_{t} - 1}} }} $$
(3)

Obviously, w s  + w t  = 1; w s  = w t  = 1/2 if and only if α s  = α t . Equation 3 has that scholar s receives the greater credit for the joint publication if the number of actual citations is more in line with her citation record. Equation 3 readily generalizes to multiple authors. I refer to w s as the Pareto weight of author s.

There are two problems. First, the joint publication is part of the citation record that is used to estimate the Pareto index α. Therefore, the joint publication is used to assess itself. This would be avoided if the joint publication is excluded from Eq. 2. For scholars with a large number of cited papers, this does not make much of a difference. Nitpickers are free to use α s,{l} .

The second problem is more substantial. Equation 2 uses the full number of citations. Equation 3 allocates only a fraction of the citations to scholar s. That is, Pareto weights change the citation record. The method is internally inconsistent. In order to solve this, redefine Eqs. 23 as the 0th iteration. In the mth iteration,

$$ \alpha_{s}^{(m)} = \frac{{n_{s} }}{{\mathop \sum \nolimits_{i = 1}^{{n_{s} }} w_{s}^{(m - 1)} C_{i,s} }} $$
(4)

and

$$ w_{s}^{(m)} = \frac{{\alpha_{s}^{(m)} C^{{ - \alpha_{s}^{(m)} - 1}} }}{{\mathop \sum \nolimits_{t} \alpha_{t}^{(m)} C^{{ - \alpha_{t}^{(m)} - 1}} }} $$
(5)

The number of iterations should be such that w (m) ≈ w (m−1).

There is a practical problem with the above proposal. A scholar’s corrected citation record depends on the citation record of everyone she has ever published with, and on everyone they have ever published with, and so on. Equations 45 can therefore only be approximated.

The data

I illustrate the above proposal with the case of Andrei Shleifer, a professor of economics at Harvard University. Although only 50 years old, Shleifer tops the IDEAS/RePEc life-time achievement ranking of all economists.Footnote 4 Shleifer won the John Bates Clark Medal and is likely to win the Nobel Prize. He has a limited number of long-term collaborators, which eases data collection.

I collected the publication and citation record of Andrei Shleifer, his 36 collaborators, and 4 of the collaborators of his closest collaborators. I did this at Easter 2011, using ScopusFootnote 5 as the source of data.

Table 1 lists the names, numbers of (cited) publications, numbers of citations, and the Hirsch (Hirsch 2005) and Pareto indices (Eq. 2). Table 1 also gives the Shleifer-number: 0 for Shleifer, 1 for his coauthors, 2 for his coauthors’ coauthors.Footnote 6 Table 1 contains a relatively small (41) but very diverse group of scholars. There are scholars who are generally considered to be world class, former post-docs who left academia, and everything in between. This is appropriate for illustrating the proposal of “A method for attributing citations” section.

Table 1 Selected characteristics of the authors in the sample: number of papers, number of cited papers, number of citations, average number of citations (per cited paper), Hirsch index, Pareto index, and Shleifer number

Results

Figure 1 shows the Pareto index as a function of the Hirsch index (bottom panel) and as a function of the average number of citations per publication (top panel). The Pareto index is the inverse of the average of the natural logarithm of the citation number (see Eq. 2), but Fig. 1 shows that the inverse of the log of the average citation number is reasonable approximation. Figure 1 also shows that there is a relationship between the Pareto and Hirsch indices—a high number of highly-cited papers imply both a low Pareto index and a high Hirsch index—but that they measure different things—the Hirsch index disregards excess citations while the Pareto index does not.

Fig. 1
figure 1

The Pareto index versus the average number of citations per paper (top panel) and the Hirsch index (bottom panel) for the 41 scholars in Table 1

Let us now turn the attention to the attribution of citations to joint papers to individual authors, focusing on Shleifer, the central author in the sample. The top panel of Fig. 2 shows the histograms of Shleifer’s Pareto weights for his papers with one, two, three and four other authors. Shleifer did not publish in teams of six or more. The Pareto weight for single authored papers is, by definition, one. Figure 2 shows that the Pareto weights spread around the egalitarian weights (1/n where n is the number of authors). Egalitarian weights are thus a reasonable approximation of Pareto weights. By implication, rank weights (proportional to 1 for the first author, 1/2 for the second, …) are not. This is no surprise given the convention in economics to list authors alphabetically.

Fig. 2
figure 2

The histograms of the Pareto weights for different numbers of authors (top panel) and the histogram of the ratio of the Pareto weights to the egalitarian weights (bottom panel)

Shleifer receives a more-than-egalitarian weight for some papers, but one may be surprised that he receives less-than-egalitarian weight for other papers.Footnote 7 After all, Table 1 shows that he is more senior than any of his coauthors. The bottom panel of Fig. 2 confirms its top panel. The bottom panel shows the histogram of the ratio of the Pareto weights to the egalitarian weights. The histogram is centred around one (so that the egalitarian weights are a reasonable approximation). The distribution ranges from 0.75 to 1.25, that is, the egalitarian weight may be a quarter too high or too low (if one accepts the Pareto weight as the true weight).

Let us consider the two extreme cases of the bottom panel of Fig. 2 in order to develop some intuition about the Pareto weights. The ratio of Pareto to egalitarian weights is highest for a paper co-authored by Barberis et al. (1998). It was cited 503 times. This is extraordinary for Barberis (whose papers are cited 119 times on average), not so special for Sheifer (whose papers are cited 213 times on average) and run-of-the-mill for Vishny (whose papers are cited 430 times on average). The egalitarian weights are one-third for each author. The Pareto weights are 15% for Barberis, 41% for Shleifer and 44% for Vishny.

At the other extreme lies a paper by Aghion et al. (2010). It was cited only four times.Footnote 8 This is exceptional for Shleifer (213 citations on average), not uncommon for Aghion (35 citations on average), common for Cahuc (11 citations on average) and as expected for Algan (5 citations on average). Therefore, the Pareto weights are 0.29 (Algan), 0.28 (Cahuc), 0.24 (Aghion) and 0.18 (Shleifer); the egalitarian weight is 0.25. This highlights another property of Pareto weights: Because a probability density function integrates to one, scholars with a high probability of a large number of citations have a low probability of a small number of citations. Pareto weights thus attribute a large share of the citations to highly-cited papers to highly-cited co-authors, and a small share of the citations to little-cited papers to highly-cited co-authors.

The above results are for the 0th iteration of the Pareto weights. See Eqs. 23. I computed the 1st iteration for Shleifer and his three core collaborators: La Porta, Lopez-de-Silanes and Vishny. There are seven papers with these four people as co-authors, and four of these papers are cited more than 500 times. There are another four papers by La Porta, Lopez-de-Silanes and Shleifer; nine papers by La Porta, Lopez-de-Silanes, Shleifer and others; seven papers by Shleifer and Vishny; and eleven papers by Shleifer, Vishny and others. All of Vishny’s papers are co-authored by Shleifer; 80% of La Porta’s papers; and 71% of Lopez-de-Silanes’ papers. 49% of Shleifer’s papers are with some or all of these core collaborators.

Table 2 repeats some of the characteristics from Table 1 and adds new ones for these four scholars. The Pareto weights allocate 34% of citations to Shleifer and Vishny, compared to 28–29% to La Porta and Lopez-de-Silanes. The latter two have lower Pareto indices and thus a greater probability of publishing highly-cited papers. However, Shleifer and Vishny tend to publish with fewer co-authors, and this effect dominates the difference in Pareto indices.

Table 2 Selected characteristics of the four core authors in the sample: number of cited papers (P); average number of authors (A); average number of citations (C(0)) and after egalitarian (C(E)) correction and Pareto correction (C(1), C(2)); and Pareto index for all citations (P(0)) and after egalitarian (P(E)) correction and Pareto correction (P(1), P(2))

This effect is reinforced in the first iteration, in which the Pareto index is calculated for the attributed citations. The average number of citations and the Pareto index fall for each of the four authors (as they receive 100% or less of the citations). However, the Pareto index rises more for La Porta and Lopez-de-Silanes than for Shleifer and Vishny.

Table 2 also shows the attributed citations and Pareto indices for the second iteration. Although the attribution again shifts in favour of Shleifer and Vishny, the differences with the first iteration are minimal. At least for this group of authors, the first iteration appears to be a reasonable approximation. Table 2 further shows that the egalitarian attribution is, at least in this case, a reasonable approximation (but not in all cases as shown in Fig. 2).

Figure 3 highlights the difference between the 0th and 1st iterations for the seven papers co-authored by the four core scholars. Three things stand out. Firstly, there is a change in the order of attribution. Whereas in the 0th iteration, the credit went to La Porta first, Vishny second, Lopez-de-Silanes third and Shleifer fourth; in the 1st iteration, Shleifer is first, followed by Vishny, La Porta and Lopez-de-Silanes. In both iterations, there is a difference with the egalitarian attribution (0.25)—but, noting the vertical scale of Fig. 3 (2.25–2.65), the difference is small.Footnote 9 Secondly, attribution varies less with the number of citations. This is because differences in the Pareto index matter more for higher citation numbers, and citations numbers are lower when shared between co-authors.

Fig. 3
figure 3

The Pareto weights assigned to the four scholars in the 0th (dashed lines) and 1st (solid lines) iterations as a function of the number of citations that the papers received

The above results are based on the number of citations per paper. This is a proper measure for the eventual impact of a scholar, but Shleifer is an active researcher and some of his papers were published too recently to amass a large number of citations (see above). Therefore, I repeated the analysis with citations per year—specifically, citations divided by 2012 minus the year of publication. Using this metric, 33.81% of citations per year are attributed to Shleifer. This compares to 33.75% of citations. In this case, therefore, citations and citation-rates yield indistinguishable results.

Discussion and conclusion

I propose an objective method to attribute citations to co-authors. The Pareto weight is based on the probability of observing a number of citations given the author’s citation record. Assuming that citation numbers follow a Pareto distribution, there is a closed-form solution to compute the Pareto weight. However, one needs a few iterations and data on the scholar in question as well as on her co-authors and their co-authors. In the examples used in this paper, the Pareto weights attribute up to 25% more or less citations to an author than do equal weights. The Pareto weights are very different from rank-based weights.

In future research, it would be good to test the current proposal with other data. A longitudinal study would be particularly interesting. Over time, a scholar’s publication and citation record changes. Using Pareto weights, the attribution of citations changes too. One could, of course, also consider alternative distributional assumptions, particularly when modeling citation-rates rather than citations (Fok and Franses 2007; Franses 2003).