Credit where credit’s due: accounting for co-authorship in citation counts

Tol, Richard S. J.

doi:10.1007/s11192-011-0451-5

Credit where credit’s due: accounting for co-authorship in citation counts

Open access
Published: 16 July 2011

Volume 89, pages 291–299, (2011)
Cite this article

Download PDF

You have full access to this open access article

Scientometrics Aims and scope Submit manuscript

Credit where credit’s due: accounting for co-authorship in citation counts

Download PDF

Richard S. J. Tol^1,2,3,4

1942 Accesses
31 Citations
1 Altmetric
Explore all metrics

Abstract

I propose a new method (Pareto weights) to objectively attribute citations to co-authors. Previous methods either profess ignorance about the seniority of co-authors (egalitarian weights) or are based in an ad hoc way on the order of authors (rank weights). Pareto weights are based on the respective citation records of the co-authors. Pareto weights are proportional to the probability of observing the number of citations obtained. Assuming a Pareto distribution, such weights can be computed with a simple, closed-form equation but require a few iterations and data on a scholar, her co-authors, and her co-authors’ co-authors. The use of Pareto weights is illustrated with a group of prominent economists. In this case, Pareto weights are very different from rank weights. Pareto weights are more similar to egalitarian weights but can deviate up to a quarter in either direction (for reasons that are intuitive).

CiteRank: A Method to Evaluate Researchers Influence Based on Citation and Collaboration Networks

Impact Factor and Altmetrics: What Is the Future?

Combining the weighted and unweighted Euclidean indices: a graphical approach

Article 10 February 2020

Introduction

Papers with multiple authors pose a problem to scientometric analysis. Who deserves the credit? There are three common solutions to this (Abbas 2011). The first and perhaps most common approach is to ignore the problem and let all co-authors take full credit. Bad incentives are the result: Authors may add each other to their papers even without a contribution.^{Footnote 1} The second solution—“egalitarian weights”—is to share the credit equally between the co-authors (Batista et al. 2006; Ellison 2010; Schreiber 2008). Essentially, the analyst claims to have no information about who contributed most. The third solution—“rank weights”—is to share the credit based on the order of the authors (Hagen 2009; Hodge and Greenberg 1981; Sekercioglu 2008; Zhang 2009). This is entirely ad hoc, and conventions on the order of authors differ greatly between disciplines. None of these rules are satisfactory. In this paper, I present a method to apportion credit in an objective manner using readily available data.^{Footnote 2}

The idea is straightforward. Suppose that a mediocre researcher and a star jointly write a paper. The paper is widely cited. Surely, most of the credit should go to the star.

While simple, the idea cannot be implemented without defining the relative stardom of the two authors. Again, I adopt a simple approach. Based on the citation record of the two researchers, I find the probability that a paper by them is cited N times. By definition, the star would have a higher probability of N citations (if N is large) than the mediocre researcher. The relative credit is proportional to the relative probabilities.

In the next section, I formalize this, defining Pareto weights. “The data” section presents the data used to illustrate the proposal. “Results” section discusses the results. “Discussion and conclusion” section concludes.

A method for attributing citations

Consider S scholars who published papers that were cited C _i,s > 0 times. For convenience, we disregard uncited papers. The Pareto distribution (Pareto 1896) is often used to describe the number of citations (Egghe 1987, 1991, 1998, 2005):

$$ f\left( {C_{i,s} } \right) = \frac{{\mu \alpha_{s} }}{{C_{i,s}^{{1 + \alpha_{s} }} }} $$

(1)

We set μ = 1 so that we allow for any number of citations.^{Footnote 3} The maximum likelihood estimate for the Pareto index α is

$$ \alpha_{s} = \frac{{n_{s} }}{{\mathop \sum \nolimits_{i = 1}^{{n_{s} }} C_{i,s} }} $$

(2)

Now consider a paper l that is cited C times and that is co-authored by scholars s and t. Each scholar is allocated a share w of the citations according to

$$ w_{l,s} = \frac{{\alpha_{s} C^{{ - \alpha_{s} - 1}} }}{{\alpha_{s} C^{{ - \alpha_{s} - 1}} + \alpha_{t} C^{{ - \alpha_{t} - 1}} }};w_{l,t} = \frac{{\alpha_{t} C^{{ - \alpha_{t} - 1}} }}{{\alpha_{s} C^{{ - \alpha_{s} - 1}} + \alpha_{t} C^{{ - \alpha_{t} - 1}} }} $$

(3)

Obviously, w _s + w _t = 1; w _s = w _t = 1/2 if and only if α_s = α_t. Equation 3 has that scholar s receives the greater credit for the joint publication if the number of actual citations is more in line with her citation record. Equation 3 readily generalizes to multiple authors. I refer to w _s as the Pareto weight of author s.

There are two problems. First, the joint publication is part of the citation record that is used to estimate the Pareto index α. Therefore, the joint publication is used to assess itself. This would be avoided if the joint publication is excluded from Eq. 2. For scholars with a large number of cited papers, this does not make much of a difference. Nitpickers are free to use α _s,{l}.

The second problem is more substantial. Equation 2 uses the full number of citations. Equation 3 allocates only a fraction of the citations to scholar s. That is, Pareto weights change the citation record. The method is internally inconsistent. In order to solve this, redefine Eqs. 2–3 as the 0th iteration. In the mth iteration,

$$ \alpha_{s}^{(m)} = \frac{{n_{s} }}{{\mathop \sum \nolimits_{i = 1}^{{n_{s} }} w_{s}^{(m - 1)} C_{i,s} }} $$

(4)

and

$$ w_{s}^{(m)} = \frac{{\alpha_{s}^{(m)} C^{{ - \alpha_{s}^{(m)} - 1}} }}{{\mathop \sum \nolimits_{t} \alpha_{t}^{(m)} C^{{ - \alpha_{t}^{(m)} - 1}} }} $$

(5)

The number of iterations should be such that w ^(m) ≈ w ^(m−1).

There is a practical problem with the above proposal. A scholar’s corrected citation record depends on the citation record of everyone she has ever published with, and on everyone they have ever published with, and so on. Equations 4–5 can therefore only be approximated.

The data

I illustrate the above proposal with the case of Andrei Shleifer, a professor of economics at Harvard University. Although only 50 years old, Shleifer tops the IDEAS/RePEc life-time achievement ranking of all economists.^{Footnote 4} Shleifer won the John Bates Clark Medal and is likely to win the Nobel Prize. He has a limited number of long-term collaborators, which eases data collection.

I collected the publication and citation record of Andrei Shleifer, his 36 collaborators, and 4 of the collaborators of his closest collaborators. I did this at Easter 2011, using Scopus^{Footnote 5} as the source of data.

Table 1 lists the names, numbers of (cited) publications, numbers of citations, and the Hirsch (Hirsch 2005) and Pareto indices (Eq. 2). Table 1 also gives the Shleifer-number: 0 for Shleifer, 1 for his coauthors, 2 for his coauthors’ coauthors.^{Footnote 6} Table 1 contains a relatively small (41) but very diverse group of scholars. There are scholars who are generally considered to be world class, former post-docs who left academia, and everything in between. This is appropriate for illustrating the proposal of “A method for attributing citations” section.

Table 1 Selected characteristics of the authors in the sample: number of papers, number of cited papers, number of citations, average number of citations (per cited paper), Hirsch index, Pareto index, and Shleifer number

Full size table

Results

Figure 1 shows the Pareto index as a function of the Hirsch index (bottom panel) and as a function of the average number of citations per publication (top panel). The Pareto index is the inverse of the average of the natural logarithm of the citation number (see Eq. 2), but Fig. 1 shows that the inverse of the log of the average citation number is reasonable approximation. Figure 1 also shows that there is a relationship between the Pareto and Hirsch indices—a high number of highly-cited papers imply both a low Pareto index and a high Hirsch index—but that they measure different things—the Hirsch index disregards excess citations while the Pareto index does not.

Let us now turn the attention to the attribution of citations to joint papers to individual authors, focusing on Shleifer, the central author in the sample. The top panel of Fig. 2 shows the histograms of Shleifer’s Pareto weights for his papers with one, two, three and four other authors. Shleifer did not publish in teams of six or more. The Pareto weight for single authored papers is, by definition, one. Figure 2 shows that the Pareto weights spread around the egalitarian weights (1/n where n is the number of authors). Egalitarian weights are thus a reasonable approximation of Pareto weights. By implication, rank weights (proportional to 1 for the first author, 1/2 for the second, …) are not. This is no surprise given the convention in economics to list authors alphabetically.

Shleifer receives a more-than-egalitarian weight for some papers, but one may be surprised that he receives less-than-egalitarian weight for other papers.^{Footnote 7} After all, Table 1 shows that he is more senior than any of his coauthors. The bottom panel of Fig. 2 confirms its top panel. The bottom panel shows the histogram of the ratio of the Pareto weights to the egalitarian weights. The histogram is centred around one (so that the egalitarian weights are a reasonable approximation). The distribution ranges from 0.75 to 1.25, that is, the egalitarian weight may be a quarter too high or too low (if one accepts the Pareto weight as the true weight).

Let us consider the two extreme cases of the bottom panel of Fig. 2 in order to develop some intuition about the Pareto weights. The ratio of Pareto to egalitarian weights is highest for a paper co-authored by Barberis et al. (1998). It was cited 503 times. This is extraordinary for Barberis (whose papers are cited 119 times on average), not so special for Sheifer (whose papers are cited 213 times on average) and run-of-the-mill for Vishny (whose papers are cited 430 times on average). The egalitarian weights are one-third for each author. The Pareto weights are 15% for Barberis, 41% for Shleifer and 44% for Vishny.

At the other extreme lies a paper by Aghion et al. (2010). It was cited only four times.^{Footnote 8} This is exceptional for Shleifer (213 citations on average), not uncommon for Aghion (35 citations on average), common for Cahuc (11 citations on average) and as expected for Algan (5 citations on average). Therefore, the Pareto weights are 0.29 (Algan), 0.28 (Cahuc), 0.24 (Aghion) and 0.18 (Shleifer); the egalitarian weight is 0.25. This highlights another property of Pareto weights: Because a probability density function integrates to one, scholars with a high probability of a large number of citations have a low probability of a small number of citations. Pareto weights thus attribute a large share of the citations to highly-cited papers to highly-cited co-authors, and a small share of the citations to little-cited papers to highly-cited co-authors.

The above results are for the 0th iteration of the Pareto weights. See Eqs. 2–3. I computed the 1st iteration for Shleifer and his three core collaborators: La Porta, Lopez-de-Silanes and Vishny. There are seven papers with these four people as co-authors, and four of these papers are cited more than 500 times. There are another four papers by La Porta, Lopez-de-Silanes and Shleifer; nine papers by La Porta, Lopez-de-Silanes, Shleifer and others; seven papers by Shleifer and Vishny; and eleven papers by Shleifer, Vishny and others. All of Vishny’s papers are co-authored by Shleifer; 80% of La Porta’s papers; and 71% of Lopez-de-Silanes’ papers. 49% of Shleifer’s papers are with some or all of these core collaborators.

Table 2 repeats some of the characteristics from Table 1 and adds new ones for these four scholars. The Pareto weights allocate 34% of citations to Shleifer and Vishny, compared to 28–29% to La Porta and Lopez-de-Silanes. The latter two have lower Pareto indices and thus a greater probability of publishing highly-cited papers. However, Shleifer and Vishny tend to publish with fewer co-authors, and this effect dominates the difference in Pareto indices.

Table 2 Selected characteristics of the four core authors in the sample: number of cited papers (P); average number of authors (A); average number of citations (C⁽⁰⁾) and after egalitarian (C^(E)) correction and Pareto correction (C⁽¹⁾, C⁽²⁾); and Pareto index for all citations (P⁽⁰⁾) and after egalitarian (P^(E)) correction and Pareto correction (P⁽¹⁾, P⁽²⁾)

Full size table

This effect is reinforced in the first iteration, in which the Pareto index is calculated for the attributed citations. The average number of citations and the Pareto index fall for each of the four authors (as they receive 100% or less of the citations). However, the Pareto index rises more for La Porta and Lopez-de-Silanes than for Shleifer and Vishny.

Table 2 also shows the attributed citations and Pareto indices for the second iteration. Although the attribution again shifts in favour of Shleifer and Vishny, the differences with the first iteration are minimal. At least for this group of authors, the first iteration appears to be a reasonable approximation. Table 2 further shows that the egalitarian attribution is, at least in this case, a reasonable approximation (but not in all cases as shown in Fig. 2).

Figure 3 highlights the difference between the 0th and 1st iterations for the seven papers co-authored by the four core scholars. Three things stand out. Firstly, there is a change in the order of attribution. Whereas in the 0th iteration, the credit went to La Porta first, Vishny second, Lopez-de-Silanes third and Shleifer fourth; in the 1st iteration, Shleifer is first, followed by Vishny, La Porta and Lopez-de-Silanes. In both iterations, there is a difference with the egalitarian attribution (0.25)—but, noting the vertical scale of Fig. 3 (2.25–2.65), the difference is small.^{Footnote 9} Secondly, attribution varies less with the number of citations. This is because differences in the Pareto index matter more for higher citation numbers, and citations numbers are lower when shared between co-authors.

The above results are based on the number of citations per paper. This is a proper measure for the eventual impact of a scholar, but Shleifer is an active researcher and some of his papers were published too recently to amass a large number of citations (see above). Therefore, I repeated the analysis with citations per year—specifically, citations divided by 2012 minus the year of publication. Using this metric, 33.81% of citations per year are attributed to Shleifer. This compares to 33.75% of citations. In this case, therefore, citations and citation-rates yield indistinguishable results.

Discussion and conclusion

I propose an objective method to attribute citations to co-authors. The Pareto weight is based on the probability of observing a number of citations given the author’s citation record. Assuming that citation numbers follow a Pareto distribution, there is a closed-form solution to compute the Pareto weight. However, one needs a few iterations and data on the scholar in question as well as on her co-authors and their co-authors. In the examples used in this paper, the Pareto weights attribute up to 25% more or less citations to an author than do equal weights. The Pareto weights are very different from rank-based weights.

In future research, it would be good to test the current proposal with other data. A longitudinal study would be particularly interesting. Over time, a scholar’s publication and citation record changes. Using Pareto weights, the attribution of citations changes too. One could, of course, also consider alternative distributional assumptions, particularly when modeling citation-rates rather than citations (Fok and Franses 2007; Franses 2003).

Notes

Note that collaborative research tends to be cited more often (Levitt and Thelwall 2010).
As an alternative, one could rely on survey data (Vinkler 1993).
Strictly, the Pareto distribution is defined on real numbers C > μ. This is convenient if citations are shared between co-authors (as done below). For now, one can think of f(C) as F(C + 0.5) − F(C − 0.5).
http://ideas.repec.org/top/top.person.all.html.
http://www.scopus.com/home.url.
The current author’s Shleifer number is 3.
Note the difference with the $ \hbar $ index (Hirsch 2010), which always gives full credit to the most senior author and gives either full or no credit to junior authors.
Note that this is a recent paper. This issue is further discussed below.
There is a large difference with the standard rank attribution (Hagen 2009; Hodge and Greenberg 1981; Sekercioglu 2008). In that case, La Porta would be attributed 48% of the citations, Lopez-de-Silanes 24%, Shleifer 16%, and Vishny 12%.

References

Abbas, A. (2011) Weighted indices for evaluating the quality of research with multiple authorship. Scientometrics, 88(1), 107–131.
Google Scholar
Aghion, P., Algan, Y., Cahuc, P., & Shleifer, A. (2010). Regulation and distrust. Quarterly Journal of Economics, 125(3), 1015–1049.
Article Google Scholar
Barberis, N., Shleifer, A., & Vishny, R. (1998). A model of investor sentiment. Journal of Financial Economics, 49(3), 307–343.
Article Google Scholar
Batista, P. D., Campiteli, M. G., Kinouchi, O., & Martinez, A. S. (2006). Is it possible to compare researchers with different scientific interests? Scientometrics, 68(1), 179–189.
Article Google Scholar
Egghe, L. (1987). An exact calculation of Price’s law for the law of Lotka. Scientometrics, 11(1–2), 81–97.
Article Google Scholar
Egghe, L. (1991). The exact place of Zipf’s and Pareto’s law amongst the classical informetric laws. Scientometrics, 20(1), 93–106.
Article Google Scholar
Egghe, L. (1998). Mathematical theories of citation. Scientometrics, 43(1), 57–62.
Article MathSciNet Google Scholar
Egghe, L. (2005). A characterization of the law of Lotka in terms of sampling. Scientometrics, 62(3), 321–328.
Article Google Scholar
Ellison, G. (2010), How does the market use citation data? The Hirsch index in economics, Working Paper 3188, CESifo, Munich.
Fok, D., & Franses, P. H. (2007). Modeling the diffusion of scientific publications. Journal of Econometrics, 139, 376–390.
Article MathSciNet Google Scholar
Franses, P. H. (2003). The diffusion of scientific publications: The case of Econometrica, 1987. Scientometrics, 56(1), 29–42.
Article MathSciNet Google Scholar
Hagen, N. T. (2009). Credit for coauthors. Science, 323(5914), 583.
Article Google Scholar
Hirsch, J. E. (2005). An index to quantify an individual’s scientific research output. Proceedings of the National Academy of Science of the USA, 102, 16569–16572.
Article Google Scholar
Hirsch, J. E. (2010). An index to quantify an individual’s scientific research output that takes into account the effect of multiple coauthorship. Scientometrics, 85(3), 741–754.
Article MathSciNet Google Scholar
Hodge, S. E., & Greenberg, D. A. (1981). Publication credit. Science, 213, 950.
Google Scholar
Levitt, J. M., & Thelwall, M. (2010). Does the higher citation of collaborative research differ from region to region? A case study of Economics. Scientometrics, 85(1), 171–183.
Article Google Scholar
Pareto, V. (1896). Cours d’Economie Politique. Lausanne: F. Rouge.
Google Scholar
Schreiber, M. (2008). A modification of the h-index: The hm-index accounts for multi-authored manuscripts. Journal of Informetrics, 2(3), 211–216.
Article MathSciNet Google Scholar
Sekercioglu, C. H. (2008). Quantifying coauthor contributions. Science, 322(5900), 371.
Article Google Scholar
Vinkler, P. (1993). Research contribution, authorship and team cooperativeness. Scientometrics, 26(1), 213–230.
Article Google Scholar
Zhang, C. T. (2009). A proposal for calculating weighted citations based on author rank. EMBO Reports, 10(5), 416–417.
Article Google Scholar

Download references

Open Access

This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Author information

Authors and Affiliations

Economic and Social Research Institute, Dublin, Ireland
Richard S. J. Tol
Institute for Environmental Studies, Vrije Universiteit, Amsterdam, The Netherlands
Richard S. J. Tol
Department of Spatial Economics, Vrije Universiteit, Amsterdam, The Netherlands
Richard S. J. Tol
Department of Economics, Trinity College, Dublin, Ireland
Richard S. J. Tol

Authors

Richard S. J. Tol
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Richard S. J. Tol.

Rights and permissions

Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Reprints and permissions

About this article

Cite this article

Tol, R.S.J. Credit where credit’s due: accounting for co-authorship in citation counts. Scientometrics 89, 291–299 (2011). https://doi.org/10.1007/s11192-011-0451-5

Download citation

Received: 09 May 2011
Published: 16 July 2011
Issue Date: October 2011
DOI: https://doi.org/10.1007/s11192-011-0451-5

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Credit where credit’s due: accounting for co-authorship in citation counts

Abstract

Similar content being viewed by others

CiteRank: A Method to Evaluate Researchers Influence Based on Citation and Collaboration Networks

Impact Factor and Altmetrics: What Is the Future?

Combining the weighted and unweighted Euclidean indices: a graphical approach

Introduction

A method for attributing citations

The data

Results

Discussion and conclusion

Notes

References

Open Access

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Credit where credit’s due: accounting for co-authorship in citation counts

Abstract

Similar content being viewed by others

CiteRank: A Method to Evaluate Researchers Influence Based on Citation and Collaboration Networks

Impact Factor and Altmetrics: What Is the Future?

Combining the weighted and unweighted Euclidean indices: a graphical approach

Introduction

A method for attributing citations

The data

Results

Discussion and conclusion

Notes

References

Open Access

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation