# The productivity of top researchers: a semi-nonparametric approach

• Published:

## Abstract

Research productivity distributions exhibit heavy tails because it is common for a few researchers to accumulate the majority of the top publications and their corresponding citations. Measurements of this productivity are very sensitive to the field being analyzed and the distribution used. In particular, distributions such as the lognormal distribution seem to systematically underestimate the productivity of the top researchers. In this article, we propose the use of a (log)semi-nonparametric distribution (log-SNP) that nests the lognormal and captures the heavy tail of the productivity distribution through the introduction of new parameters linked to high-order moments. The application uses scientific production data on 140,971 researchers who have produced 253,634 publications in 18 fields of knowledge (O’Boyle and Aguinis in Pers Psychol 65(1):79–119, 2012) and publications in the field of finance of 330 academic institutions (Borokhovich et al. in J Finance 50(5):1691–1717, 1995), and shows that the log-SNP distribution outperforms the lognormal and provides more accurate measures for the high quantiles of the productivity distribution.

## Notes

1. Different weight functions w(x) can be used; for details, see Abramowitz and Stegun (1972, pp. 774–775). We will consider P 0(x) = 1.

2. For more details about the Edgeworth and Gram–Charlier series, see Kendall and Stuart (1977, pp. 167–172).

3. It must be noted that given a truncating order, the resulting distribution is purely parametric, but the truncating order is flexible to achieve a more accurate approximation to a given distribution. Without loss of generality, we will assume that d 0 = 1.

4. Log-SNP’s moments can be directly derived as $$E\left[ {z^{t} } \right] = e^{{\mu t + \frac{1}{2}t^{2} \sigma^{2} }} \left[ {1 + \sum\nolimits_{s = 1}^{n} {d_{s} \left( {\sigma t} \right)^{s} } } \right]$$ (see Ñíguez et al. 2013).

5. It should be noted that the different size of journals in the JCR categories represents a shortcoming of the selection procedure. Nevertheless, it is not clear if other arbitrary selection method would yield to better results and, anyhow, this issue does not affect the advantages of the methodology proposed in this paper.

6. For details about the data treatment, see O’Boyle and Aguinis (2012), p. 86.

7. We took the JCR of the year 2007 to be consistent with O’Boyle and Aguinis (2012), as that was the year used by the authors to select the five main journals within each field of knowledge.

8. The code for the implementation of the maximum likelihood estimation algorithm in R package is available upon request.

9. Note that we did not include the d s parameters for s odd, after having tested that they were not significantly different from zero. This result reinforces the fact that the parameter σ captures all relevant features about the skewness. It must be highlighted that the latter does not contradict the fact that the d s parameters for s even are highly significant, which means that productivity distributions have very thick tails and thus require different parameters to provide accurate measures of the “probability of being a very top researcher” in every field.

10. The quantiles of the log-SNP distribution are obtained from the cdf displayed in Eq. (15) and the Inverse Transform Method (ITM).

## Acknowledgments

We thank Herman Aguinis and Ernest O’Boyle for allowing us to use their database on academic productivity compiled in O’Boyle and Aguinis (2012). We also thank two anonymous referees for their constructive and valuable suggestions. Financial support from the Spanish Ministry of Economics and Competitiveness, through the project ECO2013-44483-P, FAPA-Uniandes, through the project PR.3.2016.2807, and Universidad EAFIT are also gratefully acknowledged.

## Appendices

### Appendix 1

This appendix lists the first eight d s parameters in terms of the central moments of the SNP distribution. For more information, see Del Brio and Perote (2012).

$$d_{1} = \mu_{1}$$
(18)
$$d_{2} = \frac{1}{2}\left( {\mu_{2} - 1} \right)$$
(19)
$$d_{3} = \frac{1}{6}\left( {\mu_{3} - 3\mu_{1} } \right)$$
(20)
$$d_{4} = \frac{1}{24}\left( {\mu_{4} - 6\mu_{2} + 3} \right)$$
(21)
$$d_{5} = \frac{1}{120}\left( {\mu_{5} - 10\mu_{3} + 15\mu_{1} } \right)$$
(22)
$$d_{6} = \frac{1}{720}\left( {\mu_{6} - 15\mu_{4} + 45\mu_{2} - 15} \right)$$
(23)
$$d_{7} = \frac{1}{5040}\left( {\mu_{7} - 21\mu_{5} + 105\mu_{3} - 105\mu_{1} } \right)$$
(24)
$$d_{8} = \frac{1}{40320}\left( {\mu_{8} - 28\mu_{6} + 210\mu_{4} - 420\mu_{2} + 105} \right)$$
(25)

### Appendix 2

This appendix derives the cdf of the SNP distribution.

\begin{aligned} G_{x} \left( a \right) = & \int\limits_{ - \infty }^{a} {g\left( {x;\varvec{d}} \right)dx = \int\limits_{ - \infty }^{a} {\phi \left( x \right)dx} + \sum\limits_{s = 1}^{n} {d_{s} \int\limits_{ - \infty }^{a} {H_{s} \left( x \right)\phi \left( x \right)dx} } } \\ = & \int\limits_{ - \infty }^{a} {\phi \left( x \right)dx - \left. {\sum\limits_{s = 1}^{n} {d_{s} H_{s - 1} \left( x \right)\phi \left( x \right)} } \right|}_{ - \infty }^{a} \\ = & \int\limits_{ - \infty }^{a} {\phi \left( x \right)dx - \phi \left( a \right)\sum\limits_{s = 1}^{n} {d_{s} H_{s - 1} \left( a \right)} } \\ \end{aligned}

Given that $$\mathop {\lim }\limits_{x \to \pm \infty } H_{s} \left( x \right)\phi \left( x \right) = 0 \quad \forall s \ge 1,$$ it follows that

\begin{aligned} \int {H_{s} \left( x \right)\phi \left( x \right)dx} = & \int {\left( { - 1} \right)^{s} \frac{{d^{s} \phi \left( x \right)}}{{dx^{s} }}dx_{t} = \left( { - 1} \right)^{s} \frac{{d^{s - 1} \phi \left( x \right)}}{{dx^{s - 1} }}} \\ = &\, \left( { - 1} \right)^{s} \left( { - 1} \right)^{s - 1} H_{s - 1} \left( x \right)\phi \left( x \right) = - H_{s - 1} \left( x \right)\phi \left( x \right) \\ \end{aligned}

Cortés, L.M., Mora-Valencia, A. & Perote, J. The productivity of top researchers: a semi-nonparametric approach. Scientometrics 109, 891–915 (2016). https://doi.org/10.1007/s11192-016-2072-5