The measurement of the quality of academic research is a rather controversial issue. Recently Hirsch has proposed a measure that has the advantage of summarizing in a single summary statistics the information that is contained in the citation counts of each scientist. From that seminal paper, a huge amount of research has been lavished, focusing on one hand on the development of correction factors to the h index and on the other hand, on the pros and cons of such measure proposing several possible alternatives. Although the h index has received a great deal of interest since its very beginning, only few papers have analyzed its statistical properties and implications. In the present work we propose a statistical approach to derive the distribution of the h index. To achieve this objective we work directly on the two basic components of the h index: the number of produced papers and the related citation counts vector, by introducing convolution models. Our proposal is applied to a database of homogeneous scientists made up of 131 full professors of statistics employed in Italian universities. The results show that while “sufficient” authors are reasonably well detected by a crude bibliometric approach, outstanding ones are underestimated, motivating the development of a statistical based h index. Our proposal offers such development and in particular confidence intervals to compare authors as well as quality control thresholds that can be used as target values.
Keywordsh-Index Discrete extreme value models Convolution models
The authors thank the referee(s) for the useful comments and suggestion. The authors also thank the financial support of the project MIUR PRIN MISURA—‘Multivariate models for risk assessment’.
- Ball, P. (2005). Index aims for fair ranking of scientists. Nature, 436, 900.Google Scholar
- Bühlmann, H. (1970). Mathematical methods in risk theory, Grundlehrenband 172. Heidelberg: Springer.Google Scholar
- Cruz, M.G. (2002). Modeling, measuring and hedging operational risk. London: Wiley.Google Scholar
- Evert, S., (2004). A simple LNRE model for random character sequences. In Proceedings of the 7mes Journes Internationales dAnalyse Statistique des Donnes Textuelles (JADT 2004) (pp. 411–422). Louvain-la-Neuve, BelgiumGoogle Scholar
- Evert, S. & Baroni, M., (2007). zipfR: Word frequency distributions in R. In Proceedings of the 45th annual meeting of the association for computational linguistics, posters and demonstrations session. Prague, Czech Republic.Google Scholar
- Frachot, A., Moudoulaud, O. & Roncalli, T. (2007). Loss distribution approach in practice. In M.K. Ong (Ed.), The basel handbook. A guide for financial practitioners. London: Risk Books.Google Scholar
- Harzing, A.W. (2007). Publish or Perish, available from http://www.harzing.com/pop.htm.
- Mandelbrot, B. (1962). On the theory of word frequencies and on related Markovian models of discourse. In R. Jakobson (Ed.), Structure of language and its mathematical aspects (pp. 190–219). Providence, RI: American Mathematical Society.Google Scholar
- Siegel, S. & Castellan N.J., (1988). Nonparametric statistics for the behavioral sciences (2nd ed.). New York: McGraw-Hill.Google Scholar