The measurement of the quality of academic research is a rather controversial issue. Recently Hirsch has proposed a measure that has the advantage of summarizing in a single summary statistics the information that is contained in the citation counts of each scientist. From that seminal paper, a huge amount of research has been lavished, focusing on one hand on the development of correction factors to the h index and on the other hand, on the pros and cons of such measure proposing several possible alternatives. Although the h index has received a great deal of interest since its very beginning, only few papers have analyzed its statistical properties and implications. In the present work we propose a statistical approach to derive the distribution of the h index. To achieve this objective we work directly on the two basic components of the h index: the number of produced papers and the related citation counts vector, by introducing convolution models. Our proposal is applied to a database of homogeneous scientists made up of 131 full professors of statistics employed in Italian universities. The results show that while “sufficient” authors are reasonably well detected by a crude bibliometric approach, outstanding ones are underestimated, motivating the development of a statistical based h index. Our proposal offers such development and in particular confidence intervals to compare authors as well as quality control thresholds that can be used as target values.
Keywordsh-Index Discrete extreme value models Convolution models
- Ball, P. (2005). Index aims for fair ranking of scientists. Nature, 436, 900.Google Scholar
- Bühlmann, H. (1970). Mathematical methods in risk theory, Grundlehrenband 172. Heidelberg: Springer.Google Scholar
- Cruz, M.G. (2002). Modeling, measuring and hedging operational risk. London: Wiley.Google Scholar
- Evert, S., (2004). A simple LNRE model for random character sequences. In Proceedings of the 7mes Journes Internationales dAnalyse Statistique des Donnes Textuelles (JADT 2004) (pp. 411–422). Louvain-la-Neuve, BelgiumGoogle Scholar
- Evert, S. & Baroni, M., (2007). zipfR: Word frequency distributions in R. In Proceedings of the 45th annual meeting of the association for computational linguistics, posters and demonstrations session. Prague, Czech Republic.Google Scholar
- Frachot, A., Moudoulaud, O. & Roncalli, T. (2007). Loss distribution approach in practice. In M.K. Ong (Ed.), The basel handbook. A guide for financial practitioners. London: Risk Books.Google Scholar
- Harzing, A.W. (2007). Publish or Perish, available from http://www.harzing.com/pop.htm.
- Mandelbrot, B. (1962). On the theory of word frequencies and on related Markovian models of discourse. In R. Jakobson (Ed.), Structure of language and its mathematical aspects (pp. 190–219). Providence, RI: American Mathematical Society.Google Scholar
- Siegel, S. & Castellan N.J., (1988). Nonparametric statistics for the behavioral sciences (2nd ed.). New York: McGraw-Hill.Google Scholar