, Volume 99, Issue 2, pp 299–312 | Cite as

On a statistical h index



The measurement of the quality of academic research is a rather controversial issue. Recently Hirsch has proposed a measure that has the advantage of summarizing in a single summary statistics the information that is contained in the citation counts of each scientist. From that seminal paper, a huge amount of research has been lavished, focusing on one hand on the development of correction factors to the h index and on the other hand, on the pros and cons of such measure proposing several possible alternatives. Although the h index has received a great deal of interest since its very beginning, only few papers have analyzed its statistical properties and implications. In the present work we propose a statistical approach to derive the distribution of the h index. To achieve this objective we work directly on the two basic components of the h index: the number of produced papers and the related citation counts vector, by introducing convolution models. Our proposal is applied to a database of homogeneous scientists made up of 131 full professors of statistics employed in Italian universities. The results show that while “sufficient” authors are reasonably well detected by a crude bibliometric approach, outstanding ones are underestimated, motivating the development of a statistical based h index. Our proposal offers such development and in particular confidence intervals to compare authors as well as quality control thresholds that can be used as target values.


h-Index Discrete extreme value models Convolution models 



The authors thank the referee(s) for the useful comments and suggestion. The authors also thank the financial support of the project MIUR PRIN MISURA—‘Multivariate models for risk assessment’.


  1. Ball, P. (2005). Index aims for fair ranking of scientists. Nature, 436, 900.Google Scholar
  2. Beirlant, J. & Einmahl, J.H.J. (2010). Asymptotics for the Hirsch index. Scandinavian Journal of Statistics 37, 355–364.CrossRefMATHMathSciNetGoogle Scholar
  3. Bühlmann, H. (1970). Mathematical methods in risk theory, Grundlehrenband 172. Heidelberg: Springer.Google Scholar
  4. Burrell, Q.L. (2007). Hirsch’s h-index: A stochastic model. Journal of Informetrics, 1, 16–25.CrossRefGoogle Scholar
  5. Cerchiello, P., Giudici, P. (2012). On the distribution of functionals of discrete ordinal variables. Statistics & Probability Letters, 82, 2044–2049.CrossRefMATHMathSciNetGoogle Scholar
  6. Cruz, M.G. (2002). Modeling, measuring and hedging operational risk. London: Wiley.Google Scholar
  7. Dalla Valle, L. & Giudici P., (2008). A Bayesian approach to estimate the marginal loss distributions in operational risk management. Computational Statistics & Data Analysis, 52, 3107–3127.CrossRefMATHMathSciNetGoogle Scholar
  8. Evert, S., (2004). A simple LNRE model for random character sequences. In Proceedings of the 7mes Journes Internationales dAnalyse Statistique des Donnes Textuelles (JADT 2004) (pp. 411–422). Louvain-la-Neuve, BelgiumGoogle Scholar
  9. Evert, S. & Baroni, M., (2007). zipfR: Word frequency distributions in R. In Proceedings of the 45th annual meeting of the association for computational linguistics, posters and demonstrations session. Prague, Czech Republic.Google Scholar
  10. Frachot, A., Moudoulaud, O. & Roncalli, T. (2007). Loss distribution approach in practice. In M.K. Ong (Ed.), The basel handbook. A guide for financial practitioners. London: Risk Books.Google Scholar
  11. Gabaix, X. (2009). Power laws in economics and finance. Annual Review of Economics, 1, 255–293.CrossRefGoogle Scholar
  12. Glanzel, W. (2006) On the h-index—A mathematical approach to a new measure of publication activity and citation impact. Scientometrics, 67, 315–321.CrossRefGoogle Scholar
  13. Harzing, A.W. (2007). Publish or Perish, available from
  14. Hirsch, J.E. (2005). An index to quantify an individual’s scientific research output. Proceedings of the National Academy of Sciences of the United States of America, 102, 16569–16572.CrossRefGoogle Scholar
  15. Iglesias, J.E. & Pecharroman C. (2007). Scaling the h-index for different scientific ISI fields. Scientometrics, 73, 303–320.CrossRefGoogle Scholar
  16. Izsak, F. (2006). Maximum likelihood estimation for constrained parameters of multinomial distributions—Application to Zipf-Mandelbrot models. Computational Statistics & Data Analysis, 51, 1575–1583.CrossRefMATHMathSciNetGoogle Scholar
  17. Mandelbrot, B. (1962). On the theory of word frequencies and on related Markovian models of discourse. In R. Jakobson (Ed.), Structure of language and its mathematical aspects (pp. 190–219). Providence, RI: American Mathematical Society.Google Scholar
  18. Pratelli, L., Baccini, A., Barabesi, L. & Marcheselli, M., (2012). Statistical analysis of the Hirsch index. Scandinavian Journal of Statistics, 39, 681–694.CrossRefMATHMathSciNetGoogle Scholar
  19. Siegel, S. & Castellan N.J., (1988). Nonparametric statistics for the behavioral sciences (2nd ed.). New York: McGraw-Hill.Google Scholar
  20. Todeschini, R. (2011). The j-index: A new bibliometric index and multivariate comparisons between other common indices. Scientometrics, 87, 621–639.CrossRefGoogle Scholar

Copyright information

© Akadémiai Kiadó, Budapest, Hungary 2013

Authors and Affiliations

  1. 1.University of PaviaPaviaItaly

Personalised recommendations