Abstract
Citation numbers and other quantities derived from bibliographic databases are becoming standard tools for the assessment of productivity and impact of research activities. Though widely used, still their statistical properties have not been well established so far. This is especially true in the case of bibliometric indicators aimed at the evaluation of individual scholars, because large-scale data sets are typically difficult to be retrieved. Here, we take advantage of a recently introduced large bibliographic data set, Google Scholar Citations, which collects the entire publication record of individual scholars. We analyze the scientific profile of more than 30,000 researchers, and study the relation between the h-index, the number of publications and the number of citations of individual scientists. While the number of publications of a scientist has a rather weak relation with his/her h-index, we find that the h-index of a scientist is strongly correlated with the number of citations that she/he has received so that the number of citations can be effectively be used as a proxy of the h-index. Allowing for the h-index to depend on both the number of citations and the number of publications, we find only a minor improvement.
Similar content being viewed by others
References
Adler, R., Ewing, J., & Taylor, P. (2009). Citation statistics. Statistical Science, 24(1), 1–14.
Alonso, S., Cabrerizo, F., Herrera-Viedma, E., & F, H. (2009). h-Index: A review focused in its variants, computation and standardization for different scientific fields. Journal of Informetrics, 3(4), 273–289.
Bar-Ilan, J. (2008). Which h-index?–a comparison of WOS, Scopus and Google Scholar. Scientometrics, 74(2), 257–271.
Bornmann, L., & Daniel, H. D. (2006). Selecting scientific excellence through committee peer review: A citation analysis of publications previously published to approval or rejection of post-doctoral research fellowship applicants. Scientometrics, 68(3), 427–440.
Bornmann, L., & Daniel, H. D. (2008). What do citation counts measure? A review of studies on citing behavior. Journal of Documentation, 64(1), 45–80.
Bornmann, L., Wallon, G., & Ledin, A. (2008). Does the committee peer review select the best applicants for funding? An investigation of the selection process for two European molecular biology organization programmes. PLoS ONE, 3(10), e3480.
Cabanac, G. (2013). Experimenting with the partnership ability \(\varphi\)-index on a million computer scientists. Scientometrics.
Costas, R., & Bordons, M. (2007). The h-index: Advantages, limitations and its relation with other bibliometric indicators at the micro level. Journal of Informetrics, 1(3), 193–203.
Costas, R., & Bordons, M. (2008). Is g-index better than h-index? An exploratory study at the individual level. Scientometrics, 77(2), 267–288.
Davis, P., & Papanek, G. F. (1984). Faculty ratings of major economics departments by citations. The American Economic Review, 74(1), 225–230.
De Solla Price, D. J. (1965). Networks of scientific papers. Science, 149(3683), 510–515.
Egghe, L. (2006). Theory and practise of the g-index. Scientometrics, 69(1), 131–152.
Egghe, L. (2010). The Hirsch index and related impact measures. Annual Review of Information Science and Technology, 44(1), 65–114.
Egghe, L., & Rousseau, R. (2006). An informetric model for the Hirsch-index. Scientometrics, 69(1), 121–129.
Garfield, E. (1998). The impact factor and using it correctly. Der Unfallchirurg, 101(6), 413–414.
Glänzel, W. (2006). On the h-index: A mathematical approach to a new measure of publication activity and citation impact. Scientometrics, 67(2), 315–321.
Hartley, J. (2012). To cite or not to cite: Author self-citations and the impact factor. Scientometrics, 92(2), 313–317.
Harzing, A. W. K., & van der Wal, R. (2008). Google Scholar as a new source for citation analysis. Ethics in Science and Environmental Politics, 8(1), 61–73.
Hendricks, W. A., & Robey, K. W. (1936). The sampling distribution of the coefficient of variation. The Annals of Mathematical Statistics, 7(3), 129–132.
Hirsch, J. E. (2005). An index to quantify an individual’s scientific research output. Proceedings of the National Academy of Sciences of the United States of America, 102(46), 16,569–16,572.
Iglesias, J., & Pecharromán, C. (2007). Scaling the h-index for different scientific ISI fields. Scientometrics, 73(3), 303–320.
Jacsó, P. (2005). As we may search—comparison of major features of web of science, scopus and Google Scholar citation-based and citation-enhanced databases. Current Science, 89(9), 1537–1547.
Jacsó, P. (2005). Visualizing overlap and rank differences among web-wide search engines. Online Information Review, 29(5), 554–560.
Jacsó, P. (2010). Metadata mega mess in Google Scholar. Online Information Review, 34(1), 175–191.
Kinney, A. L. (2007). National scientific facilities and their science impact on nonbiomedical research. Proceedings of the National Academy of Sciences of the United States of America, 104(46), 17,943–17,947.
Labbé, C. (2011). Ike Antkare, one of the great stars in the scientific firmament. ISSI newsletter, 6(2), 48–52.
Laherrère, J., & Sornette, D. (1998). Stretched exponential distributions in nature and economy: “Fat tails” with characteristic scales. European Physical Journal B, 2(4), 525–539.
Lehmann, S., Jackson, A. D., & Lautrup, B. E. (2006). Measures for measures. Nature, 444(7122), 1003–1004.
MacRoberts, M. H., & MacRoberts, B. R. (1989). Problems of citation analysis: A critical review. Journal of the American Society for Information Science, 40(5), 342–349.
MacRoberts, M. H., & MacRoberts, B. R. (1996). Problems of citation analysis. Scientometrics, 36(3), 435–444.
Meho, L. I., & Yang, K. (2007). Impact of data sources on citation counts and rankings of lis faculty: Web of science versus Scopus and Google Scholar. Journal of the American Society for Information Science and Technology, 58(13), 2105–2125.
Petersen, A. M., Jung, W. s., Yang, J. s., & Stanley, H. E. (2010). Quantitative and empirical demonstration of the Matthew effect in a study of career longevity. Proceedings of the National Academy of Sciences, 108(1), 18–23.
Petersen, A. M., Wang, F., & Stanley, H. E. (2010). Methods for measuring the citations and productivity of scientists across time and discipline. Physical Review E, 81(3), 1–9.
Petersen, A. M., Stanley, H. E., & Succi, S. (2011). Statistical regularities in the rank-citation profile of scientists. Scientific reports, 1, 181.
Petersen, A. M., Riccaboni, M., Stanley, H. E., & Pammolli, F. (2012). Persistence and uncertainty in the academic career. Proceedings of the National Academy of Sciences, 109(14), 5213–5218.
Pratelli, L., Baccini, A., Barabesi, L., & Marcheselli, M. (2012). Statistical analysis of the Hirsch Index. Scandinavian Journal of Statistics, 39(4), 681–694.
van Raan, A. F. J. (2006). Comparison of the Hirsch-index with standard bibliometric indicators and with peer judgment for 147 chemistry research groups. Scientometrics, 67(3), 491–502.
Radicchi, F., & Castellano, C. (2012). A reverse engineering approach to the suppression of citation biases reveals universal properties of citation distributions. PLoS ONE, 7(3), e33,833.
Radicchi, F., Fortunato, S., & Castellano, C. (2008). Universality of citation distributions: Toward an objective measure of scientific impact. Proceedings of the National Academy of Sciences of the United States of America, 105(45), 17,268–17,272.
Radicchi, F., Fortunato, S., Markines, B., & Vespignani, A. (2009). Diffusion of scientific credits and the ranking of scientists. Physical Review E, 80(5), 056,103.
Redner, S. (1998). How popular is your paper? An empirical study of citation distribution. European Physical Journal B, 4(2), 131–134.
Redner, S. (2010). On the meaning of the h-index. Journal of Statistical Mechanics (3), L03,005.
Rosvall, M., & Bergstrom, C. T. (2008). Maps of random walks on complex networks reveal community structure. Proceedings of the National Academy of Sciences of the United States of America, 105(4), 1118–1123.
Schreiber, M., Malesios, C., & S, P. (2011). Categorizing h-index variants. Research Evaluation, 21(3), 397–409.
Schubert, A., & Glänzel, W. (2007). A systematic analysis of Hirsch-type indices for journals. Journal of Informetrics, 1(3), 179–184.
Spruit H.C. (2012) The relative significance of the H-index. ArXiv e-prints 1201.5476
Stringer, M. J., Sales-Pardo, M., & Amaral, L. A. N. (2008). Effectiveness of journal ranking schemes as a tool for locating Information. PLoS ONE, 3(2), e1683.
Stringer, M. J., Sales-Pardo, M., & Amaral, L. A. N. (2010). Statistical validation of a global model for the distribution published in a scientific journal. Journal of the American Society for Information Science, 61(7), 1377–1385.
Wallace, M. L., Larivière, V., & Gingras, Y. (2008). Modeling a century of citation distributions. Journal of Informetrics, 3(4), 296–303.
West, J., Bergstrom, T., Bergstrom, C. T., Road, H. P., & Fe, S. (2010). Big macs and eigenfactor scores : Don’t let correlation coefficients fool you. Journal of the American Society for Information Science, 61(2008), 1800–1807.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Radicchi, F., Castellano, C. Analysis of bibliometric indicators for individual scholars in a large data set. Scientometrics 97, 627–637 (2013). https://doi.org/10.1007/s11192-013-1027-3
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-013-1027-3