Understanding the Scientific Enterprise: Citation Analysis, Data and Modeling

  • Filippo RadicchiEmail author
  • Claudio Castellano
Part of the Computational Social Sciences book series (CSS)


The large amount of information contained in bibliographic databases has recently boosted the use of citations, and other indicators based on citation numbers, as tools for the quantitative assessment of scientific research. Citations counts are often interpreted as proxies for the scientific influence of papers, journals, scholars, and institutions. Given their importance in practical contexts, the interest in the study of bibliographic datasets is no longer restricted to specialists in bibliometrics but extends to scholars having very different primary fields of research. As a result, the recent past has witnessed a huge production of papers on this topic of research. The present chapter aims at providing a brief overview of the progress recently made in the analysis of bibliographic databases. In the first part of the chapter, we will focus our attention on studies devoted to the statistical description of distributions of citations received by individual publications. The second part is instead devoted at summarizing some recent research efforts towards the modeling of the citation dynamics and the growth of citation networks.



We are indebted to A.Vespignani and S.Fortunato for the core part of the chapter [84]. F. Radicchi acknowledges the support from the NSF grant SMA-1446078.


  1. 1.
    Lotka, A. J. (1926). The frequency distribution of scientific productivity. Journal of Washington Academy of Sciences, 16(12), 317–324.Google Scholar
  2. 2.
    Shockley, W. (1957). On the statistics of individual variations of productivity in research laboratories. Proceedings of the IRE, 45(3), 279–290.CrossRefGoogle Scholar
  3. 3.
    de Solla Price, D. J. (1965). Networks of scientific papers. Science, 149(3683), 510–515.CrossRefADSGoogle Scholar
  4. 4.
    de Solla Price, D. J. (1976). A general theory of bibliometric and other cumulative advantage processes. Journal of the American Society for Information Science, 27(5), 292–306.CrossRefGoogle Scholar
  5. 5.
    Newman, M. E. J. (2003). The structure and function of complex networks. SIAM Review, 45(2), 167–256.MathSciNetCrossRefADSzbMATHGoogle Scholar
  6. 6.
    MacRoberts, M. H., & MacRoberts, B. R. (1989). Problems of citation analysis: A critical review. Journal of the American Society for Information Science, 40(5), 342–349.CrossRefGoogle Scholar
  7. 7.
    MacRoberts, M. H., & MacRoberts, B. R. (1996). Problems of citation analysis. Scientometrics, 36(3), 435–444.CrossRefGoogle Scholar
  8. 8.
    Adler, R., Ewing, J., Taylor, P. (2009) Citation statistics. Statistical Science, 24(1), 1.MathSciNetCrossRefGoogle Scholar
  9. 9.
    Bornmann, L., & Daniel, H.-D. (2008). What do citation counts measure? A review of studies on citing behavior. Journal of Documentation, 64(1), 45–80CrossRefGoogle Scholar
  10. 10.
    Hirsch, J. E. (2005). An index to quantify an individual’s scientific research output. Proceedings of the National academy of Sciences of the United States of America, 102(46), 16569–16572.CrossRefADSGoogle Scholar
  11. 11.
    Egghe, L. (2006). Theory and practise of the g-index. Scientometrics, 69(1), 131–152.MathSciNetCrossRefGoogle Scholar
  12. 12.
    Garfield, E. (2006). The history and meaning of the journal impact factor. JAMA: The Journal of the American Medical Association, 295(1), 90–93.CrossRefGoogle Scholar
  13. 13.
    Davis, P., & Papanek, G. F. (1984). Faculty ratings of major economics departments by citations. The American Economic Review, 74(1), 225–230.Google Scholar
  14. 14.
    Kinney, A. L. (2007). National scientific facilities and their science impact on nonbiomedical research. Proceedings of the National Academy of Sciences, 104(46), 17943–17947.CrossRefADSGoogle Scholar
  15. 15.
    King, D. A. (2004). The scientific impact of nations. Nature, 430(6997), 311–316.CrossRefADSGoogle Scholar
  16. 16.
    Bornmann, L., & Daniel, H.-D. (2006). Selecting scientific excellence through committee peer review-a citation analysis of publications previously published to approval or rejection of post-doctoral research fellowship applicants. Scientometrics, 68(3), 427–440.CrossRefGoogle Scholar
  17. 17.
    Bornmann, L., Wallon, G., & Ledin, A. (2008). Does the committee peer review select the best applicants for funding? An investigation of the selection process for two European molecular biology organization programmes. PLoS One, 3(10), e3480.CrossRefADSGoogle Scholar
  18. 18.
    Web of Science. Available at
  19. 19.
    CrossRef. Available at
  20. 20.
    Scopus. Available at
  21. 21.
    GoogleScholar. Available at
  22. 22.
    Citeseer. Available at
  23. 23.
    inSpire. Available at
  24. 24.
    Redner, S. (1998). How popular is your paper? An empirical study of the citation distribution. The European Physical Journal B-Condensed Matter and Complex Systems, 4(2), 131–134.CrossRefADSGoogle Scholar
  25. 25.
    Laherrere, J., & Sornette, D. (1998). Stretched exponential distributions in nature and economy: “fat tails” with characteristic scales. The European Physical Journal B-Condensed Matter and Complex Systems, 2(4), 525–539.CrossRefADSGoogle Scholar
  26. 26.
    Tsallis, C., & de Albuquerque, M. P. (2000). Are citations of scientific papers a case of nonextensivity? The European Physical Journal B-Condensed Matter and Complex Systems, 13(4), 777–780.CrossRefADSGoogle Scholar
  27. 27.
    Redner, S. (2005). Citation statistics from more than a century of physical review. Physics Today, 58, 49–54.CrossRefGoogle Scholar
  28. 28.
    Seglen, P. O. (1992). The skewness of science. Journal of the American Society for Information Science, 43(9), 628–638.CrossRefGoogle Scholar
  29. 29.
    Vázquez, A. (2001). Statistics of citation networks. arXiv preprint cond-mat/0105031.Google Scholar
  30. 30.
    Lehmann, S., Lautrup, B., & Jackson, A. D. (2003). Citation networks in high energy physics. Physical Review E, 68(2), 026113.CrossRefADSGoogle Scholar
  31. 31.
    Bommarito, M. J., & Katz, D. M. (2009). Properties of the united states code citation network. Available at SSRN: or
  32. 32.
    Eom, Y.-H., & Fortunato, S. (2011). Characterizing and modeling citation dynamics. PLoS One, 6(9), e24926.CrossRefADSGoogle Scholar
  33. 33.
    Stringer, M. J., Sales-Pardo, M., & Nunes Amaral, L. A. (2008). Effectiveness of journal ranking schemes as a tool for locating information. PLoS One, 3(2), e1683.CrossRefADSGoogle Scholar
  34. 34.
    Radicchi, F., Fortunato, S., & Castellano, C. (2008). Universality of citation distributions: Toward an objective measure of scientific impact. Proceedings of the National Academy of Sciences, 105(45), 17268–17272.CrossRefADSGoogle Scholar
  35. 35.
    Castellano, C., & Radicchi, F. (2009). On the fairness of using relative indicators for comparing citation performance in different disciplines. Archivum immunologiae et therapiae experimentalis, 57(2), 85–90.CrossRefGoogle Scholar
  36. 36.
    Stringer, M. J., Sales-Pardo, M., & Amaral, L. A. N. (2010). Statistical validation of a global model for the distribution of the ultimate number of citations accrued by papers published in a scientific journal. Journal of the American Society for Information Science and Technology, 61(7), 1377–1385.CrossRefGoogle Scholar
  37. 37.
    Wallace, M. L., Larivière, V., & Gingras, Y. (2009). Modeling a century of citation distributions. Journal of Informetrics, 3(4), 296–303.CrossRefGoogle Scholar
  38. 38.
    Anastasiadis, A. D., de Albuquerque, M. P., de Albuquerque, M. P., & Mussi, D. B. (2010). Tsallis q-exponential describes the distribution of scientific citations – A new characterization of the impact. Scientometrics, 83(1), 205–218.CrossRefGoogle Scholar
  39. 39.
    van Raan, A. F. J. (2001). Two-step competition process leads to quasi power-law income distributions: Application to scientific publication and citation distributions. Physica A: Statistical Mechanics and Its Applications, 298(3), 530–536.CrossRefADSzbMATHGoogle Scholar
  40. 40.
    Van Raan, A. F. J. (2001). Competition amongst scientists for publication status: Toward a model of scientific publication and citation distributions. Scientometrics, 51(1), 347–357.CrossRefGoogle Scholar
  41. 41.
    Kryssanov, V. V., Kuleshov, E. L., Rinaldo, F. J., & Ogawa, H. (2007). We cite as we communicate: A communication model for the citation process. arXiv preprint cs/0703115.Google Scholar
  42. 42.
    Waltman, L., van Eck, N. J., & van Raan, A. F. J. (2012). Universality of citation distributions revisited. Journal of the American Society for Information Science and Technology, 63(1), 72–77.CrossRefGoogle Scholar
  43. 43.
    Evans, T. S., Hopkins, N., & Kaube, B. S. (2012). Universality of performance indicators based on citation and reference counts. Scientometrics, 93(2), 473–495.CrossRefGoogle Scholar
  44. 44.
    Radicchi, F., & Castellano, C. (2011). Rescaling citations of publications in physics. Physical Review E, 83(4), 046116.CrossRefADSGoogle Scholar
  45. 45.
    Bornmann, L., & Daniel, H.-D. (2009). Universality of citation distributions – A validation of Radicchi et al.’s relative indicator cf= c/c0 at the micro level using data from chemistry. Journal of the American Society for Information Science and Technology, 60(8), 1664–1670.CrossRefGoogle Scholar
  46. 46.
    Kaur, J., Radicchi, F., & Menczer, F. (2013). Universality of scholarly impact metrics. Journal of Informetrics, 7(4), 924–932.CrossRefGoogle Scholar
  47. 47.
    Leydesdorff, L., Radicchi, F., Bornmann, L., Castellano, C., & de Nooy, W. (2013). Field-normalized impact factors: A comparison of rescaling versus fractionally counted ifs. Journal of the American Society for Information Science and Technology, 64(11), 2299–2309.CrossRefGoogle Scholar
  48. 48.
    Chatterjee, A., Ghosh, A., & Chakrabarti, B. K. (2014). Universality of citation distributions for academic institutions and journals. arXiv preprint arXiv:1409.8029.Google Scholar
  49. 49.
    Radicchi, F., & Castellano, C. (2012). A reverse engineering approach to the suppression of citation biases reveals universal properties of citation distributions. PLoS One, 7(3), e33833.CrossRefADSGoogle Scholar
  50. 50.
    Lawless, J. F. (2011). Statistical models and methods for lifetime data (Vol. 362). New York: Wiley.Google Scholar
  51. 51.
    Li, Y., Radicchi, F., Castellano, C., & Ruiz-Castillo, J. (2013). Quantitative evaluation of alternative field normalization procedures. Journal of Informetrics, 7(3), 746–755.CrossRefGoogle Scholar
  52. 52.
    Crespo, J. A., Li, Y., & Ruiz-Castillo, J. (2013). The measurement of the effect on citation inequality of differences in citation practices across scientific fields. PLoS One, 8(3), e58727.CrossRefADSGoogle Scholar
  53. 53.
    Karrer, B., & Newman, M. E. J. (2009). Random acyclic networks. Physical Review Letters, 102(12), 128701.CrossRefADSGoogle Scholar
  54. 54.
    Karrer, B., & Newman, M. E. J. (2009). Random graph models for directed acyclic networks. Physical Review E, 80(4), 046110.CrossRefADSGoogle Scholar
  55. 55.
    Molloy, M., & Reed, B. (1998). The size of the giant component of a random graph with a given degree sequence. Combinatorics, Probability and Computing, 7(03), 295–305.MathSciNetCrossRefzbMATHGoogle Scholar
  56. 56.
    Milo, R., Shen-Orr, S., Itzkovitz, S., Kashtan, N., Chklovskii, D., & Alon, U. (2002). Network motifs: Simple building blocks of complex networks. Science, 298(5594), 824–827.CrossRefADSGoogle Scholar
  57. 57.
    Wu, Z.-X., & Holme, P. (2009). Modeling scientific-citation patterns and other triangle-rich acyclic networks. Physical Review E, 80(3), 037101.CrossRefADSGoogle Scholar
  58. 58.
    Yule, G. U. (1925). A mathematical theory of evolution, based on the conclusions of Dr. JC Willis, FRS. Philosophical Transactions of the Royal Society of London. Series B, Containing Papers of a Biological Character, 213, 21–87.Google Scholar
  59. 59.
    Simon, H. A. (1957). Models of man: Social and rational. New York: Wiley.zbMATHGoogle Scholar
  60. 60.
    Barabási, A.-L., & Albert, R. (1999). Emergence of scaling in random networks. Science, 286(5439), 509–512.MathSciNetCrossRefADSGoogle Scholar
  61. 61.
    Krapivsky, P. L., Redner, S., & Leyvraz, F. (2000). Connectivity of growing random networks. Physical Review Letters, 85(21), 4629.CrossRefADSGoogle Scholar
  62. 62.
    Dorogovtsev, S. N., Mendes, J. F. F., & Samukhin, A. N. (2000). Structure of growing networks with preferential linking. Physical Review Letters, 85(21), 4633.CrossRefADSGoogle Scholar
  63. 63.
    Newman, M. E. J. (2009). The first-mover advantage in scientific publication. Europhysics Letters, 86(6), 68001.CrossRefADSGoogle Scholar
  64. 64.
    Jeong, H., Néda, Z., & Barabási, A.-L. (2003). Measuring preferential attachment in evolving networks. Europhysics Letters, 61(4), 567.CrossRefADSGoogle Scholar
  65. 65.
    Golosovsky, M., & Solomon, S. (2012). Stochastic dynamical model of a growing citation network based on a self-exciting point process. Physical Review Letters, 109(9), 098701.CrossRefADSGoogle Scholar
  66. 66.
    Golosovsky, M., & Solomon, S. (2013). The transition towards immortality: Non-linear autocatalytic growth of citations to scientific papers. Journal of Statistical Physics, 151(1–2), 340–354.MathSciNetCrossRefADSzbMATHGoogle Scholar
  67. 67.
    Hajra, K. B., & Sen, P. (2004). Phase transitions in an aging network. Physical Review E, 70(5), 056103.CrossRefADSGoogle Scholar
  68. 68.
    Hajra, K. B., & Sen, P. (2005). Aging in citation networks. Physica A: Statistical Mechanics and Its Applications, 346(1), 44–48.CrossRefADSGoogle Scholar
  69. 69.
    Hajra, K. B., & Sen, P. (2006). Modelling aging characteristics in citation networks. Physica A: Statistical Mechanics and Its Applications, 368(2), 575–582.CrossRefADSGoogle Scholar
  70. 70.
    Wang, M., Yu, G., & Yu, D. (2008). Measuring the preferential attachment mechanism in citation networks. Physica A: Statistical Mechanics and Its Applications, 387(18), 4692–4698.CrossRefADSGoogle Scholar
  71. 71.
    Dorogovtsev, S. N., & Mendes, J. F. F. (2000). Evolution of networks with aging of sites. Physical Review E, 62(2), 1842.CrossRefADSGoogle Scholar
  72. 72.
    Dorogovtsev, S. N., & Mendes, J. F. F. (2001). Scaling properties of scale-free evolving networks: Continuous approach. Physical Review E, 63(5), 056125.CrossRefADSGoogle Scholar
  73. 73.
    Zhu, H., Wang, X., & Zhu, J.-Y. (2003). Effect of aging on network structure. Physical Review E, 68(5), 056121.CrossRefADSGoogle Scholar
  74. 74.
    Wang, D., Song, C., & Barabási, A.-L. (2013). Quantifying long-term scientific impact. Science, 342(6154), 127–132.CrossRefADSGoogle Scholar
  75. 75.
    Wang, J., Mei, Y., & Hicks, D. (2014). Comment on “quantifying long-term scientific impact”. Science, 345(6193), 149–149.ADSGoogle Scholar
  76. 76.
    Ibáñez, A., Larrañaga, P., & Bielza, C. (2009). Predicting citation count of bioinformatics papers within four years of publication. Bioinformatics, 25(24), 3303–3309.CrossRefGoogle Scholar
  77. 77.
    Livne, A., Adar, E., Teevan, J., & Dumais, S. (2013). Predicting citation counts using text and graph mining. In: Proceedings of the iConference 2013 Workshop on Computational Scientometrics: Theory and Applications.Google Scholar
  78. 78.
    Shibata, N., Kajikawa, Y., & Matsushima, K. (2007). Topological analysis of citation networks to discover the future core articles. Journal of the American Society for Information Science and Technology, 58(6), 872–882.CrossRefGoogle Scholar
  79. 79.
    Sarigöl, E., Pfitzner, R., Scholtes, I., Garas, A., & Schweitzer, F. (2014). Predicting scientific success based on coauthorship networks. EPJ Data Science, 3(1), 1.CrossRefGoogle Scholar
  80. 80.
    Bertsimas, D., Brynjolfsson, E., Reichman, S., & Silberholz, J. M. (2014). Moneyball for academics: Network analysis for predicting research impact. Available at SSRN: or
  81. 81.
    Acuna, D. E., Allesina, S., & Kording, K. P. (2012). Future impact: Predicting scientific success. Nature, 489(7415), 201–202.CrossRefADSGoogle Scholar
  82. 82.
    Penner, O., Pan, R. K., Petersen, A. M., Kaski, K., & Fortunato, S. (2013). On the predictability of future impact in science. Scientific Reports, 3, 3052.CrossRefADSGoogle Scholar
  83. 83.
    De Nicolao, G. (2014, October). Times higher education world university rankings: Science or quackery?.
  84. 84.
    Radicchi, F., Fortunato, S., & Vespignani, A. (2012). Citation networks. In A. Scharnhorst, K. Börner, & P. van den Besselaar (Eds.) Models of science dynamics, understanding complex systems (pp. 233–257). Berlin/Heidelberg: Springer.CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Center for Complex Networks and Systems Research, School of Informatics and ComputingIndiana UniversityBloomingtonUSA
  2. 2.Istituto dei Sistemi Complessi (ISC-CNR)RomaItaly
  3. 3.Dipartimento di Fisica“Sapienza” Universitá di RomaRomaItaly

Personalised recommendations