Theory of Citing

  • M. V. SimkinEmail author
  • V. P. Roychowdhury
Part of the Springer Optimization and Its Applications book series (SOIA, volume 57)


We present empirical data on misprints in citations to 12 high-profile papers. The great majority of misprints are identical to misprints in articles that earlier cited the same paper. The distribution of the numbers of misprint repetitions follows a power law. We develop a stochastic model of the citation process, which explains these findings and shows that about 70–90% of scientific citations are copied from the lists of references used in other papers. Citation copying can explain not only why some misprints become popular, but also why some papers become highly cited. We show that a model where a scientist picks few random papers, cites them, and copies a fraction of their references accounts quantitatively for empirically observed distribution of citations.


Page Number Fitness Distribution Citation Distribution Literature Growth Citation Process 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Simkin MV, Roychowdhury VP (2003) Read before you cite! Complex Systems 14: 269–274. Alternatively available at
  2. 2.
    Simkin MV, Roychowdhury VP (2006) An introduction to the theory of citing. Significance 3: 179–181. Alternatively available at
  3. 3.
    Simkin MV, Roychowdhury VP (2005) Stochastic modeling of citation slips. Scientometrics 62: 367–384. Alternatively available at Google Scholar
  4. 4.
    Simon HA (1957) Models of Man. New York: Wiley.zbMATHGoogle Scholar
  5. 5.
    Krapivsky PL, Redner S (2001) Organization of growing random networks. Phys. Rev. E 63, 066123; Alternatively available at
  6. 6.
    Krapivsky PL, Redner S (2002) Finiteness and Fluctuations in Growing Networks. J. Phys. A 35: 9517; Alternatively available at
  7. 7.
    Press WH, Flannery BP, Teukolsky SA, Vetterling WT (1992) Numerical Recipes in FORTRAN: The Art of Scientific Computing. Cambridge: University Press (see Chapt. 14.3, p.617–620).Google Scholar
  8. 8.
    Simboli B (2003) Accessed on 7 Sep 2011
  9. 9.
    Smith A (1983) Erroneous error correction. New Library World 84: 198.Google Scholar
  10. 10.
    Garfield E (1990) Journal editors awaken to the impact of citation errors. How we control them at ISI. Essays of Information Scientist 13:367.Google Scholar
  11. 11.
    SPIRES ( data, compiled by H. Galic, and made available by S. Redner: ∼ http://redner/projects/citation. Accessed on 7 Sep 2011
  12. 12.
    Steel CM (1996) Read before you cite. The Lancet 348: 144.Google Scholar
  13. 13.
    Broadus RN (1983) An investigation of the validity of bibliographic citations. Journal of the American Society for Information Science 34: 132.Google Scholar
  14. 14.
    Moed HF, Vriens M (1989) Possible inaccuracies occurring in citation analysis. Journal of Information Science 15:95.CrossRefGoogle Scholar
  15. 15.
    Hoerman HL, Nowicke CE (1995) Secondary and tertiary citing: A study of referencing behaviour in the literature of citation analyses deriving from the Ortega Hypothesis of Cole and Cole. Library Quarterly 65: 415.CrossRefGoogle Scholar
  16. 16.
    Kåhre J (2002) The Mathematical Theory of Information. Boston: Kluwer.zbMATHCrossRefGoogle Scholar
  17. 17.
    Deming WE (1986) Out of the crisis. Cambridge: MIT Press.Google Scholar
  18. 18.
    Garfield E (1979) Citation Indexing. New York: John Wiley.Google Scholar
  19. 19.
    Merton RK (1968) The Matthew Effect in Science. Science 159: 56.Google Scholar
  20. 20.
    Price D de S (1976) A general theory of bibliometric and other cumulative advantage process. Journal of American Society for Information Science 27: 292.Google Scholar
  21. 21.
    Barabasi A-L, Albert R (1999) Emergence of scaling in random networks. Science 286: 509.MathSciNetCrossRefGoogle Scholar
  22. 22.
    Simkin MV, Roychowdhury VP (2005) Copied citations create renowned papers? Annals of Improbable Research 11:24–27. Alternatively available at Google Scholar
  23. 23.
    Dorogovtsev SN, Mendes JFF (2004) Accelerated growth of networks. (see Chap. 0.6.3)
  24. 24.
    Price D de S (1965) Networks of Scientific Papers. Science 149: 510.Google Scholar
  25. 25.
    Silagadze ZK (1997) Citations and Zipf-Mandelbrot law. Complex Systems 11: 487zbMATHGoogle Scholar
  26. 26.
    Redner S (1998) How popular is your paper? An empirical study of citation distribution. Eur. Phys. J. B 4: 131.CrossRefGoogle Scholar
  27. 27.
    Vazquez A (2001) Disordered networks generated by recursive searches. Europhys. Lett. 54: 430.CrossRefGoogle Scholar
  28. 28.
    Ziman JM (1969) Information, communication, knowledge. Nature, 324: 318.CrossRefGoogle Scholar
  29. 29.
    Günter R, Levitin L, Schapiro B, Wagner P (1996) Zipf’s law and the effect of ranking on probability distributions. International Journal of Theoretical Physics 35: 395CrossRefGoogle Scholar
  30. 30.
    Nakamoto H (1988) Synchronous and diachronous citation distributions. In. Egghe L and Rousseau R (eds) Informetrics 87/88. Amsterdam: Elsevier.Google Scholar
  31. 31.
    Glänzel W., Schoepflin U. (1994). A stochastic model for the ageing of scientific literature. Scientometrics 30: 49–64.CrossRefGoogle Scholar
  32. 32.
    Pollmann T. (2000). Forgetting and the aging of scientific publication. Scientometrics 47: 43.CrossRefGoogle Scholar
  33. 33.
    Simkin M. V., Roychowdhury V. P. (2007) A mathematical theory of citing. Journal of the American Society for Information Science and Technology 58:1661–1673.CrossRefGoogle Scholar
  34. 34.
    Harris T.E. (1963). The theory of branching processes. Berlin: Springer.zbMATHGoogle Scholar
  35. 35.
    Bentley R. A., Hahn, M.W., Shennan S.J. (2004). Random drift and culture change. Proceedings of the Royal Society B: Biological Sciences 271: 1443 – 1450.CrossRefGoogle Scholar
  36. 36.
    Redner S. (2004). Citation Statistics From More Than a Century of Physical Review.
  37. 37.
    Wright S (1931) Evolution in Mendelian populations. Genetics 16: 97–159.Google Scholar
  38. 38.
    Simkin M. V., Roychowdhury V. P. (2010) An explanation of the distribution of inter-seizure intervals. EPL 91: 58005CrossRefGoogle Scholar
  39. 39.
    Bak P, Tang C, Wiesenfeld K (1988) Self-organized criticality. Phys. Rev. A 38: 364–374.MathSciNetCrossRefGoogle Scholar
  40. 40.
    Simkin M. V., Roychowdhury V. P. (2008) A theory of web traffic. EPL 62: 28006. Accessed on 7 Sep 2011Google Scholar
  41. 41.
    Some Statistics about the MR Database
  42. 42.
    Burrell Q L (2003) Predicting future citation behavior. Journal of the American Society for Information Science and Technology 54: 372–378.CrossRefGoogle Scholar
  43. 43.
    Garfield E (1980) Premature discovery or delayed recognition -Why? Current Contents 21: 5–10.Google Scholar
  44. 44.
    Raan AFJ van (2004) Sleeping Beauties in science. Scientometrics 59: 467–472CrossRefGoogle Scholar
  45. 45.
    Alstrøm P (1988). Mean-field exponents for self-organized critical phenomena. Phys. Rev. A 38: 4905–4906.CrossRefGoogle Scholar
  46. 46.
    Bak P (1999). How Nature Works the Science of Self-Organized Criticality. New York: Copernicus.Google Scholar
  47. 47.
    Bak P, Sneppen, K (1993). Punctuated equilibrium and criticality in a simple model of evolution. Physical Review Letters 71: 4083–4086CrossRefGoogle Scholar
  48. 48.
    Sokal A, Bricmont J (1998) Fashionable Nonsense. New York: Picador.Google Scholar
  49. 49.
    Hahn MW, Bentley RA (2003) Drift as a mechanism for cultural change: an example from baby names. Proc. R. Soc. Lond. B (Suppl.), Biology Letters, DOI 10.1098/rsbl.2003.0045.Google Scholar
  50. 50.
    Social Security Administration: Popular Baby Names Accessed on 7 Sep 2011
  51. 51.
    Simkin MV (2007) My Statistician Could Have Painted That! A Statistical Inquiry into Modern Art. Significance 14:93–96. Also available at
  52. 52.
    Naftulin DH, Ware JE, Donnelly FA (1973) The Doctor Fox Lecture: A Paradigm of Educational Seduction. Journal of Medical Education 48: 630–635.Google Scholar
  53. 53.
    Encyclopaedia of Mathematics (Ed. M. Hazewinkel). See: Bürmann–Lagrange series: Accessed on 7 Sep 2011
  54. 54.
    Otter R (1949) The multiplicative process. The Annals of Mathematical Statistics 20: 206MathSciNetzbMATHCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  1. 1.Department of Electrical EngineeringUniversity of CaliforniaLos AngelesUSA

Personalised recommendations