Scientometrics

, Volume 56, Issue 2, pp 247–257

Zipf's law and the diversity of biology newsgroups

  • Mark Kot
  • Emily Silverman
  • Celeste A. Berg
Article

Abstract

Usenet newsgroups provide a popular means of scientific communication. We demonstrate striking order in the diversity of biology newsgroups: Submissions to newsgroups obey a form of Zipf's law, a simple power law for the frequency of posts as a function of the rank, by posting, of contributors. We show that a simple stochastic process, due to Günther et al. (1992, 1996), Levitin and Schapiro (1993), and Schapiro (1994), accounts for this pattern and reproduces many of the properties of newsgroups. This model successfully predicts the relative contribution from each poster in terms of the size, the number of posters and total posts, of the newsgroup.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Baayen, R. H. (2001), Word Frequency Distributions. Kluwer Academic Publishers, Dordrecht, Netherlands.MATHGoogle Scholar
  2. Bar-ilan, J. (1997), The “mad cow” disease, Usenet newsgroups and bibliometric laws. Scientometrics, 39: 29–55.CrossRefGoogle Scholar
  3. David, H. A., Hartley, H. O., Pearson, E. S. (1954), The distribution of the ratio, in a single normal sample, of range to standard deviation. Biometrika, 41: 482–493.MATHMathSciNetCrossRefGoogle Scholar
  4. Frontier, S. (1985), Diversity and structure in aquatic ecosystems. Oceanography and Marine Biology: An Annual Review, 23: 253–312.Google Scholar
  5. Günther, R., Levitin, L., Schapiro, B., Wagner, P. (1996), Zipf's law and the effect of ranking on probability distributions. International Journal of Theoretical Physics, 35: 395–417.MATHCrossRefGoogle Scholar
  6. Günther, R., Schapiro, B., Wagner, P. (1992), Physical complexity and Zipf's law. International Journal of Theoretical Physics, 31: 525–543.MathSciNetCrossRefGoogle Scholar
  7. Hauben, M., Hauben, R. (1997), Netizens: On the History and Impact of Usenet and the Internet. IEEE Computer Society Press, Los Alamitos, California, USA.Google Scholar
  8. Hubbell, S. P. (2001), The Unified Neutral Theory of Biodiversity and Biogeography. Princeton University Press, Princeton, New Jersey, USA.Google Scholar
  9. Huberman, B. A., Pirolli, P. L. T., Pitkow, J. E., Lukose, R. M. (1998), Strong regularities in World Wide Web surfing. Science, 280: 95–97.CrossRefGoogle Scholar
  10. Kanji, G. K. (1999), 100 Statistical Tests. Sage Publications, London, UK.Google Scholar
  11. Levitin, L. B., Schapiro, B. (1993), Zipf's law and information complexity in an evolutionary system. Proceedings IEEE International Symposium on Information Theory, 76.Google Scholar
  12. Li, W. (1992), Random texts exhibit Zipf's-law-like word frequency distribution. IEEE Transactions on Information Theory, 38: 1842–1845.CrossRefGoogle Scholar
  13. Mandelbrot, B. (1953), An information theory of the statistical structure of language. In: W. E. Jackson (Ed.), Communication Theory, Academic Press, New York, New York, USA, pp. 486–502.Google Scholar
  14. Mandelbrot, B. (1961), On the theory of word frequencies and on related Markovian models of discourse. In: R. Jakobson (Ed.), Structure of Language and its Mathematical Aspects, American Mathematical Society, Providence, Rhode Island, USA, pp. 190–219.Google Scholar
  15. Magurran, A. E. (1988), Ecological Diversity and Its Measurement. Princeton University Press, Princeton, New Jersey, USA.Google Scholar
  16. Marsili, M., Zhang, Y.-C. (1998), Interacting individuals leading to Zipf's law. Physical Review Letters, 80: 2741–2744.CrossRefGoogle Scholar
  17. Miller, G. A., Newman, E. B., Friedman, E. A. (1957), Some effects of intermittent silence. American Journal of Psychology, 70: 311–313.CrossRefGoogle Scholar
  18. Okuyama, K., Takayasu, M., Takayasu, H. (1999), Zipf's law in income distribution of companies. Physica A, 269: 125–131.CrossRefGoogle Scholar
  19. Osborne, L. N. (1998), Topic development in USENET newsgroups. Journal of the American Society for Information Science, 49:1010–1016.CrossRefGoogle Scholar
  20. Schapiro, B. (1994), An approach to the physics of complexity. Chaos, Solitons and Fractals, 4: 115–123.MATHMathSciNetCrossRefGoogle Scholar
  21. Simon, H. A. (1955), On a class of skew distribution functions. Biometrika, 42: 425–440.MATHMathSciNetCrossRefGoogle Scholar
  22. Smith, M. A. (1999), Invisible crowds in cyberspace: mapping the social structure of the Usenet. In: M. A. Smith, P. Kollock (Eds), Communities in Cyberspace, Routledge, London, UK, pp. 195–219.Google Scholar
  23. Tokeshi, M. (1993), Species abundance patterns and community structure. Advances in Ecological Research, 24: 111–186.CrossRefGoogle Scholar
  24. Wilson, J. B., Wells, T. C. E., Trueman, I. C., Jones, G., Atkinson, M. D., Crawley, M. J., Dodd, M. E., Silvertown, J. (1996), Are there assembly rules for plant species abundance? An investigation in relation to soil resources and successional trends. Journal of Ecology, 84: 527–538.CrossRefGoogle Scholar
  25. Yule, G. U. (1924), A mathematical theory of evolution, based on the conclusions of Dr. J. C. Willis, F. R. S. Philosophical Transactions B, 213: 21.Google Scholar
  26. Zipf, G. K. (1935), The Psycho-Biology of Language. Houghton Mifflin, Boston, Massachusetts, USA.Google Scholar
  27. Zipf, G. K. (1949), Human Behavior and the Principle of Least Effort. Addison-Wesley Publishing Company, Cambridge, Massachusetts, USA.Google Scholar

Copyright information

© Kluwer Academic Publishers/Akadémiai Kiadó 2003

Authors and Affiliations

  • Mark Kot
    • 1
  • Emily Silverman
    • 2
  • Celeste A. Berg
    • 3
  1. 1.Department of Applied MathematicsUniversity of WashingtonSeattleUSA
  2. 2.School of Natural Resources and EnvironmentUniversity of MichiganAnn ArborUSA
  3. 3.Department of Genome SciencesUniversity of WashingtonSeattleUSA

Personalised recommendations