Skip to main content

Relation Between Rank and Frequency

  • Chapter
  • First Online:
Statistical Universals of Language

Part of the book series: Mathematics in Mind ((MATHMIN))

  • 686 Accesses

Abstract

Part II mainly considers the characteristics of a population of linguistic elements, such as words. A word has a frequency in a text, and the vocabulary of the text forms a population, which this part analyzes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 16.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Change history

  • 29 November 2022

    The original version of the chapter “Language as a Complex System” was previously published without updating missing reference in footnote 4, page 22. This change has now been included and the chapter and the book have been updated with the change.

Notes

  1. 1.

    Others, such as Ferrer-i-Cancho and Elvevåg (2010), previously indicated such differences between some of the simplest random texts and the original text. The population of monkey text can also be shown to differ from that of natural language text by different ways from those used in these previous works, as shown later in this chapter, in Chaps. 6 and 13. It is important to know first, however, that monkey texts analytically produce a power law.

  2. 2.

    Chapter 21.1 defines measures for the goodness of fit, which depend on the fitting method. For the maximum-likelihood method, the goodness of fit is primarily measured by the value of the negative log-likelihood, LL. In the case of Moby Dick, LL = 6.692 for η = 1.037. At this point, we can only say that, for Moby Dick, the plot is pretty straight, and the fit is therefore good. Section 5.1 will compare these values across different datasets. The rest of the book gives measures indicating the goodness of fit in the footnotes.

References

  • Bell, Timothy C., Cleary, John G., and Witten, Ian H. (1990). Text Compression. Prentice Hall.

    Google Scholar 

  • Clauset, Aaron, Shalizi, Cosma R., and Newman, Mark E. J. (2009). Power-law distributions in empirical data. SIAM review, 51(4), 661–703.

    Article  MathSciNet  MATH  Google Scholar 

  • Conrad, Brian and Mitzenmacher, Michael (2004). Power laws for monkeys typing randomly: The case of unequal probabilities. IEEE Transactions on Information Theory, 50(7), 1403–1414.

    Article  MathSciNet  MATH  Google Scholar 

  • Ferrer-i-Cancho, Ramon and Elvevåg, Brita (2010). Random texts do not exhibit the real Zipf’s law-like rank distribution. PLoS ONE, 5(3). e9411.

    Google Scholar 

  • Gerlach, Martin and Altmann, Eduardo G. (2013). Stochastic model for the vocabulary growth in natural languages. Physical Review X, 3(2):21006.

    Article  Google Scholar 

  • Kretzschmar Jr., William A. (2015) Language and Complex Systems. Cambridge University Press.

    Google Scholar 

  • Li, Wentian (1992). Random texts exhibit Zipf’s-law-like word frequency distribution. IEEE Transactions on Information Theory, 38, 1842–1845.

    Article  Google Scholar 

  • Miller, George A. (1957). Some effects of intermittent silence. The American Journal of Psychology, 70(2), 311–314.

    Article  Google Scholar 

  • Petruszewycz, Micheline (1973). L’histoire de la loi d’Estoup-Zipf: documents. Mathématiques et Sciences Humaines, 44, 41–56.

    MathSciNet  Google Scholar 

  • Piantadosi, Steven T. (2014). Zipf’s word frequency law in natural language: A critical review and future directions. Psychonomic Bulletin & Review, 21(5), 1112–1130.

    Article  Google Scholar 

  • Zipf, George K. (1949). Human Behavior and the Principle of Least Effort : An Introduction to Human Ecology. Addison-Wesley Press.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kumiko Tanaka-Ishii .

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s)

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Tanaka-Ishii, K. (2021). Relation Between Rank and Frequency. In: Statistical Universals of Language. Mathematics in Mind. Springer, Cham. https://doi.org/10.1007/978-3-030-59377-3_4

Download citation

Publish with us

Policies and ethics