Abstract
Part II mainly considers the characteristics of a population of linguistic elements, such as words. A word has a frequency in a text, and the vocabulary of the text forms a population, which this part analyzes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Change history
29 November 2022
The original version of the chapter “Language as a Complex System” was previously published without updating missing reference in footnote 4, page 22. This change has now been included and the chapter and the book have been updated with the change.
Notes
- 1.
Others, such as Ferrer-i-Cancho and Elvevåg (2010), previously indicated such differences between some of the simplest random texts and the original text. The population of monkey text can also be shown to differ from that of natural language text by different ways from those used in these previous works, as shown later in this chapter, in Chaps. 6 and 13. It is important to know first, however, that monkey texts analytically produce a power law.
- 2.
Chapter 21.1 defines measures for the goodness of fit, which depend on the fitting method. For the maximum-likelihood method, the goodness of fit is primarily measured by the value of the negative log-likelihood, LL. In the case of Moby Dick, LL = 6.692 for η = 1.037. At this point, we can only say that, for Moby Dick, the plot is pretty straight, and the fit is therefore good. Section 5.1 will compare these values across different datasets. The rest of the book gives measures indicating the goodness of fit in the footnotes.
References
Bell, Timothy C., Cleary, John G., and Witten, Ian H. (1990). Text Compression. Prentice Hall.
Clauset, Aaron, Shalizi, Cosma R., and Newman, Mark E. J. (2009). Power-law distributions in empirical data. SIAM review, 51(4), 661–703.
Conrad, Brian and Mitzenmacher, Michael (2004). Power laws for monkeys typing randomly: The case of unequal probabilities. IEEE Transactions on Information Theory, 50(7), 1403–1414.
Ferrer-i-Cancho, Ramon and Elvevåg, Brita (2010). Random texts do not exhibit the real Zipf’s law-like rank distribution. PLoS ONE, 5(3). e9411.
Gerlach, Martin and Altmann, Eduardo G. (2013). Stochastic model for the vocabulary growth in natural languages. Physical Review X, 3(2):21006.
Kretzschmar Jr., William A. (2015) Language and Complex Systems. Cambridge University Press.
Li, Wentian (1992). Random texts exhibit Zipf’s-law-like word frequency distribution. IEEE Transactions on Information Theory, 38, 1842–1845.
Miller, George A. (1957). Some effects of intermittent silence. The American Journal of Psychology, 70(2), 311–314.
Petruszewycz, Micheline (1973). L’histoire de la loi d’Estoup-Zipf: documents. Mathématiques et Sciences Humaines, 44, 41–56.
Piantadosi, Steven T. (2014). Zipf’s word frequency law in natural language: A critical review and future directions. Psychonomic Bulletin & Review, 21(5), 1112–1130.
Zipf, George K. (1949). Human Behavior and the Principle of Least Effort : An Introduction to Human Ecology. Addison-Wesley Press.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2021 The Author(s)
About this chapter
Cite this chapter
Tanaka-Ishii, K. (2021). Relation Between Rank and Frequency. In: Statistical Universals of Language. Mathematics in Mind. Springer, Cham. https://doi.org/10.1007/978-3-030-59377-3_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-59377-3_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59376-6
Online ISBN: 978-3-030-59377-3
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)