Relation Between Rank and Frequency

Tanaka-Ishii, Kumiko

doi:10.1007/978-3-030-59377-3_4

Kumiko Tanaka-Ishii¹¹

Part of the book series: Mathematics in Mind ((MATHMIN))

686 Accesses

The original version of this chapter was revised. The correction to this chapter is available at https://doi.org/10.1007/978-3-030-59377-3_23

Abstract

Part II mainly considers the characteristics of a population of linguistic elements, such as words. A word has a frequency in a text, and the vocabulary of the text forms a population, which this part analyzes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 16.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Change history

29 November 2022
The original version of the chapter “Language as a Complex System” was previously published without updating missing reference in footnote 4, page 22. This change has now been included and the chapter and the book have been updated with the change.

Notes

1.
Others, such as Ferrer-i-Cancho and Elvevåg (2010), previously indicated such differences between some of the simplest random texts and the original text. The population of monkey text can also be shown to differ from that of natural language text by different ways from those used in these previous works, as shown later in this chapter, in Chaps. 6 and 13. It is important to know first, however, that monkey texts analytically produce a power law.
2.
Chapter 21.1 defines measures for the goodness of fit, which depend on the fitting method. For the maximum-likelihood method, the goodness of fit is primarily measured by the value of the negative log-likelihood, LL. In the case of Moby Dick, LL = 6.692 for η = 1.037. At this point, we can only say that, for Moby Dick, the plot is pretty straight, and the fit is therefore good. Section 5.1 will compare these values across different datasets. The rest of the book gives measures indicating the goodness of fit in the footnotes.

References

Bell, Timothy C., Cleary, John G., and Witten, Ian H. (1990). Text Compression. Prentice Hall.
Google Scholar
Clauset, Aaron, Shalizi, Cosma R., and Newman, Mark E. J. (2009). Power-law distributions in empirical data. SIAM review, 51(4), 661–703.
Article MathSciNet MATH Google Scholar
Conrad, Brian and Mitzenmacher, Michael (2004). Power laws for monkeys typing randomly: The case of unequal probabilities. IEEE Transactions on Information Theory, 50(7), 1403–1414.
Article MathSciNet MATH Google Scholar
Ferrer-i-Cancho, Ramon and Elvevåg, Brita (2010). Random texts do not exhibit the real Zipf’s law-like rank distribution. PLoS ONE, 5(3). e9411.
Google Scholar
Gerlach, Martin and Altmann, Eduardo G. (2013). Stochastic model for the vocabulary growth in natural languages. Physical Review X, 3(2):21006.
Article Google Scholar
Kretzschmar Jr., William A. (2015) Language and Complex Systems. Cambridge University Press.
Google Scholar
Li, Wentian (1992). Random texts exhibit Zipf’s-law-like word frequency distribution. IEEE Transactions on Information Theory, 38, 1842–1845.
Article Google Scholar
Miller, George A. (1957). Some effects of intermittent silence. The American Journal of Psychology, 70(2), 311–314.
Article Google Scholar
Petruszewycz, Micheline (1973). L’histoire de la loi d’Estoup-Zipf: documents. Mathématiques et Sciences Humaines, 44, 41–56.
MathSciNet Google Scholar
Piantadosi, Steven T. (2014). Zipf’s word frequency law in natural language: A critical review and future directions. Psychonomic Bulletin & Review, 21(5), 1112–1130.
Article Google Scholar
Zipf, George K. (1949). Human Behavior and the Principle of Least Effort : An Introduction to Human Ecology. Addison-Wesley Press.
Google Scholar

Download references

Author information

Authors and Affiliations

University of Tokyo Research Center for Advanced Sci and Tech, Research Center for Advanced Sci and Tech, Tokyo, Japan
Kumiko Tanaka-Ishii

Authors

Kumiko Tanaka-Ishii
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kumiko Tanaka-Ishii .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Tanaka-Ishii, K. (2021). Relation Between Rank and Frequency. In: Statistical Universals of Language. Mathematics in Mind. Springer, Cham. https://doi.org/10.1007/978-3-030-59377-3_4

Download citation

DOI: https://doi.org/10.1007/978-3-030-59377-3_4
Published: 02 April 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59376-6
Online ISBN: 978-3-030-59377-3
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics