Abstract
This last chapter of Part II further considers the nature of a vocabulary population in terms of two related properties that have a mathematical relation with Zipf’s law: the density function and the vocabulary growth. Similarly to Zipf’s law, both nearly indicate power-law behavior but are subject to some deviations.
The original version of this chapter was revised. The correction to this chapter is available at https://doi.org/10.1007/978-3-030-59377-3_23
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Change history
29 November 2022
The original version of the chapter “Language as a Complex System” was previously published without updating missing reference in footnote 4, page 22. This change has now been included and the chapter and the book have been updated with the change.
Notes
- 1.
More precisely, the density function represents the distribution that defines the information source of the author of Moby Dick, Herman Melville. Such a true function is never known, however, so a guess is deduced from the word counts, as mentioned as P in the main text.
- 2.
LL = 2.092.
- 3.
Precisely, this depends on the value of ζ. For the values of ζ acquired from texts, the result of the integral becomes a power function.
- 4.
The proof is based on a very simple, elegant consideration by Lü et al. (2010).
- 5.
LL = 2.093.
- 6.
LL = 0.432.
- 7.
LL = 3.376, LL = 3.180, and LL = 7.718 for the Wall Street Journal, Thomas, and Chinese characters, respectively.
- 8.
a = 1.3 for the type-token relations shown in this book.
- 9.
For fitting Fig. 6.3, the least-squares method was applied (cf. Sect. 21.1 ). As defined in Sect. 21.1 , the goodness of fit was measured by the residual ε. For Moby Dick, the shuffled text, and the monkey text, respectively, the error ε = 105.322, 175.808, 45.538. The values are very large because of the ranges of the axes. The comparison of residuals at least shows how the monkey text fits well to the fit line.
- 10.
LL = 2.092.
- 11.
References
Baeza-Yates, Ricard and Navarro, Gonzalo (2000). Block addressing indices for approximate text retrieval. Journal of the American Society for Information Science, 51, 69–82.
Bernhardsson, Sebastian, da Rocha, Luis E. C., and Minnhagen, Petter (2009). The meta book and size-dependent properties of written language. New Journal of Physics, 11(12):123015.
Font-Clos, Francesc and Corral, Álvaro (2015). Log-log convexity of type-token growth in Zipf’s systems. Physical Review Letters, 114:238701.
Guiraud, Pierre (1954). Les Caractères Statistique du Vocabulaire. Universitaires de France Press.
Heaps, Harold S. (1978). Information Retrieval: Computational and Theoretical Aspects. Academic Press.
Herdan, Gustav (1964). Quantitative Linguistics. Butterworths.
Lü, Linyuan, Zhang, Zi-Ke, and Zhou, Tao (2010). Zipf’s law leads to Heaps’ law : Analyzing their relation in finite-size systems. PLoS ONE, 5(12):e14139.
Lü, Linyuan, Zhang, Zi-Ke, and Zhou, Tao (2013). Deviation of Zipf’s and Heaps’ laws in human languages with limited dictionary sizes. Scientific Reports, 1082.
Montemurro, Marcelo A. and Zanette, Damian (2002). New perspectives on Zipf’s law: from single texts to large corpora. Glottometrics, 4, 86–98.
Serrano, M.Ángeles, Flammini, Alessandro, and Menczer, Filippo (2009). Modeling statistical properties of written text. PLoS One, 4(4):e5372. e:5372.
van Leijenhorst, D. C. and van der Weide, Theo P. (2005). A formal derivation of Heaps’ law. Information Science, 170, 263–272.
Zanette, Damián H. and Montemurro, Marcelo A. (2005). Dynamics of text generation with realistic Zipf’s distriubtion. Journal of Quantitative Linguistics, 12(1), 29–40.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2021 The Author(s)
About this chapter
Cite this chapter
Tanaka-Ishii, K. (2021). Related Statistical Universals. In: Statistical Universals of Language. Mathematics in Mind. Springer, Cham. https://doi.org/10.1007/978-3-030-59377-3_6
Download citation
DOI: https://doi.org/10.1007/978-3-030-59377-3_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59376-6
Online ISBN: 978-3-030-59377-3
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)