Related Statistical Universals

Tanaka-Ishii, Kumiko

doi:10.1007/978-3-030-59377-3_6

Kumiko Tanaka-Ishii¹¹

Part of the book series: Mathematics in Mind ((MATHMIN))

672 Accesses

A correction to this publication are available online at https://doi.org/10.1007/978-3-030-59377-3_23

Abstract

This last chapter of Part II further considers the nature of a vocabulary population in terms of two related properties that have a mathematical relation with Zipf’s law: the density function and the vocabulary growth. Similarly to Zipf’s law, both nearly indicate power-law behavior but are subject to some deviations.

The original version of this chapter was revised. The correction to this chapter is available at https://doi.org/10.1007/978-3-030-59377-3_23

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 16.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Change history

29 November 2022
The original version of the chapter “Language as a Complex System” was previously published without updating missing reference in footnote 4, page 22. This change has now been included and the chapter and the book have been updated with the change.

Notes

1.
More precisely, the density function represents the distribution that defines the information source of the author of Moby Dick, Herman Melville. Such a true function is never known, however, so a guess is deduced from the word counts, as mentioned as P in the main text.
2.
LL = 2.092.
3.
Precisely, this depends on the value of ζ. For the values of ζ acquired from texts, the result of the integral becomes a power function.
4.
The proof is based on a very simple, elegant consideration by Lü et al. (2010).
5.
LL = 2.093.
6.
LL = 0.432.
7.
LL = 3.376, LL = 3.180, and LL = 7.718 for the Wall Street Journal, Thomas, and Chinese characters, respectively.
8.
a = 1.3 for the type-token relations shown in this book.
9.
For fitting Fig. 6.3, the least-squares method was applied (cf. Sect. 21.1 ). As defined in Sect. 21.1 , the goodness of fit was measured by the residual ε. For Moby Dick, the shuffled text, and the monkey text, respectively, the error ε = 105.322, 175.808, 45.538. The values are very large because of the ranges of the axes. The comparison of residuals at least shows how the monkey text fits well to the fit line.
10.
LL = 2.092.
11.
Among these are Baeza-Yates and Navarro (2000); Montemurro and Zanette (2002); van Leijenhorst and van der Weide (2005); Zanette and Montemurro (2005); Serrano et al. (2009); Lü et al. (2010).

References

Baeza-Yates, Ricard and Navarro, Gonzalo (2000). Block addressing indices for approximate text retrieval. Journal of the American Society for Information Science, 51, 69–82.
Article Google Scholar
Bernhardsson, Sebastian, da Rocha, Luis E. C., and Minnhagen, Petter (2009). The meta book and size-dependent properties of written language. New Journal of Physics, 11(12):123015.
Article Google Scholar
Font-Clos, Francesc and Corral, Álvaro (2015). Log-log convexity of type-token growth in Zipf’s systems. Physical Review Letters, 114:238701.
Article Google Scholar
Guiraud, Pierre (1954). Les Caractères Statistique du Vocabulaire. Universitaires de France Press.
Google Scholar
Heaps, Harold S. (1978). Information Retrieval: Computational and Theoretical Aspects. Academic Press.
MATH Google Scholar
Herdan, Gustav (1964). Quantitative Linguistics. Butterworths.
Google Scholar
Lü, Linyuan, Zhang, Zi-Ke, and Zhou, Tao (2010). Zipf’s law leads to Heaps’ law : Analyzing their relation in finite-size systems. PLoS ONE, 5(12):e14139.
Article Google Scholar
Lü, Linyuan, Zhang, Zi-Ke, and Zhou, Tao (2013). Deviation of Zipf’s and Heaps’ laws in human languages with limited dictionary sizes. Scientific Reports, 1082.
Google Scholar
Montemurro, Marcelo A. and Zanette, Damian (2002). New perspectives on Zipf’s law: from single texts to large corpora. Glottometrics, 4, 86–98.
Google Scholar
Serrano, M.Ángeles, Flammini, Alessandro, and Menczer, Filippo (2009). Modeling statistical properties of written text. PLoS One, 4(4):e5372. e:5372.
Google Scholar
van Leijenhorst, D. C. and van der Weide, Theo P. (2005). A formal derivation of Heaps’ law. Information Science, 170, 263–272.
Article MathSciNet MATH Google Scholar
Zanette, Damián H. and Montemurro, Marcelo A. (2005). Dynamics of text generation with realistic Zipf’s distriubtion. Journal of Quantitative Linguistics, 12(1), 29–40.
Article Google Scholar

Download references

Author information

Authors and Affiliations

University of Tokyo Research Center for Advanced Sci and Tech, Research Center for Advanced Sci and Tech, Tokyo, Japan
Kumiko Tanaka-Ishii

Authors

Kumiko Tanaka-Ishii
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kumiko Tanaka-Ishii .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Tanaka-Ishii, K. (2021). Related Statistical Universals. In: Statistical Universals of Language. Mathematics in Mind. Springer, Cham. https://doi.org/10.1007/978-3-030-59377-3_6

Download citation

DOI: https://doi.org/10.1007/978-3-030-59377-3_6
Published: 02 April 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59376-6
Online ISBN: 978-3-030-59377-3
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics