Skip to main content

Related Statistical Universals

  • Chapter
  • First Online:
Statistical Universals of Language

Part of the book series: Mathematics in Mind ((MATHMIN))

  • 672 Accesses

Abstract

This last chapter of Part II further considers the nature of a vocabulary population in terms of two related properties that have a mathematical relation with Zipf’s law: the density function and the vocabulary growth. Similarly to Zipf’s law, both nearly indicate power-law behavior but are subject to some deviations.

The original version of this chapter was revised. The correction to this chapter is available at https://doi.org/10.1007/978-3-030-59377-3_23

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 16.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Change history

  • 29 November 2022

    The original version of the chapter “Language as a Complex System” was previously published without updating missing reference in footnote 4, page 22. This change has now been included and the chapter and the book have been updated with the change.

Notes

  1. 1.

    More precisely, the density function represents the distribution that defines the information source of the author of Moby Dick, Herman Melville. Such a true function is never known, however, so a guess is deduced from the word counts, as mentioned as P in the main text.

  2. 2.

    LL = 2.092.

  3. 3.

    Precisely, this depends on the value of ζ. For the values of ζ acquired from texts, the result of the integral becomes a power function.

  4. 4.

    The proof is based on a very simple, elegant consideration by Lü et al. (2010).

  5. 5.

    LL = 2.093.

  6. 6.

    LL = 0.432.

  7. 7.

    LL = 3.376, LL = 3.180, and LL = 7.718 for the Wall Street Journal, Thomas, and Chinese characters, respectively.

  8. 8.

    a = 1.3 for the type-token relations shown in this book.

  9. 9.

    For fitting Fig. 6.3, the least-squares method was applied (cf. Sect. 21.1 ). As defined in Sect. 21.1 , the goodness of fit was measured by the residual ε. For Moby Dick, the shuffled text, and the monkey text, respectively, the error ε = 105.322,  175.808,  45.538. The values are very large because of the ranges of the axes. The comparison of residuals at least shows how the monkey text fits well to the fit line.

  10. 10.

    LL = 2.092.

  11. 11.

    Among these are Baeza-Yates and Navarro (2000); Montemurro and Zanette (2002); van Leijenhorst and van der Weide (2005); Zanette and Montemurro (2005); Serrano et al. (2009); Lü et al. (2010).

References

  • Baeza-Yates, Ricard and Navarro, Gonzalo (2000). Block addressing indices for approximate text retrieval. Journal of the American Society for Information Science, 51, 69–82.

    Article  Google Scholar 

  • Bernhardsson, Sebastian, da Rocha, Luis E. C., and Minnhagen, Petter (2009). The meta book and size-dependent properties of written language. New Journal of Physics, 11(12):123015.

    Article  Google Scholar 

  • Font-Clos, Francesc and Corral, Álvaro (2015). Log-log convexity of type-token growth in Zipf’s systems. Physical Review Letters, 114:238701.

    Article  Google Scholar 

  • Guiraud, Pierre (1954). Les Caractères Statistique du Vocabulaire. Universitaires de France Press.

    Google Scholar 

  • Heaps, Harold S. (1978). Information Retrieval: Computational and Theoretical Aspects. Academic Press.

    MATH  Google Scholar 

  • Herdan, Gustav (1964). Quantitative Linguistics. Butterworths.

    Google Scholar 

  • Lü, Linyuan, Zhang, Zi-Ke, and Zhou, Tao (2010). Zipf’s law leads to Heaps’ law : Analyzing their relation in finite-size systems. PLoS ONE, 5(12):e14139.

    Article  Google Scholar 

  • Lü, Linyuan, Zhang, Zi-Ke, and Zhou, Tao (2013). Deviation of Zipf’s and Heaps’ laws in human languages with limited dictionary sizes. Scientific Reports, 1082.

    Google Scholar 

  • Montemurro, Marcelo A. and Zanette, Damian (2002). New perspectives on Zipf’s law: from single texts to large corpora. Glottometrics, 4, 86–98.

    Google Scholar 

  • Serrano, M.Ángeles, Flammini, Alessandro, and Menczer, Filippo (2009). Modeling statistical properties of written text. PLoS One, 4(4):e5372. e:5372.

    Google Scholar 

  • van Leijenhorst, D. C. and van der Weide, Theo P. (2005). A formal derivation of Heaps’ law. Information Science, 170, 263–272.

    Article  MathSciNet  MATH  Google Scholar 

  • Zanette, Damián H. and Montemurro, Marcelo A. (2005). Dynamics of text generation with realistic Zipf’s distriubtion. Journal of Quantitative Linguistics, 12(1), 29–40.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kumiko Tanaka-Ishii .

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s)

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Tanaka-Ishii, K. (2021). Related Statistical Universals. In: Statistical Universals of Language. Mathematics in Mind. Springer, Cham. https://doi.org/10.1007/978-3-030-59377-3_6

Download citation

Publish with us

Policies and ethics