Skip to main content

Size and Frequency

  • Chapter
  • First Online:
Statistical Universals of Language

Part of the book series: Mathematics in Mind ((MATHMIN))

  • 667 Accesses


Part IV thus far has examined how statistical universals might contribute to the formation of linguistic units such as words and their values. This chapter will continue to examine these units, especially in terms of the length distribution of words and compounds.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 16.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions


  1. 1.

    The procedure requires a dictionary to convert a word to a phoneme sequence. In Chap. 11, a text was transformed to phoneme sequences by using such a dictionary, but words that are not in the dictionary cannot easily be transformed into phoneme sequences.

  2. 2.

    Note that the range of lengths on the horizontal axis is too small for a logarithmic axis to reveal any useful trend, too.

  3. 3.

    The corresponding graph for a shuffled text is obviously identical to that for the original natural language text.

  4. 4.

    This graph, too, is presented on semilog axes, because of Miller and Mandelbrot’s theoretical analysis and the same reason mentioned for Fig. 13.1.

  5. 5.

    The corpus includes some long hyphenated chunks that are sometimes doubtful to be called “compounds”. Nevertheless, they are included in this analysis because they show some of the reality of hyphen usage.


  • Bentz, Christian and Ferrer-i-Cancho, Ramon (2016). Zipf’s law of abbreviation as a language universal. In Proceedings of the Leiden Workshop on Capturing Phylogenetic Algorithms for Linguistics.

    Google Scholar 

  • Kanwal, Jasmeen, Smith, Kenny, Culbertson, Jennifer, and Kirby, Simon (2017). Zipf’s law of abbreviation and the principle of least effort: Language users optimise a miniature lexicon for efficient communication. Cognition, 165, 45–52.

    Article  Google Scholar 

  • Mandelbrot, Benoit B. (1953). An informational theory of the statistical structure of language. In Proceedings of Symposium of Applications of Communication theory, pages 486–502.

    Google Scholar 

  • Miller, George A. (1957). Some effects of intermittent silence. The American Journal of Psychology, 70(2), 311–314.

    Article  Google Scholar 

  • Piantadosi, Stegen T., Tily, Harry, and Gibson, Edward (2011). Word lengths are optimized for efficient communication. Proceedings of the National Academy of Sciences, 108(9), 3526–3529.

    Article  Google Scholar 

  • Zipf, George K. (1949). Human Behavior and the Principle of Least Effort : An Introduction to Human Ecology. Addison-Wesley Press.

    Google Scholar 

Download references

Author information

Authors and Affiliations


Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s)

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Tanaka-Ishii, K. (2021). Size and Frequency. In: Statistical Universals of Language. Mathematics in Mind. Springer, Cham.

Download citation

Publish with us

Policies and ethics