Size and Frequency

Tanaka-Ishii, Kumiko

doi:10.1007/978-3-030-59377-3_13

Kumiko Tanaka-Ishii¹¹

Part of the book series: Mathematics in Mind ((MATHMIN))

Abstract

Part IV thus far has examined how statistical universals might contribute to the formation of linguistic units such as words and their values. This chapter will continue to examine these units, especially in terms of the length distribution of words and compounds.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 16.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The procedure requires a dictionary to convert a word to a phoneme sequence. In Chap. 11, a text was transformed to phoneme sequences by using such a dictionary, but words that are not in the dictionary cannot easily be transformed into phoneme sequences.
2.
Note that the range of lengths on the horizontal axis is too small for a logarithmic axis to reveal any useful trend, too.
3.
The corresponding graph for a shuffled text is obviously identical to that for the original natural language text.
4.
This graph, too, is presented on semilog axes, because of Miller and Mandelbrot’s theoretical analysis and the same reason mentioned for Fig. 13.1.
5.
The corpus includes some long hyphenated chunks that are sometimes doubtful to be called “compounds”. Nevertheless, they are included in this analysis because they show some of the reality of hyphen usage.

References

Bentz, Christian and Ferrer-i-Cancho, Ramon (2016). Zipf’s law of abbreviation as a language universal. In Proceedings of the Leiden Workshop on Capturing Phylogenetic Algorithms for Linguistics.
Google Scholar
Kanwal, Jasmeen, Smith, Kenny, Culbertson, Jennifer, and Kirby, Simon (2017). Zipf’s law of abbreviation and the principle of least effort: Language users optimise a miniature lexicon for efficient communication. Cognition, 165, 45–52.
Article Google Scholar
Mandelbrot, Benoit B. (1953). An informational theory of the statistical structure of language. In Proceedings of Symposium of Applications of Communication theory, pages 486–502.
Google Scholar
Miller, George A. (1957). Some effects of intermittent silence. The American Journal of Psychology, 70(2), 311–314.
Article Google Scholar
Piantadosi, Stegen T., Tily, Harry, and Gibson, Edward (2011). Word lengths are optimized for efficient communication. Proceedings of the National Academy of Sciences, 108(9), 3526–3529.
Article Google Scholar
Zipf, George K. (1949). Human Behavior and the Principle of Least Effort : An Introduction to Human Ecology. Addison-Wesley Press.
Google Scholar

Download references

Author information

Authors and Affiliations

Research Center for Advanced Science and Technology (RCAST), The University of Tokyo, Tokyo, Japan
Kumiko Tanaka-Ishii

Authors

Kumiko Tanaka-Ishii
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Tanaka-Ishii, K. (2021). Size and Frequency. In: Statistical Universals of Language. Mathematics in Mind. Springer, Cham. https://doi.org/10.1007/978-3-030-59377-3_13

Download citation

DOI: https://doi.org/10.1007/978-3-030-59377-3_13
Published: 02 April 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59376-6
Online ISBN: 978-3-030-59377-3
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics