Psychonomic Bulletin & Review

, Volume 21, Issue 5, pp 1112–1130 | Cite as

Zipf’s word frequency law in natural language: A critical review and future directions

Theoretical Review

Abstract

The frequency distribution of words has been a key object of study in statistical linguistics for the past 70 years. This distribution approximately follows a simple mathematical form known as Zipf’s law. This article first shows that human language has a highly complex, reliable structure in the frequency distribution over and above this classic law, although prior data visualization methods have obscured this fact. A number of empirical phenomena related to word frequencies are then reviewed. These facts are chosen to be informative about the mechanisms giving rise to Zipf’s law and are then used to evaluate many of the theoretical explanations of Zipf’s law in language. No prior account straightforwardly explains all the basic facts or is supported with independent evaluation of its underlying assumptions. To make progress at understanding why language obeys Zipf’s law, studies must seek evidence beyond the law itself, testing assumptions and evaluating novel predictions with new, independent data.

Keywords

Language Zipf’s law Statistics 

References

  1. Adamic, L. A., & Huberman, B. A. (2002). Zipf’s law and the Internet. Glottometrics, 3(1), 143–150.Google Scholar
  2. Altmann, E. G., Pierrehumbert, J. B., & Motter, A. E. (2009). Beyond word frequency: Bursts, lulls, and scaling in the temporal distributions of words. PLoS One, 4(11), e7678.PubMedCrossRefPubMedCentralGoogle Scholar
  3. Altmann, E. G., Pierrehumbert, J. B., & Motter, A. E. (2011). Niche as a determinant of word fate in online groups. PloS ONE, 6(5), e19009.PubMedCrossRefPubMedCentralGoogle Scholar
  4. Anderson, J., & Schooler, L. (1991). Reflections of the environment in memory. Psychological Science, 2(6), 396.CrossRefGoogle Scholar
  5. Arapov, M., & Shrejder, Y. (1978). Zakon cipfa i princip dissimmetrii sistem [Zipf’s law and system dissymmetry principle]. Semiotics and Informatics, 10, 74–95.Google Scholar
  6. Baayen, R. (2001). Word frequency distributions (Vol. 1). Kluwer Academic Publishers.Google Scholar
  7. Baek, S. K., Bernhardsson, S., & Minnhagen, P. (2011). Zipf’s law unzipped. New Journal of Physics, 13(4), 043004.CrossRefGoogle Scholar
  8. Belevitch, V. (1959). On the statistical laws of linguistic distributions. Annales de la Societe Scientifique de Bruxelles, 73(3), 301–326.Google Scholar
  9. Blei, D. M., & Lafferty, J. D. (2007). A correlated topic model of science. The Annals of Applied Statistics, 17–35.Google Scholar
  10. Blei, D. M., & Lafferty, J. D. (2009). Topic models. Text mining: classification, clustering, and applications, 10, 71.CrossRefGoogle Scholar
  11. Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022.Google Scholar
  12. Buhrmester, M., Kwang, T., & Gosling, S. D. (2011). Amazon’s mechanical turk a new source of inexpensive, yet high-quality, data? Perspectives on Psychological Science, 6(1), 3–5.CrossRefGoogle Scholar
  13. Calude, A. S., & Pagel, M. (2011). How do we use language? shared patterns in the frequency of word use across 17 world languages. Philosophical Transactions of the Royal Society B: Biological Sciences, 366(1567), 1101–1107.CrossRefGoogle Scholar
  14. Carroll, J. B. (1967). On sampling from a lognormal model of word frequency distribution. Computational analysis of present-day American English, 406–424Google Scholar
  15. Carroll, J. B. (1969). A rationale for an asymptotic lognormal form of word-frequency distributions.Google Scholar
  16. Chater, N., & Brown, G. D. (1999). Scale-invariance as a unifying psychological principle. Cognition, 69(3), B17–B24.PubMedCrossRefGoogle Scholar
  17. Chen, Y. S. (1991). Zipf’s law in natural languages, programming languages, and command languages: the Simon-Yule approach. International journal of systems science, 22(11), 2299–2312.CrossRefGoogle Scholar
  18. Clark, E. V. (1987). The principle of contrast: A constraint on language acquisition. Mechanisms of language acquisition. Hillsdale: Erlbaum.Google Scholar
  19. Cleveland, W. S., Grosse, E., & Shyu, W. M. (1992). Local regression models. Statistical models in S, 309–376Google Scholar
  20. Collins, A. M., & Quillian, M. R. (1969). Retrieval time from semantic memory. Journal of verbal learning and verbal behavior, 8(2), 240–247.CrossRefGoogle Scholar
  21. Concas, G., Marchesi, M., Pinna, S., & Serra, N. (2007). Power-laws in a large object-oriented software system. Software Engineering, IEEE Transactions on, 33(10), 687–708.CrossRefGoogle Scholar
  22. Conrad, B., & Mitzenmacher, M. (2004). Power laws for monkeys typing randomly: the case of unequal probabilities. Information Theory, IEEE Transactions on, 50(7), 1403–1414.CrossRefGoogle Scholar
  23. Corominas-Murtra, B., & Solé, R. V. (2010). Universality of zipf’s law. Physical Review E, 82(1), 011102.CrossRefGoogle Scholar
  24. Dahan, D., Magnuson, J. S., & Tanenhaus, M. K. (2001). Time course of frequency effects in spoken-word recognition: Evidence from eye movements. Cognitive psychology, 42(4), 317–367.PubMedCrossRefGoogle Scholar
  25. Dehaene, S., & Mehler, J. (1992). Cross-linguistic regularities in the frequency of number words. Cognition, 43(1), 1–29.PubMedCrossRefGoogle Scholar
  26. Demberg, V., & Keller, F. (2008). Data from eye-tracking corpora as evidence for theories of syntactic processing complexity. Cognition, 109(2), 193–210.PubMedCrossRefGoogle Scholar
  27. Dumais, S. T. (2005). Latent semantic analysis. Annual Review of Information Science and Technology, 38(1), 188–230.CrossRefGoogle Scholar
  28. Egghe, L. (1999). On the law of Zipf-Mandelbrot for multi-world phrases.Google Scholar
  29. Egghe, L. (2000). The distribution of N-grams. Scientometrics, 47(2), 237–252.CrossRefGoogle Scholar
  30. Ellis, N. (2002). Frequency effects in language processing. Studies in second language acquisition, 24(2), 143–188.Google Scholar
  31. Farmer, J. D., & Geanakoplos, J. (2006). Power laws in economics and elsewhere (Tech. Rep.). Santa Fe Institute Tech Report.Google Scholar
  32. Fellbaum, C. (1998). WordNet: An electronic lexical database. Cambridge: MIT Press.Google Scholar
  33. Ferrer i Cancho, R. (2005a). Decoding least effort and scaling in signal frequency distributions. Physica A: Statistical Mechanics and its Applications, 345(1), 275–284.CrossRefGoogle Scholar
  34. Ferrer i Cancho, R. F. (2005b). Hidden communication aspects inside the exponent of zipf’s law. 11, 98–119.Google Scholar
  35. Ferrer i Cancho, R. (2005c). Zipf’s law from a communicative phase transition. The European Physical Journal B-Condensed Matter and Complex Systems, 47(3), 449–457.CrossRefGoogle Scholar
  36. Ferrer i Cancho, R., & Díaz-Guilera, A. (2007). The global minima of the communicative energy of natural communication systems. Journal of Statistical Mechanics: Theory and Experiment (06), P06009.Google Scholar
  37. Ferrer i Cancho, R., & Elvevåg, B. (2010). Random Texts Do Not Exhibit the Real Zipf’s Law-Like Rank Distribution. PLoS ONE, 5(3).Google Scholar
  38. Ferrer i Cancho, R., & Moscoso del Prado Martín, F. (2011). Information content versus word length in random typing. Journal of Statistical Mechanics: Theory and Experiment, 2011, L12002.Google Scholar
  39. Ferrer i Cancho, R., & Servedio, V. D. (2005). Can simple models explain zipf’s law in all cases? Glottometrics, 11, 1-8.Google Scholar
  40. Ferrer i Cancho, R., & Solé, R. (2002). Zipf’s law and random texts. Advances in Complex Systems, 5(1), 1–6.CrossRefGoogle Scholar
  41. Ferrer i Cancho, R., & Solé, R. (2003). Least effort and the origins of scaling in human language. Proceedings of the National Academy of Sciences of the United States of America, 100(3), 788.PubMedCrossRefGoogle Scholar
  42. Ferrer i Cancho, R., & Solé, R. V. (2001). Two regimes in the frequency of words and the origins of complex lexicons: Zipf’s law revisited. Journal of Quantitative Linguistics, 8(3), 165–173.CrossRefGoogle Scholar
  43. Frank, A., & Jaeger, T. (2008). Speaking rationally: Uniform information density as an optimal strategy for language production. In Proceedings of the Cognitive Science Society.Google Scholar
  44. Frank, S. A. (2009). The common patterns of nature. Journal of evolutionary biology, 22(8), 1563–1585.PubMedCrossRefPubMedCentralGoogle Scholar
  45. Gan, X., Wang, D., & Han, Z. (2009). N-tuple Zipf Analysis and Modeling for Language, Computer Program and DNA. arXiv, preprint arXiv:0908.0500.Google Scholar
  46. Gibson, E., Piantadosi, S., & Fedorenko, K. (2011). Using Mechanical Turk to Obtain and Analyze English Acceptability Judgments. Language and Linguistics Compass, 5(8), 509–524.CrossRefGoogle Scholar
  47. Glymour, C., Scheines, R., Spirtes, P., & Kelly, K. (1987). Discovering causal structure: Artificial intelligence, philosophy of science, and statistical modeling. Academic Press.Google Scholar
  48. Gnedenko, B. V., & Kolmogorov, A. (1968). Limit distributions for sums of independent random variables (Vol. 233). Addison-Wesley Reading.Google Scholar
  49. Guiraud, P. (1968). The semic matrices of meaning. Social Science Information, 7(2), 131–139.CrossRefGoogle Scholar
  50. Ha, L. Q., Hanna, P., Ming, J., & Smith, F. (2009). Extending Zipf’s law to n-grams for large corpora. Artificial Intelligence Review, 32(1), 101–113.CrossRefGoogle Scholar
  51. Ha, L. Q., Sicilia-Garcia, E. I., Ming, J., & Smith, F. J. (2002). Extension of Zipf’s law to words and phrases. In Proceedings of the 19th international conference on computational linguistics-volume 1 (pp. 1–6).Google Scholar
  52. Herdan, G. (1960). Type-token mathematics (Vol. 4). Mouton.Google Scholar
  53. Herdan, G. (1961). A critical examination of simon’s model of certain distribution functions in linguistics. Applied Statistics, 65–76.Google Scholar
  54. Herdan, G. (1964). Quantitative linguistics. Butterworths LondonGoogle Scholar
  55. Hernando, A., Puigdomènech, D., Villuendas, D., Vesperinas, C., & Plastino, A. (2009). Zipf’s law from a fisher variational-principle. Physics Letters A, 374(1), 18–21.CrossRefGoogle Scholar
  56. Howes, D. (1968). Zipf’s law and miller’s random-monkey model. The American Journal of Psychology, 81(2), 269–272.CrossRefGoogle Scholar
  57. Jaeger, F. (2010). Redundancy and reduction: Speakers manage syntactic information density. Cognitive Psychology, 61(1), 23–62.PubMedCrossRefGoogle Scholar
  58. Jescheniak, J. D., & Levelt, W. J. (1994). Word frequency effects in speech production: Retrieval of syntactic information and of phonological form. Journal of Experimental Psychology: Learning, Memory, and Cognition, 20(4), 824.Google Scholar
  59. Kanter, I., & Kessler, D. (1995). Markov processes: linguistics and zipf’s law. Physical review letters, 74(22), 4559–4562.PubMedCrossRefGoogle Scholar
  60. Kawamura, K., & Hatano, N. (2002). Universality of zipf’s law. arXiv, preprint cond-mat/0203455.Google Scholar
  61. Kay, P., & Regier, T. (2003). Resolving the question of color naming universals. Proceedings of the National Academy of Sciences, 100(15), 9085–9089.CrossRefGoogle Scholar
  62. Kello, C. T., Brown, G. D., Ferrer i Cancho, R., Holden, J. G., Linkenkaer-Hansen, K., Rhodes, T., & Van Orden, G. C. (2010). Scaling laws in cognitive sciences. Trends in cognitive sciences, 14(5), 223–232.PubMedCrossRefGoogle Scholar
  63. Kemp, C., & Regier, T. (2012). Kinship categories across languages reflect general communicative principles. Science, 336(6084), 1049–1054.PubMedCrossRefGoogle Scholar
  64. Landauer, T. K., Foltz, P. W., & Laham, D. (1998). An introduction to latent semantic analysis. Discourse processes, 25(2–3), 259–284.CrossRefGoogle Scholar
  65. Levelt, W. J. (1999). Models of word production. Trends in cognitive sciences, 3(6), 223–232.PubMedCrossRefGoogle Scholar
  66. Levy, R. (2008). Expectation-based syntactic comprehension. Cognition, 106(3), 1126–1177.PubMedCrossRefGoogle Scholar
  67. Levy, R., & Jaeger, T. (2007). Speakers optimize information density through syntactic reduction. Advances in neural information processing systems, 19, 849–856.Google Scholar
  68. Li, M., & Vitányi, P. (2008). An introduction to Kolmogorov complexity and its applications. New York: Springer-Verlag.CrossRefGoogle Scholar
  69. Li, W. (1992). Random texts exhibit zipf’s-law-like word frequency distribution. Information Theory, IEEE Transactions on, 38(6), 1842–1845.CrossRefGoogle Scholar
  70. Li, W. (2002). Zipf’s law everywhere. Glottometrics, 5, 14–21.Google Scholar
  71. Lin, Y., Michel, J., Aiden, E., Orwant, J., Brockman, W., & Petrov, S. (2012). Syntactic Annotations for the Google Books Ngram Corpus.Google Scholar
  72. Ljung, G. M., & Box, G. E. (1978). On a measure of lack of fit in time series models. Biometrika, 65(2), 297–303.CrossRefGoogle Scholar
  73. Louridas, P., Spinellis, D., & Vlachos, V. (2008). Power laws in software. ACM Transactions on Software Engineering and Methodology (TOSEM), 18(1), 2.CrossRefGoogle Scholar
  74. Lyon, A. (2014). Why are normal distributions normal? The British Journal for the Philosophy of Science.Google Scholar
  75. Manaris, B., Romero, J., Machado, P., Krehbiel, D., Hirzel, T., Pharr, W., & Davis, R. B. (2005). Zipf’s law, music classification, and aesthetics. Computer Music Journal, 29(1), 55–69.CrossRefGoogle Scholar
  76. Mandelbrot, B. (1953). An informational theory of the statistical structure of language. Communication theory, 486–502.Google Scholar
  77. Mandelbrot, B. (1962). On the theory of word frequencies and on related markovian models of discourse. Structure of language and its mathematical aspects, 190–219.Google Scholar
  78. Mandelbrot, B. (1966). Information theory and psycholinguistics: A theory of word frequencies. In P. Lazarsfield & N. Henry (Eds.), Readings in mathematical social sciences. Cambridge: MIT Press.Google Scholar
  79. Manin, D. (2008). Zipf’s law and avoidance of excessive synonymy. Cognitive Science, 32(7), 1075–1098.PubMedCrossRefGoogle Scholar
  80. Manin, D. (2009). Mandelbrot’s Model for Zipf’s Law: Can Mandelbrot’s Model Explain Zipf’s Law for Language? Journal of Quantitative Linguistics, 16(3), 274–285.CrossRefGoogle Scholar
  81. Manin, Y. I. (2013). Zipf’s law and L. Levin’s probability distributions. arXiv, preprint arXiv:1301.0427.Google Scholar
  82. Manning, C., & Schütze, H. (1999). Foundations of statistical natural language processing. Cambridge: MIT Press.Google Scholar
  83. Marcus, M. P., Marcinkiewicz, M. A., & Santorini, B. (1993). Building a large annotated corpus of english: The penn treebank. Computational linguistics, 19(2), 313–330.Google Scholar
  84. Mason, W., & Suri, S. (2012). Conducting behavioral research on amazon’s mechanical turk. Behavior research methods, 44(1), 1–23.PubMedCrossRefGoogle Scholar
  85. Miller, G. (1957). Some effects of intermittent silence. The American Journal of Psychology, 311–314.Google Scholar
  86. Mitzenmacher, M. (2004). A brief history of generative models for power law and lognormal distributions. Internet mathematics, 1(2), 226–251.CrossRefGoogle Scholar
  87. Montemurro, M. (2001). Beyond the Zipf–Mandelbrot law in quantitative linguistics. Physica A: Statistical Mechanics and its Applications, 300(3), 567–578.CrossRefGoogle Scholar
  88. Newman, M. (2005). Power laws, Pareto distributions and Zipf’s law. Contemporary physics, 46(5), 323–351.CrossRefGoogle Scholar
  89. Oldfield, R. C., & Wingfield, A. (1965). Response latencies in naming objects. Quarterly Journal of Experimental Psychology, 17(4), 273–281.PubMedCrossRefGoogle Scholar
  90. Orlov, J., & Chitashvili, R. (1983). Generalized Z-distribution generating the well-known rank-distributions. Bulletin of the Academy of Sciences, Georgia, 110, 269–272.Google Scholar
  91. Pagel, M., Atkinson, Q. D., & Meade, A. (2007). Frequency of word-use predicts rates of lexical evolution throughout indo-european history. Nature, 449(7163), 717–720.PubMedCrossRefGoogle Scholar
  92. Paolacci, G., Chandler, J., & Ipeirotis, P. (2010). Running experiments on Amazon Mechanical Turk. Judgment and Decision Making, 5(5), 411–419.Google Scholar
  93. Parker-Rhodes, A., & Joyce, T. (1956). A theory of word-frequency distribution. Nature, 178, 1308.CrossRefGoogle Scholar
  94. Petersen, A. M., Tenenbaum, J., Havlin, S., & Stanley, H. E. (2012). Statistical laws governing fluctuations in word use from word birth to word death. Scientific reports, 2.Google Scholar
  95. Piantadosi, S. (2012). Approximate number from first principles. Manuscript under reviewGoogle Scholar
  96. Piantadosi, S., Tily, H., & Gibson, E. (2011). Word lengths are optimized for efficient communication. Proceedings of the National Academy of Sciences, 108(9), 3526–3529.CrossRefGoogle Scholar
  97. Piantadosi, S., Tily, H., & Gibson, E. (2013). Information content versus word length in natural language: A reply to Ferrer i Cancho and Moscoso del Prado Martin. Manuscript under review.Google Scholar
  98. Popescu, I. I. (2009). Word frequency studies (Vol. 64). Walter de Gruyter.Google Scholar
  99. Reppen, R., & Ide, N. (2004). The american national corpus overall goals and the first release. Journal of English Linguistics, 32(2), 105–113.CrossRefGoogle Scholar
  100. Roehner, B., & Winiwarter, P. (1985). Aggregation of independent paretian random variables. Advances in applied probability, 465–469.Google Scholar
  101. Rouault, A. (1978). Lois de Zipf et sources Markoviennes. In Annales de l’institut h. poincare.Google Scholar
  102. Saichev, A., Malevergne, Y., & Sornette, D. (2010). Theory of Zipf’s law and beyond (Vol. 632). Springer.Google Scholar
  103. Salge, C., Ay, N., Polani, D., & Prokopenko, M. (2013). Zipf’s Law: Balancing Signal Usage Cost and Communication Efficiency (Tech. Rep.). Santa Fe Institute Working Paper #13–10–033.Google Scholar
  104. Shannon, C. (1948). The mathematical theory of communication. Urbana: University of Illinois Press.Google Scholar
  105. Shooman, M., & Laemmel, A. (1977). Statistical theory of computer programs information content and complexity. In Compcon fall’77 (pp. 341–347).Google Scholar
  106. Sichel, H. S. (1975). On a distribution law for word frequencies. Journal of the American Statistical Association, 70(351a), 542–547.CrossRefGoogle Scholar
  107. Simon, H. A. (1955). On a class of skew distribution functions. Biometrika, 425–440.Google Scholar
  108. Simon, H. A. (1960). Some further notes on a class of skew distribution functions. Information and Control, 3(1), 80–88.CrossRefGoogle Scholar
  109. Smith, N. J., & Levy, R. (2014). The effect of word predictability on reading time is logarithmic. Cognition.Google Scholar
  110. Smith, R. D. (2008). Investigation of the zipf-plot of the extinct meroitic language. arXiv, preprint arXiv:0808.2904.Google Scholar
  111. Steyvers, M., & Griffiths, T. (2007). Probabilistic topic models. Handbook of latent semantic analysis, 427(7), 424–440.Google Scholar
  112. Stumpf, M. P., & Porter, M. A. (2012). Critical truths about power laws. Science, 335(6069), 665–666.PubMedCrossRefGoogle Scholar
  113. Tripp, O., & Feitelson, D. (1982). Zipf’s law re-visited. Studies on Zipf’s law, 1–28.Google Scholar
  114. Veldhuizen, T. L. (2005). Software libraries and their reuse: Entropy, kolmogorov complexity, and zipf’s law. arXiv, preprint cs/0508023.Google Scholar
  115. Wickelgren, W. A. (1974). Single-trace fragility theory of memory dynamics. Memory & Cognition, 2(4), 775–780.CrossRefGoogle Scholar
  116. Wickelgren, W. A. (1977). Learning and memory. NJ: Prentice-Hall Englewood Cliffs.Google Scholar
  117. Wigner, E. P. (1960). The unreasonable effectiveness of mathematics in the natural sciences. Communications on pure and applied mathematics, 13(1), 1–14.CrossRefGoogle Scholar
  118. Wixted, J. T. (2004a). On common ground: Jost’s (1897) law of forgetting and Ribot’s (1881) law of retrograde amnesia. Psychological review, 111(4), 864–879.PubMedCrossRefGoogle Scholar
  119. Wixted, J. T. (2004b). The psychology and neuroscience of forgetting. Annu. Rev. Psychol., 55, 235–269.PubMedCrossRefGoogle Scholar
  120. Wixted, J. T., & Ebbesen, E. B. (1991). On the form of forgetting. Psychological science, 2(6), 409–415.CrossRefGoogle Scholar
  121. Wixted, J. T., & Ebbesen, E. B. (1997). Genuine power curves in forgetting: A quantitative analysis of individual subject forgetting functions. Memory & Cognition, 25(5), 731–739.CrossRefGoogle Scholar
  122. Yule, G. U. (1924). A mathematical theory of evolution, based on the conclusions of Dr. JC Willis, FRS. Philosophical Transactions of the Royal Society of London. Series B, Containing Papers of a Biological Character, 213, 21–87.CrossRefGoogle Scholar
  123. Yule, G. U. (1944). The statistical study of literary vocabulary. CUP ArchiveGoogle Scholar
  124. Zanette, D., & Montemurro, M. (2005). Dynamics of text generation with realistic zipf’s distribution. Journal of quantitative Linguistics, 12(1), 29–40.CrossRefGoogle Scholar
  125. Zanette, D. H. (2006). Zipf’s law and the creation of musical context. Musicae Scientiae, 10(1), 3–18.CrossRefGoogle Scholar
  126. Zipf, G. (1936). The Psychobiology of Language. London: Routledge.Google Scholar
  127. Zipf, G. (1949). Human Behavior and the Principle of Least Effort. New York: Addison-Wesley.Google Scholar

Copyright information

© Psychonomic Society, Inc. 2014

Authors and Affiliations

  1. 1.Brain and Cognitive SciencesUniversity of RochesterRochesterUSA

Personalised recommendations