Abstract
The word-stock of a language is a complex dynamical system in which words can be created, evolve, and become extinct. Even more dynamic are the short-term fluctuations in word usage by individuals in a population. Building on the recent demonstration that word niche is a strong determinant of future rise or fall in word frequency, here we introduce a model that allows us to distinguish persistent from temporary increases in frequency. Our model is illustrated using a 108-word database from an online discussion group and a 1011-word collection of digitized books. The model reveals a strong relation between changes in word dissemination and changes in frequency. Aside from their implications for short-term word frequency dynamics, these observations are potentially important for language evolution as new words must survive in the short term in order to survive in the long term.
Similar content being viewed by others
References
Manning, C.D., Schuetze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)
Baayen, R.H.: Word Frequency Distributions. Springer, Berlin (2002)
Pagel, M.: Human language as a culturally transmitted replicator. Nat. Rev. Genet. 10, 405–415 (2009)
Gell-Mann, M., Ruhlen, M.: The origin and evolution of word order. Proc. Natl. Acad. Sci. 108, 17290–17295 (2011)
Altmann, E.G., Pierrehumbert, J.B., Motter, A.E.: Beyond word frequency: bursts, lulls, and scaling in the temporal distributions of words. PLoS ONE 4(11), e7678 (2009)
Michel, J.-B., et al.: Quantitative analysis of culture using millions of digitized books. Science 331, 176–182 (2010)
Dodds, P.S., Harris, K.D., Kloumann, I.M., Bliss, C.A., Danforth, C.M.: Temporal patterns of happiness and information in a global social network: Hedonometrics and Twitter. PLoS ONE 6(12), e26752 (2011)
Lieberman, E., Michel, J.-B., Jackson, J., Tang, T., Nowak, M.A.: Quantifying the evolutionary dynamics of language. Nature 449, 713–716 (2007)
Pagel, M., Atkinson, A., Meade, A.: Frequency of word-use predicts rates of lexical evolution throughout Indo-European history. Nature 449, 717–720 (2007)
Altmann, E.G., Pierrehumbert, J.B., Motter, A.E.: Niche as a determinant of word fate in online groups. PLoS ONE 6(5), e19009 (2011)
The Usenet Archives, available at http://groups.google.com
The Google Books Ngram Corpuses, available at http://books.google.com/ngrams/datasets
Stephens, G.J., Bialek, W.: Statistical mechanics of letters in words. Phys. Rev. E 81, 066119 (2010)
Montemurro, M., Zanette, D.H.: Towards the quantification of the semantic information encoded in written language. Adv. Complex Syst. 13, 135–153 (2010)
Ferrer i Cancho, R., Solé, R.V.: Least effort and the origins of scaling in human language. Proc. Natl. Acad. Sci. USA 100, 788–791 (2003)
Prokopenko, M., Ay, N., Obst, O., Polani, D.: Phase transitions in least-effort communications. J. Stat. Mech. 2010(11), P11025 (2010)
Ferrer i Cancho, R., Solé, R.V.: The small world of human language. Proc. R. Soc. Lond. B 268, 2261–2265 (2001)
Dorogovtsev, S.N., Mendes, J.F.F.: Language as an evolving word web. Proc. R. Soc. Lond. B 268, 2603–2606 (2001)
Motter, A.E., de Moura, A.P.S., Lai, Y.-C., Dasgupta, P.: Topology of the conceptual network of language. Phys. Rev. E 65, 065102(R) (2002)
Sigman, M., Cecchi, G.A.: Global organization of the Wordnet lexicon. Proc. Natl. Acad. Sci. USA 99, 1742–1747 (2002)
Serrano, M.A., Flammini, A., Menczer, F.: Modeling statistical properties of written text. PLoS ONE 4(4), e537 (2009)
Corral, R., Ferrer-i-Cancho, R., Boleda, G., Diaz-Guilera, A.: Universal complex structures in written language. arXiv:0901.2924v1 [physics.soc-ph] (2009)
Solé, R.V., Corominas-Murtra, B., Fortuny, J.: Diversity, competition, extinction: the ecophysics of language change. J. R. Soc. Interface 7, 1647–1664 (2010)
Petersen, A.M., Tenenbaum, J., Havlin, S., Stanley, H.E.: Statistical laws governing fluctuations in word use from word birth to word death. Sci. Rep. 2, 313 (2012)
Perc, M.: Evolution of the most common English words and phrases over the centuries. J. R. Soc. Interface 9, 3323–3328 (2012)
Hruschka, D.J., Christiansen, M.H., Blythe, R.A., Croft, W., Heggarty, P., Mufwene, S.S., Pierrehumbert, J.B., Poplack, S.: Building social cognitive models of language change. Trends Cogn. Sci. 13, 464–469 (2009)
Castellano, C., Fortunato, S., Loreto, V.: Statistical physics of social dynamics. Rev. Mod. Phys. 81, 591–646 (2009)
Kessler, D.A., Maruvka, Y.E., Ouren, J., Shnerb, N.M.: You name it—how memory and delay govern first name dynamics. PLoS ONE 7(6), e38790 (2012)
Zanette, D.H.: Dynamics of fashion: the case of given names. arXiv:1208.0576 [physics.soc-ph] (2012)
Foote, M., Crampton, J.S., Beu, A.G., Cooper, R.A.: On the bidirectional relationship between geographic range and taxonomic duration. Paleobiology 34, 421–433 (2008)
Wilson, R.J., Thomas, C.D., Fox, R., Roy, D.B., Kunin, W.E.: Spatial patterns in species distributions reveal biodiversity change. Nature 432, 393–396 (2004)
Meyer, M., Havlin, S., Bunde, A.: Clustering of independently diffusing individuals by birth and death processes. Phys. Rev. E 54, 5567–5570 (1996)
Acknowledgements
We thank Janet Pierrehumbert for discussions during preliminary stages of the project. This work was supported by the Northwestern University Institute on Complex Systems (E.G.A.), the Max Planck Institute for the Physics of Complex Systems (E.G.A.), and a Sloan Research Fellowship (A.E.M.).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Altmann, E.G., Whichard, Z.L. & Motter, A.E. Identifying Trends in Word Frequency Dynamics. J Stat Phys 151, 277–288 (2013). https://doi.org/10.1007/s10955-013-0699-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10955-013-0699-7