Skip to main content
Log in

Identifying Trends in Word Frequency Dynamics

  • Published:
Journal of Statistical Physics Aims and scope Submit manuscript

Abstract

The word-stock of a language is a complex dynamical system in which words can be created, evolve, and become extinct. Even more dynamic are the short-term fluctuations in word usage by individuals in a population. Building on the recent demonstration that word niche is a strong determinant of future rise or fall in word frequency, here we introduce a model that allows us to distinguish persistent from temporary increases in frequency. Our model is illustrated using a 108-word database from an online discussion group and a 1011-word collection of digitized books. The model reveals a strong relation between changes in word dissemination and changes in frequency. Aside from their implications for short-term word frequency dynamics, these observations are potentially important for language evolution as new words must survive in the short term in order to survive in the long term.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Manning, C.D., Schuetze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)

    MATH  Google Scholar 

  2. Baayen, R.H.: Word Frequency Distributions. Springer, Berlin (2002)

    Google Scholar 

  3. Pagel, M.: Human language as a culturally transmitted replicator. Nat. Rev. Genet. 10, 405–415 (2009)

    Google Scholar 

  4. Gell-Mann, M., Ruhlen, M.: The origin and evolution of word order. Proc. Natl. Acad. Sci. 108, 17290–17295 (2011)

    Article  ADS  MATH  Google Scholar 

  5. Altmann, E.G., Pierrehumbert, J.B., Motter, A.E.: Beyond word frequency: bursts, lulls, and scaling in the temporal distributions of words. PLoS ONE 4(11), e7678 (2009)

    Article  ADS  Google Scholar 

  6. Michel, J.-B., et al.: Quantitative analysis of culture using millions of digitized books. Science 331, 176–182 (2010)

    Article  ADS  Google Scholar 

  7. Dodds, P.S., Harris, K.D., Kloumann, I.M., Bliss, C.A., Danforth, C.M.: Temporal patterns of happiness and information in a global social network: Hedonometrics and Twitter. PLoS ONE 6(12), e26752 (2011)

    Article  ADS  Google Scholar 

  8. Lieberman, E., Michel, J.-B., Jackson, J., Tang, T., Nowak, M.A.: Quantifying the evolutionary dynamics of language. Nature 449, 713–716 (2007)

    Article  ADS  Google Scholar 

  9. Pagel, M., Atkinson, A., Meade, A.: Frequency of word-use predicts rates of lexical evolution throughout Indo-European history. Nature 449, 717–720 (2007)

    Article  ADS  Google Scholar 

  10. Altmann, E.G., Pierrehumbert, J.B., Motter, A.E.: Niche as a determinant of word fate in online groups. PLoS ONE 6(5), e19009 (2011)

    Article  ADS  Google Scholar 

  11. The Usenet Archives, available at http://groups.google.com

  12. The Google Books Ngram Corpuses, available at http://books.google.com/ngrams/datasets

  13. Stephens, G.J., Bialek, W.: Statistical mechanics of letters in words. Phys. Rev. E 81, 066119 (2010)

    Article  ADS  Google Scholar 

  14. Montemurro, M., Zanette, D.H.: Towards the quantification of the semantic information encoded in written language. Adv. Complex Syst. 13, 135–153 (2010)

    Article  MATH  Google Scholar 

  15. Ferrer i Cancho, R., Solé, R.V.: Least effort and the origins of scaling in human language. Proc. Natl. Acad. Sci. USA 100, 788–791 (2003)

    Article  MathSciNet  ADS  MATH  Google Scholar 

  16. Prokopenko, M., Ay, N., Obst, O., Polani, D.: Phase transitions in least-effort communications. J. Stat. Mech. 2010(11), P11025 (2010)

    Article  Google Scholar 

  17. Ferrer i Cancho, R., Solé, R.V.: The small world of human language. Proc. R. Soc. Lond. B 268, 2261–2265 (2001)

    Article  Google Scholar 

  18. Dorogovtsev, S.N., Mendes, J.F.F.: Language as an evolving word web. Proc. R. Soc. Lond. B 268, 2603–2606 (2001)

    Article  Google Scholar 

  19. Motter, A.E., de Moura, A.P.S., Lai, Y.-C., Dasgupta, P.: Topology of the conceptual network of language. Phys. Rev. E 65, 065102(R) (2002)

    ADS  Google Scholar 

  20. Sigman, M., Cecchi, G.A.: Global organization of the Wordnet lexicon. Proc. Natl. Acad. Sci. USA 99, 1742–1747 (2002)

    Article  ADS  Google Scholar 

  21. Serrano, M.A., Flammini, A., Menczer, F.: Modeling statistical properties of written text. PLoS ONE 4(4), e537 (2009)

    Google Scholar 

  22. Corral, R., Ferrer-i-Cancho, R., Boleda, G., Diaz-Guilera, A.: Universal complex structures in written language. arXiv:0901.2924v1 [physics.soc-ph] (2009)

  23. Solé, R.V., Corominas-Murtra, B., Fortuny, J.: Diversity, competition, extinction: the ecophysics of language change. J. R. Soc. Interface 7, 1647–1664 (2010)

    Article  Google Scholar 

  24. Petersen, A.M., Tenenbaum, J., Havlin, S., Stanley, H.E.: Statistical laws governing fluctuations in word use from word birth to word death. Sci. Rep. 2, 313 (2012)

    Google Scholar 

  25. Perc, M.: Evolution of the most common English words and phrases over the centuries. J. R. Soc. Interface 9, 3323–3328 (2012)

    Article  Google Scholar 

  26. Hruschka, D.J., Christiansen, M.H., Blythe, R.A., Croft, W., Heggarty, P., Mufwene, S.S., Pierrehumbert, J.B., Poplack, S.: Building social cognitive models of language change. Trends Cogn. Sci. 13, 464–469 (2009)

    Article  Google Scholar 

  27. Castellano, C., Fortunato, S., Loreto, V.: Statistical physics of social dynamics. Rev. Mod. Phys. 81, 591–646 (2009)

    Article  ADS  Google Scholar 

  28. Kessler, D.A., Maruvka, Y.E., Ouren, J., Shnerb, N.M.: You name it—how memory and delay govern first name dynamics. PLoS ONE 7(6), e38790 (2012)

    Article  Google Scholar 

  29. Zanette, D.H.: Dynamics of fashion: the case of given names. arXiv:1208.0576 [physics.soc-ph] (2012)

  30. Foote, M., Crampton, J.S., Beu, A.G., Cooper, R.A.: On the bidirectional relationship between geographic range and taxonomic duration. Paleobiology 34, 421–433 (2008)

    Article  Google Scholar 

  31. Wilson, R.J., Thomas, C.D., Fox, R., Roy, D.B., Kunin, W.E.: Spatial patterns in species distributions reveal biodiversity change. Nature 432, 393–396 (2004)

    Article  ADS  Google Scholar 

  32. Meyer, M., Havlin, S., Bunde, A.: Clustering of independently diffusing individuals by birth and death processes. Phys. Rev. E 54, 5567–5570 (1996)

    Article  ADS  Google Scholar 

Download references

Acknowledgements

We thank Janet Pierrehumbert for discussions during preliminary stages of the project. This work was supported by the Northwestern University Institute on Complex Systems (E.G.A.), the Max Planck Institute for the Physics of Complex Systems (E.G.A.), and a Sloan Research Fellowship (A.E.M.).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Adilson E. Motter.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Altmann, E.G., Whichard, Z.L. & Motter, A.E. Identifying Trends in Word Frequency Dynamics. J Stat Phys 151, 277–288 (2013). https://doi.org/10.1007/s10955-013-0699-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10955-013-0699-7

Keywords

Navigation