Advertisement

Journal of Statistical Physics

, Volume 151, Issue 1–2, pp 277–288 | Cite as

Identifying Trends in Word Frequency Dynamics

  • Eduardo G. Altmann
  • Zakary L. Whichard
  • Adilson E. Motter
Article

Abstract

The word-stock of a language is a complex dynamical system in which words can be created, evolve, and become extinct. Even more dynamic are the short-term fluctuations in word usage by individuals in a population. Building on the recent demonstration that word niche is a strong determinant of future rise or fall in word frequency, here we introduce a model that allows us to distinguish persistent from temporary increases in frequency. Our model is illustrated using a 108-word database from an online discussion group and a 1011-word collection of digitized books. The model reveals a strong relation between changes in word dissemination and changes in frequency. Aside from their implications for short-term word frequency dynamics, these observations are potentially important for language evolution as new words must survive in the short term in order to survive in the long term.

Keywords

Word dynamics Fluctuations Statistical model Internet communities 

Notes

Acknowledgements

We thank Janet Pierrehumbert for discussions during preliminary stages of the project. This work was supported by the Northwestern University Institute on Complex Systems (E.G.A.), the Max Planck Institute for the Physics of Complex Systems (E.G.A.), and a Sloan Research Fellowship (A.E.M.).

References

  1. 1.
    Manning, C.D., Schuetze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999) MATHGoogle Scholar
  2. 2.
    Baayen, R.H.: Word Frequency Distributions. Springer, Berlin (2002) Google Scholar
  3. 3.
    Pagel, M.: Human language as a culturally transmitted replicator. Nat. Rev. Genet. 10, 405–415 (2009) Google Scholar
  4. 4.
    Gell-Mann, M., Ruhlen, M.: The origin and evolution of word order. Proc. Natl. Acad. Sci. 108, 17290–17295 (2011) ADSMATHCrossRefGoogle Scholar
  5. 5.
    Altmann, E.G., Pierrehumbert, J.B., Motter, A.E.: Beyond word frequency: bursts, lulls, and scaling in the temporal distributions of words. PLoS ONE 4(11), e7678 (2009) ADSCrossRefGoogle Scholar
  6. 6.
    Michel, J.-B., et al.: Quantitative analysis of culture using millions of digitized books. Science 331, 176–182 (2010) ADSCrossRefGoogle Scholar
  7. 7.
    Dodds, P.S., Harris, K.D., Kloumann, I.M., Bliss, C.A., Danforth, C.M.: Temporal patterns of happiness and information in a global social network: Hedonometrics and Twitter. PLoS ONE 6(12), e26752 (2011) ADSCrossRefGoogle Scholar
  8. 8.
    Lieberman, E., Michel, J.-B., Jackson, J., Tang, T., Nowak, M.A.: Quantifying the evolutionary dynamics of language. Nature 449, 713–716 (2007) ADSCrossRefGoogle Scholar
  9. 9.
    Pagel, M., Atkinson, A., Meade, A.: Frequency of word-use predicts rates of lexical evolution throughout Indo-European history. Nature 449, 717–720 (2007) ADSCrossRefGoogle Scholar
  10. 10.
    Altmann, E.G., Pierrehumbert, J.B., Motter, A.E.: Niche as a determinant of word fate in online groups. PLoS ONE 6(5), e19009 (2011) ADSCrossRefGoogle Scholar
  11. 11.
    The Usenet Archives, available at http://groups.google.com
  12. 12.
    The Google Books Ngram Corpuses, available at http://books.google.com/ngrams/datasets
  13. 13.
    Stephens, G.J., Bialek, W.: Statistical mechanics of letters in words. Phys. Rev. E 81, 066119 (2010) ADSCrossRefGoogle Scholar
  14. 14.
    Montemurro, M., Zanette, D.H.: Towards the quantification of the semantic information encoded in written language. Adv. Complex Syst. 13, 135–153 (2010) MATHCrossRefGoogle Scholar
  15. 15.
    Ferrer i Cancho, R., Solé, R.V.: Least effort and the origins of scaling in human language. Proc. Natl. Acad. Sci. USA 100, 788–791 (2003) MathSciNetADSMATHCrossRefGoogle Scholar
  16. 16.
    Prokopenko, M., Ay, N., Obst, O., Polani, D.: Phase transitions in least-effort communications. J. Stat. Mech. 2010(11), P11025 (2010) CrossRefGoogle Scholar
  17. 17.
    Ferrer i Cancho, R., Solé, R.V.: The small world of human language. Proc. R. Soc. Lond. B 268, 2261–2265 (2001) CrossRefGoogle Scholar
  18. 18.
    Dorogovtsev, S.N., Mendes, J.F.F.: Language as an evolving word web. Proc. R. Soc. Lond. B 268, 2603–2606 (2001) CrossRefGoogle Scholar
  19. 19.
    Motter, A.E., de Moura, A.P.S., Lai, Y.-C., Dasgupta, P.: Topology of the conceptual network of language. Phys. Rev. E 65, 065102(R) (2002) ADSGoogle Scholar
  20. 20.
    Sigman, M., Cecchi, G.A.: Global organization of the Wordnet lexicon. Proc. Natl. Acad. Sci. USA 99, 1742–1747 (2002) ADSCrossRefGoogle Scholar
  21. 21.
    Serrano, M.A., Flammini, A., Menczer, F.: Modeling statistical properties of written text. PLoS ONE 4(4), e537 (2009) Google Scholar
  22. 22.
    Corral, R., Ferrer-i-Cancho, R., Boleda, G., Diaz-Guilera, A.: Universal complex structures in written language. arXiv:0901.2924v1 [physics.soc-ph] (2009)
  23. 23.
    Solé, R.V., Corominas-Murtra, B., Fortuny, J.: Diversity, competition, extinction: the ecophysics of language change. J. R. Soc. Interface 7, 1647–1664 (2010) CrossRefGoogle Scholar
  24. 24.
    Petersen, A.M., Tenenbaum, J., Havlin, S., Stanley, H.E.: Statistical laws governing fluctuations in word use from word birth to word death. Sci. Rep. 2, 313 (2012) Google Scholar
  25. 25.
    Perc, M.: Evolution of the most common English words and phrases over the centuries. J. R. Soc. Interface 9, 3323–3328 (2012) CrossRefGoogle Scholar
  26. 26.
    Hruschka, D.J., Christiansen, M.H., Blythe, R.A., Croft, W., Heggarty, P., Mufwene, S.S., Pierrehumbert, J.B., Poplack, S.: Building social cognitive models of language change. Trends Cogn. Sci. 13, 464–469 (2009) CrossRefGoogle Scholar
  27. 27.
    Castellano, C., Fortunato, S., Loreto, V.: Statistical physics of social dynamics. Rev. Mod. Phys. 81, 591–646 (2009) ADSCrossRefGoogle Scholar
  28. 28.
    Kessler, D.A., Maruvka, Y.E., Ouren, J., Shnerb, N.M.: You name it—how memory and delay govern first name dynamics. PLoS ONE 7(6), e38790 (2012) CrossRefGoogle Scholar
  29. 29.
    Zanette, D.H.: Dynamics of fashion: the case of given names. arXiv:1208.0576 [physics.soc-ph] (2012)
  30. 30.
    Foote, M., Crampton, J.S., Beu, A.G., Cooper, R.A.: On the bidirectional relationship between geographic range and taxonomic duration. Paleobiology 34, 421–433 (2008) CrossRefGoogle Scholar
  31. 31.
    Wilson, R.J., Thomas, C.D., Fox, R., Roy, D.B., Kunin, W.E.: Spatial patterns in species distributions reveal biodiversity change. Nature 432, 393–396 (2004) ADSCrossRefGoogle Scholar
  32. 32.
    Meyer, M., Havlin, S., Bunde, A.: Clustering of independently diffusing individuals by birth and death processes. Phys. Rev. E 54, 5567–5570 (1996) ADSCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  • Eduardo G. Altmann
    • 1
  • Zakary L. Whichard
    • 2
  • Adilson E. Motter
    • 3
  1. 1.Max Planck Institute for the Physics of Complex SystemsDresdenGermany
  2. 2.Department of Physics and AstronomyNorthwestern UniversityEvanstonUSA
  3. 3.Department of Physics and Astronomy and Northwestern Institute on Complex SystemsNorthwestern UniversityEvanstonUSA

Personalised recommendations