Advertisement

The Dynamics of Semantic Change: A Corpus-Based Analysis

  • Mohamed Amine BoukhaledEmail author
  • Benjamin Fagard
  • Thierry Poibeau
Conference paper
  • 161 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11978)

Abstract

In this contribution, we report on a computational corpus-based study to analyse the semantic evolution of words over time. Though semantic change is complex and not well suited to analytical manipulation, we believe that computational modelling is a crucial tool to study this phenomenon. This study consists of two parts. In the first one, our aim is to capture the systemic change of word meanings in an empirical model that is also predictive, making it falsifiable. In order to illustrate the significance of this kind of empirical model, we then conducted an experimental evaluation using the Google Books N-Gram corpus. The results show that the model is effective in capturing semantic change and can achieve a high degree of accuracy on predicting words’ distributional semantics. In the second part, we look at the degree to which the S-curve model, which is generally used to describe the quantitative property associated with linguistic changes, applies in the case of lexical semantic change. We use an automatic procedure to empirically extract words that have known the biggest semantic shifts in the past two centuries from the Google Books N-gram corpus. Then, we investigate the significance of the S-curve pattern in their frequency evolution. The results suggest that the S-curve pattern has indeed some generic character, especially in the case of frequency rises related to semantic expansions.

Keywords

Semantic change Diachronic word embedding Recurrent neural networks Computational semantics S-curve model 

Notes

Acknowledgements

This work is supported by the project 2016-147 ANR OPLADYN TAP-DD2016.

References

  1. 1.
    Simpson, J.A., Weiner, E.S.C.: The Oxford English Dictionary. Oxford University Press, Oxford (1989)Google Scholar
  2. 2.
    Dubossarsky, H., Tsvetkov, Y., Dyer, C., Grossman, E.: A bottom up approach to category mapping and meaning change. In: NetWordS, pp. 66–70 (2015)Google Scholar
  3. 3.
    Traugott, E.C., Dasher, R.B.: Regularity in Semantic Change. Cambridge University Press, Cambridge (2001)Google Scholar
  4. 4.
    Bailey, C.-J.N.: Variation and linguistic theory (1973)Google Scholar
  5. 5.
    Kroch, A.S.: Reflexes of grammar in patterns of language change. Lang. Var. Change. 1, 199–244 (1989)CrossRefGoogle Scholar
  6. 6.
    Steels, L.: Modeling the cultural evolution of language. Phys. Life Rev. 8, 339–356 (2011)CrossRefGoogle Scholar
  7. 7.
    Boukhaled, M., Fagard, B., Poibeau, T.: Modelling the semantic change dynamics using diachronic word embedding. In: Proceedings of the 11th International Conference on Agents and Artificial Intelligenc, ICAART 2019. Prague, Czech Republic (2019)Google Scholar
  8. 8.
    Kim, Y., Chiu, Y.-I., Hanaki, K., Hegde, D., Petrov, S.: Temporal analysis of language through neural language models. arXiv Preprint. arXiv:1405.3515 (2014)
  9. 9.
    Rosin, G.D., Radinsky, K., Adar, E.: Learning Word Relatedness over Time. arXiv Preprint arXiv:1707.08081 (2017)
  10. 10.
    Szymanski, T.: Temporal word analogies: identifying lexical replacement with diachronic word embeddings. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 448–453 (2017)Google Scholar
  11. 11.
    Kutuzov, A., Velldal, E., Øvrelid, L.: Temporal dynamics of semantic relations in word embeddings: an application to predicting armed conflict participants. arXiv Preprint arXiv:1707.08660 (2017)
  12. 12.
    Hamilton, W.L., Leskovec, J., Jurafsky, D.: Diachronic word embeddings reveal statistical laws of semantic change. arXiv Preprint arXiv:1605.09096 (2016)
  13. 13.
    Feltgen, Q., Fagard, B., Nadal, J.-P.: Frequency patterns of semantic change: corpus-based evidence of a near-critical dynamics in language change. R. Soc. Open Sci. 4, 170830 (2017)CrossRefGoogle Scholar
  14. 14.
    Turney, P.D., Pantel, P.: From frequency to meaning: vector space models of semantics. J. Artif. Intell. Res. 37, 141–188 (2010)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41, 391–407 (1990)CrossRefGoogle Scholar
  16. 16.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)zbMATHGoogle Scholar
  17. 17.
    Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv Preprint arXiv:1301.3781 (2013)
  18. 18.
    Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)Google Scholar
  19. 19.
    Lin, Y., Michel, J.-B., Aiden, E.L., Orwant, J., Brockman, W., Petrov, S.: Syntactic annotations for the Google books ngram corpus. In: Proceedings of the ACL 2012 System Demonstrations, pp. 169–174 (2012)Google Scholar
  20. 20.
    Hamilton, W.L., Leskovec, J., Jurafsky, D.: Cultural shift or linguistic drift? comparing two computational measures of semantic change. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, p. 2116 (2016)Google Scholar
  21. 21.
    Bengio, Y.: Markovian models for sequential data. Neural Comput. Surv. 2, 129–162 (1999)Google Scholar
  22. 22.
    Lafferty, J., McCallum, A., Pereira, F.C.N.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data (2001)Google Scholar
  23. 23.
    Medsker, L.R., Jain, L.C.: Recurrent neural networks. Des. Appl. 5 (2001)Google Scholar
  24. 24.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735–1780 (1997)CrossRefGoogle Scholar
  25. 25.
    Pascanu, R., Mikolov, T., Bengio, Y.: On the difficulty of training recurrent neural networks. In: International Conference on Machine Learning, pp. 1310–1318 (2013)Google Scholar
  26. 26.
    Iacobacci, I., Pilehvar, M.T., Navigli, R.: Sensembed: learning sense embeddings for word and relational similarity. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 95–105 (2015)Google Scholar
  27. 27.
    Rogers, E.M.: Diffusion of Innovations. Simon and Schuster, New York (2010)Google Scholar
  28. 28.
    Denison, D.: Logistic and simplistic S-curves. Motiv. Lang. Chang. 54, 70 (2003)Google Scholar
  29. 29.
    Labov, W.: Principles of Linguistic Change, Volume 3: Cognitive and Cultural Factors. Wiley, Oxford (1994)Google Scholar
  30. 30.
    Ghanbarnejad, F., Gerlach, M., Miotto, J.M., Altmann, E.G.: Extracting information from S-curves of language change. J. R. Soc. Interface 11, 20141044 (2014)CrossRefGoogle Scholar
  31. 31.
    Nevalainen, T.: Descriptive adequacy of the S-curve model in diachronic studies of language change. In: Can We Predict Linguistic Change? (2015)Google Scholar
  32. 32.
    Blythe, R.A., Croft, W.: S-curves and the mechanisms of propagation in language change. Language (Baltim) 88, 269–304 (2012)CrossRefGoogle Scholar
  33. 33.
    Feltgen, Q.: Statistical physics of language evolution: the grammaticalization phenomenon (2017)Google Scholar
  34. 34.
    Webster, N.: Noah Webster’s first edition of an American dictionary of the English language. Foundation for Amer Christian (1828)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Mohamed Amine Boukhaled
    • 1
    Email author
  • Benjamin Fagard
    • 1
  • Thierry Poibeau
    • 1
  1. 1.Laboratoire Langues, Textes, Traitements Informatique, CognitionLATTICE, CNRS, ENS & Université Paris 3; PSL & USPCParisFrance

Personalised recommendations