Dynamic EM in Neologism Evolution

  • Martin Emms
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8206)

Abstract

Research on unsupervised word sense discrimination typically ignores a notable dynamic aspect, whereby the prevalence of a word sense varies over time, to the point that a given word (such as ’tweet’) can acquire a new usage alongside a pre-existing one (such as ’a Twitter post’ alongside ’a bird noise’). This work applies unsupervised methods to text collections within which such neologisms can reasonably be expected to occur. We propose a probabilistic model which conditions words on senses, and senses on times and an EM method to learn the parameters of the model using data from which sense labels have been deleted. This is contrasted with a static model with no time dependency. We show qualitatively that the learned and the observed time-dependent sense distributions resemble each other closely, and quantitatively that the learned dynamic model achieves a higher tagging accuracy (82.4%) than the learned static model does (76.1%).

Keywords

neologism sense EM 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Blei, D.M., Lafferty, J.D.: Dynamic topic models. In: Cohen, W., Moore, A. (eds.) ICML 2006: Proceedings of the 23rd International Conference on Machine Learning, pp. 113–120. ACM Press, New York (2006)Google Scholar
  2. 2.
    Maldonado-Guerra, A., Emms, M.: First-order and second-order context representations: geometrical considerations and performance in word-sense disambiguation and discrimination. In: Dister, A., Dominique Longrée, G.P. (eds.) Proceeding of JADT 11th International Conference on the Statistical Analysis of Textual Data, pp. 676–686. LASLA (2012)Google Scholar
  3. 3.
    Manning, C., Schütze, H.: Word Sense Disambiguation. In: Foundations of Statistical Language Processing, 6th edn., pp. 229–264. MIT Press (2003)Google Scholar
  4. 4.
    Manning, C.D., Raghavan, P., Schütze, H.: Language models for information retrieval. In: Introduction to Information Retrieval. Cambridge University Press (2009)Google Scholar
  5. 5.
    de Marneffe, M.C., Dupont, P.: Comparative study of statistical word sense discrimination. In: Purnelle, G., Fairon, C., Dister, A. (eds.) Proceedings of JADT 2004 7th International Conference on the Statistical Analysis of Textual Data, pp. 270–281. UCL Presses Universitaire de Louvain (2004)Google Scholar
  6. 6.
    Prescher, D.: A tutorial on the expectation-maximization algorithm including maximum-likelihood estimation and em training of probabilistic context-free grammars. Computing Research Repository (2004)Google Scholar
  7. 7.
    Purandare, A., Pedersen, T.: Word sense discrimination by clustering contexts in vector and similarity spaces. In: Ng, H.T., Riloff, E. (eds.) Proceedings of CoNLL 2004, Boston, MA, USA, pp. 41–48 (2004)Google Scholar
  8. 8.
    Sagi, E., Kaufmann, S., Clark, B.: Tracing semantic change with latent semantic analysis. In: Allan, K., Robinson, J.A. (eds.) Current Methods in Historical Semantics, pp. 161–183. Mouton de Gruyter, Berlin (2012)Google Scholar
  9. 9.
    Schütze, H.: Automatic word sense discrimination. Computational Linguistics 24(1), 97–123 (1998)Google Scholar
  10. 10.
    Véronis, J.: Hyperlex: lexical cartography for information retrieval. Computer Speech and Language 18(3), 223–252 (2004)CrossRefGoogle Scholar
  11. 11.
    Vickrey, D., Biewald, L., Teyssier, M., Koller, D.: Word-sense disambiguation for machine translation. In: Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, HLT 2005, pp. 771–778. Association for Computational Linguistics, Stroudsburg (2005)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Martin Emms
    • 1
  1. 1.Dept. of Computer ScienceTrinity CollegeDublinIreland

Personalised recommendations