Diachronic Deviation Features in Continuous Space Word Representations

  • Ni Sun
  • Tongfei Chen
  • Liumingjing Xiao
  • Junfeng Hu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8801)

Abstract

In distributed word representation, each word is represented as a unique point in the vector space. This paper extends this to a diachronic setting, where multiple word embeddings are generated with corpora in different time periods. These multiple embeddings can be mapped to a single target space via a linear transformation. In this target space each word is thus represented as a distribution. The deviation features of this distribution can reflect the semantic variation of words through different time periods. Experiments show that word groups with similar deviation features can indicate the hot topics in different ages. And the frequency change of these word groups can be used to detect the age of peak celebrity of the topics in the history.

Keywords

Lexical semantics diachronic corpora semantic distribution hot topics 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. JMLR 3, 1137–1155 (2003)MATHGoogle Scholar
  2. 2.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. JMLR 3, 993–1022 (2003)MATHGoogle Scholar
  3. 3.
    Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. JMLR 12, 2493–2537 (2011)MATHGoogle Scholar
  4. 4.
    He, S., Zou, X., Xiao, L., Hu, J.: Construction of diachronic ontologies from people’s daily of fifty years. In: LREC (2014)Google Scholar
  5. 5.
    Kleinberg, J.M.: Hubs, authorities, and communities. ACM Computing Surveys 31(4es), 5 (1999)CrossRefGoogle Scholar
  6. 6.
    Michel, J.B., Shen, Y.K., Aiden, A.P., Veres, A., Gray, M.K., Pickett, J.P., Hoiberg, D., Clancy, D., Norvig, P., Orwant, J., et al.: Quantitative analysis of culture using millions of digitized books. Science 331(6014), 176–182 (2011)CrossRefGoogle Scholar
  7. 7.
    Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)Google Scholar
  8. 8.
    Mikolov, T., Le, Q.V., Sutskever, I.: Exploiting similarities among languages for machine translation. arXiv preprint arXiv:1309.4168 (2013)Google Scholar
  9. 9.
    Zhang, H.P., Yu, H.K., Xiong, D.Y., Liu, Q.: Hhmm-based chinese lexical analyzer ictclas. In: SIGHAN, pp. 184–187 (2003)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Ni Sun
    • 1
    • 2
  • Tongfei Chen
    • 1
  • Liumingjing Xiao
    • 1
  • Junfeng Hu
    • 1
    • 2
  1. 1.School of Electronics Engineering & Computer SciencePeking UniversityBeijingP.R. China
  2. 2.Key Laboratory of Computational Linguistics (Ministry of Education)P.R. China

Personalised recommendations