Diachronic Deviation Features in Continuous Space Word Representations
In distributed word representation, each word is represented as a unique point in the vector space. This paper extends this to a diachronic setting, where multiple word embeddings are generated with corpora in different time periods. These multiple embeddings can be mapped to a single target space via a linear transformation. In this target space each word is thus represented as a distribution. The deviation features of this distribution can reflect the semantic variation of words through different time periods. Experiments show that word groups with similar deviation features can indicate the hot topics in different ages. And the frequency change of these word groups can be used to detect the age of peak celebrity of the topics in the history.
KeywordsLexical semantics diachronic corpora semantic distribution hot topics
Unable to display preview. Download preview PDF.
- 4.He, S., Zou, X., Xiao, L., Hu, J.: Construction of diachronic ontologies from people’s daily of fifty years. In: LREC (2014)Google Scholar
- 7.Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)Google Scholar
- 8.Mikolov, T., Le, Q.V., Sutskever, I.: Exploiting similarities among languages for machine translation. arXiv preprint arXiv:1309.4168 (2013)Google Scholar
- 9.Zhang, H.P., Yu, H.K., Xiong, D.Y., Liu, Q.: Hhmm-based chinese lexical analyzer ictclas. In: SIGHAN, pp. 184–187 (2003)Google Scholar