Diachronic Deviation Features in Continuous Space Word Representations
- Cite this paper as:
- Sun N., Chen T., Xiao L., Hu J. (2014) Diachronic Deviation Features in Continuous Space Word Representations. In: Sun M., Liu Y., Zhao J. (eds) Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. Lecture Notes in Computer Science, vol 8801. Springer, Cham
In distributed word representation, each word is represented as a unique point in the vector space. This paper extends this to a diachronic setting, where multiple word embeddings are generated with corpora in different time periods. These multiple embeddings can be mapped to a single target space via a linear transformation. In this target space each word is thus represented as a distribution. The deviation features of this distribution can reflect the semantic variation of words through different time periods. Experiments show that word groups with similar deviation features can indicate the hot topics in different ages. And the frequency change of these word groups can be used to detect the age of peak celebrity of the topics in the history.
KeywordsLexical semantics diachronic corpora semantic distribution hot topics
Unable to display preview. Download preview PDF.