On Continuous Space Word Representations as Input of LSTM Language Model
Artificial neural networks have become the state-of-the-art in the task of language modelling whereas Long-Short Term Memory (LSTM) networks seem to be an efficient architecture. The continuous skip-gram and the continuous bag of words (CBOW) are algorithms for learning quality distributed vector representations that are able to capture a large number of syntactic and semantic word relationships. In this paper, we carried out experiments with a combination of these powerful models: the continuous representations of words trained with skip-gram/CBOW/GloVe method, word cache expressed as a vector using latent Dirichlet allocation (LDA). These all are used on the input of LSTM network instead of 1-of-N coding traditionally used in language models. The proposed models are tested on Penn Treebank and MALACH corpus.
KeywordsLanguage modelling Neural networks LSTM Skip-gram CBOW GloVe word2vec LDA
Access to computing and storage facilities owned by parties and projects contributing to the National Grid Infrastructure MetaCentrum, provided under the programme “Projects of Large Infrastructure for Research, Development, and Innovations” (LM2010005), is greatly appreciated.
This research was supported by the Ministry of Culture Czech Republic, project No. DF12P01OVV022.
- 3.Charniak, E., et al.: BLLIP 1987–89 WSJ Corpus Release 1. Linguistic Data Consortium, Philadelphia (2000)Google Scholar
- 5.Mikolov, T., Kombrink, S., Deoras, A., Burget, L., Černocký, J.: RNNLM - Recurrent Neural Network Language Modeling Toolkit (2011)Google Scholar
- 6.Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of Workshop at ICLR (2013)Google Scholar
- 7.Mikolov, T., Sutskever, I., Chen, K., Corrado G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of NIPS (2013)Google Scholar
- 8.Mikolov, T., Yih, W., Zweig, G.: Linguistic regularities in continuous space word representations. In: Proceedings of NAACL HLT (2013)Google Scholar
- 9.Mnih, A., Kavukcuoglu, K.: Learning word embeddings efficiently with noise-contrastive estimation. Adv. Neural Inf. Process. Syst. 26, 2265–2273 (2013)Google Scholar
- 10.Pennington, J., Socher, J., Manning., C.D.: GloVe: global vectors for word representation. In: Empricial Methods in Natural Language Processing (EMNLP) (2014)Google Scholar
- 11.Ramabhadran, B., et al.: USC-SFI MALACH Interviews and transcripts english LDC2012S05. Web Download. Philadelphia: Linguistic Data Consortium (2012)Google Scholar
- 12.Řehůřek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings of LREC 2010 Workshop New Challenges for NLP Frameworks. University of Malta, Valletta, Malta, pp. 4650–4655 (2010). ISBN 2-9517408-6-7Google Scholar
- 13.Soutner, D., Müller, L.: Application of LSTM neural networks in language modelling. In: Habernal, I. (ed.) TSD 2013. LNCS, vol. 8082, pp. 105–112. Springer, Heidelberg (2013)Google Scholar
- 14.Sundermeyer, M., Schlüter, R., Ney, H.: LSTM neural networks for language modeling. In: Interspeech (2012)Google Scholar