Advertisement

On Continuous Space Word Representations as Input of LSTM Language Model

  • Daniel Soutner
  • Luděk Müller
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9449)

Abstract

Artificial neural networks have become the state-of-the-art in the task of language modelling whereas Long-Short Term Memory (LSTM) networks seem to be an efficient architecture. The continuous skip-gram and the continuous bag of words (CBOW) are algorithms for learning quality distributed vector representations that are able to capture a large number of syntactic and semantic word relationships. In this paper, we carried out experiments with a combination of these powerful models: the continuous representations of words trained with skip-gram/CBOW/GloVe method, word cache expressed as a vector using latent Dirichlet allocation (LDA). These all are used on the input of LSTM network instead of 1-of-N coding traditionally used in language models. The proposed models are tested on Penn Treebank and MALACH corpus.

Keywords

Language modelling Neural networks LSTM Skip-gram CBOW GloVe word2vec LDA 

Notes

Acknowledgements

Access to computing and storage facilities owned by parties and projects contributing to the National Grid Infrastructure MetaCentrum, provided under the programme “Projects of Large Infrastructure for Research, Development, and Innovations” (LM2010005), is greatly appreciated.

This research was supported by the Ministry of Culture Czech Republic, project No. DF12P01OVV022.

References

  1. 1.
    Bengio, Y., Ducharme, R., Vincent, P., Janvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)MATHGoogle Scholar
  2. 2.
    Blei, D.M., Ng, Y.A., Jordan, M.I., Lafferty, J.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)MATHGoogle Scholar
  3. 3.
    Charniak, E., et al.: BLLIP 1987–89 WSJ Corpus Release 1. Linguistic Data Consortium, Philadelphia (2000)Google Scholar
  4. 4.
    Hochreiter, S., Schmidhuber, J.: Long Short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  5. 5.
    Mikolov, T., Kombrink, S., Deoras, A., Burget, L., Černocký, J.: RNNLM - Recurrent Neural Network Language Modeling Toolkit (2011)Google Scholar
  6. 6.
    Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of Workshop at ICLR (2013)Google Scholar
  7. 7.
    Mikolov, T., Sutskever, I., Chen, K., Corrado G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of NIPS (2013)Google Scholar
  8. 8.
    Mikolov, T., Yih, W., Zweig, G.: Linguistic regularities in continuous space word representations. In: Proceedings of NAACL HLT (2013)Google Scholar
  9. 9.
    Mnih, A., Kavukcuoglu, K.: Learning word embeddings efficiently with noise-contrastive estimation. Adv. Neural Inf. Process. Syst. 26, 2265–2273 (2013)Google Scholar
  10. 10.
    Pennington, J., Socher, J., Manning., C.D.: GloVe: global vectors for word representation. In: Empricial Methods in Natural Language Processing (EMNLP) (2014)Google Scholar
  11. 11.
    Ramabhadran, B., et al.: USC-SFI MALACH Interviews and transcripts english LDC2012S05. Web Download. Philadelphia: Linguistic Data Consortium (2012)Google Scholar
  12. 12.
    Řehůřek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings of LREC 2010 Workshop New Challenges for NLP Frameworks. University of Malta, Valletta, Malta, pp. 4650–4655 (2010). ISBN 2-9517408-6-7Google Scholar
  13. 13.
    Soutner, D., Müller, L.: Application of LSTM neural networks in language modelling. In: Habernal, I. (ed.) TSD 2013. LNCS, vol. 8082, pp. 105–112. Springer, Heidelberg (2013)Google Scholar
  14. 14.
    Sundermeyer, M., Schlüter, R., Ney, H.: LSTM neural networks for language modeling. In: Interspeech (2012)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.NTIS - New Technologies for the Information Society, Faculty of Applied ScienceUniversity of West BohemiaPilsenCzech Republic

Personalised recommendations