SubGram: Extending Skip-Gram Word Representation with Substrings

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9924)

Abstract

Skip-gram (word2vec) is a recent method for creating vector representations of words (“distributed word representations”) using a neural network. The representation gained popularity in various areas of natural language processing, because it seems to capture syntactic and semantic information about words without any explicit supervision in this respect.

We propose SubGram, a refinement of the Skip-gram model to consider also the word structure during the training process, achieving large gains on the Skip-gram original test set.

Keywords

Distributed word representations Unsupervised learning of morphological relations 

References

  1. 1.
    Lazaridou, A., Pham, N.T., Baroni, M.: Combining language and vision with a multimodal skip-gram model (2015). arXiv preprint arXiv:1501.02598
  2. 2.
    Weston, J., Bengio, S., Usunier, N.: Wsabie: scaling up to large vocabulary image annotation. In: IJCAI, vol. 11 (2011)Google Scholar
  3. 3.
    Schwenk, H., Gauvain, J.L.: Neural network language models for conversational speech recognition. In: INTERSPEECH (2004)Google Scholar
  4. 4.
    Schwenk, H., Dchelotte, D., Gauvain, J.L.: Continuous space language models for statistical machine translation. In: Proceedings of the COLING/ACL on Main Conference Poster Sessions (2006)Google Scholar
  5. 5.
    Mnih, A., Hinton, G.: Three new graphical models for statistical language modelling. In: Proceedings of the 24th International Conference on Machine Learning (2007)Google Scholar
  6. 6.
    Soricut, R., Och, F.: Unsupervised morphology induction using word embeddings. In: Proceedings of NAACL (2015)Google Scholar
  7. 7.
    Wang, Z., Zhang, J., Feng, J., Chen, Z.: Knowledge graph and text jointly embedding. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics (2014)Google Scholar
  8. 8.
    Mikolov, T., Chen, K., Corrado, G., Dean., J.: Efficient estimation of word representations in vector space (2013). arXiv preprint arXiv:1301.3781
  9. 9.
    Morin, F., Bengio, Y.: Hierarchical probabilistic neural network language model. In: Proceedings of the International Workshop on AI and Statistics (2005)Google Scholar
  10. 10.
    Lin, Q., Cao, Y., Nie, Z., Rui, Y.: Learning word representation considering proximity and ambiguity. In: Twenty-Eighth AAAI Conference on Artificial Intelligence (2014)Google Scholar
  11. 11.
    Yoon, K., Jernite, Y., Sontag, D., Rush, A.M.: Character-aware neural language models (2015). arXiv preprint arXiv:1508.06615
  12. 12.
    Cui, Q., Gao, B., Bian, J., Qiu, S., Liu, T.Y.: A framework for learning knowledge-powered word embedding (2014)Google Scholar
  13. 13.
    Bian, J., Gao, B., Liu, T.Y.: Knowledge-powered deep learning for word embedding. In: Machine Learning and Knowledge Discovery in Databases (2014)Google Scholar
  14. 14.
    Vylomova, E., Rimmel, L., Cohn, T., Baldwin, T.: Take and took, gaggle and goose, book and read: evaluating the utility of vector differences for lexical relation learning (2015). arXiv preprint arXiv:1509.01692
  15. 15.
    Bojar, O., Dušek, O., Kocmi, T., Libovický, J., Novák, M., Popel, M., Sudarikov, R., Variš, D.: Czeng 1.6: enlarged Czech-English parallel corpus with processing tools dockered. In: Sojka, P., et al. (eds.) TSD 2016. LNAI, vol. 9924, pp. 231–238. Springer International Publishing, Heidelberg (2016)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Faculty of Mathematics and Physics, Institute of Formal and Applied LinguisticsCharles University in PraguePragueCzech Republic

Personalised recommendations