Evaluating Memory Efficiency and Robustness of Word Embeddings

  • Johannes Jurgovsky
  • Michael Granitzer
  • Christin Seifert
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9626)

Abstract

Skip-Gram word embeddings, estimated from large text corpora, have been shown to improve many NLP tasks through their high-quality features. However, little is known about their robustness against parameter perturbations and about their efficiency in preserving word similarities under memory constraints. In this paper, we investigate three post-processing methods for word embeddings to study their robustness and memory efficiency. We employ a dimensionality-based, a parameter-based and a resolution-based method to obtain parameter-reduced embeddings and we provide a concept that connects the three approaches. We contrast these methods with the relative accuracy loss on six intrinsic evaluation tasks and compare them with regard to the memory efficiency of the reduced embeddings. The evaluation shows that low Bit-resolution embeddings offer great potential for memory savings by alleviating the risk of accuracy loss. The results indicate that post-processed word embeddings could also enhance applications on resource limited devices with valuable word features.

Keywords

Natural language processing Word embedding Memory efficiency Robustness Evaluation 

Notes

Acknowledgments

The presented work was developed within the EEXCESS project funded by the European Union Seventh Framework Programme FP7/2007-2013 under grant agreement number 600601.

References

  1. 1.
    Turney, P.D., Pantel, P.: From frequency to meaning: vector space models of semantics. J. Artif. Intell. Res. 37(1), 141–188 (2010)MathSciNetMATHGoogle Scholar
  2. 2.
    Bengio, Y., Ducharme, R., Vincent, P., Janvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)MATHGoogle Scholar
  3. 3.
    Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)MATHGoogle Scholar
  4. 4.
    Mnih, A., Hinton, G.: Three new graphical models for statistical language modelling. In: ICML, pp. 641–648. ACM, June 2007Google Scholar
  5. 5.
    Huang, E.H., Socher, R., Manning, C.D., Ng, A.Y.: Improving word representations via global context and multiple word prototypes. In: 50th Annual Meeting of the Association for Computational Linguistics, pp. 873–882. ACL, July 2012Google Scholar
  6. 6.
    Baroni, M., Dinu, G., Kruszewski, G.: Dont count, predict! a systematic comparison of context-counting vs. context-predicting semantic vectors. In: 52nd Annual Meeting of the Association for Computational Linguistics, pp. 238–247 (2014)Google Scholar
  7. 7.
    Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space (2013). arXiv.org
  8. 8.
    Mikolov, T., Yih, W.T., Zweig, G.: Linguistic regularities in continuous space word representations. In: HLT-NAACL, pp. 746–751, June 2013Google Scholar
  9. 9.
    Levy, O., Goldberg, Y.: Neural word embedding as implicit matrix factorization. In: Advances in Neural Information Processing Systems, pp. 2177–2185 (2014)Google Scholar
  10. 10.
    Goldberg, Y., Levy, O.: word2vec Explained: deriving Mikolov et al’.s negative-sampling word-embedding method. arXiv preprint (2014). arxiv:1402.3722
  11. 11.
    Van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(2579–2605), 85 (2008)MATHGoogle Scholar
  12. 12.
    Chen, Y., Perozzi, B., Al-Rfou, R., Skiena, S.: The expressive power of word embeddings. In: Speech and Language Proceeding Workshop, ICML, Deep Learning for Audio (2013)Google Scholar
  13. 13.
    Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G., Ruppin, E.: Placing search in context: the concept revisited. In: 10th International Conference on World Wide Web, pp. 406–414, ACM, April 2001Google Scholar
  14. 14.
    Bruni, E., Tran, N.K., Baroni, M.: Multimodal distributional semantics. J. Artif. Intell. Res. (JAIR) 49, 1–47 (2014)MathSciNetMATHGoogle Scholar
  15. 15.
    Schnabel, T., Labutov, I., Mimno, D., Joachims, T.: Evaluation methods for unsupervised word embeddings. In: Proc. Emp. Met. Nat. Lang. (2015)Google Scholar
  16. 16.
    Ling, Y., Dyer, G.L.C.: Evaluation of word vector representations by subspace alignment. In: Proc. Emp. Met. Nat. Lang., pp. 2049–2054, ACL, Lisbon (2015)Google Scholar
  17. 17.
    Saad, Y.: Iterative methods for sparse linear systems. Siam (2003)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Johannes Jurgovsky
    • 1
  • Michael Granitzer
    • 1
  • Christin Seifert
    • 1
  1. 1.Media Computer ScienceUniversität PassauPassauGermany

Personalised recommendations