Evaluating Memory Efficiency and Robustness of Word Embeddings
Skip-Gram word embeddings, estimated from large text corpora, have been shown to improve many NLP tasks through their high-quality features. However, little is known about their robustness against parameter perturbations and about their efficiency in preserving word similarities under memory constraints. In this paper, we investigate three post-processing methods for word embeddings to study their robustness and memory efficiency. We employ a dimensionality-based, a parameter-based and a resolution-based method to obtain parameter-reduced embeddings and we provide a concept that connects the three approaches. We contrast these methods with the relative accuracy loss on six intrinsic evaluation tasks and compare them with regard to the memory efficiency of the reduced embeddings. The evaluation shows that low Bit-resolution embeddings offer great potential for memory savings by alleviating the risk of accuracy loss. The results indicate that post-processed word embeddings could also enhance applications on resource limited devices with valuable word features.
KeywordsNatural language processing Word embedding Memory efficiency Robustness Evaluation
The presented work was developed within the EEXCESS project funded by the European Union Seventh Framework Programme FP7/2007-2013 under grant agreement number 600601.
- 4.Mnih, A., Hinton, G.: Three new graphical models for statistical language modelling. In: ICML, pp. 641–648. ACM, June 2007Google Scholar
- 5.Huang, E.H., Socher, R., Manning, C.D., Ng, A.Y.: Improving word representations via global context and multiple word prototypes. In: 50th Annual Meeting of the Association for Computational Linguistics, pp. 873–882. ACL, July 2012Google Scholar
- 6.Baroni, M., Dinu, G., Kruszewski, G.: Dont count, predict! a systematic comparison of context-counting vs. context-predicting semantic vectors. In: 52nd Annual Meeting of the Association for Computational Linguistics, pp. 238–247 (2014)Google Scholar
- 7.Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space (2013). arXiv.org
- 8.Mikolov, T., Yih, W.T., Zweig, G.: Linguistic regularities in continuous space word representations. In: HLT-NAACL, pp. 746–751, June 2013Google Scholar
- 9.Levy, O., Goldberg, Y.: Neural word embedding as implicit matrix factorization. In: Advances in Neural Information Processing Systems, pp. 2177–2185 (2014)Google Scholar
- 10.Goldberg, Y., Levy, O.: word2vec Explained: deriving Mikolov et al’.s negative-sampling word-embedding method. arXiv preprint (2014). arxiv:1402.3722
- 12.Chen, Y., Perozzi, B., Al-Rfou, R., Skiena, S.: The expressive power of word embeddings. In: Speech and Language Proceeding Workshop, ICML, Deep Learning for Audio (2013)Google Scholar
- 13.Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G., Ruppin, E.: Placing search in context: the concept revisited. In: 10th International Conference on World Wide Web, pp. 406–414, ACM, April 2001Google Scholar
- 15.Schnabel, T., Labutov, I., Mimno, D., Joachims, T.: Evaluation methods for unsupervised word embeddings. In: Proc. Emp. Met. Nat. Lang. (2015)Google Scholar
- 16.Ling, Y., Dyer, G.L.C.: Evaluation of word vector representations by subspace alignment. In: Proc. Emp. Met. Nat. Lang., pp. 2049–2054, ACL, Lisbon (2015)Google Scholar
- 17.Saad, Y.: Iterative methods for sparse linear systems. Siam (2003)Google Scholar