Advertisement

A Systematic Literature Review on Word Embeddings

  • Luis Gutiérrez
  • Brian Keith
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 865)

Abstract

This article presents a systematic literature review on word embeddings within the field of natural language processing and text processing. A search and classification of 140 articles on proposals of word embeddings or their application was carried out from three different sources. Word embeddings have been widely adopted with satisfactory results in natural language processing tasks in general and other domains with good results. In this paper, we report the hegemony of word embeddings based on neural models over those generated by matrix factorization (i.e., variants of word2vec). Finally, despite the good performance of word embeddings, some drawbacks and their respective solution proposals are identified, such as the lack of interpretability of the real values that make up the embedded vectors.

Keywords

Bayesian networks Sentiment analysis Literature review Opinion mining 

Notes

Acknowledgments

Research partially funded by the National Commission of Scientific and Technological Research (CONICYT) and the Ministry of Education of the Government of Chile. Project REDI170607: “Multidimensional Bayesian classifiers for the interpretation of text and video emotions”.

References

  1. 1.
    Ravi, K., Ravi, V.: A survey on opinion mining and sentiment analysis: tasks, approaches and applications. Knowl. Based Syst. 89, 14–46 (2015)CrossRefGoogle Scholar
  2. 2.
    Levy, O., Goldberg, Y.: Dependency-based word embeddings. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), vol. 2, pp. 302–308 (2014)Google Scholar
  3. 3.
    Kitchenham, B.: Procedures for performing systematic reviews. Keele, UK, Keele University, 33(2004), 1–26 (2004)Google Scholar
  4. 4.
    Collobert, R., Weston, J.: A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th International Conference on Machine Learning, pp. 160–167. ACM (2008)Google Scholar
  5. 5.
    Zou, W.Y., Socher, R., Cer, D., Manning, C.D.: Bilingual word embeddings for phrase-based machine translation. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1393–1398 (2013)Google Scholar
  6. 6.
    Severyn, A., Moschitti, A.: Twitter sentiment analysis with deep convolutional neural networks. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 959–962. ACM (2015)Google Scholar
  7. 7.
    Tang, D., Wei, F., Yang, N., Zhou, M., Liu, T., Qin, B.: Learning sentiment-specific word embedding for twitter sentiment classification. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 1555–1565 (2014)Google Scholar
  8. 8.
    Carrizo Moreno, D.: Atributos contextuales influyentes en el proceso de educción de requisitos: una exhaustiva revisión de literatura Ingeniare. Revista Chilena de ingeniería 23(2), 208–218 (2015)CrossRefGoogle Scholar
  9. 9.
    Díaz, N.D., Zepeda, V.V.: Ejecución de una Revisión Sistemática sobre Gestión de Calidad para Sistemas MultiagenteGoogle Scholar
  10. 10.
    Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
  11. 11.
    Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)Google Scholar
  12. 12.
    Vulíc, I., Moens, M.F.: Monolingual and cross-lingual information retrieval models based on (bilingual) word embeddings. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 363–372. ACM (2015)Google Scholar
  13. 13.
    Glavaš, G., Franco-Salvador, M., Ponzetto, S.P., Rosso, P.: A resource-light method for cross-lingual semantic textual similarity. Knowl. Based Syst. 143, 1–9 (2018) CrossRefGoogle Scholar
  14. 14.
    Soliman, A.B., Eissa, K., El-Beltagy, S.R.: AraVec: a set of Arabic word embedding models for use in Arabic NLP. Procedia Comput. Sci. 117, 256–265 (2017)CrossRefGoogle Scholar
  15. 15.
    Laatar, R., Aloulou, C., Bilguith, L.H.: Word sense disambiguation of Arabic language with word embeddings as part of the creation of a historical dictionary. In: Language Processing and Knowledge Management Proceedings. CEUR-WS.org (2017)Google Scholar
  16. 16.
    Musto, C., Semeraro, G., de Gemmis, M., Lops, P.: Learning word embeddings from Wikipedia for content-based recommender systems. In: European Conference on Information Retrieval, pp. 729–734. Springer (2016)Google Scholar
  17. 17.
    Boratto, L., Carta, S., Fenu, G., Saia, R.: Representing items as word-embedding vectors and generating recommendations by measuring their linear independence. In: RecSys Posters (2016)Google Scholar
  18. 18.
    Greenstein-Messica, A., Rokach, L., Friedman, M.: Session-based recommendations using item embedding. In: Proceedings of the 22nd International Conference on Intelligent User Interfaces, pp. 629–633. ACM (2017)Google Scholar
  19. 19.
    Alsuhaibani, M., Bollegala, D., Maehara, T., Kawarabayashi, K.I.: Jointly learning word embeddings using a corpus and a knowledge base. PloS One 13(3), e0193094 (2018)Google Scholar
  20. 20.
    Liu, J.: Morpheme-enhanced spectral word embedding. In: Proceedings of the International Conference on Software Engineering and Knowledge Engineering (2017)Google Scholar
  21. 21.
    Gallo, I., Nawaz, S., Calefati, A.: Semantic text encoding for text classification using convolutional neural networks. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 5, pp. 16–21. IEEE (2017)Google Scholar
  22. 22.
    Wild, F., Stahl, C.: Investigating unstructured texts with latent semantic analysis. In: Advances in Data Analysis, pp. 383–390. Springer (2007)Google Scholar
  23. 23.
    Liu, S., Bremer, P.T., Thiagarajan, J.J., Srikumar, V., Wang, B., Livnat, Y., Pascucci, V.: Visual exploration of semantic relationships in neural word embeddings. IEEE Trans. Visual Comput. Graph. 24(1), 553–562 (2018)CrossRefGoogle Scholar
  24. 24.
    Andrews, M.: Compressing word embeddings. In: International Conference on Neural Information Processing, pp. 413–422. Springer (2016)Google Scholar
  25. 25.
    Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759 (2016)
  26. 26.
    Moody, C.E.: Mixing Dirichlet topic models and word embeddings to make lda2vec. arXiv preprint arXiv:1605.02019 (2016)

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Department of Computing and Systems EngineeringUniversidad Católica del NorteAntofagastaChile

Personalised recommendations