Skip to main content

Distributed Representations

  • Chapter
  • First Online:
Deep Learning for NLP and Speech Recognition

Abstract

In this chapter, we introduce the notion of word embeddings that serve as core representations of text in deep learning approaches. We start with the distributional hypothesis and explain how it can be leveraged to form semantic representations of words. We discuss the common distributional semantic models including word2vec and GloVe and their variants. We address the shortcomings of embedding models and their extension to document and concept representation. Finally, we discuss several applications to natural language processing tasks and present a case study focused on language modeling.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 119.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Faisal Alshargi et al. “Concept2vec: Metrics for Evaluating Quality of Embeddings for Ontological Concepts.” In: CoRR abs/1803.04488 (2018).

    Google Scholar 

  2. Waleed Ammar et al. “Massively Multilingual Word Embeddings.” In: CoRR abs/1602.01925 (2016).

    Google Scholar 

  3. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. “Neural machine translation by jointly learning to align and translate”. In: CoRR abs/1409.0473 (2014).

    Google Scholar 

  4. Amir Bakarov. “A Survey of Word Embeddings Evaluation Methods”. In: CoRR abs/1801.09536 (2018).

    Google Scholar 

  5. Yoshua Bengio et al. “A neural probabilistic language model”. In: JMLR (2003), pp. 1137–1155.

    Google Scholar 

  6. Piotr Bojanowski et al. “Enriching Word Vectors with Subword Information”. In: CoRR abs/1607.04606 (2016).

    Google Scholar 

  7. Antoine Bordes et al. “Translating Embeddings for Modeling Multirelational Data.” In: NIPS. 2013, pp. 2787–2795.

    Google Scholar 

  8. José Camacho-Collados and Mohammad Taher Pilehvar. “From Word to Sense Embeddings: A Survey on Vector Representations of Meaning”. In: CoRR abs/1805.04032 (2018).

    Google Scholar 

  9. Ting Chen et al. “Entity Embedding-Based Anomaly Detection for Heterogeneous Categorical Events.” In: IJCAI. IJCAI/AAAI Press, 2016, pp. 1396–1403.

    Google Scholar 

  10. Ronan Collobert and Jason Weston. “A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning”. In: Proceedings of the 25th International Conference on Machine Learning. ACM, 2008, pp. 160–167.

    Google Scholar 

  11. Marta R. Costa-Jussà and José A. R. Fonollosa. “Character-based Neural Machine Translation.” In: CoRR abs/1603.00810 (2016).

    Google Scholar 

  12. Jocelyn Coulmance et al. “Trans-gram, Fast Cross-lingual Word embeddings”. In: CoRR abs/1601.02502 (2016).

    Google Scholar 

  13. Jacob Devlin et al. “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.” In: CoRR abs/1810.04805 (2018).

    Google Scholar 

  14. Paramveer S. Dhillon, Dean Foster, and Lyle Ungar. “Multiview learning of word embeddings via cca”. In: In Proc. of NIPS. 2011.

    Google Scholar 

  15. Bhuwan Dhingra et al. “Embedding Text in Hyperbolic Spaces”. In: Proceedings of the Twelfth Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs-12). Association for Computational Linguistics, 2018, pp. 59–69.

    Google Scholar 

  16. Manaal Faruqui et al. Retrofitting Word Vectors to Semantic Lexicons. 2014.

    Google Scholar 

  17. Edouard Grave et al. “Learning Word Vectors for 157 Languages”. In: CoRR abs/1802.06893 (2018).

    Google Scholar 

  18. Jiatao Gu et al. Incorporating Copying Mechanism in Sequence-to-Sequence Learning. 2016.

    Google Scholar 

  19. Jeremy Howard and Sebastian Ruder. “Universal Language Model Fine-tuning for Text Classification”. In: Association for Computational Linguistics, 2018.

    Google Scholar 

  20. Armand Joulin et al. “Bag of Tricks for Efficient Text Classification”. In: CoRR abs/1607.01759 (2016).

    Google Scholar 

  21. Ramakrishnan Kannan et al. “Outlier Detection for Text Data: An Extended Version.” In: CoRR abs/1701.01325 (2017).

    Google Scholar 

  22. Yoon Kim et al. “Character-Aware Neural Language Models”. In: AAAI. 2016.

    Google Scholar 

  23. Anoop Kunchukuttan and Pushpak Bhattacharyya. “Learning variable length units for SMT between related languages via Byte Pair Encoding.” In: CoRR abs/1610.06510 (2016).

    Google Scholar 

  24. Maximilian Lam. “Word2Bits - Quantized Word Vectors”. In: CoRR abs/1803.05651 (2018).

    Google Scholar 

  25. Quoc V. Le and Tomas Mikolov. “Distributed Representations of Sentences and Documents”. In: CoRR abs/1405.4053 (2014).

    Google Scholar 

  26. Wang Ling et al. “Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation.” In: CoRR abs/1508.02096 (2015).

    Google Scholar 

  27. Minh-Thang Luong and Christopher D. Manning. “Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character Models.” In: CoRR abs/1604.00788 (2016).

    Google Scholar 

  28. Tomas Mikolov et al. “Distributed Representations of Words and Phrases and their Compositionality”. In: Advances in Neural Information Processing Systems 26. 2013, pp. 3111–3119.

    Google Scholar 

  29. Andriy Mnih and Geoffrey E Hinton. “A scalable hierarchical distributed language model”. In: Advances in neural information processing systems. 2009, pp. 1081–1088.

    Google Scholar 

  30. Arvind Neelakantan et al. “Efficient Non-parametric Estimation of Multiple Embeddings per Word in Vector Space.” In: EMNLP. ACL, 2014, pp. 1059–1069.

    Google Scholar 

  31. Maximillian Nickel and Douwe Kiela. “Poincaré Embeddings for Learning Hierarchical Representations”. In: Advances in Neural Information Processing Systems 30. Curran Associates, Inc., 2017, pp. 6338–6347.

    Google Scholar 

  32. Masataka Ono, Makoto Miwa, and Yutaka Sasaki. “Word Embedding based Antonym Detection using Thesauri and Distributional Information.” In: HLT-NAACL. 2015, pp. 984–989.

    Google Scholar 

  33. Jeffrey Pennington, Richard Socher, and Christopher D. Manning. “GloVe: Global Vectors for Word Representation”. In: Empirical Methods in Natural Language Processing (EMNLP). 2014, pp. 1532–1543.

    Google Scholar 

  34. Sebastian Ruder, Ivan Vulic, and Anders Sogaard. A Survey Of Cross-lingual Word Embedding Models. 2017.

    Google Scholar 

  35. Tianze Shi and Zhiyuan Liu. “Linking GloVe with word2vec.” In: CoRR abs/1411.5595 (2014).

    Google Scholar 

  36. Andrew Trask, Phil Michalak, and John Liu. “sense2vec - A Fast and Accurate Method for Word Sense Disambiguation In Neural Word Embeddings.” In: CoRR abs/1511.06388 (2015).

    Google Scholar 

  37. Ashish Vaswani et al. “Attention is all you need”. In: Advances in Neural Information Processing Systems. 2017, pp. 5998–6008.

    Google Scholar 

  38. Luke Vilnis and Andrew McCallum. “Word Representations via Gaussian Embedding.” In: CoRR abs/1412.6623 (2014).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Kamath, U., Liu, J., Whitaker, J. (2019). Distributed Representations. In: Deep Learning for NLP and Speech Recognition . Springer, Cham. https://doi.org/10.1007/978-3-030-14596-5_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-14596-5_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-14595-8

  • Online ISBN: 978-3-030-14596-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics