Skip to main content

Word Embedding for Understanding Natural Language: A Survey

  • Chapter
  • First Online:
Guide to Big Data Applications

Part of the book series: Studies in Big Data ((SBD,volume 26))

Abstract

Word embedding, where semantic and syntactic features are captured from unlabeled text data, is a basic procedure in Natural Language Processing (NLP). The extracted features thus could be organized in low dimensional space. Some representative word embedding approaches include Probability Language Model, Neural Networks Language Model, Sparse Coding, etc. The state-of-the-art methods like skip-gram negative samplings, noise-contrastive estimation, matrix factorization and hierarchical structure regularizer are applied correspondingly to resolve those models. Most of these literatures are working on the observed count and co-occurrence statistic to learn the word embedding. The increasing scale of data, the sparsity of data representation, word position, and training speed are the main challenges for designing word embedding algorithms. In this survey, we first introduce the motivation and background of word embedding. Next we will introduce the methods of text representation as preliminaries, as well as some existing word embedding approaches such as Neural Network Language Model and Sparse Coding Approach, along with their evaluation metrics. In the end, we summarize the applications of word embedding and discuss its future directions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://en.wikipedia.org/wiki/Semantic_analysis_(linguistics).

  2. 2.

    The citation numbers are from http://www.webofscience.com.

References

  • Amir, S., Astudillo, R., Ling, W., Martins, B., Silva, M. J., & Trancoso, I. (2015). INESC-ID: A regression model for large scale twitter sentiment lexicon induction. In International Workshop on Semantic Evaluation.

    Google Scholar 

  • Andreas, J., & Dan, K. (2014). How much do word embeddings encode about syntax? In Meeting of the Association for Computational Linguistics (pp. 822–827).

    Google Scholar 

  • Antony, P. J., Warrier, N. J., & Soman, K. P. (2010). Penn treebank. International Journal of Computer Applications, 7(8), 14–21.

    Article  Google Scholar 

  • Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. Eprint arxiv.

    Google Scholar 

  • Bengio, Y., Schwenk, H., Senécal, J. S., Morin, F., & Gauvain, J. L. (2003). A neural probabilistic language model. Journal of Machine Learning Research, 3(6), 1137–1155.

    Google Scholar 

  • Bjerva, J., Bos, J., van der Goot, R., & Nissim, M. (2014). The meaning factory: Formal semantics for recognizing textual entailment and determining semantic similarity. In SemEval-2014 Workshop.

    Google Scholar 

  • Collobert, R., & Weston, J. (2008). A unified architecture for natural language processing: deep neural networks with multitask learning. In International Conference, Helsinki, Finland, June (pp. 160–167).

    Google Scholar 

  • Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., & Kuksa, P. (2011). Natural language processing (almost) from scratch. Journal of Machine Learning Research, 12(1), 2493–2537.

    MATH  Google Scholar 

  • Deerweste, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., & Richard (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41, 391–407.

    Google Scholar 

  • Dickinson, B., & Hu, W. (2015). Sentiment analysis of investor opinions on twitter. Social Networking, 04(3), 62–71.

    Article  Google Scholar 

  • Djuric, N., Wu, H., Radosavljevic, V., Grbovic, M., & Bhamidipati, N. (2015). Hierarchical neural language models for joint representation of streaming documents and their content. In WWW.

    Google Scholar 

  • Faruqui, M., Tsvetkov, Y., Yogatama, D., Dyer, C., & Smith, N. (2015). Sparse overcomplete word vector representations. Preprint, arXiv:1506.02004.

    Google Scholar 

  • Fellbaum, C. (1998). WordNet. Wiley Online Library.

    MATH  Google Scholar 

  • Goddard, C. (2011). Semantic analysis: A practical introduction. Oxford: Oxford University Press.

    Google Scholar 

  • Goller, C., & Kuchler, A. (1996). Learning task-dependent distributed representations by backpropagation through structure. In IEEE International Conference on Neural Networks (Vol. 1, pp. 347–352).

    Google Scholar 

  • Harris, Z. S. (1954). Distributional structure. Synthese Language Library, 10(2–3), 146–162.

    Google Scholar 

  • Hill, F., Cho, K., Jean, S., Devin, C., & Bengio, Y. (2014). Embedding word similarity with neural machine translation. Eprint arXiv.

    Google Scholar 

  • Hinton, G. E. (1986). Learning distributed representations of concepts. In Proceedings of CogSci.

    Google Scholar 

  • Hofmann, T. (2001). Unsupervised learning by probabilistic latent semantic analysis. Machine Learning, 42(1–2), 177–196.

    Article  MATH  Google Scholar 

  • Hoyer, P. O. (2002). Non-negative sparse coding. In IEEE Workshop on Neural Networks for Signal Processing (pp. 557–565).

    Google Scholar 

  • Huang, E. H., Socher, R., Manning, C. D., & Ng, A. Y. (2012). Improving word representations via global context and multiple word prototypes. In Meeting of the Association for Computational Linguistics: Long Papers (pp. 873–882).

    Google Scholar 

  • Huang, P.-S., He, X., Gao, J., Deng, L., Acero, A., & Heck, L. (2013). Learning deep structured semantic models for web search using clickthrough data. In Proceedings of the 22Nd ACM International Conference on Information & Knowledge Management, CIKM ’13 (pp. 2333–2338). New York, NY: ACM.

    Google Scholar 

  • Klein, D., & Manning, C. D. (2003). Accurate unlexicalized parsing. In Meeting on Association for Computational Linguistics (pp. 423–430).

    Google Scholar 

  • Lai, S., Liu, K., Xu, L., & Zhao, J. (2015). How to generate a good word embedding? Credit Union Times, III(2).

    Google Scholar 

  • Landauer, T. K. (2002). On the computational basis of learning and cognition: Arguments from lsa. Psychology of Learning & Motivation, 41(41), 43–84.

    Article  Google Scholar 

  • Landauer, T. K., & Dumais, S. T. (1997). A solution to plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review, 104(2), 211–240.

    Article  Google Scholar 

  • Landauer, T. K., Foltz, P. W., & Laham, D. (1998). An introduction to latent semantic analysis. Discourse Processes, 25(2), 259–284.

    Article  Google Scholar 

  • Lin, C., & He, Y. (2009). Joint sentiment/topic model for sentiment analysis. In ACM Conference on Information & Knowledge Management (pp. 375–384).

    Google Scholar 

  • Lin, X. (2009). Dual averaging methods for regularized stochastic learning and online optimization. In Conference on Neural Information Processing Systems 2009 (pp. 2543–2596).

    Google Scholar 

  • Liu, Y., Liu, Z., Chua, T. S., & Sun, M. (2015). Topical word embeddings. In Twenty-Ninth AAAI Conference on Artificial Intelligence.

    Google Scholar 

  • Luo, Y., Tang, J., Yan, J., Xu, C., & Chen, Z. (2014). Pre-trained multi-view word embedding using two-side neural network. In Twenty-Eighth AAAI Conference on Artificial Intelligence.

    Google Scholar 

  • Matsugu, M., Mori, K., Mitari, Y., & Kaneda, Y. (2003). Subject independent facial expression recognition with robust face detection using a convolutional neural network. Neural Networks, 16(5–6), 555–559.

    Article  Google Scholar 

  • McMahon, J. G., & Smith, F. J. (1996). Improving statistical language model performance with automatically generated word hierarchies. Computational Linguistics, 22(2), 217–247.

    Google Scholar 

  • Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. CoRR, abs/1301.3781.

    Google Scholar 

  • Mikolov, T., Karafiát, M., Burget, L., Cernocký, J., & Khudanpur, S. (2010). Recurrent neural network based language model. In INTERSPEECH 2010, Conference of the International Speech Communication Association, Makuhari, Chiba, Japan, September (pp. 1045–1048).

    Google Scholar 

  • Mnih, A., & Hinton, G. (2007). Three new graphical models for statistical language modelling. In International Conference on Machine Learning (pp. 641–648).

    Google Scholar 

  • Mnih, A., & Hinton, G. E. (2008). A scalable hierarchical distributed language model. In Advances in Neural Information Processing Systems 21, Proceedings of the Twenty-Second Annual Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada, December 8–11, 2008 (pp. 1081–1088).

    Google Scholar 

  • Morin, F., & Bengio, Y. (2005). Hierarchical probabilistic neural network language model. Aistats (Vol. 5, pp. 246–252). Citeseer.

    Google Scholar 

  • Pennington, J., Socher, R., & Manning, C. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP).

    Google Scholar 

  • Rastogi, P., Van Durme, B., & Arora, R. (2015). Multiview LSA: Representation learning via generalized CCA. In Conference of the North American chapter of the association for computational linguistics: Human language technologies, NAACL-HLT’15 (pp. 556–566).

    Google Scholar 

  • Rijkhoff, & Jan (2007). Word classes. Language & Linguistics Compass, 1(6), 709–726.

    Google Scholar 

  • Salehi, B., Cook, P., & Baldwin, T. (2015). A word embedding approach to predicting the compositionality of multiword expressions. In Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.

    Google Scholar 

  • Salton, G., Wong, A., & Yang, C. S. (1997). A vector space model for automatic indexing. San Francisco: Morgan Kaufmann Publishers Inc.

    MATH  Google Scholar 

  • Saurf, R., & Pustejovsky, J. (2007). Determining modality and factuality for text entailment. In International Conference on Semantic Computing (pp. 509–516).

    Google Scholar 

  • Schökopf, B., Platt, J., & Hofmann, T. (2007). Efficient sparse coding algorithms. In NIPS (pp. 801–808).

    Google Scholar 

  • Scott, D., Dumais, S. T., Furnas, G. W., Lauer, T. K., & Richard, H. (1999). Indexing by latent semantic analysis. In Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence (pp. 391–407).

    Google Scholar 

  • Sharma, R., & Raman, S. (2003). Phrase-based text representation for managing the web documents. In International Conference on Information Technology: Coding and Computing (pp. 165–169).

    Google Scholar 

  • Shazeer, N., Doherty, R., Evans, C., & Waterson, C. (2016). Swivel: Improving embeddings by noticing what’s missing. Preprint, arXiv:1602.02215.

    Google Scholar 

  • Socher, R., Huval, B., Manning, C. D., & Ng, A. Y. (2012). Semantic compositionality through recursive matrix-vector spaces. In Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (pp. 1201–1211).

    Google Scholar 

  • Socher, R., Pennington, J., Huang, E. H., Ng, A. Y., & Manning, C. D. (2011). Semi-supervised recursive autoencoders for predicting sentiment distributions. In Conference on Empirical Methods in Natural Language Processing, EMNLP 2011, 27–31 July 2011, John Mcintyre Conference Centre, Edinburgh, A Meeting of SIGDAT, A Special Interest Group of the ACL (pp. 151–161).

    Google Scholar 

  • Socher, R., Perelygin, A., Wu, J. Y., Chuang, J., Manning, C. D., Ng, A. Y., & Potts, C. (2013). Recursive deep models for semantic compositionality over a sentiment treebank. In Conference on Empirical Methods on Natural Language Processing.

    Google Scholar 

  • Sun, F., Guo, J., Lan, Y., Xu, J., & Cheng, X. (2015). Learning word representations by jointly modeling syntagmatic and paradigmatic relations. In AAAI.

    Google Scholar 

  • Sun, F., Guo, J., Lan, Y., Xu, J., & Cheng, X. (2016). Sparse word embeddings using l1 regularized online learning. In International Joint Conference on Artificial Intelligence.

    Google Scholar 

  • Sun, S., Liu, H., Lin, H., & Abraham, A. (2012). Twitter part-of-speech tagging using pre-classification hidden Markov model. In IEEE International Conference on Systems, Man, and Cybernetics (pp. 1118–1123).

    Google Scholar 

  • Ueffing, N., Haffari, G., & Sarkar, A. (2007). Transductive learning for statistical machine translation. In ACL 2007, Proceedings of the Meeting of the Association for Computational Linguistics, June 23–30, 2007, Prague (pp. 25–32).

    Google Scholar 

  • Xu, W., & Rudnicky, A. (2000). Can artificial neural networks learn language models? In International Conference on Statistical Language Processing (pp. 202–205).

    Google Scholar 

  • Yang, Y., & Pedersen, J. O. (1997). A comparative study on feature selection in text categorization. In Fourteenth International Conference on Machine Learning (pp. 412–420).

    Google Scholar 

  • Yih, W.-T., Zweig, G., & Platt, J. C. (2012). Polarity inducing latent semantic analysis. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL ’12 (pp. 1212–1222). Stroudsburg, PA: Association for Computational Linguistics.

    Google Scholar 

  • Yin, W., & Schütze, H. (2016). Discriminative phrase embedding for paraphrase identification. Preprint, arXiv:1604.00503.

    Google Scholar 

  • Yogatama, D., Faruqui, M., Dyer, C., & Smith, N. A. (2014a). Learning word representations with hierarchical sparse coding. Eprint arXiv.

    Google Scholar 

  • Yogatama, D., Faruqui, M., Dyer, C., & Smith, N. A. (2014b). Learning word representations with hierarchical sparse coding. Eprint arXiv.

    Google Scholar 

  • Zhao, J., Lan, M., Niu, Z. Y., & Lu, Y. (2015). Integrating word embeddings and traditional NLP features to measure textual entailment and semantic relatedness of sentence pairs. In International Joint Conference on Neural Networks (pp. 32–35).

    Google Scholar 

  • Zhou, C., Sun, C., Liu, Z., & Lau, F. (2015). Category enhanced word embedding. Preprint, arXiv:1511.08629.

    Google Scholar 

  • Zou, W. Y., Socher, R., Cer, D. M., & Manning, C. D. (2013). Bilingual word embeddings for phrase-based machine translation. In EMNLP (pp. 1393–1398).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tao Yang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this chapter

Cite this chapter

Li, Y., Yang, T. (2018). Word Embedding for Understanding Natural Language: A Survey. In: Srinivasan, S. (eds) Guide to Big Data Applications. Studies in Big Data, vol 26. Springer, Cham. https://doi.org/10.1007/978-3-319-53817-4_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-53817-4_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-53816-7

  • Online ISBN: 978-3-319-53817-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics