Word Embedding for Understanding Natural Language: A Survey

Li, Yang; Yang, Tao

doi:10.1007/978-3-319-53817-4_4

Yang Li³ &
Tao Yang³

Part of the book series: Studies in Big Data ((SBD,volume 26))

7518 Accesses
95 Citations
2 Altmetric

Abstract

Word embedding, where semantic and syntactic features are captured from unlabeled text data, is a basic procedure in Natural Language Processing (NLP). The extracted features thus could be organized in low dimensional space. Some representative word embedding approaches include Probability Language Model, Neural Networks Language Model, Sparse Coding, etc. The state-of-the-art methods like skip-gram negative samplings, noise-contrastive estimation, matrix factorization and hierarchical structure regularizer are applied correspondingly to resolve those models. Most of these literatures are working on the observed count and co-occurrence statistic to learn the word embedding. The increasing scale of data, the sparsity of data representation, word position, and training speed are the main challenges for designing word embedding algorithms. In this survey, we first introduce the motivation and background of word embedding. Next we will introduce the methods of text representation as preliminaries, as well as some existing word embedding approaches such as Neural Network Language Model and Sparse Coding Approach, along with their evaluation metrics. In the end, we summarize the applications of word embedding and discuss its future directions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://en.wikipedia.org/wiki/Semantic_analysis_(linguistics).
2.
The citation numbers are from http://www.webofscience.com.

References

Amir, S., Astudillo, R., Ling, W., Martins, B., Silva, M. J., & Trancoso, I. (2015). INESC-ID: A regression model for large scale twitter sentiment lexicon induction. In International Workshop on Semantic Evaluation.
Google Scholar
Andreas, J., & Dan, K. (2014). How much do word embeddings encode about syntax? In Meeting of the Association for Computational Linguistics (pp. 822–827).
Google Scholar
Antony, P. J., Warrier, N. J., & Soman, K. P. (2010). Penn treebank. International Journal of Computer Applications, 7(8), 14–21.
Article Google Scholar
Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. Eprint arxiv.
Google Scholar
Bengio, Y., Schwenk, H., Senécal, J. S., Morin, F., & Gauvain, J. L. (2003). A neural probabilistic language model. Journal of Machine Learning Research, 3(6), 1137–1155.
Google Scholar
Bjerva, J., Bos, J., van der Goot, R., & Nissim, M. (2014). The meaning factory: Formal semantics for recognizing textual entailment and determining semantic similarity. In SemEval-2014 Workshop.
Google Scholar
Collobert, R., & Weston, J. (2008). A unified architecture for natural language processing: deep neural networks with multitask learning. In International Conference, Helsinki, Finland, June (pp. 160–167).
Google Scholar
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., & Kuksa, P. (2011). Natural language processing (almost) from scratch. Journal of Machine Learning Research, 12(1), 2493–2537.
MATH Google Scholar
Deerweste, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., & Richard (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41, 391–407.
Google Scholar
Dickinson, B., & Hu, W. (2015). Sentiment analysis of investor opinions on twitter. Social Networking, 04(3), 62–71.
Article Google Scholar
Djuric, N., Wu, H., Radosavljevic, V., Grbovic, M., & Bhamidipati, N. (2015). Hierarchical neural language models for joint representation of streaming documents and their content. In WWW.
Google Scholar
Faruqui, M., Tsvetkov, Y., Yogatama, D., Dyer, C., & Smith, N. (2015). Sparse overcomplete word vector representations. Preprint, arXiv:1506.02004.
Google Scholar
Fellbaum, C. (1998). WordNet. Wiley Online Library.
MATH Google Scholar
Goddard, C. (2011). Semantic analysis: A practical introduction. Oxford: Oxford University Press.
Google Scholar
Goller, C., & Kuchler, A. (1996). Learning task-dependent distributed representations by backpropagation through structure. In IEEE International Conference on Neural Networks (Vol. 1, pp. 347–352).
Google Scholar
Harris, Z. S. (1954). Distributional structure. Synthese Language Library, 10(2–3), 146–162.
Google Scholar
Hill, F., Cho, K., Jean, S., Devin, C., & Bengio, Y. (2014). Embedding word similarity with neural machine translation. Eprint arXiv.
Google Scholar
Hinton, G. E. (1986). Learning distributed representations of concepts. In Proceedings of CogSci.
Google Scholar
Hofmann, T. (2001). Unsupervised learning by probabilistic latent semantic analysis. Machine Learning, 42(1–2), 177–196.
Article MATH Google Scholar
Hoyer, P. O. (2002). Non-negative sparse coding. In IEEE Workshop on Neural Networks for Signal Processing (pp. 557–565).
Google Scholar
Huang, E. H., Socher, R., Manning, C. D., & Ng, A. Y. (2012). Improving word representations via global context and multiple word prototypes. In Meeting of the Association for Computational Linguistics: Long Papers (pp. 873–882).
Google Scholar
Huang, P.-S., He, X., Gao, J., Deng, L., Acero, A., & Heck, L. (2013). Learning deep structured semantic models for web search using clickthrough data. In Proceedings of the 22Nd ACM International Conference on Information & Knowledge Management, CIKM ’13 (pp. 2333–2338). New York, NY: ACM.
Google Scholar
Klein, D., & Manning, C. D. (2003). Accurate unlexicalized parsing. In Meeting on Association for Computational Linguistics (pp. 423–430).
Google Scholar
Lai, S., Liu, K., Xu, L., & Zhao, J. (2015). How to generate a good word embedding? Credit Union Times, III(2).
Google Scholar
Landauer, T. K. (2002). On the computational basis of learning and cognition: Arguments from lsa. Psychology of Learning & Motivation, 41(41), 43–84.
Article Google Scholar
Landauer, T. K., & Dumais, S. T. (1997). A solution to plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review, 104(2), 211–240.
Article Google Scholar
Landauer, T. K., Foltz, P. W., & Laham, D. (1998). An introduction to latent semantic analysis. Discourse Processes, 25(2), 259–284.
Article Google Scholar
Lin, C., & He, Y. (2009). Joint sentiment/topic model for sentiment analysis. In ACM Conference on Information & Knowledge Management (pp. 375–384).
Google Scholar
Lin, X. (2009). Dual averaging methods for regularized stochastic learning and online optimization. In Conference on Neural Information Processing Systems 2009 (pp. 2543–2596).
Google Scholar
Liu, Y., Liu, Z., Chua, T. S., & Sun, M. (2015). Topical word embeddings. In Twenty-Ninth AAAI Conference on Artificial Intelligence.
Google Scholar
Luo, Y., Tang, J., Yan, J., Xu, C., & Chen, Z. (2014). Pre-trained multi-view word embedding using two-side neural network. In Twenty-Eighth AAAI Conference on Artificial Intelligence.
Google Scholar
Matsugu, M., Mori, K., Mitari, Y., & Kaneda, Y. (2003). Subject independent facial expression recognition with robust face detection using a convolutional neural network. Neural Networks, 16(5–6), 555–559.
Article Google Scholar
McMahon, J. G., & Smith, F. J. (1996). Improving statistical language model performance with automatically generated word hierarchies. Computational Linguistics, 22(2), 217–247.
Google Scholar
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. CoRR, abs/1301.3781.
Google Scholar
Mikolov, T., Karafiát, M., Burget, L., Cernocký, J., & Khudanpur, S. (2010). Recurrent neural network based language model. In INTERSPEECH 2010, Conference of the International Speech Communication Association, Makuhari, Chiba, Japan, September (pp. 1045–1048).
Google Scholar
Mnih, A., & Hinton, G. (2007). Three new graphical models for statistical language modelling. In International Conference on Machine Learning (pp. 641–648).
Google Scholar
Mnih, A., & Hinton, G. E. (2008). A scalable hierarchical distributed language model. In Advances in Neural Information Processing Systems 21, Proceedings of the Twenty-Second Annual Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada, December 8–11, 2008 (pp. 1081–1088).
Google Scholar
Morin, F., & Bengio, Y. (2005). Hierarchical probabilistic neural network language model. Aistats (Vol. 5, pp. 246–252). Citeseer.
Google Scholar
Pennington, J., Socher, R., & Manning, C. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP).
Google Scholar
Rastogi, P., Van Durme, B., & Arora, R. (2015). Multiview LSA: Representation learning via generalized CCA. In Conference of the North American chapter of the association for computational linguistics: Human language technologies, NAACL-HLT’15 (pp. 556–566).
Google Scholar
Rijkhoff, & Jan (2007). Word classes. Language & Linguistics Compass, 1(6), 709–726.
Google Scholar
Salehi, B., Cook, P., & Baldwin, T. (2015). A word embedding approach to predicting the compositionality of multiword expressions. In Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
Google Scholar
Salton, G., Wong, A., & Yang, C. S. (1997). A vector space model for automatic indexing. San Francisco: Morgan Kaufmann Publishers Inc.
MATH Google Scholar
Saurf, R., & Pustejovsky, J. (2007). Determining modality and factuality for text entailment. In International Conference on Semantic Computing (pp. 509–516).
Google Scholar
Schökopf, B., Platt, J., & Hofmann, T. (2007). Efficient sparse coding algorithms. In NIPS (pp. 801–808).
Google Scholar
Scott, D., Dumais, S. T., Furnas, G. W., Lauer, T. K., & Richard, H. (1999). Indexing by latent semantic analysis. In Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence (pp. 391–407).
Google Scholar
Sharma, R., & Raman, S. (2003). Phrase-based text representation for managing the web documents. In International Conference on Information Technology: Coding and Computing (pp. 165–169).
Google Scholar
Shazeer, N., Doherty, R., Evans, C., & Waterson, C. (2016). Swivel: Improving embeddings by noticing what’s missing. Preprint, arXiv:1602.02215.
Google Scholar
Socher, R., Huval, B., Manning, C. D., & Ng, A. Y. (2012). Semantic compositionality through recursive matrix-vector spaces. In Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (pp. 1201–1211).
Google Scholar
Socher, R., Pennington, J., Huang, E. H., Ng, A. Y., & Manning, C. D. (2011). Semi-supervised recursive autoencoders for predicting sentiment distributions. In Conference on Empirical Methods in Natural Language Processing, EMNLP 2011, 27–31 July 2011, John Mcintyre Conference Centre, Edinburgh, A Meeting of SIGDAT, A Special Interest Group of the ACL (pp. 151–161).
Google Scholar
Socher, R., Perelygin, A., Wu, J. Y., Chuang, J., Manning, C. D., Ng, A. Y., & Potts, C. (2013). Recursive deep models for semantic compositionality over a sentiment treebank. In Conference on Empirical Methods on Natural Language Processing.
Google Scholar
Sun, F., Guo, J., Lan, Y., Xu, J., & Cheng, X. (2015). Learning word representations by jointly modeling syntagmatic and paradigmatic relations. In AAAI.
Google Scholar
Sun, F., Guo, J., Lan, Y., Xu, J., & Cheng, X. (2016). Sparse word embeddings using l1 regularized online learning. In International Joint Conference on Artificial Intelligence.
Google Scholar
Sun, S., Liu, H., Lin, H., & Abraham, A. (2012). Twitter part-of-speech tagging using pre-classification hidden Markov model. In IEEE International Conference on Systems, Man, and Cybernetics (pp. 1118–1123).
Google Scholar
Ueffing, N., Haffari, G., & Sarkar, A. (2007). Transductive learning for statistical machine translation. In ACL 2007, Proceedings of the Meeting of the Association for Computational Linguistics, June 23–30, 2007, Prague (pp. 25–32).
Google Scholar
Xu, W., & Rudnicky, A. (2000). Can artificial neural networks learn language models? In International Conference on Statistical Language Processing (pp. 202–205).
Google Scholar
Yang, Y., & Pedersen, J. O. (1997). A comparative study on feature selection in text categorization. In Fourteenth International Conference on Machine Learning (pp. 412–420).
Google Scholar
Yih, W.-T., Zweig, G., & Platt, J. C. (2012). Polarity inducing latent semantic analysis. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL ’12 (pp. 1212–1222). Stroudsburg, PA: Association for Computational Linguistics.
Google Scholar
Yin, W., & Schütze, H. (2016). Discriminative phrase embedding for paraphrase identification. Preprint, arXiv:1604.00503.
Google Scholar
Yogatama, D., Faruqui, M., Dyer, C., & Smith, N. A. (2014a). Learning word representations with hierarchical sparse coding. Eprint arXiv.
Google Scholar
Yogatama, D., Faruqui, M., Dyer, C., & Smith, N. A. (2014b). Learning word representations with hierarchical sparse coding. Eprint arXiv.
Google Scholar
Zhao, J., Lan, M., Niu, Z. Y., & Lu, Y. (2015). Integrating word embeddings and traditional NLP features to measure textual entailment and semantic relatedness of sentence pairs. In International Joint Conference on Neural Networks (pp. 32–35).
Google Scholar
Zhou, C., Sun, C., Liu, Z., & Lau, F. (2015). Category enhanced word embedding. Preprint, arXiv:1511.08629.
Google Scholar
Zou, W. Y., Socher, R., Cer, D. M., & Manning, C. D. (2013). Bilingual word embeddings for phrase-based machine translation. In EMNLP (pp. 1393–1398).
Google Scholar

Download references

Author information

Authors and Affiliations

School of Automation, NorthWestern Polytechnical University, Xi’an, Shanxi, 710072, P.R. China
Yang Li & Tao Yang

Authors

Yang Li
View author publications
You can also search for this author in PubMed Google Scholar
Tao Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tao Yang .

Editor information

Editors and Affiliations

Jesse H. Jones School of Business, Texas Southern University , Houston, Texas, USA
S. Srinivasan

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Li, Y., Yang, T. (2018). Word Embedding for Understanding Natural Language: A Survey. In: Srinivasan, S. (eds) Guide to Big Data Applications. Studies in Big Data, vol 26. Springer, Cham. https://doi.org/10.1007/978-3-319-53817-4_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-53817-4_4
Published: 27 May 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-53816-7
Online ISBN: 978-3-319-53817-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics