Abstract
Word embedding, where semantic and syntactic features are captured from unlabeled text data, is a basic procedure in Natural Language Processing (NLP). The extracted features thus could be organized in low dimensional space. Some representative word embedding approaches include Probability Language Model, Neural Networks Language Model, Sparse Coding, etc. The state-of-the-art methods like skip-gram negative samplings, noise-contrastive estimation, matrix factorization and hierarchical structure regularizer are applied correspondingly to resolve those models. Most of these literatures are working on the observed count and co-occurrence statistic to learn the word embedding. The increasing scale of data, the sparsity of data representation, word position, and training speed are the main challenges for designing word embedding algorithms. In this survey, we first introduce the motivation and background of word embedding. Next we will introduce the methods of text representation as preliminaries, as well as some existing word embedding approaches such as Neural Network Language Model and Sparse Coding Approach, along with their evaluation metrics. In the end, we summarize the applications of word embedding and discuss its future directions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
https://en.wikipedia.org/wiki/Semantic_analysis_(linguistics).
- 2.
The citation numbers are from http://www.webofscience.com.
References
Amir, S., Astudillo, R., Ling, W., Martins, B., Silva, M. J., & Trancoso, I. (2015). INESC-ID: A regression model for large scale twitter sentiment lexicon induction. In International Workshop on Semantic Evaluation.
Andreas, J., & Dan, K. (2014). How much do word embeddings encode about syntax? In Meeting of the Association for Computational Linguistics (pp. 822–827).
Antony, P. J., Warrier, N. J., & Soman, K. P. (2010). Penn treebank. International Journal of Computer Applications, 7(8), 14–21.
Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. Eprint arxiv.
Bengio, Y., Schwenk, H., Senécal, J. S., Morin, F., & Gauvain, J. L. (2003). A neural probabilistic language model. Journal of Machine Learning Research, 3(6), 1137–1155.
Bjerva, J., Bos, J., van der Goot, R., & Nissim, M. (2014). The meaning factory: Formal semantics for recognizing textual entailment and determining semantic similarity. In SemEval-2014 Workshop.
Collobert, R., & Weston, J. (2008). A unified architecture for natural language processing: deep neural networks with multitask learning. In International Conference, Helsinki, Finland, June (pp. 160–167).
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., & Kuksa, P. (2011). Natural language processing (almost) from scratch. Journal of Machine Learning Research, 12(1), 2493–2537.
Deerweste, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., & Richard (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41, 391–407.
Dickinson, B., & Hu, W. (2015). Sentiment analysis of investor opinions on twitter. Social Networking, 04(3), 62–71.
Djuric, N., Wu, H., Radosavljevic, V., Grbovic, M., & Bhamidipati, N. (2015). Hierarchical neural language models for joint representation of streaming documents and their content. In WWW.
Faruqui, M., Tsvetkov, Y., Yogatama, D., Dyer, C., & Smith, N. (2015). Sparse overcomplete word vector representations. Preprint, arXiv:1506.02004.
Fellbaum, C. (1998). WordNet. Wiley Online Library.
Goddard, C. (2011). Semantic analysis: A practical introduction. Oxford: Oxford University Press.
Goller, C., & Kuchler, A. (1996). Learning task-dependent distributed representations by backpropagation through structure. In IEEE International Conference on Neural Networks (Vol. 1, pp. 347–352).
Harris, Z. S. (1954). Distributional structure. Synthese Language Library, 10(2–3), 146–162.
Hill, F., Cho, K., Jean, S., Devin, C., & Bengio, Y. (2014). Embedding word similarity with neural machine translation. Eprint arXiv.
Hinton, G. E. (1986). Learning distributed representations of concepts. In Proceedings of CogSci.
Hofmann, T. (2001). Unsupervised learning by probabilistic latent semantic analysis. Machine Learning, 42(1–2), 177–196.
Hoyer, P. O. (2002). Non-negative sparse coding. In IEEE Workshop on Neural Networks for Signal Processing (pp. 557–565).
Huang, E. H., Socher, R., Manning, C. D., & Ng, A. Y. (2012). Improving word representations via global context and multiple word prototypes. In Meeting of the Association for Computational Linguistics: Long Papers (pp. 873–882).
Huang, P.-S., He, X., Gao, J., Deng, L., Acero, A., & Heck, L. (2013). Learning deep structured semantic models for web search using clickthrough data. In Proceedings of the 22Nd ACM International Conference on Information & Knowledge Management, CIKM ’13 (pp. 2333–2338). New York, NY: ACM.
Klein, D., & Manning, C. D. (2003). Accurate unlexicalized parsing. In Meeting on Association for Computational Linguistics (pp. 423–430).
Lai, S., Liu, K., Xu, L., & Zhao, J. (2015). How to generate a good word embedding? Credit Union Times, III(2).
Landauer, T. K. (2002). On the computational basis of learning and cognition: Arguments from lsa. Psychology of Learning & Motivation, 41(41), 43–84.
Landauer, T. K., & Dumais, S. T. (1997). A solution to plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review, 104(2), 211–240.
Landauer, T. K., Foltz, P. W., & Laham, D. (1998). An introduction to latent semantic analysis. Discourse Processes, 25(2), 259–284.
Lin, C., & He, Y. (2009). Joint sentiment/topic model for sentiment analysis. In ACM Conference on Information & Knowledge Management (pp. 375–384).
Lin, X. (2009). Dual averaging methods for regularized stochastic learning and online optimization. In Conference on Neural Information Processing Systems 2009 (pp. 2543–2596).
Liu, Y., Liu, Z., Chua, T. S., & Sun, M. (2015). Topical word embeddings. In Twenty-Ninth AAAI Conference on Artificial Intelligence.
Luo, Y., Tang, J., Yan, J., Xu, C., & Chen, Z. (2014). Pre-trained multi-view word embedding using two-side neural network. In Twenty-Eighth AAAI Conference on Artificial Intelligence.
Matsugu, M., Mori, K., Mitari, Y., & Kaneda, Y. (2003). Subject independent facial expression recognition with robust face detection using a convolutional neural network. Neural Networks, 16(5–6), 555–559.
McMahon, J. G., & Smith, F. J. (1996). Improving statistical language model performance with automatically generated word hierarchies. Computational Linguistics, 22(2), 217–247.
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. CoRR, abs/1301.3781.
Mikolov, T., Karafiát, M., Burget, L., Cernocký, J., & Khudanpur, S. (2010). Recurrent neural network based language model. In INTERSPEECH 2010, Conference of the International Speech Communication Association, Makuhari, Chiba, Japan, September (pp. 1045–1048).
Mnih, A., & Hinton, G. (2007). Three new graphical models for statistical language modelling. In International Conference on Machine Learning (pp. 641–648).
Mnih, A., & Hinton, G. E. (2008). A scalable hierarchical distributed language model. In Advances in Neural Information Processing Systems 21, Proceedings of the Twenty-Second Annual Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada, December 8–11, 2008 (pp. 1081–1088).
Morin, F., & Bengio, Y. (2005). Hierarchical probabilistic neural network language model. Aistats (Vol. 5, pp. 246–252). Citeseer.
Pennington, J., Socher, R., & Manning, C. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP).
Rastogi, P., Van Durme, B., & Arora, R. (2015). Multiview LSA: Representation learning via generalized CCA. In Conference of the North American chapter of the association for computational linguistics: Human language technologies, NAACL-HLT’15 (pp. 556–566).
Rijkhoff, & Jan (2007). Word classes. Language & Linguistics Compass, 1(6), 709–726.
Salehi, B., Cook, P., & Baldwin, T. (2015). A word embedding approach to predicting the compositionality of multiword expressions. In Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
Salton, G., Wong, A., & Yang, C. S. (1997). A vector space model for automatic indexing. San Francisco: Morgan Kaufmann Publishers Inc.
Saurf, R., & Pustejovsky, J. (2007). Determining modality and factuality for text entailment. In International Conference on Semantic Computing (pp. 509–516).
Schökopf, B., Platt, J., & Hofmann, T. (2007). Efficient sparse coding algorithms. In NIPS (pp. 801–808).
Scott, D., Dumais, S. T., Furnas, G. W., Lauer, T. K., & Richard, H. (1999). Indexing by latent semantic analysis. In Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence (pp. 391–407).
Sharma, R., & Raman, S. (2003). Phrase-based text representation for managing the web documents. In International Conference on Information Technology: Coding and Computing (pp. 165–169).
Shazeer, N., Doherty, R., Evans, C., & Waterson, C. (2016). Swivel: Improving embeddings by noticing what’s missing. Preprint, arXiv:1602.02215.
Socher, R., Huval, B., Manning, C. D., & Ng, A. Y. (2012). Semantic compositionality through recursive matrix-vector spaces. In Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (pp. 1201–1211).
Socher, R., Pennington, J., Huang, E. H., Ng, A. Y., & Manning, C. D. (2011). Semi-supervised recursive autoencoders for predicting sentiment distributions. In Conference on Empirical Methods in Natural Language Processing, EMNLP 2011, 27–31 July 2011, John Mcintyre Conference Centre, Edinburgh, A Meeting of SIGDAT, A Special Interest Group of the ACL (pp. 151–161).
Socher, R., Perelygin, A., Wu, J. Y., Chuang, J., Manning, C. D., Ng, A. Y., & Potts, C. (2013). Recursive deep models for semantic compositionality over a sentiment treebank. In Conference on Empirical Methods on Natural Language Processing.
Sun, F., Guo, J., Lan, Y., Xu, J., & Cheng, X. (2015). Learning word representations by jointly modeling syntagmatic and paradigmatic relations. In AAAI.
Sun, F., Guo, J., Lan, Y., Xu, J., & Cheng, X. (2016). Sparse word embeddings using l1 regularized online learning. In International Joint Conference on Artificial Intelligence.
Sun, S., Liu, H., Lin, H., & Abraham, A. (2012). Twitter part-of-speech tagging using pre-classification hidden Markov model. In IEEE International Conference on Systems, Man, and Cybernetics (pp. 1118–1123).
Ueffing, N., Haffari, G., & Sarkar, A. (2007). Transductive learning for statistical machine translation. In ACL 2007, Proceedings of the Meeting of the Association for Computational Linguistics, June 23–30, 2007, Prague (pp. 25–32).
Xu, W., & Rudnicky, A. (2000). Can artificial neural networks learn language models? In International Conference on Statistical Language Processing (pp. 202–205).
Yang, Y., & Pedersen, J. O. (1997). A comparative study on feature selection in text categorization. In Fourteenth International Conference on Machine Learning (pp. 412–420).
Yih, W.-T., Zweig, G., & Platt, J. C. (2012). Polarity inducing latent semantic analysis. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL ’12 (pp. 1212–1222). Stroudsburg, PA: Association for Computational Linguistics.
Yin, W., & Schütze, H. (2016). Discriminative phrase embedding for paraphrase identification. Preprint, arXiv:1604.00503.
Yogatama, D., Faruqui, M., Dyer, C., & Smith, N. A. (2014a). Learning word representations with hierarchical sparse coding. Eprint arXiv.
Yogatama, D., Faruqui, M., Dyer, C., & Smith, N. A. (2014b). Learning word representations with hierarchical sparse coding. Eprint arXiv.
Zhao, J., Lan, M., Niu, Z. Y., & Lu, Y. (2015). Integrating word embeddings and traditional NLP features to measure textual entailment and semantic relatedness of sentence pairs. In International Joint Conference on Neural Networks (pp. 32–35).
Zhou, C., Sun, C., Liu, Z., & Lau, F. (2015). Category enhanced word embedding. Preprint, arXiv:1511.08629.
Zou, W. Y., Socher, R., Cer, D. M., & Manning, C. D. (2013). Bilingual word embeddings for phrase-based machine translation. In EMNLP (pp. 1393–1398).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this chapter
Cite this chapter
Li, Y., Yang, T. (2018). Word Embedding for Understanding Natural Language: A Survey. In: Srinivasan, S. (eds) Guide to Big Data Applications. Studies in Big Data, vol 26. Springer, Cham. https://doi.org/10.1007/978-3-319-53817-4_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-53817-4_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-53816-7
Online ISBN: 978-3-319-53817-4
eBook Packages: EngineeringEngineering (R0)