Skip to main content

Improved Learning of Chinese Word Embeddings with Semantic Knowledge

  • Conference paper
  • First Online:
Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data (CCL 2015, NLP-NABD 2015)

Abstract

While previous studies show that modeling the minimum meaning-bearing units (characters or morphemes) benefits learning vector representations of words, they ignore the semantic dependencies across these units when deriving word vectors. In this work, we propose to improve the learning of Chinese word embeddings by exploiting semantic knowledge. The basic idea is to take the semantic knowledge about words and their component characters into account when designing composition functions. Experiments show that our approach outperforms two strong baselines on word similarity, word analogy, and document classification tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    We use tag-specific weight vectors rather than weight matrices, as the vLBL model [14] does, for significantly faster training. This has been discussed by Mnih and Teh [15].

  2. 2.

    http://www.csie.ntu.edu.tw/~cjlin/liblinear/

References

  1. Bengio, Y., Ducharme, R., Vincent, P., Janvin, C.: A neural probabilistic language model. JMLR 3, 1137–1155 (2003)

    MATH  Google Scholar 

  2. Botha, J.A., Blunsom, P.: Compositional morphology for word representations and language modelling. In: Proceedings of ICML (2014)

    Google Scholar 

  3. Chen, X., Liu, Z., Sun, M.: A unified model for word sense representation and disambiguation. In: Proceedings of EMNLP (2014)

    Google Scholar 

  4. Chen, X., Xu, L., Liu, Z., Sun, M., Luan, H.: Joint learning of character and word embeddings. In: Proceedings of IJCAI (2015)

    Google Scholar 

  5. Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. JMLR 12, 2493–2537 (2011)

    MATH  Google Scholar 

  6. Faruqui, M., Dodge, J., Jauhar, S.K., Dyer, C., Hovy, E., Smith, N.A.: Retrofitting word vectors to semantic lexicons. In: Proceedings of NAACL (2015)

    Google Scholar 

  7. Jin, P., Wu, Y.: Semeval-2012 task 4: evaluating chinese word similarity. In: Proceedings of SemEval (2012)

    Google Scholar 

  8. Li, J., Sun, M.: Scalable term selection for text categorization. In: Proceedings of EMNLP (2007)

    Google Scholar 

  9. Li, Z.: Parsing the internal structure of words: a new paradigm for chinese word segmentation. In: Proceedings of ACL (2011)

    Google Scholar 

  10. Luong, T., Socher, R., Manning, C.D.: Better word representations with recursive neural networks for morphology. In: Proceedings of CoNLL (2013)

    Google Scholar 

  11. Mei, J., Zhu, Y., Gao, Y., Yin, H.: TongYiCi CiLin. Shanghai Cishu Publisher, Shanghai (1983)

    Google Scholar 

  12. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of Workshop at ICLR (2013)

    Google Scholar 

  13. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of NIPS (2013)

    Google Scholar 

  14. Mnih, A., Kavukcuoglu, K.: Learning word embeddings efficiently with noise-contrastive estimation. In: Proceedings of NIPS (2013)

    Google Scholar 

  15. Mnih, A., Teh, Y.W.: A fast and simple algorithm for training neural probabilistic language models. In: Proceedings of ICML (2012)

    Google Scholar 

  16. Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Proceedings of EMNLP (2014)

    Google Scholar 

  17. Socher, R., Lin, C.C., Ng, A.Y., Manning, C.D.: Parsing natural scenes and natural language with recursive neural networks. In: Proceedings of ICML (2011)

    Google Scholar 

  18. Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C.D., Ng, A., Potts, C.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of EMNLP (2013)

    Google Scholar 

  19. Yu, M., Dredze, M.: Improving lexical embeddings with semantic knowledge. In: Proceedings of ACL (2014)

    Google Scholar 

  20. Zhang, M., Zhang, Y., Che, W., Liu, T.: Chinese parsing exploiting characters. In: Proceedings of ACL (2013)

    Google Scholar 

  21. Zhao, H.: Character-level dependencies in chinese: usefulness and learning. In: Proceedings of EACL (2009)

    Google Scholar 

Download references

Acknowledgments

The authors thank Yang Liu, Xinxiong Chen, Lei Xu, Yu Zhao and Zhiyuan Liu for helpful discussions and three anonymous reviewers for the valuable comments. This research is supported by the Key Project of National Social Science Foundation of China under Grant No. 13&ZD190 and the Project of National Natural Science Foundation of China under Grant No. 61170196.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Liner Yang .

Editor information

Editors and Affiliations

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 2.5 International License (http://creativecommons.org/licenses/by-nc/2.5/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Yang, L., Sun, M. (2015). Improved Learning of Chinese Word Embeddings with Semantic Knowledge. In: Sun, M., Liu, Z., Zhang, M., Liu, Y. (eds) Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. CCL NLP-NABD 2015 2015. Lecture Notes in Computer Science(), vol 9427. Springer, Cham. https://doi.org/10.1007/978-3-319-25816-4_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-25816-4_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-25815-7

  • Online ISBN: 978-3-319-25816-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics