Which Embedding Level is Better for Semantic Representation? An Empirical Research on Chinese Phrases

  • Kunyuan Pang
  • Jintao TangEmail author
  • Ting Wang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11109)


Word embeddings have been used as popular features in various Natural Language Processing(NLP) tasks. To overcome the coverage problem of statistics, compositional model is proposed, which embeds basic units of a language, and compose structures of higher hierarchy, like idiom, phrase, and named entity. In that case, selecting the right level of basic-unit embedding to represent semantics of higher hierarchy unit is crucial. This paper investigates this problem by Chinese phrase representation task, in which language characters and words are viewed as basic units. We define a phrase representation evaluation tasks by utilizing Wikipedia. We propose four intuitionistic composing methods from basic embedding to higher level representation, and investigate the performance of the two basic units. Empirical results show that with all composing methods, word embedding out performs character embedding on both tasks, which indicates that word level is more suitable for composing semantic representation.


Word embedding Phrase representation Composing model 


  1. 1.
    Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)zbMATHGoogle Scholar
  2. 2.
    Chen, X., Xu, L., Liu, Z., Sun, M., Luan, H.: Joint learning of character and word embeddings. In: International Conference on Artificial Intelligence, pp. 1236–1242 (2015)Google Scholar
  3. 3.
    Cho, K., Merrienboer, B., Gulcehre, C., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: EMNLP, pp. 16–37 (2014)Google Scholar
  4. 4.
    Fawcett, T.: An introduction to ROC analysis. Pattern Recognit. Lett. 27(8), 861–874 (2006)MathSciNetCrossRefGoogle Scholar
  5. 5.
    Harris, Z.S.: Distributional Structure. Springer, Netherlands (1981). Scholar
  6. 6.
    Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. In: Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, pp. 427–431 (2017)Google Scholar
  7. 7.
    Li, X., Wang, T.: Lexicon of Common Words in Contemporary Chinese. The Commercial Press, Beijing (2008)Google Scholar
  8. 8.
    Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)Google Scholar
  9. 9.
    Mitchell, J., Lapata, M.: Vector-based models of semantic composition. In: Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics, pp. 236–244 (2008)Google Scholar
  10. 10.
    Parikh, A., Täckström, O., Das, D., Uszkoreit, J.: A decomposable attention model for natural language inference. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 2249–2255 (2016)Google Scholar
  11. 11.
    Pennington, J., Socher, R., Manning, C.: Glove: Global vectors for word representation. In: Conference on Empirical Methods in Natural Language Processing, pp. 1532–1543 (2014)Google Scholar
  12. 12.
    Shen, Y., He, X., Gao, J., Deng, L.: Learning semantic representations using convolutional neural networks for web search. In: International Conference on World Wide Web, pp. 373–374 (2014)Google Scholar
  13. 13.
    Shi, X., Zhai, J., Yang, X., Xie, Z., Liu, C.: Radical embedding: delving deeper to chinese radicals. In: 2010 European Signal Processing Conference, pp. 572–575 (2015)Google Scholar
  14. 14.
    Sundermeyer, M., Alkhouli, T., Wuebker, J., Ney, H.: Translation modeling with bidirectional recurrent neural networks. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 14–25 (2014)Google Scholar
  15. 15.
    Turian, J., Ratinov, L., Bengio, Y.: Word representations: a simple and general method for semi-supervised learning. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 384–394 (2010)Google Scholar
  16. 16.
    Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 6000–6010 (2017)Google Scholar
  17. 17.
    Yang, B., Wong, D.F., Xiao, T., Chao, L.S., Zhu, J.: Towards bidirectional hierarchical representations for attention-based neural machine translation. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 1432–1441 (2017)Google Scholar
  18. 18.
    Yu, M., Dredze, M.: Learning composition models for phrase embeddings. Trans. Assoc. Comput. Linguist. 3, 227–242 (2015)Google Scholar
  19. 19.
    Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text classification. In: Advances in Neural Information Processing Systems, pp. 649–657 (2015)Google Scholar
  20. 20.
    Zheng, X., Chen, H., Xu, T.: Deep learning for chinese word segmentation and pos tagging. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 647–657 (2013)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.College of Computer, National University of Defense Technology ChangshaHunanPeople’s Republic of China

Personalised recommendations