Integrating Character Representations into Chinese Word Embedding

  • Xingyuan Chen
  • Peng JinEmail author
  • Diana McCarthy
  • John Carroll
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10085)


In this paper we propose a novel word representation for Chinese based on a state-of-the-art word embedding approach. Our main contribution is to integrate distributional representations of Chinese characters into the word embedding. Recent related work on European languages has demonstrated that information from inflectional morphology can reduce the problem of sparse data and improve word representations. Chinese has very little inflectional morphology, but there is potential for incorporating character-level information. Chinese characters are drawn from a fixed set – with just under four thousand in common usage – but a major problem with using characters is their ambiguity. In order to address this problem, we disambiguate the characters according to groupings in a semantic hierarchy. Coupling our character embeddings with word embeddings, we observe improved performance on the tasks of finding synonyms and rating word similarity compared to a model using word embeddings alone, especially for low frequency words.


Word embedding Chinese character Cilin 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. Journal of Machine Learning Research 3, 1137–1155 (2003)zbMATHGoogle Scholar
  2. Botha, J., Blunsom, P.: Compositional morphology for word representations and language modeling. In: Proceedings of ICML (2014)Google Scholar
  3. Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. Journal of Machine Learning Research 12, 2493–2537 (2011)zbMATHGoogle Scholar
  4. Curran, J., Moens, M.: Scaling context space. In: Proceedings of ACL, pp. 231–238 (2002)Google Scholar
  5. Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G., Ruppin, E.: Placing Search in Context: The Concept Revisited. ACM Transactions on Information Systems 20(1), 116–131 (2002)CrossRefGoogle Scholar
  6. Huang, C.-R., Chen, K.-J., Lai, C.: Mandarin Daily Classification Dictionary. Mandarin Daily Press, Taipei (1997)Google Scholar
  7. Jin, P., Wu, Y.: Semeval-2012 task 4: evaluating chinese word similarity. In: Proceedings of First Joint Conference of Lexical and Computational Semantics, pp. 374–377 (2012)Google Scholar
  8. Levy, O., Goldberg, Y.: Dependency-based word embedding. In: Proceedings of ACL, pp. 23–25 (2014a)Google Scholar
  9. Levy, O., Goldberg, Y.: Word2vec Explained: Deriving Mikolov et al.’s Negative-Sampling Word-Embedding Method. arxiv1402.3722v1 (2014b)Google Scholar
  10. Li, M., Zong, C., Ng, H.T.: Automatic evaluation of chinese translation output: word-level or character-level?. In: Proceedings of ACL, pp. 159–164 (2011)Google Scholar
  11. Li, Z.: Parsing the internal structure of words: a new paradigm for chinese word segmentation. In: Proceedings of ACL, pp. 1405–1414 (2011)Google Scholar
  12. Liu, C., Ng, H.T.: Character-level machine translation evaluation for languages with ambiguous word boundaries. In: Proceedings of ACL, pp. 921–929 (2012)Google Scholar
  13. Luong, M.-T., Socher, R., Manning, C.D.: Better word representations with recursive neural networks for morphology. In: Proceedings of CoNLL, pp. 104–113 (2013)Google Scholar
  14. Mei, J., Zheng, Y., Gao, Y., Yin, H.: TongYiCiCiLin. The Commercial Press, Shanghai (1984)Google Scholar
  15. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of Workshop at ICLR (2013a)Google Scholar
  16. Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of NIPS (2013b)Google Scholar
  17. Mnihand, A., Hinton, G.: Three new graphical models for statistical language modelling. In: Proceedings of ICML (2007)Google Scholar
  18. Morinand, F., Bengio, Y.: Hierarchical probabilistic neural network language model. In: AISTATS (2005)Google Scholar
  19. Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Proceedings of ACL, pp. 1532–1543 (2014)Google Scholar
  20. Reddy, S., McCarthy, D., Manandhar, S.: An empirical study on compositionality in compound nouns. In: Proceedings of IJCNLP, pp. 210–218 (2011)Google Scholar
  21. Schwenk, H.: Continuous space language models. Computer Speech and Language 21, 492–518 (2007)CrossRefGoogle Scholar
  22. Tseng, H.: Semantic classification of chinese unknown words. In: Proceedings of ACL (2003)Google Scholar
  23. Turian, J., Ratinov, L., Bengio, Y.: Word representations: a simple and general method for semi-supervised learning. In: Proceedings of ACL, pp. 384–394 (2010)Google Scholar
  24. Yu, M., Dredze, M.: Improving lexical embedding with Semantic knowledge. In: Proceedings of ACL, pp. 545–550 (2014)Google Scholar
  25. Zou, W.Y., Socher, R., Cer, D., Manning, C.D.: Bilingual word embeddings for phrase-based machine translation. In: Proceedings of EMNLP, pp. 1393–1398 (2013)Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Xingyuan Chen
    • 1
  • Peng Jin
    • 1
    Email author
  • Diana McCarthy
    • 2
  • John Carroll
    • 3
  1. 1.Key Lab of Internet Natural Language Processing of Sichuan Provincial Education DepartmentLeshan Normal UniversityLeshanChina
  2. 2.Department of Theoretical and Applied LinguisticsUniversity of CambridgeCambridgeUK
  3. 3.Department of InformaticsUniversity of SussexBrightonUK

Personalised recommendations