Dual Long Short-Term Memory Networks for Sub-Character Representation Learning

  • Han He
  • Lei WuEmail author
  • Xiaokun Yang
  • Hua Yan
  • Zhimin Gao
  • Yi Feng
  • George Townsend
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 738)


Characters have commonly been regarded as the minimal processing unit in Natural Language Processing (NLP). But many non-latin languages have hieroglyphic writing systems, involving a big alphabet with thousands or millions of characters. Each character is composed of even smaller parts, which are often ignored by the previous work. In this paper, we propose a novel architecture employing two stacked Long Short-Term Memory Networks (LSTMs) to learn sub-character level representation and capture deeper level of semantic meanings. To build a concrete study and substantiate the efficiency of our neural architecture, we take Chinese Word Segmentation as a research case example. Among those languages, Chinese is a typical case, for which every character contains several components called radicals. Our networks employ a shared radical level embedding to solve both Simplified and Traditional Chinese Word Segmentation, without extra Traditional to Simplified Chinese conversion, in such a highly end-to-end way the word segmentation can be significantly simplified compared to the previous work. Radical level embeddings can also capture deeper semantic meaning below character level and improve the system performance of learning. By tying radical and character embeddings together, the parameter count is reduced whereas semantic knowledge is shared and transferred between two levels, boosting the performance largely. On 3 out of 4 Bakeoff 2005 datasets, our method surpassed state-of-the-art results by up to 0.4%. Our results are reproducible; source codes and corpora are available on GitHub (


AI algorithms and applications Deep learning Machine learning algorithms Natural language processing Neural networks Pattern recognition 


  1. 1.
    T. Mikolov, M. Karafiát, L. Burget, J. Cernockỳ, S. Khudanpur, Recurrent neural network based language model, in Interspeech, vol. 2 (2010), p. 3Google Scholar
  2. 2.
    T. Mikolov, K. Chen, G. Corrado, J. Dean, Efficient estimation of word representations in vector space (2013). arXiv.orgGoogle Scholar
  3. 3.
    P. Bojanowski, E. Grave, A. Joulin, T. Mikolov, Enriching word vectors with subword information (2016). arXiv:1607.04606Google Scholar
  4. 4.
    Y. Kim, Y. Jernite, D. Sontag, A.M. Rush, Character-aware neural language models, in AAAI (2016), pp. 2741–2749Google Scholar
  5. 5.
    Y. Pinter, R. Guthrie, J. Eisenstein, Mimicking word embeddings using subword RNNs (2017, preprint). arXiv:1707.06961Google Scholar
  6. 6.
    Y. Sun, L. Lin, N. Yang, Z. Ji, X. Wang, Radical-enhanced Chinese character embedding, in ICONIP, vol. 8835, Chap. 34 (2014), pp. 279–286Google Scholar
  7. 7.
    Y. Li, W. Li, F. Sun, S. Li, Component-enhanced Chinese character embeddings, in EMNLP (2015)Google Scholar
  8. 8.
    X. Shi, J. Zhai, X. Yang, Z. Xie, C. Liu, Radical embedding - delving deeper to Chinese radicals, in ACL (2015)Google Scholar
  9. 9.
    C. Dong, J. Zhang, C. Zong, M. Hattori, H. Di, Character-based LSTM-CRF with radical-level features for Chinese named entity recognition, in NLPCC/ICCPOL (2016)Google Scholar
  10. 10.
    T. Mikolov, I. Sutskever, K. Chen, G. Corrado, J. Dean, Distributed representations of words and phrases and their compositionality, in NIPS (2013)Google Scholar
  11. 11.
    X. Zheng, H. Chen, T. Xu, Deep learning for Chinese word segmentation and POS tagging, in EMNLP (2013)Google Scholar
  12. 12.
    R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, P.P. Kuksa, Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)Google Scholar
  13. 13.
    W. Pei, T. Ge, B. Chang, Max-margin tensor neural network for Chinese word segmentation, in ACL (2014)Google Scholar
  14. 14.
    X. Chen, X. Qiu, C. Zhu, P. Liu, X. Huang, Long short-term memory neural networks for Chinese word segmentation, in EMNLP (2015)Google Scholar
  15. 15.
    D. Cai , H. Zhao, Neural word segmentation learning for Chinese, in ACL (2016)Google Scholar
  16. 16.
    D. Cai, H. Zhao, Z. Zhang, Y. Xin, Y. Wu, F. Huang, Fast and accurate neural word segmentation for Chinese (2017). arXiv:1704.07047Google Scholar
  17. 17.
    G. Lample, M. Ballesteros, S. Subramanian, K. Kawakami, C. Dyer, Neural architectures for named entity recognition. CoRR (2016)Google Scholar
  18. 18.
    C. Huang, H. Zhao, Chinese word segmentation: a decade review. J. Chin. Inf. Process. 21(3), 8–19 (2007)Google Scholar
  19. 19.
    N. Xue, Chinese word segmentation as character tagging, in IJCLCLP (2003)Google Scholar
  20. 20.
    F. Peng, F. Feng, A. Mccallum, Chinese segmentation and new word detection using conditional random fields, in COLING (2004), pp. 562–568Google Scholar
  21. 21.
    H. Tseng, P. Chang, G. Andrew, D. Jurafsky, C. Manning, A conditional random field word segmenter for sighan bakeoff 2005, in SIGHAN Workshop on Chinese Language Processing (2005), pp. 168–171Google Scholar
  22. 22.
    H. Zhao, C. Huang, M. Li, B.-L. Lu, Effective tag set selection in Chinese word segmentation via conditional random field modeling, in PACLIC (2006)Google Scholar
  23. 23.
    H. Zhao, C.N. Huang, M. Li, B.L. Lu, A unified character-based tagging framework for chinese word segmentation. ACM Trans. Asian Lang. Inf. Process. 9(2), 1–32 (2010)Google Scholar
  24. 24.
    X. Sun, H. Wang, W. Li, Fast online training with frequency-adaptive learning rates for chinese word segmentation and new word detection, in ACL (2012), pp. 253–262Google Scholar
  25. 25.
    Y. Qi, S.G. Das, R. Collobert, J. Weston, Deep learning for character-based information extraction, in ECIR (2014)Google Scholar
  26. 26.
    X. Chen, Z. Shi, X. Qiu, X. Huang, Adversarial multi-criteria learning for Chinese word segmentation. vol. 1704 (2017). arXiv:1704.07556Google Scholar
  27. 27.
    S. Hochreiter, J. Schmidhuber, Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)Google Scholar
  28. 28.
    A. Graves, J. Schmidhuber, Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw. 18(5–6), 602–610 (2005)Google Scholar
  29. 29.
    J.D. Lafferty, A. Mccallum, F.C.N. Pereira, Conditional random fields: probabilistic models for segmenting and labeling sequence data, in Eighteenth International Conference on Machine Learning (2001), pp. 282–289Google Scholar
  30. 30.
    T. Emerson, The second international chinese word segmentation bakeoff, in Proceedings of the Fourth SIGHAN Workshop on Chinese Language Processing, Jeju Island (2005), pp. 123–133Google Scholar
  31. 31.
    Y. Zhang, S. Clark, Chinese Segmentation with a Word-Based Perceptron Algorithm (Association for Computational Linguistics, Prague, 2007), pp. 840–847.
  32. 32.
    X. Sun, Y. Zhang, T. Matsuzaki, Y. Tsuruoka, J. Tsujii, A discriminative latent variable chinese segmenter with hybrid word/character information, in NAACL (2009), pp. 56–64Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Han He
    • 1
  • Lei Wu
    • 1
    Email author
  • Xiaokun Yang
    • 1
  • Hua Yan
    • 1
  • Zhimin Gao
    • 2
  • Yi Feng
    • 3
  • George Townsend
    • 3
  1. 1.University of Houston-Clear LakeHoustonUSA
  2. 2.University of HoustonHoustonUSA
  3. 3.Algoma UniversitySault Ste. MarieCanada

Personalised recommendations