Skip to main content

Dual Long Short-Term Memory Networks for Sub-Character Representation Learning

  • Conference paper
  • First Online:
Information Technology - New Generations

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 738))

Abstract

Characters have commonly been regarded as the minimal processing unit in Natural Language Processing (NLP). But many non-latin languages have hieroglyphic writing systems, involving a big alphabet with thousands or millions of characters. Each character is composed of even smaller parts, which are often ignored by the previous work. In this paper, we propose a novel architecture employing two stacked Long Short-Term Memory Networks (LSTMs) to learn sub-character level representation and capture deeper level of semantic meanings. To build a concrete study and substantiate the efficiency of our neural architecture, we take Chinese Word Segmentation as a research case example. Among those languages, Chinese is a typical case, for which every character contains several components called radicals. Our networks employ a shared radical level embedding to solve both Simplified and Traditional Chinese Word Segmentation, without extra Traditional to Simplified Chinese conversion, in such a highly end-to-end way the word segmentation can be significantly simplified compared to the previous work. Radical level embeddings can also capture deeper semantic meaning below character level and improve the system performance of learning. By tying radical and character embeddings together, the parameter count is reduced whereas semantic knowledge is shared and transferred between two levels, boosting the performance largely. On 3 out of 4 Bakeoff 2005 datasets, our method surpassed state-of-the-art results by up to 0.4%. Our results are reproducible; source codes and corpora are available on GitHub (https://github.com/hankcs/sub-character-cws).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://tool.httpcn.com/Zi/.

  2. 2.

    https://github.com/facebookresearch/fastText With tiny modification to output n-gram vectors.

  3. 3.

    http://www.sighan.org/bakeoff2003/score This script rounds a score to one digit.

References

  1. T. Mikolov, M. Karafiát, L. Burget, J. Cernockỳ, S. Khudanpur, Recurrent neural network based language model, in Interspeech, vol. 2 (2010), p. 3

    Google Scholar 

  2. T. Mikolov, K. Chen, G. Corrado, J. Dean, Efficient estimation of word representations in vector space (2013). arXiv.org

    Google Scholar 

  3. P. Bojanowski, E. Grave, A. Joulin, T. Mikolov, Enriching word vectors with subword information (2016). arXiv.org. arXiv:1607.04606

    Google Scholar 

  4. Y. Kim, Y. Jernite, D. Sontag, A.M. Rush, Character-aware neural language models, in AAAI (2016), pp. 2741–2749

    Google Scholar 

  5. Y. Pinter, R. Guthrie, J. Eisenstein, Mimicking word embeddings using subword RNNs (2017, preprint). arXiv:1707.06961

    Google Scholar 

  6. Y. Sun, L. Lin, N. Yang, Z. Ji, X. Wang, Radical-enhanced Chinese character embedding, in ICONIP, vol. 8835, Chap. 34 (2014), pp. 279–286

    Google Scholar 

  7. Y. Li, W. Li, F. Sun, S. Li, Component-enhanced Chinese character embeddings, in EMNLP (2015)

    Google Scholar 

  8. X. Shi, J. Zhai, X. Yang, Z. Xie, C. Liu, Radical embedding - delving deeper to Chinese radicals, in ACL (2015)

    Google Scholar 

  9. C. Dong, J. Zhang, C. Zong, M. Hattori, H. Di, Character-based LSTM-CRF with radical-level features for Chinese named entity recognition, in NLPCC/ICCPOL (2016)

    Google Scholar 

  10. T. Mikolov, I. Sutskever, K. Chen, G. Corrado, J. Dean, Distributed representations of words and phrases and their compositionality, in NIPS (2013)

    Google Scholar 

  11. X. Zheng, H. Chen, T. Xu, Deep learning for Chinese word segmentation and POS tagging, in EMNLP (2013)

    Google Scholar 

  12. R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, P.P. Kuksa, Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)

    Google Scholar 

  13. W. Pei, T. Ge, B. Chang, Max-margin tensor neural network for Chinese word segmentation, in ACL (2014)

    Google Scholar 

  14. X. Chen, X. Qiu, C. Zhu, P. Liu, X. Huang, Long short-term memory neural networks for Chinese word segmentation, in EMNLP (2015)

    Google Scholar 

  15. D. Cai , H. Zhao, Neural word segmentation learning for Chinese, in ACL (2016)

    Google Scholar 

  16. D. Cai, H. Zhao, Z. Zhang, Y. Xin, Y. Wu, F. Huang, Fast and accurate neural word segmentation for Chinese (2017). arXiv.org. arXiv:1704.07047

    Google Scholar 

  17. G. Lample, M. Ballesteros, S. Subramanian, K. Kawakami, C. Dyer, Neural architectures for named entity recognition. CoRR (2016)

    Google Scholar 

  18. C. Huang, H. Zhao, Chinese word segmentation: a decade review. J. Chin. Inf. Process. 21(3), 8–19 (2007)

    Google Scholar 

  19. N. Xue, Chinese word segmentation as character tagging, in IJCLCLP (2003)

    Google Scholar 

  20. F. Peng, F. Feng, A. Mccallum, Chinese segmentation and new word detection using conditional random fields, in COLING (2004), pp. 562–568

    Google Scholar 

  21. H. Tseng, P. Chang, G. Andrew, D. Jurafsky, C. Manning, A conditional random field word segmenter for sighan bakeoff 2005, in SIGHAN Workshop on Chinese Language Processing (2005), pp. 168–171

    Google Scholar 

  22. H. Zhao, C. Huang, M. Li, B.-L. Lu, Effective tag set selection in Chinese word segmentation via conditional random field modeling, in PACLIC (2006)

    Google Scholar 

  23. H. Zhao, C.N. Huang, M. Li, B.L. Lu, A unified character-based tagging framework for chinese word segmentation. ACM Trans. Asian Lang. Inf. Process. 9(2), 1–32 (2010)

    Google Scholar 

  24. X. Sun, H. Wang, W. Li, Fast online training with frequency-adaptive learning rates for chinese word segmentation and new word detection, in ACL (2012), pp. 253–262

    Google Scholar 

  25. Y. Qi, S.G. Das, R. Collobert, J. Weston, Deep learning for character-based information extraction, in ECIR (2014)

    Google Scholar 

  26. X. Chen, Z. Shi, X. Qiu, X. Huang, Adversarial multi-criteria learning for Chinese word segmentation. vol. 1704 (2017). arXiv:1704.07556

    Google Scholar 

  27. S. Hochreiter, J. Schmidhuber, Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Google Scholar 

  28. A. Graves, J. Schmidhuber, Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw. 18(5–6), 602–610 (2005)

    Google Scholar 

  29. J.D. Lafferty, A. Mccallum, F.C.N. Pereira, Conditional random fields: probabilistic models for segmenting and labeling sequence data, in Eighteenth International Conference on Machine Learning (2001), pp. 282–289

    Google Scholar 

  30. T. Emerson, The second international chinese word segmentation bakeoff, in Proceedings of the Fourth SIGHAN Workshop on Chinese Language Processing, Jeju Island (2005), pp. 123–133

    Google Scholar 

  31. Y. Zhang, S. Clark, Chinese Segmentation with a Word-Based Perceptron Algorithm (Association for Computational Linguistics, Prague, 2007), pp. 840–847. http://www.aclweb.org/anthology/P/P07/P07-1106

  32. X. Sun, Y. Zhang, T. Matsuzaki, Y. Tsuruoka, J. Tsujii, A discriminative latent variable chinese segmenter with hybrid word/character information, in NAACL (2009), pp. 56–64

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lei Wu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

He, H. et al. (2018). Dual Long Short-Term Memory Networks for Sub-Character Representation Learning. In: Latifi, S. (eds) Information Technology - New Generations. Advances in Intelligent Systems and Computing, vol 738. Springer, Cham. https://doi.org/10.1007/978-3-319-77028-4_55

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-77028-4_55

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-77027-7

  • Online ISBN: 978-3-319-77028-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics