Dual Long Short-Term Memory Networks for Sub-Character Representation Learning

He, Han; Wu, Lei; Yang, Xiaokun; Yan, Hua; Gao, Zhimin; Feng, Yi; Townsend, George

doi:10.1007/978-3-319-77028-4_55

Han He¹⁵,
Lei Wu¹⁵,
Xiaokun Yang¹⁵,
Hua Yan¹⁵,
Zhimin Gao¹⁶,
Yi Feng¹⁷ &
…
George Townsend¹⁷

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 738))

2785 Accesses
7 Citations

Abstract

Characters have commonly been regarded as the minimal processing unit in Natural Language Processing (NLP). But many non-latin languages have hieroglyphic writing systems, involving a big alphabet with thousands or millions of characters. Each character is composed of even smaller parts, which are often ignored by the previous work. In this paper, we propose a novel architecture employing two stacked Long Short-Term Memory Networks (LSTMs) to learn sub-character level representation and capture deeper level of semantic meanings. To build a concrete study and substantiate the efficiency of our neural architecture, we take Chinese Word Segmentation as a research case example. Among those languages, Chinese is a typical case, for which every character contains several components called radicals. Our networks employ a shared radical level embedding to solve both Simplified and Traditional Chinese Word Segmentation, without extra Traditional to Simplified Chinese conversion, in such a highly end-to-end way the word segmentation can be significantly simplified compared to the previous work. Radical level embeddings can also capture deeper semantic meaning below character level and improve the system performance of learning. By tying radical and character embeddings together, the parameter count is reduced whereas semantic knowledge is shared and transferred between two levels, boosting the performance largely. On 3 out of 4 Bakeoff 2005 datasets, our method surpassed state-of-the-art results by up to 0.4%. Our results are reproducible; source codes and corpora are available on GitHub (https://github.com/hankcs/sub-character-cws).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://tool.httpcn.com/Zi/.
2.
https://github.com/facebookresearch/fastText With tiny modification to output n-gram vectors.
3.
http://www.sighan.org/bakeoff2003/score This script rounds a score to one digit.

References

T. Mikolov, M. Karafiát, L. Burget, J. Cernockỳ, S. Khudanpur, Recurrent neural network based language model, in Interspeech, vol. 2 (2010), p. 3
Google Scholar
T. Mikolov, K. Chen, G. Corrado, J. Dean, Efficient estimation of word representations in vector space (2013). arXiv.org
Google Scholar
P. Bojanowski, E. Grave, A. Joulin, T. Mikolov, Enriching word vectors with subword information (2016). arXiv.org. arXiv:1607.04606
Google Scholar
Y. Kim, Y. Jernite, D. Sontag, A.M. Rush, Character-aware neural language models, in AAAI (2016), pp. 2741–2749
Google Scholar
Y. Pinter, R. Guthrie, J. Eisenstein, Mimicking word embeddings using subword RNNs (2017, preprint). arXiv:1707.06961
Google Scholar
Y. Sun, L. Lin, N. Yang, Z. Ji, X. Wang, Radical-enhanced Chinese character embedding, in ICONIP, vol. 8835, Chap. 34 (2014), pp. 279–286
Google Scholar
Y. Li, W. Li, F. Sun, S. Li, Component-enhanced Chinese character embeddings, in EMNLP (2015)
Google Scholar
X. Shi, J. Zhai, X. Yang, Z. Xie, C. Liu, Radical embedding - delving deeper to Chinese radicals, in ACL (2015)
Google Scholar
C. Dong, J. Zhang, C. Zong, M. Hattori, H. Di, Character-based LSTM-CRF with radical-level features for Chinese named entity recognition, in NLPCC/ICCPOL (2016)
Google Scholar
T. Mikolov, I. Sutskever, K. Chen, G. Corrado, J. Dean, Distributed representations of words and phrases and their compositionality, in NIPS (2013)
Google Scholar
X. Zheng, H. Chen, T. Xu, Deep learning for Chinese word segmentation and POS tagging, in EMNLP (2013)
Google Scholar
R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, P.P. Kuksa, Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)
Google Scholar
W. Pei, T. Ge, B. Chang, Max-margin tensor neural network for Chinese word segmentation, in ACL (2014)
Google Scholar
X. Chen, X. Qiu, C. Zhu, P. Liu, X. Huang, Long short-term memory neural networks for Chinese word segmentation, in EMNLP (2015)
Google Scholar
D. Cai , H. Zhao, Neural word segmentation learning for Chinese, in ACL (2016)
Google Scholar
D. Cai, H. Zhao, Z. Zhang, Y. Xin, Y. Wu, F. Huang, Fast and accurate neural word segmentation for Chinese (2017). arXiv.org. arXiv:1704.07047
Google Scholar
G. Lample, M. Ballesteros, S. Subramanian, K. Kawakami, C. Dyer, Neural architectures for named entity recognition. CoRR (2016)
Google Scholar
C. Huang, H. Zhao, Chinese word segmentation: a decade review. J. Chin. Inf. Process. 21(3), 8–19 (2007)
Google Scholar
N. Xue, Chinese word segmentation as character tagging, in IJCLCLP (2003)
Google Scholar
F. Peng, F. Feng, A. Mccallum, Chinese segmentation and new word detection using conditional random fields, in COLING (2004), pp. 562–568
Google Scholar
H. Tseng, P. Chang, G. Andrew, D. Jurafsky, C. Manning, A conditional random field word segmenter for sighan bakeoff 2005, in SIGHAN Workshop on Chinese Language Processing (2005), pp. 168–171
Google Scholar
H. Zhao, C. Huang, M. Li, B.-L. Lu, Effective tag set selection in Chinese word segmentation via conditional random field modeling, in PACLIC (2006)
Google Scholar
H. Zhao, C.N. Huang, M. Li, B.L. Lu, A unified character-based tagging framework for chinese word segmentation. ACM Trans. Asian Lang. Inf. Process. 9(2), 1–32 (2010)
Google Scholar
X. Sun, H. Wang, W. Li, Fast online training with frequency-adaptive learning rates for chinese word segmentation and new word detection, in ACL (2012), pp. 253–262
Google Scholar
Y. Qi, S.G. Das, R. Collobert, J. Weston, Deep learning for character-based information extraction, in ECIR (2014)
Google Scholar
X. Chen, Z. Shi, X. Qiu, X. Huang, Adversarial multi-criteria learning for Chinese word segmentation. vol. 1704 (2017). arXiv:1704.07556
Google Scholar
S. Hochreiter, J. Schmidhuber, Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Google Scholar
A. Graves, J. Schmidhuber, Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw. 18(5–6), 602–610 (2005)
Google Scholar
J.D. Lafferty, A. Mccallum, F.C.N. Pereira, Conditional random fields: probabilistic models for segmenting and labeling sequence data, in Eighteenth International Conference on Machine Learning (2001), pp. 282–289
Google Scholar
T. Emerson, The second international chinese word segmentation bakeoff, in Proceedings of the Fourth SIGHAN Workshop on Chinese Language Processing, Jeju Island (2005), pp. 123–133
Google Scholar
Y. Zhang, S. Clark, Chinese Segmentation with a Word-Based Perceptron Algorithm (Association for Computational Linguistics, Prague, 2007), pp. 840–847. http://www.aclweb.org/anthology/P/P07/P07-1106
X. Sun, Y. Zhang, T. Matsuzaki, Y. Tsuruoka, J. Tsujii, A discriminative latent variable chinese segmenter with hybrid word/character information, in NAACL (2009), pp. 56–64
Google Scholar

Download references

Author information

Authors and Affiliations

University of Houston-Clear Lake, Houston, TX, USA
Han He, Lei Wu, Xiaokun Yang & Hua Yan
University of Houston, Houston, TX, USA
Zhimin Gao
Algoma University, Sault Ste. Marie, ON, Canada
Yi Feng & George Townsend

Authors

Han He
View author publications
You can also search for this author in PubMed Google Scholar
Lei Wu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaokun Yang
View author publications
You can also search for this author in PubMed Google Scholar
Hua Yan
View author publications
You can also search for this author in PubMed Google Scholar
Zhimin Gao
View author publications
You can also search for this author in PubMed Google Scholar
Yi Feng
View author publications
You can also search for this author in PubMed Google Scholar
George Townsend
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lei Wu .

Editor information

Editors and Affiliations

Department of Electrical & Computer Engineering, University of Nevada, Las Vegas, Las Vegas, Nevada, USA
Shahram Latifi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

He, H. et al. (2018). Dual Long Short-Term Memory Networks for Sub-Character Representation Learning. In: Latifi, S. (eds) Information Technology - New Generations. Advances in Intelligent Systems and Computing, vol 738. Springer, Cham. https://doi.org/10.1007/978-3-319-77028-4_55

Download citation

DOI: https://doi.org/10.1007/978-3-319-77028-4_55
Published: 13 April 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-77027-7
Online ISBN: 978-3-319-77028-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics