Revisiting Tibetan Word Segmentation with Neural Networks

Duanzhu, Sangjie; Jiacuo, Cizhen; Jia, Cairang

doi:10.1007/978-3-030-81197-6_44

Sangjie Duanzhu^11,12,13,14,
Cizhen Jiacuo^11,12,13,14 &
Cairang Jia^11,12,13,14

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12278))

Included in the following conference series:

Workshop on Chinese Lexical Semantics

1503 Accesses
1 Citations

Abstract

Tibetan Word Segmentation is a basic and essential task in Tibetan Natural Language Processing workflow. Performance of TWS can directly affect many other downstream Tibetan NLP tasks since errors propagate in a multi-stage NLP pipeline. Traditionally the majority of researchers leverage linear statistical approaches to tackle Tibetan Word Segmentation, which often requires hand-crafted linguistic feature engineering with great care. In this work, we propose a neural network architecture for Tibetan Word Segmentation, which is a stacked combination of CNN, Bi-LSTM and CRF. By using tagged data for supervised learning and unlabeled data for representation learning, with no involvement in feature engineering, our model can produce promising performance on the test set, surpassing our baseline models by a large margin, and indicating the effectiveness of the proposed neural model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Bi-directional LSTM Recurrent Neural Network for Chinese Word Segmentation

LM Enhanced BiRNN-CRF for Joint Chinese Word Segmentation and POS Tagging

Enhancing LSTM-based Word Segmentation Using Unlabeled Data

Notes

References

Tsering, T.: Design of an interative Tibetan word segmentation and word registering system (1999)
Google Scholar
Qi, K.: Tibetan word segmentation designed for information processing. J. Northwest Univ. Nationalities 4, 92–97 (2006)
Google Scholar
Liu, H., et al.: Segt: a practical Tibetan word segmentation tool. J. Chin. Inf. Process. 26(1), 97–104 (2012)
Google Scholar
Li, Y., et al.: Tip-las: an opensource Tibetan tokenization and pos-tagging system. J. Chin. Inf. Process. 29(6), 203–207 (2015). (in Chinese)
Article Google Scholar
Li, Y., et al.: An hybrid Tibetan word segmentation with unsupervised features. J. Chin. Inf. Process. 31(2), 71–75 (2017)
Google Scholar
Rabiner, L.R., Juang, B.: An introduction to hidden Markov models. IEEE ASSP Mag. 3(1), 4–16 (1986)
Article Google Scholar
Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional Random Fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the 18th International Conference on Machine Learning (2001)
Google Scholar
Li, B., et al.: Deep learning based Tibetan word segmentation methods. Comput. Eng. Des. 1, 194–198 (2018)
Google Scholar
Bengio, Y., et al.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)
Google Scholar
Xue, N.: Chinese word segmentation as character tagging. Int. J. Comput. Linguist. Chin. Lang. Process. 8, 29–48 (2003)
Google Scholar
Tapanainen, P., Voutilainen, A.: Tagging accurately: don’t guess if you know. In: ANLP (1994)
Google Scholar
Cun, Y.L., et al.: Handwritten digit recognition with a back-propagation network. Adv. Neural. Inf. Process. Syst. 2(2), 396–404 (1990)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: International Conference on Neural Information Processing Systems (2012)
Google Scholar
Collobert, R., et al.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)
MATH Google Scholar
Lample, G., et al.: Neural Architectures for Named Entity Recognition (2016)
Google Scholar
Kim, Y., et al.: Character-aware neural language models. In: 30th AAAI Conference on Artificial Intelligence (2016)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Yao, K., et al.: Spoken language understanding using long short-term memory neural networks. In: 2014 IEEE Spoken Language Technology Workshop (SLT) (2014)
Google Scholar
Sutton, C., McCallum, A., et al.: An introduction to conditional random fields. Found. Trends Mach. Learn. 4(4), 267–373 (2012)
Article Google Scholar

Download references

Acknowledgments

This work was supported by Science and Technology Department of Qinghai Province (grant numbers: 2020-ZJ-Y05, 2020-ZJ-704) and The National Key Research and Development Program of China (grant number: 2017YFB1402200).

Author information

Authors and Affiliations

The State Key Laboratory of Tibetan Intelligent Information Processing and Application, Qinghai Normal University, Xining, Qinghai, China
Sangjie Duanzhu, Cizhen Jiacuo & Cairang Jia
Tibetan Information Processing Engineering Technology and Research Center, Qinghai Normal University, Xining, Qinghai, China
Sangjie Duanzhu, Cizhen Jiacuo & Cairang Jia
Key Laboratory of Tibetan Information Processing, Ministry of Education, Xining, China
Sangjie Duanzhu, Cizhen Jiacuo & Cairang Jia
Tibetan Information Processing and Machine Translation Key Laboratory, Xining, Qinghai, China
Sangjie Duanzhu, Cizhen Jiacuo & Cairang Jia

Authors

Sangjie Duanzhu
View author publications
You can also search for this author in PubMed Google Scholar
Cizhen Jiacuo
View author publications
You can also search for this author in PubMed Google Scholar
Cairang Jia
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Linguistics and Translation, City University of Hong Kong, Hong Kong SAR, China
Meichun Liu
Department of Linguistics and Translation, City University of Hong Kongg, Hong Kong SAR, China
Chunyu Kit
School of Foreign Languages, Peking University, Beijing, China
Qi Su

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Duanzhu, S., Jiacuo, C., Jia, C. (2021). Revisiting Tibetan Word Segmentation with Neural Networks. In: Liu, M., Kit, C., Su, Q. (eds) Chinese Lexical Semantics. CLSW 2020. Lecture Notes in Computer Science(), vol 12278. Springer, Cham. https://doi.org/10.1007/978-3-030-81197-6_44

Download citation

DOI: https://doi.org/10.1007/978-3-030-81197-6_44
Published: 26 July 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-81196-9
Online ISBN: 978-3-030-81197-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Revisiting Tibetan Word Segmentation with Neural Networks

Abstract

Access this chapter

Similar content being viewed by others

Bi-directional LSTM Recurrent Neural Network for Chinese Word Segmentation

LM Enhanced BiRNN-CRF for Joint Chinese Word Segmentation and POS Tagging

Enhancing LSTM-based Word Segmentation Using Unlabeled Data

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Revisiting Tibetan Word Segmentation with Neural Networks

Abstract

Access this chapter

Similar content being viewed by others

Bi-directional LSTM Recurrent Neural Network for Chinese Word Segmentation

LM Enhanced BiRNN-CRF for Joint Chinese Word Segmentation and POS Tagging

Enhancing LSTM-based Word Segmentation Using Unlabeled Data

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation