A Lexicon-Constrained Character Model for Chinese Morphological Analysis

Meng, Yao; Yu, Hao; Nishino, Fumihito

doi:10.1007/11562214_48

Yao Meng²²,
Hao Yu²² &
Fumihito Nishino²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3651))

Included in the following conference series:

International Conference on Natural Language Processing

1553 Accesses

Abstract

This paper proposes a lexicon-constrained character model that combines both word and character features to solve complicated issues in Chinese morphological analysis. A Chinese character-based model constrained by a lexicon is built to acquire word building rules. Each character in a Chinese sentence is assigned a tag by the proposed model. The word segmentation and part-of-speech tagging results are then generated based on the character tags. The proposed method solves such problems as unknown word identification, data sparseness, and estimation bias in an integrated, unified framework. Preliminary experiments indicate that the proposed method outperforms the best SIGHAN word segmentation systems in the open track on 3 out of the 4 test corpora. Additionally, our method can be conveniently integrated with any other Chinese morphological systems as a post-processing module leading to significant improvement in performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Wu, A.: Chinese Word Segmentation in MSR-NLP. In: Proc. of SIGHAN Workshop, Sapporo, Japan, pp. 127–175 (2003)
Google Scholar
Zhou, G., Su, J.: A Chinese Efficient Analyzer Integrating Word Segmentation, Part-Of-Speech Tagging, Partial Parsing and Full Parsing. In: Proc. Of SIGHAN Workshop, Sapporo, Japan, pp. 78–83 (2003)
Google Scholar
Zhang, H., Yu, H.-K., et al.: HHMM-based Chinese Lexical Analyzer ICTCLAS. In: Proc. Of SIGHAN Workshop, Sapporo, Japan, pp. 184–187 (2003)
Google Scholar
Xue, N., Shen, L.: Chinese Word Segmentation as LMR Tagging. In: Proc. Of SIGHAN Workshop, Sapporo, Japan, pp. 176–179 (2003)
Google Scholar
Ng, H.T., Low, J.K.: Chinese Part-of-Speech Tagging: One-at-a-Time or All-at-Once? Word-Based or Character-Based? In: Proc. of EMNLP, Barcelona, Spain, pp. 277–284 (2004)
Google Scholar
Nakagawa, T.: Chinese and Japanese Word Segmentation Using Word-level and Character-level Information. In: Proc. of the 20^th COLING, Geneva, Switzerland, pp. 466–472 (2004)
Google Scholar
Fu, G., Luke, K.-K.: A Two-stage Statistical Word Segmentation System for Chinese. In: Proc. Of SIGHAN Workshop, Sapporo, Japan, pp. 156–157 (2003)
Google Scholar
Gao, J., Wu, A., Huang, C.-N., et al.: Adaptive Chinese Word Segmentation. In: Proc. of 42^nd ACL, Barcelona, Spain, pp. 462–469 (2004)
Google Scholar
Sproat, R., Emerson, T.: The First International Chinese Word Segmentation Bakeoff. In: Proc. Of SIGHAN Workshop, Sapporo, Japan, pp. 133–143 (2003)
Google Scholar
Luo, X.: A Maximum Entropy Chinese Character-based Parser. In: Proc. of EMNLP, Sapporo, Japan, pp. 192–199 (2003)
Google Scholar
Jin, H., Wong, K.-F.: A Chinese Dictionary Construction Algorithm for Information Retrieval. ACM Transactions on Asian Language Information Processing 1(4), 281–296 (2002)
Article Google Scholar
Meng, Y., Yu, H., Nishino, F.: Chinese New Word Identification Based on Character Parsing Model. In: Proc. of 1^st IJCNLP, Hainan, China, pp. 489–496 (2004)
Google Scholar
S. Yu., H. Duan., et al.: 北京大学现代汉语语料库基本加工规范. 中文信息学报 v(5), pp 49–64, 58–65 (2002)
Google Scholar
Sun, M., Sou, B.K.T.: Ambiguity Resolution in Chinese Word Segmentation. In: Proc. of 10^th Pacific Asia Conference on Language, Information & Computation, pp. 121–126 (1995)
Google Scholar
Xue, N., Chiou, F.-D., Palmer, M.: Building a Large-scale Annotated Chinese Corpus. In: Proc. of the 19^th COLING, Taibei, Taiwan (2002)
Google Scholar
Goh, C.-L., Asahara, M., Matsumoto, Y.: Chinese Unknown Word Identification Using Character-based Tagging and Chunking. In: Proc. of the 41^st ACL, Interactive Poster/Demo Sessions, Sapporo, Japan, pp. 197–200 (2003)
Google Scholar
Luo, S., Sun, M.: Two-character Chinese Word Extraction Based on Hybrid of Internal and Contextual Measure. In: Proc. of the 2^nd SIGHAN Workshop, Sapporo, Japan, pp. 20–30 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Fujitsu R&D Center Co., Ltd, Room B1003, Eagle Run Plaza, No. 26 Xiaoyun Road, Chaoyang District, Bejing, 100016, P. R. China
Yao Meng, Hao Yu & Fumihito Nishino

Authors

Yao Meng
View author publications
You can also search for this author in PubMed Google Scholar
Hao Yu
View author publications
You can also search for this author in PubMed Google Scholar
Fumihito Nishino
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Language Technology, Macquarie University, 2019, Sydney, NSW, Australia
Robert Dale
Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong
Kam-Fai Wong
Institute for Infocomm Research, 21, Heng Mui Keng Terrace, 119613, Singapore
Jian Su
Language Information Sciences Research Centre, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong
Oi Yee Kwong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Meng, Y., Yu, H., Nishino, F. (2005). A Lexicon-Constrained Character Model for Chinese Morphological Analysis. In: Dale, R., Wong, KF., Su, J., Kwong, O.Y. (eds) Natural Language Processing – IJCNLP 2005. IJCNLP 2005. Lecture Notes in Computer Science(), vol 3651. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11562214_48

Download citation

DOI: https://doi.org/10.1007/11562214_48
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29172-5
Online ISBN: 978-3-540-31724-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics