Skip to main content

A Lexicon-Constrained Character Model for Chinese Morphological Analysis

  • Conference paper
Natural Language Processing – IJCNLP 2005 (IJCNLP 2005)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3651))

Included in the following conference series:

  • 1553 Accesses

Abstract

This paper proposes a lexicon-constrained character model that combines both word and character features to solve complicated issues in Chinese morphological analysis. A Chinese character-based model constrained by a lexicon is built to acquire word building rules. Each character in a Chinese sentence is assigned a tag by the proposed model. The word segmentation and part-of-speech tagging results are then generated based on the character tags. The proposed method solves such problems as unknown word identification, data sparseness, and estimation bias in an integrated, unified framework. Preliminary experiments indicate that the proposed method outperforms the best SIGHAN word segmentation systems in the open track on 3 out of the 4 test corpora. Additionally, our method can be conveniently integrated with any other Chinese morphological systems as a post-processing module leading to significant improvement in performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Wu, A.: Chinese Word Segmentation in MSR-NLP. In: Proc. of SIGHAN Workshop, Sapporo, Japan, pp. 127–175 (2003)

    Google Scholar 

  2. Zhou, G., Su, J.: A Chinese Efficient Analyzer Integrating Word Segmentation, Part-Of-Speech Tagging, Partial Parsing and Full Parsing. In: Proc. Of SIGHAN Workshop, Sapporo, Japan, pp. 78–83 (2003)

    Google Scholar 

  3. Zhang, H., Yu, H.-K., et al.: HHMM-based Chinese Lexical Analyzer ICTCLAS. In: Proc. Of SIGHAN Workshop, Sapporo, Japan, pp. 184–187 (2003)

    Google Scholar 

  4. Xue, N., Shen, L.: Chinese Word Segmentation as LMR Tagging. In: Proc. Of SIGHAN Workshop, Sapporo, Japan, pp. 176–179 (2003)

    Google Scholar 

  5. Ng, H.T., Low, J.K.: Chinese Part-of-Speech Tagging: One-at-a-Time or All-at-Once? Word-Based or Character-Based? In: Proc. of EMNLP, Barcelona, Spain, pp. 277–284 (2004)

    Google Scholar 

  6. Nakagawa, T.: Chinese and Japanese Word Segmentation Using Word-level and Character-level Information. In: Proc. of the 20th COLING, Geneva, Switzerland, pp. 466–472 (2004)

    Google Scholar 

  7. Fu, G., Luke, K.-K.: A Two-stage Statistical Word Segmentation System for Chinese. In: Proc. Of SIGHAN Workshop, Sapporo, Japan, pp. 156–157 (2003)

    Google Scholar 

  8. Gao, J., Wu, A., Huang, C.-N., et al.: Adaptive Chinese Word Segmentation. In: Proc. of 42nd ACL, Barcelona, Spain, pp. 462–469 (2004)

    Google Scholar 

  9. Sproat, R., Emerson, T.: The First International Chinese Word Segmentation Bakeoff. In: Proc. Of SIGHAN Workshop, Sapporo, Japan, pp. 133–143 (2003)

    Google Scholar 

  10. Luo, X.: A Maximum Entropy Chinese Character-based Parser. In: Proc. of EMNLP, Sapporo, Japan, pp. 192–199 (2003)

    Google Scholar 

  11. Jin, H., Wong, K.-F.: A Chinese Dictionary Construction Algorithm for Information Retrieval. ACM Transactions on Asian Language Information Processing 1(4), 281–296 (2002)

    Article  Google Scholar 

  12. Meng, Y., Yu, H., Nishino, F.: Chinese New Word Identification Based on Character Parsing Model. In: Proc. of 1st IJCNLP, Hainan, China, pp. 489–496 (2004)

    Google Scholar 

  13. S. Yu., H. Duan., et al.: 北京大学现代汉语语料库基本加工规范. 中文信息学报 v(5), pp 49–64, 58–65 (2002)

    Google Scholar 

  14. Sun, M., Sou, B.K.T.: Ambiguity Resolution in Chinese Word Segmentation. In: Proc. of 10th Pacific Asia Conference on Language, Information & Computation, pp. 121–126 (1995)

    Google Scholar 

  15. Xue, N., Chiou, F.-D., Palmer, M.: Building a Large-scale Annotated Chinese Corpus. In: Proc. of the 19th COLING, Taibei, Taiwan (2002)

    Google Scholar 

  16. Goh, C.-L., Asahara, M., Matsumoto, Y.: Chinese Unknown Word Identification Using Character-based Tagging and Chunking. In: Proc. of the 41st ACL, Interactive Poster/Demo Sessions, Sapporo, Japan, pp. 197–200 (2003)

    Google Scholar 

  17. Luo, S., Sun, M.: Two-character Chinese Word Extraction Based on Hybrid of Internal and Contextual Measure. In: Proc. of the 2nd SIGHAN Workshop, Sapporo, Japan, pp. 20–30 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Meng, Y., Yu, H., Nishino, F. (2005). A Lexicon-Constrained Character Model for Chinese Morphological Analysis. In: Dale, R., Wong, KF., Su, J., Kwong, O.Y. (eds) Natural Language Processing – IJCNLP 2005. IJCNLP 2005. Lecture Notes in Computer Science(), vol 3651. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11562214_48

Download citation

  • DOI: https://doi.org/10.1007/11562214_48

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29172-5

  • Online ISBN: 978-3-540-31724-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics