Advertisement

An Integrated Approach to Chinese Word Segmentation and Part-of-Speech Tagging

  • Maosong Sun
  • Dongliang Xu
  • Benjamin K. Tsou
  • Huaming Lu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4285)

Abstract

This paper discusses and compares various integration schemes of Chinese word segmentation and part-of-speech tagging in the framework of true-integration and pseudo-integration. A true-integration approach, named ‘the divide-and-conquer integration’, is presented. The experiments based on a manually word-segmented and part-of-speech tagged corpus with about 5.8 million words show that this true integration achieves 98.61% F-measure in word segmentation, 95.18% F-measure in part-of-speech tagging, and 93.86% F-measure in word segmentation and part-of-speech tagging, outperforming all other kinds of combinations to some extent. The experimental results demonstrate the potential for further improving the performance of Chinese word segmentation and part-of-speech tagging.

Keywords

Chinese word segmentation part-of-speech tagging integration smoothing 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Sproat, R., Emerson, T.: The First International Chinese Word Segmentation Bakeoff. In: Proceedings of the Second SIHAN Workshop on Chinese Language Processing, Sapporo, Japan, pp. 133–143 (2003)Google Scholar
  2. 2.
    Emerson, T.: The Second International Chinese Word Segmentation Bakeoff. In: Proceedings of the Third SIHAN Workshop on Chinese Language Processing, Jeju, Korea (2005)Google Scholar
  3. 3.
    Liang, N.Y.: Knowledge of Chinese Word Segmentation. Journal of Chinese Information Processing 4(2), 29–33 (1990)Google Scholar
  4. 4.
    Sun, M.S., Lai, B.Y., et al.: Some Issues on Statistical Approach to Chinese Word Identification. In: Proceedings of the 3rd International Conference on Chinese Information Processing, Beijing, pp. 246–253 (1992)Google Scholar
  5. 5.
    Chang, C.H., Chen, C.D.: A Study on Integrating Chinese Word Segmentation and Part-of-speech Tagging. Communications of COLIPS 3(2), 69–77 (1993)Google Scholar
  6. 6.
    Lai, B.Y., Sun, M.S., et al.: Tagging-based First Order Markov Model Approach to Chinese Word Identification. In: Proceedings of 1992 International Conference on Computer Processing of Chinese and Oriental Languages, Florida, USA (1992)Google Scholar
  7. 7.
    Bai, S.H.: The Method of Integration of Word Segmentation and Part-of-speech Tagging in Chinese Texts. In: Advance and Application of Computational Linguistics, pp. 56–61. Tsinghua University Press, Beijing (1995)Google Scholar
  8. 8.
    Lai, B.Y., Sun, M.S., et al.: Chinese Word Segmentation and Part-of-speech Tagging in One Step. In: Proceedings of International Conference: 1997 Research on Computational Linguistics, Taipei, pp. 229–236 (1997)Google Scholar
  9. 9.
    Wu, A.D., Jiang, Z.X.: Word Segmentation in Sentence Analysis. In: Proceedings of the 1998 International Conference on Chinese Information Processing, Beijing, pp. 169–180 (1998)Google Scholar
  10. 10.
    Sun, M.S., Xu, D.L., Tsou, B.K.: Integrated Chinese Word Segmentation and Part-of-speech Tagging Based on the Divide-and-Conquer Strategy. In: Proceedings of IEEE-NLPKE, Beijing (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Maosong Sun
    • 1
  • Dongliang Xu
    • 1
  • Benjamin K. Tsou
    • 2
  • Huaming Lu
    • 3
  1. 1.National Lab. of Intelligent Tech. & SystemsTsinghua UniversityBeijingChina
  2. 2.Language Information Sciences Research CentreCity University of Hong Kong 
  3. 3.Beijing Information Science and Technology UniversityBeijingChina

Personalised recommendations