An Integrated Approach to Chinese Word Segmentation and Part-of-Speech Tagging

  • Maosong Sun
  • Dongliang Xu
  • Benjamin K. Tsou
  • Huaming Lu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4285)

Abstract

This paper discusses and compares various integration schemes of Chinese word segmentation and part-of-speech tagging in the framework of true-integration and pseudo-integration. A true-integration approach, named ‘the divide-and-conquer integration’, is presented. The experiments based on a manually word-segmented and part-of-speech tagged corpus with about 5.8 million words show that this true integration achieves 98.61% F-measure in word segmentation, 95.18% F-measure in part-of-speech tagging, and 93.86% F-measure in word segmentation and part-of-speech tagging, outperforming all other kinds of combinations to some extent. The experimental results demonstrate the potential for further improving the performance of Chinese word segmentation and part-of-speech tagging.

Keywords

Chinese word segmentation part-of-speech tagging integration smoothing 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Sproat, R., Emerson, T.: The First International Chinese Word Segmentation Bakeoff. In: Proceedings of the Second SIHAN Workshop on Chinese Language Processing, Sapporo, Japan, pp. 133–143 (2003)Google Scholar
  2. 2.
    Emerson, T.: The Second International Chinese Word Segmentation Bakeoff. In: Proceedings of the Third SIHAN Workshop on Chinese Language Processing, Jeju, Korea (2005)Google Scholar
  3. 3.
    Liang, N.Y.: Knowledge of Chinese Word Segmentation. Journal of Chinese Information Processing 4(2), 29–33 (1990)Google Scholar
  4. 4.
    Sun, M.S., Lai, B.Y., et al.: Some Issues on Statistical Approach to Chinese Word Identification. In: Proceedings of the 3rd International Conference on Chinese Information Processing, Beijing, pp. 246–253 (1992)Google Scholar
  5. 5.
    Chang, C.H., Chen, C.D.: A Study on Integrating Chinese Word Segmentation and Part-of-speech Tagging. Communications of COLIPS 3(2), 69–77 (1993)Google Scholar
  6. 6.
    Lai, B.Y., Sun, M.S., et al.: Tagging-based First Order Markov Model Approach to Chinese Word Identification. In: Proceedings of 1992 International Conference on Computer Processing of Chinese and Oriental Languages, Florida, USA (1992)Google Scholar
  7. 7.
    Bai, S.H.: The Method of Integration of Word Segmentation and Part-of-speech Tagging in Chinese Texts. In: Advance and Application of Computational Linguistics, pp. 56–61. Tsinghua University Press, Beijing (1995)Google Scholar
  8. 8.
    Lai, B.Y., Sun, M.S., et al.: Chinese Word Segmentation and Part-of-speech Tagging in One Step. In: Proceedings of International Conference: 1997 Research on Computational Linguistics, Taipei, pp. 229–236 (1997)Google Scholar
  9. 9.
    Wu, A.D., Jiang, Z.X.: Word Segmentation in Sentence Analysis. In: Proceedings of the 1998 International Conference on Chinese Information Processing, Beijing, pp. 169–180 (1998)Google Scholar
  10. 10.
    Sun, M.S., Xu, D.L., Tsou, B.K.: Integrated Chinese Word Segmentation and Part-of-speech Tagging Based on the Divide-and-Conquer Strategy. In: Proceedings of IEEE-NLPKE, Beijing (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Maosong Sun
    • 1
  • Dongliang Xu
    • 1
  • Benjamin K. Tsou
    • 2
  • Huaming Lu
    • 3
  1. 1.National Lab. of Intelligent Tech. & SystemsTsinghua UniversityBeijingChina
  2. 2.Language Information Sciences Research CentreCity University of Hong Kong 
  3. 3.Beijing Information Science and Technology UniversityBeijingChina

Personalised recommendations