Recursive Part-of-Speech Tagging Using Word Structures

  • Samuel W. K. Chan
  • Mickey W. C. Chong
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8082)


This research takes advantage of word structures and produces a good estimate of part-of-speech tags of Chinese compound words before they are fed into a tagger. The approach relies on a set of features from Chinese morphemes as well as a set of collocation markers which provide hints on the syntactic categories of compound words. A recursive inferential mechanism is devised to alleviate the riffle effect from changes made at its neighbors during tagging. The approach is justified with a compound words database with more than 53,500 words. Experimental results with 500,000 words show the approach outperforms its counterparts.


Part-of-speech tagging Tree-based classifier Chinese morphemes 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Chung, Y.-S., Chen, K.-J.: Analysis of Chinese morphemes and its application to sense and part-of-speech prediction for Chinese compounds. In: Proceedings of the Joint Conference of 23rd International Conference on the Computer Processing of Oriental Languages (2010)Google Scholar
  2. 2.
    Dagan, I., Lee, L., Pereira, F.: Similarity-based models of word co-occurrence probabilities. Machine Learning Journal 34(1-3), 43–69 (1999)zbMATHCrossRefGoogle Scholar
  3. 3.
    Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)zbMATHGoogle Scholar
  4. 4.
    Frege, G.: On sense and reference. The Philosophical Review 57, 207–230 (1948)CrossRefGoogle Scholar
  5. 5.
    Gao, J., Li, M., Wu, A., Huang, C.-N.: Chinese word segmentation and named entity recognition: A pragmatic approach. Computational Linguistics 31(4), 531–574 (2006)CrossRefGoogle Scholar
  6. 6.
    Lin, D., Zhou, S., Qin, L., Zhou, M.: Identifying synonyms among distributionally similar words. In: Proceedings of the 18th International Joint Conference on Artificial Intelligence, pp. 1492–1493 (2003)Google Scholar
  7. 7.
    Liu, Y., Yu, S., Zhu, X.: Construction of the contemporary Chinese compound words database and its application. In: Zhang, P. (ed.) The Contemporary Educational Techniques and Teaching Chinese as a Foreign Language, pp. 273–278. Guangxi Normal University Press (2000)Google Scholar
  8. 8.
    Ng, H.T., Low, J.K.: Chinese part-of-speech tagging: One-at-a-time or all-at-once? Word-based or character-based? In: Proceedings of EMNLP, Barcelona, Spain (2004)Google Scholar
  9. 9.
    Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann (1993)Google Scholar
  10. 10.
    Tseng, H., Chen, K.-J.: Design of Chinese morphological analyzer. In: Proceedings of the First SIGHAN Workshops on Chinese Language Processing (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Samuel W. K. Chan
    • 1
  • Mickey W. C. Chong
    • 1
  1. 1.Dept. of Decision SciencesThe Chinese University of Hong KongHong Kong

Personalised recommendations