Predicting Part-of-Speech Tags and Morpho-Syntactic Relations Using Similarity-Based Technique

  • Samuel W. K. Chan
  • Mickey M. C. Chong
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7978)


This paper describes a similarity-based technique which produces a good estimate of part-of-speech tags and their morpho-syntactic relations of Chinese compound words before they are fed into a tagger. The technique relies on a set of features from Chinese morphemes as well as a set of collocation markers which provide hints on the syntactic categories of the compound words. The technique is trained with a compound words database with more than 53,500 disyllabic words. Experimental results show the tagger with the technique outperforms its counterpart.


Part-of-speech tagging Chinese morphemes Chinese word structures Machine learning 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Chen, K.-J., Bai, M.-H.: Unknown word detection for Chinese by a corpus-based learning method. Computational Linguistics and Chinese Language Processing 3(1), 27–44 (1998)MathSciNetGoogle Scholar
  2. 2.
    Chen, K.-J., Chen, C.-J.: Automatic semantic classification for Chinese unknown compound nouns. In: COLING 2000, pp. 173–179 (2000)Google Scholar
  3. 3.
    Chinese Word Sketch (2006),
  4. 4.
    Chung, Y.-S., Chen, K.-J.: Analysis of Chinese morphemes and its application to sense and part-of-speech prediction for Chinese compounds. In: Proceedings of the Joint Conference of 23rd International Conference on the Computer Processing of Oriental Languages (2010)Google Scholar
  5. 5.
    Ciaramita, M., Johnson, M.: Supersense tagging of unknown nouns in WordNet. In: Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, pp. 168–175 (2003)Google Scholar
  6. 6.
    Curran, J.R., Moens, M.: Improvements in automatic thesaurus extraction. In: Proceedings of the ACL 2002 Workshop on Unsupervised Lexical Acquisition, Philadelphia, Pennsylvania, pp. 59–66 (2002)Google Scholar
  7. 7.
    Curran, J.R.: Supersense tagging of unknown nouns using semantic similarity. In: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, Ann Arbor, pp. 26–33 (2005)Google Scholar
  8. 8.
    Dagan, I., Lee, L., Pereira, F.: Similarity-based models of word co-occurrence probabilities. Machine Learning Journal 34(1-3), 43–69 (1999)zbMATHCrossRefGoogle Scholar
  9. 9.
    Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)zbMATHGoogle Scholar
  10. 10.
    Frege, G.: On sense and reference. The Philosophical Review 57, 207–230 (1948)CrossRefGoogle Scholar
  11. 11.
    Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences 55(1), 119–139 (1997)MathSciNetzbMATHCrossRefGoogle Scholar
  12. 12.
    Gao, J., Li, M., Wu, A., Huang, C.-N.: Chinese word segmentation and named entity recognition: A pragmatic approach. Computational Linguistics 31(4), 531–574 (2006)CrossRefGoogle Scholar
  13. 13.
    Geffet, M., Dagan, I.: The distributional inclusion hypotheses and lexical entailment. In: Proceedings of the 43rd Annual Meeting of the ACL, pp. 107–114 (2005)Google Scholar
  14. 14.
    Harris, Z.: Mathematical Structures of Language. Wiley, NY (1968)zbMATHGoogle Scholar
  15. 15.
    Kwong, O.Y., Tsou, B.K.: Categorical fluidity in Chinese and its implications for part-of-speech tagging. In: Proceedings of the Conference on European Chapter of the Association for Computational Linguistics, pp. 115–118 (2003)Google Scholar
  16. 16.
    Lin, D.: An information-theoretic definition of similarity. In: Proceedings of 15th International Conference on Machine Learning, pp. 296–304 (1998)Google Scholar
  17. 17.
    Lin, D., Zhou, S., Qin, L., Zhou, M.: Identifying synonyms among distributionally similar words. In: Proceedings of the 18th International Joint Conference on Artificial Intelligence, pp. 1492–1493 (2003)Google Scholar
  18. 18.
    Liu, Y., Yu, S., Zhu, X.: Construction of the contemporary Chinese compound words database and its application. In: Zhang, P. (ed.) The Contemporary Educational Techniques and Teaching Chinese as a Foreign Language, pp. 273–278. Guangxi Normal University Press (2000)Google Scholar
  19. 19.
    Mei, J., Zhu, Y., Gao, Y., Ying, H.: Cilin《同 義 詞 詞 林》梅家駒等 商務印書館 (1984) (in Chinese)Google Scholar
  20. 20.
    Ng, H.T., Low, J.K.: Chinese part-of-speech tagging: One-at-a-time or all-at-once? Word-based or character-based? In: Proceedings of EMNLP, Barcelona, Spain (2004)Google Scholar
  21. 21.
    Packard, J.L.: The Morphology of Chinese: A Linguistic and Cognitive Approach. Cambridge University Press (2000)Google Scholar
  22. 22.
    Pereira, F., Tishby, N., Lee, L.: Distributional clustering of similar words. In: Proceedings of the 30th Annual Meeting of the Association for Computational Linguistics, pp. 183–190 (1993)Google Scholar
  23. 23.
    Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann (1993)Google Scholar
  24. 24.
    Resnik, P.: Semantic similarity in a taxonomy: An information-based measure and its application to problem of ambiguity in natural language. Journal of Artificial Intelligence Research 11, 95–130 (1998)Google Scholar
  25. 25.
    Schütze, H.: Automatic word sense discrimination. Computational Linguistics 24(1), 97–124 (1992)Google Scholar
  26. 26.
    Tseng, H., Chen, K.-J.: Design of Chinese morphological analyzer. In: Proceedings of the First SIGHAN Workshops on Chinese Language Processing (2002)Google Scholar
  27. 27.
    Weeds, J., Weir, D.: Co-occurrence retrieval: A flexible framework for lexical distributional similarity. Computational Linguistics 31(4), 439–475 (2006)CrossRefGoogle Scholar
  28. 28.
    Widdows, D.: Unsupervised methods for developing taxonomies by combining syntactic and statistical information. In: Proceedings of the 2003 Conference of the North American Chapter of the Association For Computational Linguistics on Human Language Technology, Morristown, NJ, pp. 197–204 (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Samuel W. K. Chan
    • 1
  • Mickey M. C. Chong
    • 1
  1. 1.Dept. of Decision SciencesThe Chinese University of Hong KongHong Kong

Personalised recommendations