Shallow Case Role Annotation Using Two-Stage Feature-Enhanced String Matching

  • Samuel W. K. Chan
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3878)


A two-stage annotation method for identification of case roles in Chinese sentences is proposed. The approach makes use of a feature-enhanced string matching technique which takes full advantage of a huge number of sentence patterns in a Treebank. The first stage of the approach is a coarse-grained syntactic parsing which is complementary to a semantic dissimilarities analysis in its latter stage. The approach goes beyond shallow parsing to a deeper level of case role identification, while preserving robustness, without being bogged down into a complete linguistic analysis. The ideas described have been implemented and an evaluation of 5,000 Chinese sentences is examined in order to justify its significances.


Edit Distance Edit Operation Semantic Classis Countable Noun Input Sentence 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Abney, S.: Parsing by chunks. In: Berwick, R., Abney, S., Tenny, C. (eds.) Principle-Based Parsing. Kluwer Academic, Dordrecht (1991)Google Scholar
  2. 2.
    Bod, R., Scha, R., Sima’an, K.: Data-Oriented Parsing. CSLI, Stanford (2003)Google Scholar
  3. 3.
    Chen, F.-Y., Tsai, P.-F., Chen, K.-J., Huang, C.-R.: Sinica Treebank. Computational Linguistics and Chinese Language Processing 4(2), 87–103 (2000) (in Chinese)Google Scholar
  4. 4.
    Chen, K.-J., Huang, C.-R., Chang, L.-P., Hsu, H.-L.: Sinica Corpus: Design Methodology for Balanced Corpora. In: Proceedings of the 11th Pacific Asia Conference on Language, Information, and Computation (PACLIC II), Seoul Korea, pp. 167–176 (1996)Google Scholar
  5. 5.
    Church, K.: A stochastic parts program and noun phrase parser for unrestricted text. In: Proceedings of Second Conference on Applied Natural Language Processing, Austin, Texas (1988)Google Scholar
  6. 6.
    CKIP: Sinica Chinese Treebank: An Introduction of Design Methodology. Academic Sinica, New York (2004)Google Scholar
  7. 7.
    Cook, W.A.: Case Grammar Applied. Summer Institute Linguistics (1998)Google Scholar
  8. 8.
    Dowty, D.: Thematic proto-roles and argument selection. Language 67, 547–619 (1991)CrossRefGoogle Scholar
  9. 9.
    Fillmore, C.J.: The case for case. In: Bach, E., Harms, R.T. (eds.) Universals in Linguistic Theory, Holt, Rinehart & Winston, pp. 1–90 (1968)Google Scholar
  10. 10.
    Gee, J., Grosjean, F.: Performance structures: A psycholinguistic and linguistic appraisal. Cognitive Psychology 15(4), 411–458 (1983)CrossRefGoogle Scholar
  11. 11.
    Guo, R.: Xiandai Hanyu Cilei Yanjiu. Commercial Press (2002) [In Chinese: 郭銳 (2002)《現代漢語詞類研究》商務印書館]Google Scholar
  12. 12.
    Gusfield, D.: Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, Cambridge (1997)zbMATHCrossRefGoogle Scholar
  13. 13.
    Her, O.S.: Grammatical Functions and Verb Subcategorization in Mandarin Chi-nese. The Crane publishing Co. (1990)Google Scholar
  14. 14.
    Jackendoff, R.: Semantics and Cognition. MIT Press, Cambridge (1983)Google Scholar
  15. 15.
    Kurohashi, S., Nagao, M.: A method of case structure analysis for Japanese sentences based on examples in case frame dictionary. IEICE Transactions on Informa-tion and Systems E77-D(2), 227–239 (1994)Google Scholar
  16. 16.
    Li, Y.C.: An investigation of Case in Chinese Grammar. Set On Hall University Press (1971)Google Scholar
  17. 17.
    Marcus, M., Santorini, B., Marcinkiewicz, M.: Building a large annotated cor-pus of English: The Penn Treebank. Computational Linguistics 19(2), 313–330 (1993)Google Scholar
  18. 18.
    Nagao, M., Tsujii, J., Tanaka, K.: Analysis of Japanese sentences by using semantic and contextual information-semantic analysis. Information Processing Society of Japan 17(1), 10–18 (1976)Google Scholar
  19. 19.
    Ramshaw, L.A., Marcus, M.P.: Text chunking using transformation-based learning. In: Proceedings of the Third Workshop on Very Large Corpora, pp. 82–94 (1995)Google Scholar
  20. 20.
    Sima’an, K.: Tree-gram parsing: lexical dependencies and structural relations. In: Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics, Hong Kong, pp. 53–60 (2000)Google Scholar
  21. 21.
    Simmons, R.F.: Natural language question answering systems. Communications of ACM 13, 15–30 (1970)CrossRefGoogle Scholar
  22. 22.
    Somers, H.L.: The use of verb features in arriving at a meaning representation. Linguistics 20, 237–265 (1982)CrossRefGoogle Scholar
  23. 23.
    Tsay, Y.T., Tsai, W.H.: Model-guided attributed string matching by split-and-merge for shape recognition. International Journal of Pattern Recognition and Artificial Intelligence 3(2), 159–179 (1989)CrossRefGoogle Scholar
  24. 24.
    Utsuro, T., Matsumoto, Y., Nagao, M.: Verbal case frame acquisition from bilingual corpora. In: Proceedings of the Thirteenth International Joint Conference on Arti-ficial Intelligence, vol. 2, pp. 1150–1156 (1993)Google Scholar
  25. 25.
    Wagner, R.A., Fischer, M.J.: The string-to-string correction problem. Journal of the Association for Computing Machinery 21(1), 168–173 (1974)zbMATHMathSciNetGoogle Scholar
  26. 26.
    Weischedel, R., Meteer, M., Schwartz, R., Ramshaw, L., Palmucci, J.: Coping with ambiguity and unknown words through probabilistic models. Computational Lin-guistics 19(2), 359–382 (1993)Google Scholar
  27. 27.
    Wilks, Y.A.: Grammar, Meaning and the Machine Analysis of Language. Routledge (1972)Google Scholar
  28. 28.
    Xia, F., Palmer, M., Xue, N., Okurowski, M.E., Kovarik, J., Chiou, F.-D., Huang, S., Kroch, T., Marcus, M.: Developing Guidelines and Ensuring Consistency for Chinese Text Annotation. In: Proceedings of the second International Conference on Lan-guage Resources and Evaluation (LREC-2000), Athens, Greece (2000)Google Scholar
  29. 29.
    Yu, S.: The Grammatical Knowledge-Base of Contemporary Chinese – A Com-plete Specification. TsingHua University Press (2003) [In Chinese: 俞士汶 (2003)《現代漢語語法信息詞典詳解》清華大學出版社]Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Samuel W. K. Chan
    • 1
  1. 1.Dept. of Decision SciencesThe Chinese University of Hong KongHong Kong SARChina

Personalised recommendations