Skip to main content

Sinica Treebank

Design criteria, representational issues and implementation

  • Chapter
Treebanks

Abstract

This paper describes the design criteria and annotation guidelines of the Sinica Treebank. The three design criteria are: Maximal Resource Sharing, Minimal Structural Complexity, and Optimal Semantic Information. One of the important design decisions guided by these criteria is the encoding of thematic role information. We discuss the representational and methodological issues based on our design criteria.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  • Böhmová, Alena, Eva Hajičová, et al. (2003). The Prague Dependency Tree-bank: a three-level annotation scenario. This Volume.

    Google Scholar 

  • Brants, Thorsten, Wojciech Skut, Hans Uszkoreit. (2003). “Syntactic Annotations of a German Newspaper Corpus.” In Abeillé (Ed). This volume.

    Google Scholar 

  • Chen, Feng-Yi, Pi-Fang Tsai, Keh-Jiann Chen, Chu-Ren Huang. (2000). Sinica Treebank. [in Chinese] Computational Linguistics and Chinese Language Processing. 4.2, p. 87–103.

    Google Scholar 

  • Chen, Keh-Jiann. (1996) A Model for Robust Chinese Parser. Computational Linguistics and Chinese Language Processing. 1.1, p. 183–204.

    Google Scholar 

  • Chen, Keh-Jiann. (1992). Design Concepts for Chinese Parsers. 3rd International Conference on Chinese Information Processing, p. 1–22.

    Google Scholar 

  • Chen, Keh-Jiann, Shing-Huan Liu, Li-Ping Chang, Yeh-Hao Chin. (1994). A Practical Tagger for Chinese Corpora. Proceedings of ROCLING VIL p. 111–126.

    Google Scholar 

  • Chen, Keh-Jiann, Shing-Huan Liu. (1992). Word Identification for Mandarin Chinese Sentences. Proceedings of COLING-92, p. 101–105.

    Google Scholar 

  • Chen, Keh-Jiann, Chi-Ching Luo, Zhao-Ming Gao, Ming-Chung Chang, Feng-Yi Chen, Chao-Ran Chen. (1999). The CKIP Chinese Treebank: Guidelines for Annotation. In Abeillé (Ed), p. 85–96.

    Google Scholar 

  • Chen, Keh-Jiann, Chu-Ren Huang, Li-Ping Chang, Hui-Li Hsu. (1996). Sinica Corpus: Design Methodology for Balanced Corpora. Proceedings of the 11th Pacific Asia Conference on Language, Information, and Computation (PACLIC II), Seoul Korea, p. 167–176.

    Google Scholar 

  • Chen, Keh-Jiann, Chu-Ren Huang. (1996). Information-based Case Grammar: A Unification-based Formalism for Parsing Chinese. Journal of Chinese Linguistics Monograph Series No. 9. Chu-Ren Huang, Keh-Jiann Chen, Benjamin K. T’sou Eds. Readings in Chinese Natural Language Processing. p. 23–45. Berkeley: JCL.

    Google Scholar 

  • Chen, Keh-Jiann, Chu-Ren Huang. (1994). Features Constraints in Chinese Language Parsing. Proceedings of ICCPOL’ 94. p. 223–228

    Google Scholar 

  • CKIP (Chinese Knowledge Information Processing). (1993). The Categorical Analysis of Chinese, [in Chinese] CKIP Technical Report 93-05. Nankang: Academia Sinica.

    Google Scholar 

  • Gazdar, G, E. Klein, G.K. Pullum, I. A. Sag. (1985). Generalized Phrase Structure Grammar. Cambridge: Blackwell, and Cambridge, Mass: Harvard University Press.

    Google Scholar 

  • Huang, Chu-Ren, Keh-Jiann Chen, Feng-Yi Chen, Keh-Jiann Chen, Zhao-Ming Gao, Kuang-Yu Chen. (2000). Sinica Treebank: Design Criteria, Annotation Guidelines, and On-line Interface. Proceedings of 2nd Chinese Language Processing Workshop(Held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics, ACL-2000). Hong Kong. p. 29–37.

    Google Scholar 

  • Huang, Chu-Ren, Keh-Jiann Chen, Feng-Yi Chen, Li-Li Chang. (1997). Segmentation Standard for Chinese Natural Language Processing. Computational Linguistics and Chinese Language Processing. 2.2. p. 47–62

    Google Scholar 

  • Lin, Fu-Wen. (1992). Some Reflections on the Thematic System of Information-based Case Grammar (ICG). [In Chinese.] CKIP Technical Report No. 92-01. Nankang: Academia Sinica.

    Google Scholar 

  • Marcus, Mitchell P., Beatrice Santorini, Mary Ann Marcinkiewicz. (1993). Building a large annotated corpus of English: The PENN Treebank. Computational Linguistics, 19.2. p. 313–330.

    Google Scholar 

  • Piollard, C., I. A. Sag. (1994). Head-Driven Phrase Structure Grammar. Stanford: Center for the Study of Language and Information, Chicago Press.

    Google Scholar 

  • Pustejovsky, J. (1985). The Generative Lexicon. MIT Press.

    Google Scholar 

  • Sag, Ivan, Gerald Gazdar, Thomas Wasow, and Steven Weisler. 1985. Coordination and How to Distinguish Categories. Natural Language and Linguistic Theory. 3. p. 117–171.

    Article  Google Scholar 

  • Tseng, Shin-Shyeng, Meng-Yuan Chang, Chin-Chun Hsieh, Keh-Jiann Chen. (1988). Approaches on An Experimental Chinese Electronic Dictionary. Proceedings of 1988 International Conference on Computer Processing of Chinese and Oriental Languages, p. 371–74.

    Google Scholar 

  • Uszkoreit, Hans (1986). Categorial Unification Grammars. Proceedings of COLING’86. Bonn: University of Bonn. Also appeared as Report No. CSLI-86-66. Stanford: Center for the Study of Language and Information.

    Google Scholar 

  • Xia, Fei. (2000a). The Segmentation Guidelines for the Penn Chinese Treebank (3.0). IRCS Report 00-06. Philadelphia, PA: University of Pennsylvania.

    Google Scholar 

  • Xia, Fei. (2000b). The Part-of-Speech Tagging Guidelines for the Penn Chinese Treebank (3.0). IRCS Report 00-07. Philadelphia, PA: University of Pennsylvania.

    Google Scholar 

  • Xia, Fei, Martha Palmer, Nianwen Xue, Mary Ellen Okurowski, John Kovarik, Fu-Dong Chiou, Shizhe Huang, Tony Kroch, Mitch Marcus. (2000). Developing Guidelines and Ensuring Consistency for Chinese Text Annotation. Proceedings of the Second International Conference on Language Resources and Evaluation (LREC-2000), Athens, Greece.

    Google Scholar 

  • Xia, Fei, Chunghye Han, Martha Palmer, Aravind Joshi. (2000). Comparing Lexicalized Treebank Grammars Extracted from Chinese, Korean, and English. Proceedings of 2nd Chinese Language Processing Workshop (Held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics, ACL-2000). p. 52–59. Hong Kong.

    Google Scholar 

  • Xue, Nianwen, Fei Xia. (2000). The Bracketing Guidelines for the Penn Chinese Treebank (3.0). IRCS Report 00-07. Philadelphia, PA: University of Pennsylvania.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Chen, KJ. et al. (2003). Sinica Treebank. In: Abeillé, A. (eds) Treebanks. Text, Speech and Language Technology, vol 20. Springer, Dordrecht. https://doi.org/10.1007/978-94-010-0201-1_13

Download citation

  • DOI: https://doi.org/10.1007/978-94-010-0201-1_13

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-1-4020-1335-5

  • Online ISBN: 978-94-010-0201-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics