Matching Pattern Acquisition Approach for Ancient Chinese Treebank Construction
Matching Pattern (MP) is a sequence of words or part-of-speech (POS), sampled from clauses, and MP acquisition is an effective approach for ancient Chinese treebank construction. This approach uses the typical characteristics of ancient Chinese short-clauses and strong-patterns, and lays down the syntactic annotation process of the treebank construction in three stages. These stages involve: (1) obtaining weighted MPs with a syntactic skeleton; (2) applying these MPs to match the clauses; and (3) generating syntactic structures of these clauses according to the syntactic skeleton of the MP. The syntactic skeletons are constructed based on the Sentence-based Grammar in our experiments. The MP-based parsing procedures are implemented on both clause and fragment units. Experiments on corpora extracted from Yili and Zuozhuan show that an integrated algorithm, involving both clause and fragment units, can achieve a performance of 99.07%/82.76% and 97.25%/77.77% for coverage/precision, respectively.
KeywordsMatching Pattern Ancient Chinese Treebank Treebank Construction Sentence-based Grammar
Unable to display preview. Download preview PDF.
- 1.Hu, X., Williamson, N., McLaughlin, J.: Sheffield corpus of chinese for diachronic linguistic study1. Literary and Linguistic Computing 20(3), 281–293 (2005)Google Scholar
- 2.Peng, W., He, J., Song, J.: The design and implement of diagrammatical sentence-based grammar parsing system. In: 4th International Conference of Digital Archives and Digital Humanities. Research Center for Digital Humanities, National Taiwan University (2012)Google Scholar
- 3.Peng, W., Song, J., Sui, Z., Guo, D.: Formal schema of diagrammatic chinese syntactic analysis. In: Workshop on Chinese Lexical Semantics. pp. 701–710. Springer (2015)Google Scholar
- 4.Peng, W., Song, J., Wang, N.: Issues on formalization of chinese syntactic analysis. Journal of Chinese Information Processing 30(3), 175–180 (2016)Google Scholar
- 5.Shi, M., Chen, X., Li, B.: Crf based research on a unified approach to word segmentation and pos tagging for pre-qin chinese. Journal of Chinese Information Processing 2(24), 39–45 (2010)Google Scholar
- 6.Song, J.h., Hu, J.j., Meng, P.s., Wang, N.: The construction of corpora in a classic-cotemporary chinese parallel corpus. Modern Educational Technology 1, 027 (2008)Google Scholar
- 7.Wei, P.c., Thompson, P., Liu, C.h., Huang, C.R., Sun, C.: Historical corpora for synchronic and diachronic linguistics studies. Computational Linguistics and Chinese Language Processing 2(1), 131–145 (1997)Google Scholar
- 8.Zhao, M., Peng, W., Song, J., Yang, T.: Development and optimization of syntax tagging tool on diagrammatic treebank. Journal of Chinese Information Processing 28(6), 26–33 (2014)Google Scholar