Abstract
Treebank is an important resource for Natural Language Processing. Most existing treebanks are monolingual, but bilingual treebanks are the important basis of syntactical model in machine translation. In this paper, a bilingual phrase structure Treebank aimed for the application of machine translation was preliminarily constructed, which chose POS tagset and syntactic tagset of U-Penn English Treebank and Chinese Treebank as its tagging system. Chinese- English sentence pairs which were drawn from machine translation evaluation data in the treebank were pre-processed, with POS tagged, phrase structure annotated, and all processed data were proofread. Through the analysis of phrase structures which were modified in the proofreading process, it was found that Chinese functional words usages play an important role in Chinese phrase structure grammar.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Du, J.H., Zhang, M., Zong, C.Q., Sun L.: Opportunities and Challenges for Machine Translation in China-Summary and Prospects for the Eighth Workshop on Machine Translation. Journal of Chinese Information Processing 27(4) 1–8 (2013). (in Chinese)
Wang, Y.L., Ji, D.H.: A Review of Chinese Treebanks. Contemporary Linguistics 11(1), 47–55 (2009). (in Chinese)
Xue,N., Xia, F., Chiou, F. D., et al.: The Penn Chinese Treebank: Phrase Structure Annotation of a Large Corpus. Natural Language Engineering 10(4), 1–30 (2004)
Chen, F.Y., Jiang, B.F., Chen K.J., et al.: The Construction of Sinica Treebank. Computational Linguistics and Chinese Language Processing 4(2), 87–104 (1999). (in Chinese)
Zhou, Q., Zhang, W., Yu, S.W.: Building a Chinese Treebank. Journal of Chinese Information Processing 11(4) 42–51 (1997). (in Chinese)
Zhou, Q.: Annotation Scheme for Chinese Treebank. Journal of Chinese Information Processing 18(4), 1–8 (2004). (in Chinese)
Jin, G.J., Xiao, H., Fu, L., et al.: Deep Processing and Construction of Modern Chinese Corpus Applied Linguistics 54(2), 111–120 (2005). (in Chinese)
Liu, T., Ma, J.S., Li, S.: Building a dependency Treebank for improving Chinese Parser. Journal of Chinese Language and Computing. 16(4), 207–224 (2006)
Xia, F.: The Part-of-speechTagging Guidelines forthe Penn Chinese Treebank(3.0). http://www.cis.upenn.edu/~chinese/
Ann, B., Mark, F., Karen K., et al.: Bracketing Guidelines for Treebank II Style Penn Treebank Project. http://www.cis.upenn.edu/~english/etb.html
Xue, N.W., Xia, F.: The Bracketing Guidelines for the Penn Chinese Treebank (3.0). http://www.cis.upenn.edu/~chinese/ctb.html
Xia, F., Mattha, P., Xue, N.W., et al.: Developing Guidelines and Ensuring Consistency for Chinese Text Annotation. In: Proceedings of the 2nd International Conference on Language Resources and Evaluation, Athens, Greece (2000)
Zan, H.Y., Zhang, K.L., Zhu, X.F., Yu, S.W.: Research on the Chinese Function Word Usage Knowledge Base. International Journal on Asian Language Processing 21(4), 185–198 (2011)
Zan, H.Y., Zhou, L.J., Zhang, K.L.: Modern Chinese Conjunction Phrase Recognition Based on Usage. Journal of Chinese Information Processing 26(6), 72–78 (2012). (in Chinese)
Zan, H.Y., Zhang, J.J., Lou, X.P.: Studies on the Application of Chinese Functional Words’ Usages IN Dependency Parsing. Journal of Chinese Information Processing 27(5), 35–42 (2013). (in Chinese)
Pang, Y.Y.: Studies on the Usage of Preposition and Conjunction in Phrase Structure Syntactic Parsing. Master Thesis. Zhengzhou University, Zhengzhou (2013). (in Chinese)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Zhang, K., Zan, H., Han, Y., Mu, L. (2014). Preliminary Study on the Construction of Bilingual Phrase Structure Treebank. In: Su, X., He, T. (eds) Chinese Lexical Semantics. CLSW 2014. Lecture Notes in Computer Science(), vol 8922. Springer, Cham. https://doi.org/10.1007/978-3-319-14331-6_40
Download citation
DOI: https://doi.org/10.1007/978-3-319-14331-6_40
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-14330-9
Online ISBN: 978-3-319-14331-6
eBook Packages: Computer ScienceComputer Science (R0)