Development of Traditional Mongolian Dependency Treebank

Su, Xiangdong; Gao, Guanglai; Yan, Xueliang

doi:10.1007/978-3-642-41491-6_23

Xiangdong Su²³,
Guanglai Gao²³ &
Xueliang Yan²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8202))

Included in the following conference series:

1586 Accesses
1 Citations

Abstract

This paper describes the development of Traditional Mongolian dependency treebank (TMDT) which aims to facilitate the dependency analysis on Traditional Mongolian. The annotation scheme of the dependency treebank is established according to Traditional Mongolian grammar and its usability in syntactic analysis. In the treebank, morphological and analytical information are annotated. At morphological level, a semi-automation strategy is adopted. Part-Of-Speech (POS) and stem of each word in the sentence are tagged and extracted respectively with automation tools, and then manually corrected. At analytical level, the dependencies in the sentence are only annotated manually according to constituent structure and the annotation scheme. This treebank formulates the foundation of dependency parsing on Traditional Mongolian and can be extended to a multi-dependency Treebank.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Marcus, M., Santorini, B., Marcinkiewicz, M.: Building a Large Annotated Corpus of English: The Penn Treebank. Computational Linguistics 19, 313–330 (1994)
Google Scholar
Bhatt, R., Narasimhan, B., Palmer, M., Rambow, O., Sharma, D.M., Xia, F.: A Multi-representational and Multi-layered Treebank for Hindi/Urdu. In: Proceedings of the Third Linguistic Annotation Workshop, pp. 186–189. Association for Computational Linguistics, Suntec (2009)
Chapter Google Scholar
Böhmová, A., Hajič, J., Hajičová, E., Hladká, B.: The Prague Dependency Treebank: A Three-Level Annotation Scenario. In: Abeillé, A. (ed.) Treebanks: Building and Using Syntactically Annotated Corpora, pp. 103–127. Kluwer Academic Publishers (2001)
Google Scholar
Huang, C.-R., Chen, F.-Y., Chen, K.-J., Gao, Z.-M., Chen, K.-Y.: Sinica Treebank: Design Criteria, Annotation Guidelines, and On-line Interface. In: Second Chinese Language Processing Workshop, pp. 29–37. Association for Computational Linguistics, Hong Kong (2000)
Google Scholar
Pajas, P., Štěpánek, J.: Recent Advances in a Feature-Rich Framework for Treebank Annotation. In: Proceedings of the 22nd International Conference on Computational Linguistics, vol. 1, pp. 673–680. Association for Computational Linguistics, Manchester (2008)
Google Scholar
de Marneffe, M.-C., Manning, C.D.: The Stanford Typed Dependencies Representation. In: Coling 2008: Proceedings of the Workshop on Cross-Framework and Cross-Domain Parser Evaluation, pp. 1–8. Association for Computational Linguistics, Manchester (2008)
Google Scholar
Mel’čuk, I.A.: Dependency Syntax: Theory and Practice. State University of New York Press, New York (1988)
Google Scholar
Hudson, R.: An Introduction to Word Grammar. Cambridge University Press, Cambridge (2010)
Book Google Scholar
Nivre, J.: Dependency Grammar and Dependency Parsing. Technical Report, School of Mathematics and Systems Engineering, Växjö University (2005)
Google Scholar
Brants, T., Skut, W.: Automation of Treebank Annotation. In: Proceedings of the Joint Conferences on New Methods in Language Processing and Computational Natural Language Learning, pp. 49–57. Association for Computational Linguistics, Sydney (1998)
Google Scholar
van der Beek, L., Bouma, G., Malouf, R., van Noord, G.: The Alpino Dependency Treebank. Computational Linguistics in the Netherlands, CLIN (2002)
Google Scholar
Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In: Proceedings of the Eighteenth International Conference on Machine Learning, pp. 282–289. Morgan Kaufmann Publishers Inc. (2001)
Google Scholar
Ma, M.-Y.: Researching of Mongolian Word Segmentation System Based on Dictionary, Rules and Language Model. Computer Science, Inner Mongolian University, master (2011) (in Chinese)
Google Scholar
Jiang, W.-B., Wu, J.-X., Wuriliga, Nashunwuritu, Liu, Q.: Discriminative Stem-Affix Segmentation for Directed-Graph-Based Mongolian Lexical Analyzer. Journal of Chinese Information Processing 25, 30–34 (2011)
Google Scholar
Qinggeertai: Traditional Mongolian Grammar. Inner Mongolian Press, Huhhot (1992) (in Chinese)
Google Scholar
König, E., Lezius, W.: The TIGER Language: A Description Language for Syntax Graphs, Formal Definition (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

College of Computer Science, Inner Mongolia University, Huhhot, 010021, China
Xiangdong Su, Guanglai Gao & Xueliang Yan

Authors

Xiangdong Su
View author publications
You can also search for this author in PubMed Google Scholar
Guanglai Gao
View author publications
You can also search for this author in PubMed Google Scholar
Xueliang Yan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Technology, Tsinghua University, 100084, Beijing, China
Maosong Sun
Horizon Doctoral Training Centre, School of Computer Science, University of Nottingham, NG8 1BB, Nottingham, UK
Min Zhang
Google Inc., Mountain View, CA, USA
Dekang Lin
Baidu Inc., Beijing, China
Haifeng Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Su, X., Gao, G., Yan, X. (2013). Development of Traditional Mongolian Dependency Treebank. In: Sun, M., Zhang, M., Lin, D., Wang, H. (eds) Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. NLP-NABD CCL 2013 2013. Lecture Notes in Computer Science(), vol 8202. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41491-6_23

Download citation

DOI: https://doi.org/10.1007/978-3-642-41491-6_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41490-9
Online ISBN: 978-3-642-41491-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics