Deriving Enhanced Universal Dependencies from a Hybrid Dependency-Constituency Treebank
- 796 Downloads
The treebanks provided by the Universal Dependencies (UD) initiative are a state-of-the-art resource for cross-lingual and monolingual syntax-based linguistic studies, as well as for multilingual dependency parsing. Creating a UD treebank for a language helps further the UD initiative by providing an important dataset for research and natural language processing in that language. In this paper, we describe how we created a UD treebank for Latvian, and how we obtained both the basic and enhanced UD representations from the data in Latvian Treebank which is annotated according to a hybrid dependency-constituency grammar model. The hybrid model was inspired by Lucien Tesnière’s dependency grammar theory and its notion of a syntactic nucleus. While the basic UD representation is already a de facto standard in NLP, the enhanced UD representation is just emerging, and the treebank described here is among the first to provide both representations.
KeywordsLatvian Treebank Universal Dependencies Enhanced dependencies
This work has received financial support from the European Regional Development Fund under the grant agreements No. 22.214.171.124/16/A/219 and No. 126.96.36.199/ VIAA/1/16/188.
We want to thank Ingus Jānis Pretkalniņš constructive criticism of the manuscript and anonymous reviewers for insightful comments.
- 1.Barzdins, G., Gruzitis, N., Nespore, G., Saulite, B.: Dependency-based hybrid model of syntactic analysis for the languages with a rather free word order. In: Proceedings of the 16th NODALIDA, pp. 13–20 (2007)Google Scholar
- 2.Gruzitis, N., et al.: Creation of a balanced state-of-the-art multilayer corpus for NLU. In: Proceedings of the 11th LREC, Miyazaki, Japan (2018)Google Scholar
- 3.Nivre, J., et al.: Universal dependencies v1: a multilingual treebank collection. In: Proceedings of the 10th LREC, pp. 1659–1666 (2016)Google Scholar
- 4.Nespore, G., Saulite, B., Barzdins, G., Gruzitis, N.: Comparison of the SemTi-Kamols and Tesniere’s dependency grammars. In: Proceedings of 4th HLT—The Baltic Perspective, Frontiers in Artificial Intelligence and Applications, vol. 219, pp. 233–240. IOS Press (2010)Google Scholar
- 5.Lokmane, I.: Sintakse. In: Latviešu valodas gramatika, pp. 692–766. LU Latviešu valodas institūts, Rīga (2013)Google Scholar
- 6.Pretkalnina, L., Nespore, G., Levane-Petrova, K., Saulite, B.: A Prague Markup Language profile for the SemTi-Kamols grammar model. In: Proceedings of the 18th NODALIDA, Riga, Latvia, pp. 303–306 (2011)Google Scholar
- 7.Pretkalnina, L., Rituma, L., Saulite, B.: Universal dependency treebank for Latvian: a pilot. In: Proceedings of 7th HLT—The Baltic Perspective, Frontiers in Artificial Intelligence and Applications, vol. 289, pp. 136–143. IOS Press (2016)Google Scholar