Advertisement

Three Syntactic Formalisms for Data-Driven Dependency Parsing of Croatian

  • Željko Agić
  • Danijela Merkler
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8082)

Abstract

A new syntactic formalism for dependency parsing of Croatian and its implementation in the SETimes Dependency Treebank of Croatian – the Setimes.Hr Treebank – is presented. Its new syntactic tagset is targeted towards improving dependency parsing accuracy, with special emphasis on the main syntactic categories such as predicates, subjects and objects. It is compared with two versions of Croatian Dependency Treebank (HOBS): one with explicit encoding of subordinate syntactic conjunctions and one without. Manual annotation quality and dependency parsing accuracy were inspected. An improvement in inter-annotator agreement was observed, as Cohen’s kappa coefficient for label attachment κ(LA) peaked at 0.92, topping the two HOBS instances by 0.036 and 0.081 points. Overall dependency parsing accuracy reached 77.49 in labeled attachment (LAS), 2.99 and 5.78 points over HOBS, using a standard graph-based dependency parser.

Keywords

dependency treebank dependency parsing Croatian language 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agić, Ž.: K-Best Spanning Tree Dependency Parsing With Verb Valency Lexicon Reranking. In: Proceedings of COLING 2012: Posters, COLING 2012 Organizing Committee, pp. 1–12 (2012)Google Scholar
  2. 2.
    Berović, D., Agić, Ž., Tadić, M.: Croatian Dependency Treebank: Recent Development and Initial Experiments. In: Proceedings of LREC 2012, pp. 1902–1906. ELRA (2012)Google Scholar
  3. 3.
    Buchholz, S., Marsi, E.: CoNLL-X Shared Task on Multilingual Dependency Parsing. In: Proceedings of CoNLL-X, pp. 149–164. ACL (2006)Google Scholar
  4. 4.
    Džeroski, S., Erjavec, T., Ledinek, N., Pajas, P., Žabokrtský, Z., Žele, A.: Towards a Slovene Dependency Treebank. In: Proceedings of LREC 2006, pp. 1388–1391. ELRA (2006)Google Scholar
  5. 5.
    Erjavec, T.: MULTEXT-East: Morphosyntactic Resources for Central and Eastern European Languages. Language Resources and Evaluation 46(1), 131–142 (2012)CrossRefGoogle Scholar
  6. 6.
    Erjavec, T., Fišer, D., Krek, S., Ledinek, N.: The JOS Linguistically Tagged Corpus of Slovene. In: Proceedings of LREC 2010, pp. 1806–1809. ELRA (2010)Google Scholar
  7. 7.
    Böhmová, A., Hajič, J., Hajičová, E., Hladká, B.: The Prague Dependency Treebank: A Three-Level Annotation Scenario. In: Abeillé, A. (ed.) Treebanks: Building and Using Parsed Corpora. Springer (2003)Google Scholar
  8. 8.
    McDonald, R., Lerman, K., Pereira, F.: Multilingual Dependency Parsing With a Two-Stage Discriminative Parser. In: Proceedings of CoNLL-X, pp. 216–220. ACL (2006)Google Scholar
  9. 9.
    Mille, S., Burga, A., Ferraro, G., Wanner, L.: How Does the Granularity of an Annotation Scheme Influence Dependency Parsing Performance? In: Proceedings of COLING 2012: Posters, COLING 2012 Organizing Committee, pp. 839–852 (2012)Google Scholar
  10. 10.
    Nivre, J., Hall, J., Kübler, S., McDonald, R., Nilsson, J., Riedel, S., Yuret, D.: The CoNLL 2007 Shared Task on Dependency Parsing. In: Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL 2007, pp. 915–932. ACL (2007)Google Scholar
  11. 11.
    Tadić, M.: Building the Croatian Dependency Treebank: The Initial Stages. Suvremena lingvistika 63(1), 85–92 (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Željko Agić
    • 1
  • Danijela Merkler
    • 1
  1. 1.Faculty of Humanities and Social SciencesUniversity of ZagrebZagrebCroatia

Personalised recommendations