Three Syntactic Formalisms for Data-Driven Dependency Parsing of Croatian

  • Željko Agić
  • Danijela Merkler
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8082)

Abstract

A new syntactic formalism for dependency parsing of Croatian and its implementation in the SETimes Dependency Treebank of Croatian – the Setimes.Hr Treebank – is presented. Its new syntactic tagset is targeted towards improving dependency parsing accuracy, with special emphasis on the main syntactic categories such as predicates, subjects and objects. It is compared with two versions of Croatian Dependency Treebank (HOBS): one with explicit encoding of subordinate syntactic conjunctions and one without. Manual annotation quality and dependency parsing accuracy were inspected. An improvement in inter-annotator agreement was observed, as Cohen’s kappa coefficient for label attachment κ(LA) peaked at 0.92, topping the two HOBS instances by 0.036 and 0.081 points. Overall dependency parsing accuracy reached 77.49 in labeled attachment (LAS), 2.99 and 5.78 points over HOBS, using a standard graph-based dependency parser.

Keywords

dependency treebank dependency parsing Croatian language 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Željko Agić
    • 1
  • Danijela Merkler
    • 1
  1. 1.Faculty of Humanities and Social SciencesUniversity of ZagrebZagrebCroatia

Personalised recommendations