Article

Research on Language and Computation

, Volume 6, Issue 2, pp 113-137

First online:

On Detecting Errors in Dependency Treebanks

  • Adriane BoydAffiliated withDepartment of Linguistics, The Ohio State University
  • , Markus DickinsonAffiliated withDepartment of Linguistics, Indiana University
  • , W. Detmar MeurersAffiliated withSeminar für Sprachwissenschaft, Universität Tübingen Email author 

Rent the article at a discount

Rent now

* Final gross prices may vary according to local VAT.

Get Access

Abstract

Dependency relations between words are increasingly recognized as an important level of linguistic representation that is close to the data and at the same time to the semantic functor-argument structure as a target of syntactic analysis and processing. Correspondingly, dependency structures play an important role in parser evaluation and for the training and evaluation of tools based on dependency treebanks. Gold standard dependency treebanks have been created for some languages, most notably Czech, and annotation efforts for other languages are under way. At the same time, general techniques for detecting errors in dependency annotation have not yet been developed. We address this gap by exploring how a technique proposed for detecting errors in constituency-based syntactic annotation can be adapted to systematically detect errors in dependency annotation. Building on an analysis of key properties and differences between constituency and dependency annotation, we discuss results for dependency treebanks for Swedish, Czech, and German. Complementing the focus on detecting errors in dependency treebanks to improve these gold standard resources, the discussion of dependency error detection for different languages and annotation schemes also raises questions of standardization for some aspects of dependency annotation, in particular regarding the locality of annotation, the assumption of a single head for each dependency relation, and phenomena such as coordination.

Keywords

Corpus annotation Dependency grammar Error detection Prague Dependency Treebank Talbanken Dependency Treebank Tiger Dependency Bank