Language Resources and Evaluation

, Volume 48, Issue 4, pp 601–637

HamleDT: Harmonized multi-language dependency treebank

  • Daniel Zeman
  • Ondřej Dušek
  • David Mareček
  • Martin Popel
  • Loganathan Ramasamy
  • Jan Štěpánek
  • Zdeněk Žabokrtský
  • Jan Hajič
Original Paper

DOI: 10.1007/s10579-014-9275-2

Cite this article as:
Zeman, D., Dušek, O., Mareček, D. et al. Lang Resources & Evaluation (2014) 48: 601. doi:10.1007/s10579-014-9275-2

Abstract

We present HamleDT—a HArmonized Multi-LanguagE Dependency Treebank. HamleDT is a compilation of existing dependency treebanks (or dependency conversions of other treebanks), transformed so that they all conform to the same annotation style. In the present article, we provide a thorough investigation and discussion of a number of phenomena that are comparable across languages, though their annotation in treebanks often differs. We claim that transformation procedures can be designed to automatically identify most such phenomena and convert them to a unified annotation style. This unification is beneficial both to comparative corpus linguistics and to machine learning of syntactic parsing.

Keywords

Dependency treebank Annotation scheme Harmonization 

Copyright information

© Springer Science+Business Media Dordrecht 2014

Authors and Affiliations

  • Daniel Zeman
    • 1
  • Ondřej Dušek
    • 1
  • David Mareček
    • 1
  • Martin Popel
    • 1
  • Loganathan Ramasamy
    • 1
  • Jan Štěpánek
    • 1
  • Zdeněk Žabokrtský
    • 1
  • Jan Hajič
    • 1
  1. 1.Faculty of Mathematics and Physics, ÚFALCharles University in PraguePragueCzech Republic

Personalised recommendations