The PROIEL treebank family: a standard for early attestations of Indo-European languages

  • Hanne Eckhoff
  • Kristin Bech
  • Gerlof Bouma
  • Kristine Eide
  • Dag Haug
  • Odd Einar Haugen
  • Marius Jøhndal
Original Paper

DOI: 10.1007/s10579-017-9388-5

Cite this article as:
Eckhoff, H., Bech, K., Bouma, G. et al. Lang Resources & Evaluation (2017). doi:10.1007/s10579-017-9388-5
  • 16 Downloads

Abstract

This article describes a family of dependency treebanks of early attestations of Indo-European languages originating in the parallel treebank built by the members of the project pragmatic resources in old Indo-European languages. The treebanks all share a set of open-source software tools, including a web annotation interface, and a set of annotation schemes and guidelines developed especially for the project languages. The treebanks use an enriched dependency grammar scheme complemented by detailed morphological tags, which have proved sufficient to give detailed descriptions of these richly inflected languages, and which have been easy to adapt to new languages. We describe the tools and annotation schemes and discuss some challenges posed by the various languages that have been annotated. We also discuss problems with tokenisation, sentence division and lemmatisation, commonly encountered in ancient and mediaeval texts, and challenges associated with low levels of standardisation and ongoing morphological and syntactic change.

Keywords

Treebank Dependency grammar Indo-European Greek Latin Romance Germanic Slavic Armenian 

Copyright information

© Springer Science+Business Media Dordrecht 2017

Authors and Affiliations

  1. 1.UiT The Arctic University of NorwayTromsøNorway
  2. 2.University of OsloOsloNorway
  3. 3.University of GothenburgGothenburgSweden
  4. 4.University of BergenBergenNorway

Personalised recommendations