Language Resources and Evaluation

, Volume 44, Issue 4, pp 315–345

AnCora-CO: Coreferentially annotated corpora for Spanish and Catalan

Article

DOI: 10.1007/s10579-009-9108-x

Cite this article as:
Recasens, M. & Martí, M.A. Lang Resources & Evaluation (2010) 44: 315. doi:10.1007/s10579-009-9108-x

Abstract

This article describes the enrichment of the AnCora corpora of Spanish and Catalan (400 k each) with coreference links between pronouns (including elliptical subjects and clitics), full noun phrases (including proper nouns), and discourse segments. The coding scheme distinguishes between identity links, predicative relations, and discourse deixis. Inter-annotator agreement on the link types is 85–89% above chance, and we provide an analysis of the sources of disagreement. The resulting corpora make it possible to train and test learning-based algorithms for automatic coreference resolution, as well as to carry out bottom-up linguistic descriptions of coreference relations as they occur in real data.

Keywords

Coreference Anaphora Corpus annotation Annotation scheme Reliability study 

Copyright information

© Springer Science+Business Media B.V. 2009

Authors and Affiliations

  1. 1.Centre de Llenguatge i Computació (CLiC)University of BarcelonaBarcelonaSpain

Personalised recommendations