AnCora-CO: Coreferentially annotated corpora for Spanish and Catalan


DOI: 10.1007/s10579-009-9108-x

Cite this article as:
Recasens, M. & Martí, M.A. Lang Resources & Evaluation (2010) 44: 315. doi:10.1007/s10579-009-9108-x


This article describes the enrichment of the AnCora corpora of Spanish and Catalan (400 k each) with coreference links between pronouns (including elliptical subjects and clitics), full noun phrases (including proper nouns), and discourse segments. The coding scheme distinguishes between identity links, predicative relations, and discourse deixis. Inter-annotator agreement on the link types is 85–89% above chance, and we provide an analysis of the sources of disagreement. The resulting corpora make it possible to train and test learning-based algorithms for automatic coreference resolution, as well as to carry out bottom-up linguistic descriptions of coreference relations as they occur in real data.


Coreference Anaphora Corpus annotation Annotation scheme Reliability study 

Authors and Affiliations

  1. 1.Centre de Llenguatge i Computació (CLiC)University of BarcelonaBarcelonaSpain

