Language Resources and Evaluation

, Volume 47, Issue 3, pp 661–694

Coreference resolution: an empirical study based on SemEval-2010 shared Task 1


DOI: 10.1007/s10579-012-9194-z

Cite this article as:
Màrquez, L., Recasens, M. & Sapena, E. Lang Resources & Evaluation (2013) 47: 661. doi:10.1007/s10579-012-9194-z


This paper presents an empirical evaluation of coreference resolution that covers several interrelated dimensions. The main goal is to complete the comparative analysis from the SemEval-2010 task on Coreference Resolution in Multiple Languages. To do so, the study restricts the number of languages and systems involved, but extends and deepens the analysis of the system outputs, including a more qualitative discussion. The paper compares three automatic coreference resolution systems for three languages (English, Catalan and Spanish) in four evaluation settings, and using four evaluation measures. Given that our main goal is not to provide a comparison between resolution algorithms, these are merely used as tools to shed light on the different conditions under which coreference resolution is evaluated. Although the dimensions are strongly interdependent, making it very difficult to extract general principles, the study reveals a series of interesting issues in relation to coreference resolution: the portability of systems across languages, the influence of the type and quality of input annotations, and the behavior of the scoring measures.


Coreference resolution and evaluationNLP system analysisMachine learning based NLP toolsSemEval-2010 (Task 1)Discourse entities

Copyright information

© Springer Science+Business Media B.V. 2012

Authors and Affiliations

  1. 1.Departament de Llenguatges i Sistemes Informàtics, TALP Research CenterUniversitat Politècnica de CatalunyaBarcelonaSpain
  2. 2.Departament de Lingüística, CLiC Research CenterUniversitat de BarcelonaBarcelonaSpain