Journal of Psycholinguistic Research

, Volume 47, Issue 2, pp 325–342 | Cite as

Coreferential Relations in Basque: The Annotation Process

  • Klara CeberioEmail author
  • Itziar Aduriz
  • Arantza Díaz de Ilarraza
  • Ines Garcia-Azkoaga


In this paper we present the coreferential tagging of part of the EPEC Corpus of Basque. Although coreference is a pragmatic linguistic phenomenon highly dependent on the situational context, it shows some language-specific patterns that vary according to the features of each language. Due to the fact that Basque is not an Indo-European language, it differs considerably in grammar from the languages spoken in surrounding areas. We will explain these features and the decisions made in each case. After describing the criteria defined for coreferential tagging in Basque, the annotation process will be explained. Our annotation is based on a morphologically and syntactically annotated corpus that provides us with a manageable environment, in which the specific structures that are part of a reference chain can be more easily identified. A part of the corpus was tagged by two annotators who marked up the same text independently, and by another annotator that acted as judge, solving problems in case of disagreement. All this process has been automatized as a result of previous studies carried out in this field. The automatic detection of mentions (Soraluze et al., in: Proceedings of Konvens, 2012) has provided us with a better working environment, and given us the possibility to build a first significant corpus for a later computational treatment of automatic coreferential resolution.


Coreference Coreferential relations Coreferential tagging 


Compliance with Ethical Standards

Conflict of interest

The authors declare that they have no conflict of interest.


  1. Aduriz, I., Ceberio, K., & Díaz de Ilarraza, A. (2005). Euskarazko anafora pronominala: Ikuspuntu konputazionala eta corpus baten garapena. Gogoa, 5(1), 91–116.Google Scholar
  2. Aduriz, I., Aranzabe, M. J., Arriola, J. M., Atutxa, A., Díaz de Ilarraza, A., Ezeiza, N., et al. (2006a). Methodology and steps towards the construction of EPEC, a corpus of written Basque tagged at morphological and syntactic levels for the automatic processing. In A. Wilson, P. Rayson, & D. Archer (Eds.), Corpus linguistics around the world. Book series: Language and computers (Vol. 56, pp. 1–15). Netherlands: Rodopi.Google Scholar
  3. Aduriz, I., Aranzabe, M. J., Arriola, J. M., & de Ilarraza, A. D. (2006b). Sintaxi partziala. In B. Fernández & I. Laka (Eds.), Andolin gogoan: Essays in honour of professor Eguzkitza (pp. 31–49). Bilbo: UPV/EHU Publishing Services.Google Scholar
  4. Alegria, I., Artola, X., Sarasola, K., & Urkia, M. (1996). Automatic morphological analysis of Basque. Literary & Linguistic Computing, 11(4), 193–203.CrossRefGoogle Scholar
  5. Alegria, I., Ezeiza, N., & Fernandez, I. (2006). Named entities translation based on comparable corpora multi-word-expressions in a multilingual context. In Proceedings of workshop on EACL06 (pp. 1–8). Trento (Italy).Google Scholar
  6. Arriola, J. M., Aduriz, I., Aldezabal, I., Aranzabe, M. J., Ceberio, K., Estarrona, A., Iruskieta, M., Lersundi, M., Pociello, E., Uria, L., & Urizar, R. (2013). Reusing the CG-2 grammar for processing basque complex postpositions. In A. D. Iñaki Alegria & J. Villena (Eds.). Actas del XXIX Congreso de la Sociedad Española del Procesamiento del Lenguaje Natural (SEPLN 2013) (pp. 20–27). Madrid (España).Google Scholar
  7. Borthen, K. (2004). Predicative NPs and the annotation of reference chains. In Proceedings of Coling2004 (pp. 1175–1178). Geneva, Switzerland.Google Scholar
  8. Botley, S., & McEnery, T. (Eds.). (2000). Corpus-based and computational approaches to discourse anaphora. Amsterdam: John Benjamins.Google Scholar
  9. Ceberio, K., Aduriz, I., de Ilarraza, A. D., & Garcia-Azkoaga, I. (2008). Erreferentziakidetasunaren azterketa eta anotazioa euskarazko corpus batean. In X. Artiagoitia & J. A. Lakarra (Eds.), Gramatika Jaietan. Patxi Goenagaren omenez, ASJU (Vol. 51, pp. 153–172). Bilbo: UPV/EHU & Gipuzkoako Foru Aldundia.Google Scholar
  10. Cornish, F. (1999). Anaphora, discourse and understanding: Evidence from English and French. Oxford: Clarendon.Google Scholar
  11. Doddington, G., Mitchell, A., Przybocki, M., Ramshaw, L., Strassel, S., & Weischedel, R. (2004). The automatic content extraction (ACE) program—Tasks, data, and evaluation. In Proceedings of LREC 2004 (pp. 837–840), Lisbon.Google Scholar
  12. Euskaltzaindia. (1985). Euskal Gramatika: Lehen urratsak-I. Bilbo: Euskaltzaindia.Google Scholar
  13. Euskaltzaindia. (2002). Euskal Gramatika Laburra: Perpaus Bakuna. Bilbo: Euskaltzaindia (2nd ed.).Google Scholar
  14. Garcia-Azkoaga, I. M. (2003). Kohesio anaforikoa hiru testu generotan. Adinaren araberako azterketa. Bilbo: EHU-UPV Publishing Services.Google Scholar
  15. Hualde, J. I., & Ortiz de Urbina, J. (Eds.). (2003). A grammar of basque. Berlin, New York: Mouton de Gruyter.Google Scholar
  16. Kleiber, G. (1994). Anaphores et pronoms. Louvain-la-Neuve: Duculot.Google Scholar
  17. Laka, I. (1996). A brief grammar of euskara, the basque language. EHU/UPV Publishing Services: Leioa (Spain). Retrieved December, 2016, from
  18. McCarthy, J. F., & Lehnert, W. G. (1995). Using decision trees for conference resolution. In Proceedings of the 14th international joint conference on Artificial intelligence (Vol. 2, pp. 1050–1055). San Francisco, CA, USA.Google Scholar
  19. Mitkov, R. (2002). Anaphora resolution. London: Longman.Google Scholar
  20. Moirand, S. (1990). Une grammaire des textes et des dialogues. Paris: Hachette.Google Scholar
  21. Müller, C., & Strube, M. (2006). Multi-level annotation of linguistic data with MMAX2. In S. Braun, K. Kohn, & J. Mukherjee (Eds.), Corpus technology and language pedagogy. New resources, new tools, new methods (English Corpus Linguistics, Vol. 3, pp. 197–214). Frankfurt: Peter Lang.Google Scholar
  22. Nicolov, N., Salvetti, F., & Ivanova, S. (2008). Sentiment analysis: Does coreference matter?. In AISB 2008 convention communication, interaction and social intelligence, pp. 37–40.Google Scholar
  23. Nilsson Björkenstam, K. (2013). SUC-CORE: A balanced corpus annotated with noun phrase coreference. Northern European Journal of Language Technology (NEJLT), 3, 19–39.CrossRefGoogle Scholar
  24. Ortiz de Urbina, J. (1989). Parameters in the grammar of basque: A GB approach to basque syntax. Dordrecht: Foris.Google Scholar
  25. Peral, J., Palomar, M., & Ferrández, A. (1999). Coreference-oriented interlingual slot structure & machine translation. In Proceedings of the workshop on coreference and its applications, (CorefApp 1999) (pp. 69–76). Stroudsburg, PA, USA.Google Scholar
  26. Poon, H., Christensen, J., Domingos, P., Etzioni, O., Hoffmann, R., Kiddon, C., Lin, T., Ling, X., Ritter, A., Schoenmackers, S., Soderland, S., Weld, D., Wu, F., & Zhang, C. (2010). Machine reading at the University of Washington. In Proceedings of the NAACL HLT 2010 first international workshop on formalisms and methodology for learning by reading, (FAM-LbR 2010) (pp 87–95). Stroudsburg, PA, USA.Google Scholar
  27. Pradhan, S. S., Ramshaw, L., Weischedel, R., MacBride, J., & Micciulla, L. (2007). Unrestricted coreference: Identifying entities and events in OntoNotes. In Proceedings of ICSC 2007 (pp. 446–453). Irvine, California.Google Scholar
  28. Recasens, M. (2010). Coreference: Theory, annotation, resolution and evaluation. Ph.D. thesis, University of Barcelona, Spain.Google Scholar
  29. Rodriguez, K. (2010). Resources for linguistically motivated multilingual anaphora resolution. Ph.D. thesis, University of Trento, Italy.Google Scholar
  30. Saeed, J. I. (2009). Semantics (3rd ed.). New York: Wiley.Google Scholar
  31. Stede, M. (2011). Discourse processing. San Rafael, California: Morgan & Claypool Publishers.Google Scholar
  32. Steinberger, J., Poesio, M., Kabadjov, M. A., & Jeek, K. (2007). Two uses of anaphora resolution in summarization. Information Processing and Management, 43(6), 1663–1680.CrossRefGoogle Scholar
  33. Soraluze, A., Arregi, O., Arregi, X., Ceberio, K., & Díaz de Ilarraza, A. (2012). Mention detection: First steps in the development of a basque coreference resolution system. Proceedings of Konvens, 2012, 128–136.Google Scholar
  34. Stoyanov, V., Gilbert, N., Cardie, C., & Riloff, E. (2009). Conundrums in noun phrase coreference resolution: Making sense of the state of-the-art. In Proceedings of the joint conference of the 47th annual meeting of the ACL and the 4th international joint conference on natural language processing of the AFNLP (pp. 656–664). Suntec, Singapore.Google Scholar
  35. Vicedo, J. L., & Ferrández, A. (2006). Coreference in Q&A. In Advances in open domain question answering (of text, speech and language technology) (Vol. 32, pp. 71–96). Berlin/New York: Springer.Google Scholar
  36. Zabala, I. (1996). Testu-lotura: Lotura tematikoa eta erreferentzia-sareak testu teknikoetan. In Igone Zabala (Ed.), Testu-loturarako baliabideak: Euskara Teknikoa (pp. 15–44). Bilbo: EHU-UPV Publishing Services.Google Scholar
  37. Zabala, I., & Odriozola, J. C. (2004). Los complejos posposicionales en vasco. In G. E. Perez, Zabala I. Igone, & L. Gràcia Sole (Eds.), Las Fronteras de la Composición (pp. 281–315). Donostia: University of Deusto.Google Scholar
  38. Zhekova, D., & Kübler, S. (2010). UBIU: A language-independent system for coreference resolution. In Proceedings of the 5th international workshop on semantic evaluation (SemEval 2010) (pp. 96–99). Stroudsburg, PA, USA.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.IXA GroupFaculty of Informatics, UPV-EHUDonostiaSpain
  2. 2.IXA Group, Department of Catalan Philology and General LinguisticsUniversitat de BarcelonaBarcelonaSpain
  3. 3.Department of Basque Language and CommunicationUPV-EHUVitoria-GasteizSpain

Personalised recommendations