Coreference Annotation Schema for an Inflectional Language

Ogrodniczuk, Maciej; Zawisławska, Magdalena; Głowińska, Katarzyna; Savary, Agata

doi:10.1007/978-3-642-37247-6_32

Maciej Ogrodniczuk¹⁷,
Magdalena Zawisławska¹⁸,
Katarzyna Głowińska¹⁹ &
…
Agata Savary²⁰

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7816))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

2226 Accesses
3 Citations

Abstract

Creating a coreference corpus for an inflectional and free-word-order language is a challenging task due to specific syntactic features largely ignored by existing annotation guidelines, such as the absence of definite/indefinite articles (making quasi-anaphoricity very common), frequent use of zero subjects or discrepancies between syntactic and semantic heads. This paper comments on the experience gained in preparation of such a resource for an ongoing project (CORE), aiming at creating tools for coreference resolution.

Starting with a clarification of the relation between noun groups and mentions, through definition of the annotation scope and strategies, up to actual decisions for borderline cases, we present the process of building the first, to our best knowledge, corpus of general coreference of Polish.

The work reported here was carried out within the Computer-based methods for coreference resolution in Polish texts (CORE) project financed by the Polish National Science Centre (contract number 6505/B/T02/2011/40).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ogrodniczuk, M., Kopeć, M.: End-to-end coreference resolution baseline system for Polish. In: Vetulani, Z. (ed.) Proceedings of the Fifth Language & Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics, Poznań, Poland, pp. 167–171 (2011)
Google Scholar
Kopeć, M., Ogrodniczuk, M.: Creating a Coreference Resolution System for Polish. In: Proceedings of the Eighth International Conference on Language Resources and Evaluation, LREC 2012, pp. 192–195. ELRA, Istanbul (2012)
Google Scholar
Mitkov, R.: Anaphora Resolution. In: Mitkov, R. (ed.) The Oxford Handbook of Computational Linguistics. Oxford University Press (2003)
Google Scholar
Recasens, M.: Coreference: Theory, Annotation, Resolution and Evaluation. PhD thesis, Department of Linguistics, University of Barcelona, Barcelona, Spain (2010)
Google Scholar
Haghighi, A., Klein, D.: Simple coreference resolution with rich syntactic and semantic features. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, EMNLP 2009, vol. 3, pp. 1152–1161. Association for Computational Linguistics, Stroudsburg (2009)
Google Scholar
Recasens, M., Hovy, E., Martí, M.A.: A Typology of Near-Identity Relations for Coreference (NIDENT). In: Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 2010), pp. 149–156 (2010)
Google Scholar
Recasens, M., Hovy, E., Martí, M.A.: Identity, non-identity, and near-identity: Addressing the complexity of coreference. Lingua 121(6) (2011)
Google Scholar
Przepiórkowski, A., Bańko, M., Górski, R.L., Lewandowska-Tomaszczyk, B. (eds.): Narodowy Korpus Języka Polskiego (Eng.: National Corpus of Polish). Wydawnictwo Naukowe PWN, Warsaw (2012)
Google Scholar
Przepiórkowski, A., Buczyński, A.: Spejd: Shallow Parsing and Disambiguation Engine. In: Vetulani, Z. (ed.) Proceedings of the 3rd Language & Technology Conference, Poznań, Poland, pp. 340–344 (2007)
Google Scholar
Acedański, S.: A Morphosyntactic Brill Tagger for Inflectional Languages. In: Loftsson, H., Rögnvaldsson, E., Helgadóttir, S. (eds.) IceTAL 2010. LNCS, vol. 6233, pp. 3–14. Springer, Heidelberg (2010)
Chapter Google Scholar
Waszczuk, J., Głowińska, K., Savary, A., Przepiórkowski, A., Lenart, M.: Annotation Tools for Syntax and Named Entities in the National Corpus of Polish. International Journal of Data Mining, Modelling and Management (to appear)
Google Scholar
Müller, C., Strube, M.: Multi-level annotation of linguistic data with MMAX2. In: Braun, S., Kohn, K., Mukherjee, J. (eds.) Corpus Technology and Language Pedagogy: New Resources, New Tools, New Methods, pp. 197–214. Peter Lang, Frankfurt a.M., Germany (2006)
Google Scholar
Osenova, P., Simov, K.: BTB-TR05: BulTreeBank Stylebook. BulTreeBank Version 1.0. Technical Report BTB-TR05, Linguistic Modelling Laboratory, Bulgarian Academy of Sciences, Sofia, Bulgaria (2004)
Google Scholar
Poesio, M., Artstein, R.: Anaphoric Annotation in the ARRAU Corpus. In: Proceedings of the International Conference on Language Resources and Evaluation (LREC 2008). European Language Resources Association, Marrakech (2008)
Google Scholar
Nedoluzhko, A., Mírovský, J., Ocelák, R., Pergler, J.: Extended Coreferential Relations and Bridging Anaphora in the Prague Dependency Treebank. In: Proceedings of the 7th Discourse Anaphora and Anaphor Resolution Colloquium (DAARC 2009), Goa, India. AU-KBC Research Centre, Anna University, Chennai, pp. 1–16 (2009)
Google Scholar
Recasens, M., Martí, M.A.: AnCora-CO: Coreferentially annotated corpora for Spanish and Catalan. Language Resources and Evaluation 44(4), 315–345 (2010)
Article Google Scholar
Korzen, I., Buch-Kromann, M.: Anaphoric relations in the Copenhagen Dependency Treebanks. In: Proceedings of DGfS Workshop, Göttingen, Germany, pp. 83–98 (2011)
Google Scholar
Rahman, A., Ng, V.: Translation-Based Projection for Multilingual Coreference Resolution. In: Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 720–730. Association for Computational Linguistics, Montréal (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Computer Science, Polish Academy of Sciences, Poland
Maciej Ogrodniczuk
Institute of Polish Language, Warsaw University, Poland
Magdalena Zawisławska
Lingventa, Poland
Katarzyna Głowińska
Laboratoire d’informatique, François Rabelais University Tours, France
Agata Savary

Authors

Maciej Ogrodniczuk
View author publications
You can also search for this author in PubMed Google Scholar
Magdalena Zawisławska
View author publications
You can also search for this author in PubMed Google Scholar
Katarzyna Głowińska
View author publications
You can also search for this author in PubMed Google Scholar
Agata Savary
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Computing Research, National Polytechnic Institute, Mexico D.F., Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ogrodniczuk, M., Zawisławska, M., Głowińska, K., Savary, A. (2013). Coreference Annotation Schema for an Inflectional Language. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2013. Lecture Notes in Computer Science, vol 7816. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37247-6_32

Download citation

DOI: https://doi.org/10.1007/978-3-642-37247-6_32
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37246-9
Online ISBN: 978-3-642-37247-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics