Interesting Linguistic Features in Coreference Annotation of an Inflectional Language

  • Maciej Ogrodniczuk
  • Katarzyna Głowińska
  • Mateusz Kopeć
  • Agata Savary
  • Magdalena Zawisławska
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8202)

Abstract

This paper reports on linguistic features and decisions that we find vital in the process of annotation and resolution of coreference for highly inflectional languages. The presented results have been collected during preparation of a corpus of general direct nominal coreference of Polish. Starting from the notion of a mention, its borders and potential vs. actual referentiality, we discuss the problem of complete and near-identity, zero subjects and dominant expressions. We also present interesting linguistic cases influencing the coreference resolution such as the difference between semantic and syntactic heads or the phenomenon of coreference chains made of indefinite pronouns.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Osenova, P., Simov, K.: BTB-TR05: BulTreeBank Stylebook. BulTreeBank Version 1.0. Technical Report BTB-TR05, Linguistic Modelling Laboratory, Bulgarian Academy of Sciences, Sofia, Bulgaria (2004)Google Scholar
  2. 2.
    Nedoluzhko, A., Mírovský, J., Ocelák, R., Pergler, J.: Extended Coreferential Relations and Bridging Anaphora in the Prague Dependency Treebank. In: Proceedings of the 7th Discourse Anaphora and Anaphor Resolution Colloquium (DAARC 2009), Goa, India, pp. 1–16. AU-KBC Research Centre, Anna University, Chennai (2009)Google Scholar
  3. 3.
    Recasens, M., Martí, M.A.: AnCora-CO: Coreferentially annotated corpora for Spanish and Catalan. Language Resources and Evaluation 44(4), 315–345 (2010)CrossRefGoogle Scholar
  4. 4.
    Korzen, I., Buch-Kromann, M.: Anaphoric relations in the Copenhagen Dependency Treebanks. In: Proceedings of DGfS Workshop, Göttingen, Germany, pp. 83–98 (2011)Google Scholar
  5. 5.
    Poesio, M., Artstein, R.: Anaphoric Annotation in the ARRAU Corpus. In: Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008), Marrakech, Morocco, pp. 1170–1174. European Language Resources Association (2008)Google Scholar
  6. 6.
    Recasens, M.: Coreference: Theory, Annotation, Resolution and Evaluation. PhD thesis, Department of Linguistics, University of Barcelona, Barcelona, Spain (2010)Google Scholar
  7. 7.
    Linguistic Data Consortium: ACE (Automatic Content Extraction) Spanish Annotation Guidelines for Entities (2006), http://projects.ldc.upenn.edu/ace/docs/Spanish-Entities-Guidelines_v1.6.pdf (accessed on February 18, 2013)
  8. 8.
    Hinrichs, E.W., Kübler, S., Naumann, K.: A Unified Representation for Morphological, Syntactic, Semantic, and Referential Annotations. In: Proceedings of the ACL Workshop on Frontiers in Corpus Annotation II: Pie in the Sky, Ann Arbor, Michigan, USA, pp. 13–20 (2005)Google Scholar
  9. 9.
    Magnini, B., Pianta, E., Girardi, C., Negri, M., Romano, L., Speranza, M., Lenzi, V.B., Sprugnoli, R.: I-CAB: the Italian Content Annotation Bank. In: Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC 2006), Genova, Italy, pp. 963–968. European Language Resources Association (2006)Google Scholar
  10. 10.
    Iida, R., Komachi, M., Inui, K., Matsumoto, Y.: Annotating a Japanese Text Corpus with Predicate-Argument and Coreference Relations. In: Proceedings of the Linguistic Annotation Workshop (LAW 2007), pp. 132–139. Association for Computational Linguistics, Stroudsburg (2007)CrossRefGoogle Scholar
  11. 11.
    Pradhan, S.S., Ramshaw, L., Weischedel, R., MacBride, J., Micciulla, L.: Unrestricted Coreference: Identifying Entities and Events in OntoNotes. In: Proceedings of the First IEEE International Conference on Semantic Computing (ICSC 2007), pp. 446–453. IEEE Computer Society, Washington, DC (2007)CrossRefGoogle Scholar
  12. 12.
    Weischedel, R., Pradhan, S., Ramshaw, L., Kaufman, J., Franchini, M., El-Bachouti, M.: OntoNotes Release 4.0 (2010), http://www.bbn.com/NLP/OntoNotes (accessed on February 18, 2013)
  13. 13.
    Hendrickx, I., Bouma, G., Daelemans, W., Hoste, V., Kloosterman, G., Mineur, A.M., Van Der Vloet, J., Verschelde, J.L.: A Coreference Corpus and Resolution System for Dutch. In: Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008), Marrakech, Morocco, pp. 144–149. European Language Resources Association, ELRA (2008)Google Scholar
  14. 14.
    Recasens, M., Hovy, E., Martí, M.A.: Identity, non-identity, and near-identity: Addressing the complexity of coreference. Lingua 121(6) (2011)Google Scholar
  15. 15.
    Recasens, M., Hovy, E., Marti, M.A.: A Typology of Near-Identity Relations for Coreference (NIDENT). In: Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 2010), Valletta, Malta, pp. 149–156. European Language Resources Association (2010)Google Scholar
  16. 16.
    Cohen, J.: A Coefficient of Agreement for Nominal Scales. Educational and Psychological Measurement 20(1), 37–46 (1960)CrossRefGoogle Scholar
  17. 17.
    Bennet, E.M., Alpert, R., Goldstein, A.C.: Communications through limited response questioning. Public Opinion Quarterly 18, 303–308 (1954)CrossRefGoogle Scholar
  18. 18.
    Krippendorff, K.H.: Content Analysis: An Introduction to Its Methodology, 2nd edn. Sage Publications, Inc. (December 2003)Google Scholar
  19. 19.
    Passonneau, R.J.: Computing reliability for coreference annotation. In: LREC. European Language Resources Association (2004)Google Scholar
  20. 20.
    Ogrodniczuk, M., Głowińska, K., Kopeć, M., Savary, A., Zawisławska, M.: Interesting Linguistic Features in Coreference Annotation of an Inflectional Language. In: Sun, M., Liu, T., Sun, L., Zhang, M., Sun, M., Lin, D., Wang, H. (eds.) CCL and NLP-NABD 2013. LNCS (LNAI), vol. 8202, pp. 97–108. Springer, Heidelberg (2013)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Maciej Ogrodniczuk
    • 1
  • Katarzyna Głowińska
    • 2
  • Mateusz Kopeć
    • 1
  • Agata Savary
    • 3
  • Magdalena Zawisławska
    • 4
  1. 1.Institute of Computer SciencePolish Academy of SciencesPoland
  2. 2.LingventaPoland
  3. 3.Laboratoire d’informatiqueFrançois Rabelais University ToursFrance
  4. 4.Institute of Polish LanguageWarsaw UniversityPoland

Personalised recommendations