CoRTE: A Corpus of Recognizing Textual Entailment Data Annotated for Coreference and Bridging Relations

  • Afifah WaseemEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11107)


This paper presents CoRTE, an English corpus annotated with coreference and bridging relations, where the dataset is taken from the main task of recognizing textual entailment (RTE). Our annotation scheme elaborates existing schemes by introducing subcategories. Each coreference and bridging relation has been assigned a category. CoRTE is a useful resource for researchers working on coreference and bridging resolution, as well as recognizing textual entailment (RTE) task. RTE has its applications in many NLP domains. CoRTE would thus provide contextual information readily available to the NLP systems being developed for domains requiring textual inference and discourse understanding. The paper describes the annotation scheme with examples. We have annotated 340 text-hypothesis pairs, consisting of 24,742 tokens and 8,072 markables.


Coreference Bridging relations Annotated corpus 


  1. 1.
    Bos, J., Markert, K.: When logical inference helps determining textual entailment (and when it doesn’t). In: Proceedings of the Second PASCAL RTE Challenge, p. 26 (2006)Google Scholar
  2. 2.
    Abad, A., et al.: A resource for investigating the impact of anaphora and coreference on inference. In: Proceedings of LREC (2010)Google Scholar
  3. 3.
    Mirkin, S., Dagan, I., Padó, S.: Assessing the role of discourse references in entailment inference. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 1209–1219 (2010)Google Scholar
  4. 4.
    Bowman, S.R., Angeli, G., Potts, C., Manning, C.D.: A large annotated corpus for learning natural language inference, pp. 632–642 (2015)Google Scholar
  5. 5.
    White, A.S., Rastogi, P., Duh, K.: Inference is everything: recasting semantic resources into a unified evaluation framework. In: Proceedings of the Eighth International Joint Conference on Natural Language Processing, p. 10 (2017)Google Scholar
  6. 6.
    Harabagiu, S., Hickl, A.: Methods for using textual entailment in open-domain question answering. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, pp. 905–912 (2006)Google Scholar
  7. 7.
    Romano, L., Kouylekov, M., Szpektor, I., Dagan, I., Lavelli, A.: Investigating a generic paraphrase-based approach for relation extraction. In: 11th Conference of the European Chapter of the ACL (2006)Google Scholar
  8. 8.
    Padó, S., Galley, M., Jurafsky, D., Manning, C.: Robust machine translation evaluation with entailment features. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, Stroudsburg, PA, USA, vol. 1, pp. 297–305 (2009)Google Scholar
  9. 9.
    Hirschman, L., Chinchor, N.: Appendix F: MUC-7 coreference task definition (version 3.0). In: Seventh Message Understanding Conference (MUC-7), Virginia (1998)Google Scholar
  10. 10.
    Doddington, G.R., Mitchell, A., Przybocki, M.A., Ramshaw, L.A., Strassel, S., Weischedel, R.M.: The automatic content extraction (ACE) program-tasks, data, and evaluation. In: LREC, vol. 2, p. 1 (2004)Google Scholar
  11. 11.
    Pradhan, S.S., Hovy, E., Marcus, M., Palmer, M., Ramshaw, L., Weischedel, R.: OntoNotes: a unified relational semantic representation. Int. J. Semant. Comput. 1, 405–419 (2007)CrossRefGoogle Scholar
  12. 12.
    Clark, H.H.: Bridging. In: Proceedings of the 1975 Workshop on Theoretical Issues in Natural Language Processing, TINLAP 1975, pp. 169–174 (1975)Google Scholar
  13. 13.
    Poesio, M.: The MATE/GNOME proposals for anaphoric annotation, revisited. In: Proceedings of the 5th SIGdial Workshop on Discourse and Dialogue at HLT-NAACL (2004)Google Scholar
  14. 14.
    Poesio, M., Artstein, R.: Anaphoric annotation in the ARRAU corpus. In: LREC (2008)Google Scholar
  15. 15.
    Nedoluzhko, A., Mírovský, J., Pajas, P.: The coding scheme for annotating extended nominal coreference and bridging anaphora in the Prague Dependency Treebank. In: Proceedings of the Third Linguistic Annotation Workshop, pp. 108–111 (2009)Google Scholar
  16. 16.
    Stede, M.: The Potsdam commentary corpus. In: Proceedings of the 2004 ACL Workshop on Discourse Annotation, pp. 96–102 (2004)Google Scholar
  17. 17.
    Riester, A., Lorenz, D., Seemann, N.: A recursive annotation scheme for referential information status. In: LREC (2010)Google Scholar
  18. 18.
    Eckart, K., Riester, A., Schweitzer, K.: A discourse information radio news database for linguistic analysis. In: Chiarcos, C., Nordhoff, S., Hellmann, S. (eds.) Linked Data in Linguistics, pp. 65–76. Springer, Heidelberg (2012). Scholar
  19. 19.
    Cahill, A., Riester, A.: Automatically acquiring fine-grained information status distinctions in German. In: Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pp. 232–236 (2012)Google Scholar
  20. 20.
    Markert, K., Hou, Y., Strube, M.: Collective classification for fine-grained information status. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers, vol. 1, pp. 795–804 (2012)Google Scholar
  21. 21.
    Hou, Y., Markert, K., Strube, M.: Cascading collective classification for bridging anaphora recognition using a rich linguistic feature set. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 814–820 (2013)Google Scholar
  22. 22.
    Grishina, Y.: Experiments on bridging across languages and genres. In: Proceedings of the Workshop on Coreference Resolution Beyond OntoNotes, CORBON 2016 (2016)Google Scholar
  23. 23.
    Rösiger, I.: SciCorp: a corpus of English scientific articles annotated for information status analysis. In: LREC (2016)Google Scholar
  24. 24.
    Müller, C., Strube, M.: Multi-level annotation of linguistic data with MMAX2. In: Corpus Technology and Language Pedagogy: New Resources, New Tools, New Methods (2006)Google Scholar
  25. 25.
    Recasens, M., Martí, M.A., Orasan, C.: Annotating near-identity from coreference disagreements. In: LREC, pp. 165–172 (2012)Google Scholar
  26. 26.
    Schäfer, U., Spurk, C., Steffen, J.: A fully coreference-annotated corpus of scholarly papers from the ACL anthology. In: Proceedings of COLING 2012 Posters, pp. 1059–1070 (2012)Google Scholar
  27. 27.
    Carletta, J.: Assessing agreement on classification tasks: the kappa statistic. Comput. Linguist. 22(2), 249–254 (1996)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of OxfordOxfordUK

Personalised recommendations