Skip to main content

Coreference in English OntoNotes: Properties and Genre Differences

Part of the Lecture Notes in Computer Science book series (LNAI,volume 11697)

Abstract

The OntoNotes corpus is widely used for training and testing coreference resolution systems, but only little attention has so far been given to the differences between the different genres of language that the corpus is composed of. We are primarily interested in the contrast between spoken and written language, and thus we conducted in-depth analyses of various reference-related properties of the sub-corpora of OntoNotes, which yield several statistically significant differences. We compare these to predictions made in the Linguistics literature, and draw some conclusions for potential genre-specific implementations of coreference resolution.

Keywords

  • Ontonotes
  • Coreference
  • Genre
  • Spoken
  • Written

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-030-27947-9_15
  • Chapter length: 14 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   59.99
Price excludes VAT (USA)
  • ISBN: 978-3-030-27947-9
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   79.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.
Fig. 3.

Notes

  1. 1.

    For the purposes of this paper, we use the term genre in a broad sense of text variety, and text in the sense of “any passage (of language), spoken or written, of whatever length, that does form a unified whole” [11].

  2. 2.

    We calculated this by taking the average of MUC, BCUBED and CEAF F1 scores in Table 4 in [1] as explained in http://conll.cemantix.org/2011/faq.html and comparing it with the CoNLL value in Table 3 in [6].

  3. 3.

    The closest previous mention of the same referent.

  4. 4.

    The performance rates are calculated with the CONLL scorer as explained in http://conll.cemantix.org/2011/faq.html.

  5. 5.

    ftp://ftp.cis.upenn.edu/pub/treebank/public_html/tokenization.html.

References

  1. Aktaş, B., Scheffler, T., Stede, M.: Anaphora resolution for twitter conversations: an exploratory study. In: Proceedings of the First Workshop on Computational Models of Reference, Anaphora and Coreference, New Orleans, Louisiana, pp. 1–10. Association for Computational Linguistics, June 2018

    Google Scholar 

  2. Amoia, M., Kunz, K., Lapshinova-Koltunski, E.: Coreference in spoken vs. written texts: a corpus-based analysis. In: Chair, N.C.C., et al. (eds.) Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC 2012), Istanbul, Turkey. European Language Resources Association (ELRA), May 2012

    Google Scholar 

  3. BBN Technologies: Co-reference Guidelines for English OntoNotes Version 7.0 (2007)

    Google Scholar 

  4. Biber, D.: Using computer-based text corpora to analyze the referential strategies of spoken and written texts. In: Svartvik, J. (ed.) Directions in Corpus Linguistics: Proceedings of Nobel Symposium 82, Stockholm, 4–8 August 1991, pp. 213–252. Berlin, Mouton (1992)

    Google Scholar 

  5. Biber, D., Finegan, E., Johansson, S., Conrad, S., Leech, G.: Longman Grammar of Spoken and Written English, 1st edn. Longman, Harlow (1999)

    Google Scholar 

  6. Clark, K., Manning, C.D.: Entity-centric coreference resolution with model stacking. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Beijing, China, pp. 1405–1415. Association for Computational Linguistics, July 2015

    Google Scholar 

  7. Durrett, G., Klein, D.: Easy victories and uphill battles in coreference resolution. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, Washington, USA, pp. 1971–1982. Association for Computational Linguistics, October 2013

    Google Scholar 

  8. Engell, S.: Coreference in English and German: A Theoretical Framework and Its Application in a Study of Court Decisions. Logos Verlag, Berlin (2016)

    Google Scholar 

  9. Fox, B.A.: Discourse Structure and Anaphora: Written and Conversational English. Cambridge University Press, Cambridge (1987)

    CrossRef  Google Scholar 

  10. Gardner, M., et al.: Allennlp: a deep semantic natural language processing platform. In: Proceedings of Workshop for NLP Open Source Software (NLP-OSS) (2017)

    Google Scholar 

  11. Halliday, M., Hasan, R.: Cohesion in English. Longman, London (1976)

    Google Scholar 

  12. Hardmeier, C., Bevacqua, L., Loáiciga, S., Rohde, H.: Forms of anaphoric reference to organisational named entities: hoping to widen appeal, they diversified. In: Proceedings of the Seventh Named Entities Workshop, Melbourne, Australia, pp. 36–40. Association for Computational Linguistics, July 2018

    Google Scholar 

  13. Kunz, K., Lapshinova-Koltunski, E.: Cross-linguistic analysis of discourse variation across registers. cross-linguistic studies at the interface between lexis and grammar. Nord. J. Eng. Stud. 14, 258–288 (2015)

    CrossRef  Google Scholar 

  14. Kunz, K., Lapshinova-Koltunski, E., Martínez, J.M.: Beyond identity coreference: contrasting indicators of textual coherence in English and German. In: Proceedings of the Workshop on Coreference Resolution Beyond OntoNotes (CORBON 2016), pp. 23–31. Association for Computational Linguistics (2016)

    Google Scholar 

  15. Lapshinova-Koltunski, E.: Exploration of inter- and intralingual variation of discourse phenomena. In: Proceedings of the Second Workshop on Discourse in Machine Translation, DiscoMT@EMNLP 2015, Lisbon, Portugal, pp. 158–167, 17 September 2015

    Google Scholar 

  16. Neumann, S., Fest, J.: Cohesive devices across registers and varieties: the role of medium in English. In: Schubert, C., Sanchez-Stockhammer, C. (ed.) Variational Text Linguistics: Revisiting Register in English, Topics in English Linguistics, Berlin, Boston, vol. 90, pp. 195–220. DeGruyter (2016)

    Google Scholar 

  17. Pradhan, S., et al.: Towards robust linguistic analysis using ontonotes. In: Proceedings of the Seventeenth Conference on Computational Natural Language Learning, pp. 143–152. Association for Computational Linguistics (2013)

    Google Scholar 

  18. Pradhan, S., Moschitti, A., Xue, N., Uryupina, O., Zhang, Y.: ConLL-2012 shared task: modeling multilingual unrestricted coreference in ontonotes. In: Joint Conference on EMNLP and CoNLL - Shared Task, pp. 1–40. Association for Computational Linguistics (2012)

    Google Scholar 

  19. Uryupina, O., Artstein, R., Bristot, A., Cavicchio, F., Rodríguez, K.J., Poesio, M.: ARRAU: linguistically-motivated annotation of anaphoric descriptions. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016, Portorož, Slovenia, 23–28 May 2016

    Google Scholar 

  20. Uryupina, O., Poesio, M.: Domain-specific vs. uniform modeling for coreference resolution. In: Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC-2012), Istanbul, Turkey, pp. 187–191. European Language Resources Association (ELRA), May 2012

    Google Scholar 

  21. Weischedel, R., et al.: Ontonotes release 5.0 ldc2013t19. Web Download. Linguistic Data Consortium, Philadelphia, PA (2013)

    Google Scholar 

  22. Zeldes, A.: A predictive model for notional anaphora in English. In: Proceedings of the First Workshop on Computational Models of Reference, Anaphora and Coreference, New Orleans, Louisiana, pp. 34–43. Association for Computational Linguistics, June 2018

    Google Scholar 

Download references

Acknowledgments

This work was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation), Projektnummer 317633480, SFB 1287, Project A03. We thank the anonymous reviewers for their comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Berfin Aktaş .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Aktaş, B., Scheffler, T., Stede, M. (2019). Coreference in English OntoNotes: Properties and Genre Differences. In: Ekštein, K. (eds) Text, Speech, and Dialogue. TSD 2019. Lecture Notes in Computer Science(), vol 11697. Springer, Cham. https://doi.org/10.1007/978-3-030-27947-9_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-27947-9_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-27946-2

  • Online ISBN: 978-3-030-27947-9

  • eBook Packages: Computer ScienceComputer Science (R0)