Coreference in English OntoNotes: Properties and Genre Differences

Aktaş, Berfin; Scheffler, Tatjana; Stede, Manfred

doi:10.1007/978-3-030-27947-9_15

Berfin Aktaş⁹,
Tatjana Scheffler⁹ &
Manfred Stede⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11697))

Included in the following conference series:

International Conference on Text, Speech, and Dialogue

895 Accesses
1 Citations
1 Altmetric

Abstract

The OntoNotes corpus is widely used for training and testing coreference resolution systems, but only little attention has so far been given to the differences between the different genres of language that the corpus is composed of. We are primarily interested in the contrast between spoken and written language, and thus we conducted in-depth analyses of various reference-related properties of the sub-corpora of OntoNotes, which yield several statistically significant differences. We compare these to predictions made in the Linguistics literature, and draw some conclusions for potential genre-specific implementations of coreference resolution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
For the purposes of this paper, we use the term genre in a broad sense of text variety, and text in the sense of “any passage (of language), spoken or written, of whatever length, that does form a unified whole” [11].
2.
We calculated this by taking the average of MUC, BCUBED and CEAF F1 scores in Table 4 in [1] as explained in http://conll.cemantix.org/2011/faq.html and comparing it with the CoNLL value in Table 3 in [6].
3.
The closest previous mention of the same referent.
4.
The performance rates are calculated with the CONLL scorer as explained in http://conll.cemantix.org/2011/faq.html.
5.
ftp://ftp.cis.upenn.edu/pub/treebank/public_html/tokenization.html.

References

Aktaş, B., Scheffler, T., Stede, M.: Anaphora resolution for twitter conversations: an exploratory study. In: Proceedings of the First Workshop on Computational Models of Reference, Anaphora and Coreference, New Orleans, Louisiana, pp. 1–10. Association for Computational Linguistics, June 2018
Google Scholar
Amoia, M., Kunz, K., Lapshinova-Koltunski, E.: Coreference in spoken vs. written texts: a corpus-based analysis. In: Chair, N.C.C., et al. (eds.) Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC 2012), Istanbul, Turkey. European Language Resources Association (ELRA), May 2012
Google Scholar
BBN Technologies: Co-reference Guidelines for English OntoNotes Version 7.0 (2007)
Google Scholar
Biber, D.: Using computer-based text corpora to analyze the referential strategies of spoken and written texts. In: Svartvik, J. (ed.) Directions in Corpus Linguistics: Proceedings of Nobel Symposium 82, Stockholm, 4–8 August 1991, pp. 213–252. Berlin, Mouton (1992)
Google Scholar
Biber, D., Finegan, E., Johansson, S., Conrad, S., Leech, G.: Longman Grammar of Spoken and Written English, 1st edn. Longman, Harlow (1999)
Google Scholar
Clark, K., Manning, C.D.: Entity-centric coreference resolution with model stacking. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Beijing, China, pp. 1405–1415. Association for Computational Linguistics, July 2015
Google Scholar
Durrett, G., Klein, D.: Easy victories and uphill battles in coreference resolution. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, Washington, USA, pp. 1971–1982. Association for Computational Linguistics, October 2013
Google Scholar
Engell, S.: Coreference in English and German: A Theoretical Framework and Its Application in a Study of Court Decisions. Logos Verlag, Berlin (2016)
Google Scholar
Fox, B.A.: Discourse Structure and Anaphora: Written and Conversational English. Cambridge University Press, Cambridge (1987)
Book Google Scholar
Gardner, M., et al.: Allennlp: a deep semantic natural language processing platform. In: Proceedings of Workshop for NLP Open Source Software (NLP-OSS) (2017)
Google Scholar
Halliday, M., Hasan, R.: Cohesion in English. Longman, London (1976)
Google Scholar
Hardmeier, C., Bevacqua, L., Loáiciga, S., Rohde, H.: Forms of anaphoric reference to organisational named entities: hoping to widen appeal, they diversified. In: Proceedings of the Seventh Named Entities Workshop, Melbourne, Australia, pp. 36–40. Association for Computational Linguistics, July 2018
Google Scholar
Kunz, K., Lapshinova-Koltunski, E.: Cross-linguistic analysis of discourse variation across registers. cross-linguistic studies at the interface between lexis and grammar. Nord. J. Eng. Stud. 14, 258–288 (2015)
Article Google Scholar
Kunz, K., Lapshinova-Koltunski, E., Martínez, J.M.: Beyond identity coreference: contrasting indicators of textual coherence in English and German. In: Proceedings of the Workshop on Coreference Resolution Beyond OntoNotes (CORBON 2016), pp. 23–31. Association for Computational Linguistics (2016)
Google Scholar
Lapshinova-Koltunski, E.: Exploration of inter- and intralingual variation of discourse phenomena. In: Proceedings of the Second Workshop on Discourse in Machine Translation, DiscoMT@EMNLP 2015, Lisbon, Portugal, pp. 158–167, 17 September 2015
Google Scholar
Neumann, S., Fest, J.: Cohesive devices across registers and varieties: the role of medium in English. In: Schubert, C., Sanchez-Stockhammer, C. (ed.) Variational Text Linguistics: Revisiting Register in English, Topics in English Linguistics, Berlin, Boston, vol. 90, pp. 195–220. DeGruyter (2016)
Google Scholar
Pradhan, S., et al.: Towards robust linguistic analysis using ontonotes. In: Proceedings of the Seventeenth Conference on Computational Natural Language Learning, pp. 143–152. Association for Computational Linguistics (2013)
Google Scholar
Pradhan, S., Moschitti, A., Xue, N., Uryupina, O., Zhang, Y.: ConLL-2012 shared task: modeling multilingual unrestricted coreference in ontonotes. In: Joint Conference on EMNLP and CoNLL - Shared Task, pp. 1–40. Association for Computational Linguistics (2012)
Google Scholar
Uryupina, O., Artstein, R., Bristot, A., Cavicchio, F., Rodríguez, K.J., Poesio, M.: ARRAU: linguistically-motivated annotation of anaphoric descriptions. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016, Portorož, Slovenia, 23–28 May 2016
Google Scholar
Uryupina, O., Poesio, M.: Domain-specific vs. uniform modeling for coreference resolution. In: Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC-2012), Istanbul, Turkey, pp. 187–191. European Language Resources Association (ELRA), May 2012
Google Scholar
Weischedel, R., et al.: Ontonotes release 5.0 ldc2013t19. Web Download. Linguistic Data Consortium, Philadelphia, PA (2013)
Google Scholar
Zeldes, A.: A predictive model for notional anaphora in English. In: Proceedings of the First Workshop on Computational Models of Reference, Anaphora and Coreference, New Orleans, Louisiana, pp. 34–43. Association for Computational Linguistics, June 2018
Google Scholar

Download references

Acknowledgments

This work was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation), Projektnummer 317633480, SFB 1287, Project A03. We thank the anonymous reviewers for their comments.

Author information

Authors and Affiliations

SFB1287, Research Focus Cognitive Sciences, University of Potsdam, Potsdam, Germany
Berfin Aktaş, Tatjana Scheffler & Manfred Stede

Authors

Berfin Aktaş
View author publications
You can also search for this author in PubMed Google Scholar
Tatjana Scheffler
View author publications
You can also search for this author in PubMed Google Scholar
Manfred Stede
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Berfin Aktaş .

Editor information

Editors and Affiliations

University of West Bohemia, Pilsen, Czech Republic
Kamil Ekštein

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Aktaş, B., Scheffler, T., Stede, M. (2019). Coreference in English OntoNotes: Properties and Genre Differences. In: Ekštein, K. (eds) Text, Speech, and Dialogue. TSD 2019. Lecture Notes in Computer Science(), vol 11697. Springer, Cham. https://doi.org/10.1007/978-3-030-27947-9_15

Download citation

DOI: https://doi.org/10.1007/978-3-030-27947-9_15
Published: 06 August 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-27946-2
Online ISBN: 978-3-030-27947-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics