Abstract
The OntoNotes corpus is widely used for training and testing coreference resolution systems, but only little attention has so far been given to the differences between the different genres of language that the corpus is composed of. We are primarily interested in the contrast between spoken and written language, and thus we conducted in-depth analyses of various reference-related properties of the sub-corpora of OntoNotes, which yield several statistically significant differences. We compare these to predictions made in the Linguistics literature, and draw some conclusions for potential genre-specific implementations of coreference resolution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
For the purposes of this paper, we use the term genre in a broad sense of text variety, and text in the sense of “any passage (of language), spoken or written, of whatever length, that does form a unified whole” [11].
- 2.
We calculated this by taking the average of MUC, BCUBED and CEAF F1 scores in Table 4 in [1] as explained in http://conll.cemantix.org/2011/faq.html and comparing it with the CoNLL value in Table 3 in [6].
- 3.
The closest previous mention of the same referent.
- 4.
The performance rates are calculated with the CONLL scorer as explained in http://conll.cemantix.org/2011/faq.html.
- 5.
References
Aktaş, B., Scheffler, T., Stede, M.: Anaphora resolution for twitter conversations: an exploratory study. In: Proceedings of the First Workshop on Computational Models of Reference, Anaphora and Coreference, New Orleans, Louisiana, pp. 1–10. Association for Computational Linguistics, June 2018
Amoia, M., Kunz, K., Lapshinova-Koltunski, E.: Coreference in spoken vs. written texts: a corpus-based analysis. In: Chair, N.C.C., et al. (eds.) Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC 2012), Istanbul, Turkey. European Language Resources Association (ELRA), May 2012
BBN Technologies: Co-reference Guidelines for English OntoNotes Version 7.0 (2007)
Biber, D.: Using computer-based text corpora to analyze the referential strategies of spoken and written texts. In: Svartvik, J. (ed.) Directions in Corpus Linguistics: Proceedings of Nobel Symposium 82, Stockholm, 4–8 August 1991, pp. 213–252. Berlin, Mouton (1992)
Biber, D., Finegan, E., Johansson, S., Conrad, S., Leech, G.: Longman Grammar of Spoken and Written English, 1st edn. Longman, Harlow (1999)
Clark, K., Manning, C.D.: Entity-centric coreference resolution with model stacking. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Beijing, China, pp. 1405–1415. Association for Computational Linguistics, July 2015
Durrett, G., Klein, D.: Easy victories and uphill battles in coreference resolution. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, Washington, USA, pp. 1971–1982. Association for Computational Linguistics, October 2013
Engell, S.: Coreference in English and German: A Theoretical Framework and Its Application in a Study of Court Decisions. Logos Verlag, Berlin (2016)
Fox, B.A.: Discourse Structure and Anaphora: Written and Conversational English. Cambridge University Press, Cambridge (1987)
Gardner, M., et al.: Allennlp: a deep semantic natural language processing platform. In: Proceedings of Workshop for NLP Open Source Software (NLP-OSS) (2017)
Halliday, M., Hasan, R.: Cohesion in English. Longman, London (1976)
Hardmeier, C., Bevacqua, L., Loáiciga, S., Rohde, H.: Forms of anaphoric reference to organisational named entities: hoping to widen appeal, they diversified. In: Proceedings of the Seventh Named Entities Workshop, Melbourne, Australia, pp. 36–40. Association for Computational Linguistics, July 2018
Kunz, K., Lapshinova-Koltunski, E.: Cross-linguistic analysis of discourse variation across registers. cross-linguistic studies at the interface between lexis and grammar. Nord. J. Eng. Stud. 14, 258–288 (2015)
Kunz, K., Lapshinova-Koltunski, E., Martínez, J.M.: Beyond identity coreference: contrasting indicators of textual coherence in English and German. In: Proceedings of the Workshop on Coreference Resolution Beyond OntoNotes (CORBON 2016), pp. 23–31. Association for Computational Linguistics (2016)
Lapshinova-Koltunski, E.: Exploration of inter- and intralingual variation of discourse phenomena. In: Proceedings of the Second Workshop on Discourse in Machine Translation, DiscoMT@EMNLP 2015, Lisbon, Portugal, pp. 158–167, 17 September 2015
Neumann, S., Fest, J.: Cohesive devices across registers and varieties: the role of medium in English. In: Schubert, C., Sanchez-Stockhammer, C. (ed.) Variational Text Linguistics: Revisiting Register in English, Topics in English Linguistics, Berlin, Boston, vol. 90, pp. 195–220. DeGruyter (2016)
Pradhan, S., et al.: Towards robust linguistic analysis using ontonotes. In: Proceedings of the Seventeenth Conference on Computational Natural Language Learning, pp. 143–152. Association for Computational Linguistics (2013)
Pradhan, S., Moschitti, A., Xue, N., Uryupina, O., Zhang, Y.: ConLL-2012 shared task: modeling multilingual unrestricted coreference in ontonotes. In: Joint Conference on EMNLP and CoNLL - Shared Task, pp. 1–40. Association for Computational Linguistics (2012)
Uryupina, O., Artstein, R., Bristot, A., Cavicchio, F., Rodríguez, K.J., Poesio, M.: ARRAU: linguistically-motivated annotation of anaphoric descriptions. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016, Portorož, Slovenia, 23–28 May 2016
Uryupina, O., Poesio, M.: Domain-specific vs. uniform modeling for coreference resolution. In: Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC-2012), Istanbul, Turkey, pp. 187–191. European Language Resources Association (ELRA), May 2012
Weischedel, R., et al.: Ontonotes release 5.0 ldc2013t19. Web Download. Linguistic Data Consortium, Philadelphia, PA (2013)
Zeldes, A.: A predictive model for notional anaphora in English. In: Proceedings of the First Workshop on Computational Models of Reference, Anaphora and Coreference, New Orleans, Louisiana, pp. 34–43. Association for Computational Linguistics, June 2018
Acknowledgments
This work was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation), Projektnummer 317633480, SFB 1287, Project A03. We thank the anonymous reviewers for their comments.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Aktaş, B., Scheffler, T., Stede, M. (2019). Coreference in English OntoNotes: Properties and Genre Differences. In: Ekštein, K. (eds) Text, Speech, and Dialogue. TSD 2019. Lecture Notes in Computer Science(), vol 11697. Springer, Cham. https://doi.org/10.1007/978-3-030-27947-9_15
Download citation
DOI: https://doi.org/10.1007/978-3-030-27947-9_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-27946-2
Online ISBN: 978-3-030-27947-9
eBook Packages: Computer ScienceComputer Science (R0)