Abstract
Event coreference resolution is a task in which different text fragments that refer to the same real-world event are automatically linked together. This task can be performed not only within a single document but also across different documents and can serve as a basis for many useful Natural Language Processing applications. Resources for this type of research, however, are extremely limited. We compiled the first large-scale dataset for cross-document event coreference resolution in Dutch, comparable in size to the most widely used English event coreference corpora. As data for event coreference is notoriously sparse, we took additional steps to maximize the number of coreference links in our corpus. Due to the complex nature of event coreference resolution, many algorithms consist of pipeline architectures which rely on a series of upstream tasks such as event detection, event argument identification and argument coreference. We tackle the task of event argument coreference to both illustrate the potential of our compiled corpus and to lay the groundwork for a Dutch event coreference resolution system in the future. Results show that existing NLP algorithms can be easily retrofitted to contribute to the subtasks of an event coreference resolution pipeline system.
Similar content being viewed by others
Notes
These articles were collected as a part of the NewsDNA project (https://www.ugent.be/mict/en/research/newsdna)
More details on the annotation of implicit sentiment for events can be found here https://github.com/Cyvhee/ImplicitSentimentAnnotations.
https://github.com/andreasvc/dutchcoref, v0.1, 22/03/21.
References
ACE English Annotation Guidelines for Events (v5.4.3). (2008). Linguistics Data Consortium.
Ahn, D. (2006). The stages of event extraction. Proceedings of the Work-shop on Annotating and Reasoning about Time and Events - ARTE’06 (July), 1–8. Retrieved from https://doi.org/10.3115/1629235.1629236
Aone, C., & Ramos-Santacruz, M. (2000). Rees: A large-scale relation and event extraction system. In Sixth applied natural language processing conference (pp. 76–83).
Bagga, A., & Baldwin, B. (1998). Algorithms for scoring coreference chains. In The first international conference on language resources and evaluation workshop on linguistics coreference (Vol. 1, pp. 563–566).
Bejan, C., & Harabagiu, S. (2010). Unsupervised Event Coreference Resolution with Rich Linguistic Features. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (July), pp. 1412–1422. Retrieved from http://www.aclweb.org/anthology/P10-1143https://doi.org/10.1162/COLI_a_00174
Bugert, M., Reimers, N., Barhom, S., Dagan, I., & Gurevych, I. (2020). Breaking the subtopic barrier in cross-document event coreference resolution. In Text2story@ ecir (pp. 23–29).
Cai, J., & Strube, M. (2010). Evaluation metrics for end-to-end coreference resolution systems. In Proceedings of the sigdial 2010 conference (pp. 28–36).
Caicedo, R. W. A., Soriano, J. M. G., & , Sasieta, H. A. M. (2022). Bootstrapping semi-supervised annotation method for potential suicidal messages. Internet Interventions.
Chen, C., & Ng, V. (2016). Joint inference over a lightly supervised information extraction pipeline: Towards event coreference resolution for resource-scarce languages. In Proceedings of the thirtieth AAAI conference on artificial intelligence, February 12–17, 2016, Phoenix, Arizona, USA. (pp. 2913–2920). Retrieved from http://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/view/12413
Chen, C., & Ng, V. S. (2014). An end-to-end Chinese event coreference resolver [c].
Chen, T., He, T., Benesty, M., Khotilovich, V., Tang, Y., Cho, H., et al. (2015). Xgboost: Extreme gradient boosting. R package version 0.4-2, 1 (4).
Choubey, P. K., Raju, K., & Huang, R. (2018). Identifying the most dominant event in a news article by mining event coreference relations, 6.
Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1), 37–46.
Colruyt, C. (2020). Eventdna annotation guidelines.
Colruyt, C., De Clercq, O., & Hoste, V. (2019a). EventDNA: Annotation guidelines for entities and events in Dutch News Texts (v1.0) (Technical report).
Colruyt, C., De Clercq, O., & Hoste, V. (2019b). Eventdna: Guidelines for entities and events in Dutch news texts (v1. 0). LT3 Technical Report-LT3 19-01.
Cybulska, A., & Vossen, P. (2014a). Guidelines for ecb+ annotation of events and their coreference. In Technical report. Technical Report NWR-2014-1, VU University Amsterdam.
Cybulska, A., & Vossen, P. (2014b). Using a sledgehammer to crack a nut? Lexical diversity and event coreference resolution. In Proceedings of the ninth international conference on language resources and evaluation (LREC’14), 4545–4552.
Cybulska, A., & Vossen, P. (2015). Translating Granularity of Event Slots into Features for Event Coreference Resolution. In Proceedings of the 3rd workshop on EVENTS: Definition, detection, coreference, and representation (pp. 1–10). Denver, Colorado: Association for Computational Linguistics. Retrieved 2019-02-20, from https://doi.org/10.3115/v1/W15-0801
De Langhe, L., De Clercq, O., & Hoste, V. (2021). Guidelines for annotating events and event coreference in Dutch News Articles (Technical Report).
De Marneffe, M.-C., Rafferty, A. N., & Manning, C. D. (2008). Finding contradictions in text. In Proceedings of acl-08: Hlt (pp. 1039–1047).
Desmet, B., & Hoste, V. (2014). Fine-grained Dutch named entity recognition. Language Resources and Evaluation, 48 (2), 307–343. Retrieved 2019, Dec 09, from http://hdl.handle.net/1854/LU-4246431https://doi.org/10.1007/s10579-013-9255-y
Elango, P. (2005). Coreference resolution: A survey. Madison, WI: University of Wisconsin.
Horne, B., & Adali, S. (2017). This just in: Fake news packs a lot in title, uses simpler, repetitive content in text body, more similar to satire than real news. In Proceedings of the international aaai conference on web and social media (Vol. 11).
Hoste, V. (2005). Optimization issues in machine learning of coreference resolution (PhD Thesis). Universiteit Antwerpen. Faculteit Letteren en Wijsbegeerte.
Humphreys, K., Gaizauskas, R., & Azzam, S. (1997). Event coreference for information extraction. In Proceedings of the ACL/EACL Workshop on Operational Factors in Practical, Robus Anaphora Resolution for Unrestricted Texts (pp. 75–81).
Ji, H., & Grishman, R. (2011). Knowledge base population: Successful approaches and challenges. In Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies (pp. 1148–1158).
Joshi, M., Chen, D., Liu, Y., Weld, D. S., Zettlemoyer, L., & Levy, O. (2020). Spanbert: Improving pre-training by representing and predicting spans. Transactions of the Association for Computational Linguistics, 8, 64–77.
Lee, H., Surdeanu, M., & Jurafsky, D. (2017). A scaffolding approach to coreference resolution integrating statistical and rule-based models.
Lee, K., He, L., & Zettlemoyer, L. (2018). Higher-order coreference resolution with coarse-to-fine inference. arXiv preprint arXiv:1804.05392.
Liu, Y., & Lapata, M. (2019). Hierarchical transformers for multi-document summarization. arXiv preprint arXiv:1905.13164.
Lu, J., & Ng, V. (2016a). Event Coreference Resolution with Multi-Pass Sieves, 8.
Lu, J., & Ng, V. (2016b). Event coreference resolution with multi-pass sieves. In Proceedings of the tenth international conference on language resources and evaluation (lrec’16) (pp. 3996–4003).
Lu, J., & Ng, V. (2017). Joint Learning for Event Coreference Resolution (pp. 90–101). Association for Computational Linguistics. Retrieved 2018 Jul 02, from https://doi.org/10.18653/v1/P17-1009
Lu, J., & Ng, V. (2018, July). Event Coreference Resolution: A Survey of Two Decades of Research. In Proceedings of the twenty-seventh international joint conference on artificial intelligence (pp. 5479–5486). Stockholm, Sweden: International Joint Conferences on Artificial Intelligence Organization. Retrieved 2019 Jan 14, from https://doi.org/10.24963/ijcai.2018/773
Lu, J., Venugopal, D., Gogate, V., & Ng, V. (2016). In Joint inference for event coreference resolution. COLING, 12.
Luo, X. (2005). On coreference resolution performance metrics. In Proceedings of human language technology conference and conference on empirical methods in natural language processing (pp. 25–32).
Minard, A.-L., Speranza, M., Urizar, R., van Erp, M., Schoen, A., & van Son, C. (2016). MEANTIME, the NewsReader Multilingual Event and Time Corpus. In Proceedings of the 10th language resources and evaluation conference (LREC 2016) (p. 6). Portorož, Slovenia: European Language Resources Association (ELRA).
Mitamura, T., Liu, Z., & Hovy, E. (2015). Overview of TAC KBP 2015 Event Nugget Track. Kbp Tac, 2015, 1–31.
Moosavi, N. S., & Strube, M. (2016). Which coreference evaluation metric do you trust? a proposal for a link-based entity aware metric. In Proceedings of the 54th annual meeting of the association for computational linguistics (volume 1: Long papers) (pp. 632–642).
Narayanan, S., & Harabagiu, S. (2004). Question answering based on semantic structures (Technical Report). International Computer Science Inst, Berkeley, CA.
Ng, V. (2017). Machine learning for entity coreference resolution: A retrospective look at two decades of research. In Proceedings of the aaai conference on artificial intelligence (Vol. 31).
Nguyen, T. H., Meyers, A., & Grishman, R. (2016). New York University 2016 System for KBP Event Nugget: A Deep Learning Approach. Text Analysis Conference, 7.
NIST. (2005). The ACE 2005 (ACE 05) Evaluation Plan.
Oostdijk, N., Reynaert, M., Hoste, V., & Schuurman, I. (2013, November). The Construction of a 500-Million-Word Reference Corpus of Contemporary Written Dutch (pp. 219–247). https://doi.org/10.1007/978-3-642-30910-6_13
Poot, C., & van Cranenburgh, A. (2020). A benchmark of rule-based and neural coreference resolution in Dutch novels and news. In Proceedings of the third workshop on computational models of reference, Anaphora and coreference.
Postma, M., van Miltenburg, E., Segers, R., Schoen, A., & Vossen, P. (2016). Open Dutch WordNet. In Proceedings of the eight global wordnet conference, 300–307. Retrieved from http://wordpress.let.vupr.nl/odwn/
Pradhan, S. S., Ramshaw, L., Weischedel, R., MacBride, J., & Micciulla, L. (2007). Unrestricted coreference: Identifying entities and events in ontonotes. ICSC 2007 International Conference on Semantic Computing, 446–453. https://doi.org/10.1109/ICSC.2007.93
Pustejovsky, J., Castano, J., Ingria, R., Saurı, R., Gaizauskas, R., Setzer, A., & Katz, G. (2003). TimeML: Robust Specication of Event and Temporal Expressions in Text. New Directions in Question Answering, 3, 28–34.
Quine, W. V. O. (1985). Events and reification. Actions and events: Perspectives on the philosophy of Donald Davidson, pp. 162–171.
Raghunathan, K., Lee, H., Rangarajan, S., Chambers, N., Surdeanu, M., Jurafsky, D., & Manning, C. (2010, October). A Multi-Pass Sieve for Coreference Resolution. In Proceedings of the 2010 conference on empirical methods in natural language processing (pp. 492–501). Cambridge, MA: Association for Computational Linguistics. Retrieved 2019 Feb 27, from http://www.aclweb.org/anthology/D10-1048
Rubin, V. L., Chen, Y., & Conroy, N. J. (2015). Deception detection for news: Three types of fakes: Deception detection for news: Three types of fakes. Proceedings of the Association for Information Science and Technology, 52(1), 1–4. https://doi.org/10.1002/pra2.2015.145052010083.
Sarwar, B., Karypis, G., Konstan, J., & Riedl, J. (2001, April). Item-based collaborative filtering recommendation algorithms. In Proceedings of the 10th international conference on World Wide Web (pp. 285–295). Hong Kong, Hong Kong: Association for Computing Machinery. Retrieved 2020 Feb 26, from https://doi.org/10.1145/371920.372071
Su, M.-H., Wu, C.-H., & Shih, P.-C. (2019). Automatic ontology population using deep learning for triple extraction. In 2019 Asia-Pacific signal and information processing association annual summit and conference (apsipa asc) (pp. 262–267).
Sukthanker, R., Poria, S., Cambria, E., & Thirunavukarasu, R. (2020). Anaphora and coreference resolution: A review. Information Fusion, 59, 139–162.
van Cranenburgh, A. (2019). A Dutch coreference resolution system with an evaluation on literary fiction. Computational Linguistics in the Netherlands Journal, 9, 27–54.
Van de Kauter, M., Coorman, G., Lefever, E., Desmet, B., Macken, L., & Hoste, V. (2013). Lets preprocess: The multilingual lt3 linguistic preprocessing toolkit. Computational Linguistics in the Netherlands journal, 3, 103–120.
Van de Kauter, M., Coorman, G., Lefever, E., Desmet, B., Macken, L., & Hoste, V. (2013). LeTs Preprocess: The multilingual LT3 linguistic preprocessing toolkit. Computational Linguistics in the Netherlands Journal, 3, 103–120.
Van Hee, C., De Clercq, O., & Hoste, V. (2021). Exploring implicit sentiment evoked by fine-grained news events. In Workshop on computational approaches to subjectivity and sentiment analysis (wassa), held in conjunction with eacl 2021 (pp. 138–148).
van Noord, G. J. (2006). At last parsing is now operational.
Vermeulen, J. (2018). newsdna: Promoting news diversity: An interdisciplinary investigation into algorithmic design, personalization and the public interest (2018–2022).
Vilain, M., Burger, J. D., Aberdeen, J., Connolly, D., & Hirschman, L. (1995). A model-theoretic coreference scoring scheme. In Sixth message understanding conference (muc-6): Proceedings of a conference held in Columbia, Maryland, November 6–8, 1995.
Vossen, P. (2018). NewsReader at SemEval-2018 Task 5: Counting events by reasoning over event-centric-knowledge-graphs, 7.
Yan, M., Xia, J., Wu, C., Bi, B., Zhao, Z., Zhang, J., & Chen, H. (2019). A deep cascade model for multi-document reading comprehension. In Proceedings of the aaai conference on artificial intelligence (Vol. 33, pp. 7354–7361).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix: IAA scores
Appendix: IAA scores
See Table 10.
Rights and permissions
About this article
Cite this article
De Langhe, L., De Clercq, O. & Hoste, V. Constructing a cross-document event coreference corpus for Dutch. Lang Resources & Evaluation 57, 819–848 (2023). https://doi.org/10.1007/s10579-022-09597-1
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10579-022-09597-1