Constructing a cross-document event coreference corpus for Dutch

De Langhe, Loic; De Clercq, Orphée; Hoste, Veronique

doi:10.1007/s10579-022-09597-1

Constructing a cross-document event coreference corpus for Dutch

Original Paper
Published: 04 June 2022

Volume 57, pages 819–848, (2023)
Cite this article

Language Resources and Evaluation Aims and scope Submit manuscript

297 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

Event coreference resolution is a task in which different text fragments that refer to the same real-world event are automatically linked together. This task can be performed not only within a single document but also across different documents and can serve as a basis for many useful Natural Language Processing applications. Resources for this type of research, however, are extremely limited. We compiled the first large-scale dataset for cross-document event coreference resolution in Dutch, comparable in size to the most widely used English event coreference corpora. As data for event coreference is notoriously sparse, we took additional steps to maximize the number of coreference links in our corpus. Due to the complex nature of event coreference resolution, many algorithms consist of pipeline architectures which rely on a series of upstream tasks such as event detection, event argument identification and argument coreference. We tackle the task of event argument coreference to both illustrate the potential of our compiled corpus and to lay the groundwork for a Dutch event coreference resolution system in the future. Results show that existing NLP algorithms can be easily retrofitted to contribute to the subtasks of an event coreference resolution pipeline system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

XCoref: Cross-document Coreference Resolution in the Wild

A brief survey on recent advances in coreference resolution

Article 26 May 2023

Exploring Coreference Uncertainty of Generically Extracted Event Mentions

Notes

These articles were collected as a part of the NewsDNA project (https://www.ugent.be/mict/en/research/newsdna)
More details on the annotation of implicit sentiment for events can be found here https://github.com/Cyvhee/ImplicitSentimentAnnotations.
https://github.com/andreasvc/dutchcoref, v0.1, 22/03/21.

References

ACE English Annotation Guidelines for Events (v5.4.3). (2008). Linguistics Data Consortium.
Ahn, D. (2006). The stages of event extraction. Proceedings of the Work-shop on Annotating and Reasoning about Time and Events - ARTE’06 (July), 1–8. Retrieved from https://doi.org/10.3115/1629235.1629236
Aone, C., & Ramos-Santacruz, M. (2000). Rees: A large-scale relation and event extraction system. In Sixth applied natural language processing conference (pp. 76–83).
Bagga, A., & Baldwin, B. (1998). Algorithms for scoring coreference chains. In The first international conference on language resources and evaluation workshop on linguistics coreference (Vol. 1, pp. 563–566).
Bejan, C., & Harabagiu, S. (2010). Unsupervised Event Coreference Resolution with Rich Linguistic Features. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (July), pp. 1412–1422. Retrieved from http://www.aclweb.org/anthology/P10-1143 https://doi.org/10.1162/COLI_a_00174
Bugert, M., Reimers, N., Barhom, S., Dagan, I., & Gurevych, I. (2020). Breaking the subtopic barrier in cross-document event coreference resolution. In Text2story@ ecir (pp. 23–29).
Cai, J., & Strube, M. (2010). Evaluation metrics for end-to-end coreference resolution systems. In Proceedings of the sigdial 2010 conference (pp. 28–36).
Caicedo, R. W. A., Soriano, J. M. G., & , Sasieta, H. A. M. (2022). Bootstrapping semi-supervised annotation method for potential suicidal messages. Internet Interventions.
Chen, C., & Ng, V. (2016). Joint inference over a lightly supervised information extraction pipeline: Towards event coreference resolution for resource-scarce languages. In Proceedings of the thirtieth AAAI conference on artificial intelligence, February 12–17, 2016, Phoenix, Arizona, USA. (pp. 2913–2920). Retrieved from http://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/view/12413
Chen, C., & Ng, V. S. (2014). An end-to-end Chinese event coreference resolver [c].
Chen, T., He, T., Benesty, M., Khotilovich, V., Tang, Y., Cho, H., et al. (2015). Xgboost: Extreme gradient boosting. R package version 0.4-2, 1 (4).
Choubey, P. K., Raju, K., & Huang, R. (2018). Identifying the most dominant event in a news article by mining event coreference relations, 6.
Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1), 37–46.
Article Google Scholar
Colruyt, C. (2020). Eventdna annotation guidelines.
Colruyt, C., De Clercq, O., & Hoste, V. (2019a). EventDNA: Annotation guidelines for entities and events in Dutch News Texts (v1.0) (Technical report).
Colruyt, C., De Clercq, O., & Hoste, V. (2019b). Eventdna: Guidelines for entities and events in Dutch news texts (v1. 0). LT3 Technical Report-LT3 19-01.
Cybulska, A., & Vossen, P. (2014a). Guidelines for ecb+ annotation of events and their coreference. In Technical report. Technical Report NWR-2014-1, VU University Amsterdam.
Cybulska, A., & Vossen, P. (2014b). Using a sledgehammer to crack a nut? Lexical diversity and event coreference resolution. In Proceedings of the ninth international conference on language resources and evaluation (LREC’14), 4545–4552.
Cybulska, A., & Vossen, P. (2015). Translating Granularity of Event Slots into Features for Event Coreference Resolution. In Proceedings of the 3rd workshop on EVENTS: Definition, detection, coreference, and representation (pp. 1–10). Denver, Colorado: Association for Computational Linguistics. Retrieved 2019-02-20, from https://doi.org/10.3115/v1/W15-0801
De Langhe, L., De Clercq, O., & Hoste, V. (2021). Guidelines for annotating events and event coreference in Dutch News Articles (Technical Report).
De Marneffe, M.-C., Rafferty, A. N., & Manning, C. D. (2008). Finding contradictions in text. In Proceedings of acl-08: Hlt (pp. 1039–1047).
Desmet, B., & Hoste, V. (2014). Fine-grained Dutch named entity recognition. Language Resources and Evaluation, 48 (2), 307–343. Retrieved 2019, Dec 09, from http://hdl.handle.net/1854/LU-4246431 https://doi.org/10.1007/s10579-013-9255-y
Elango, P. (2005). Coreference resolution: A survey. Madison, WI: University of Wisconsin.
Google Scholar
Horne, B., & Adali, S. (2017). This just in: Fake news packs a lot in title, uses simpler, repetitive content in text body, more similar to satire than real news. In Proceedings of the international aaai conference on web and social media (Vol. 11).
Hoste, V. (2005). Optimization issues in machine learning of coreference resolution (PhD Thesis). Universiteit Antwerpen. Faculteit Letteren en Wijsbegeerte.
Humphreys, K., Gaizauskas, R., & Azzam, S. (1997). Event coreference for information extraction. In Proceedings of the ACL/EACL Workshop on Operational Factors in Practical, Robus Anaphora Resolution for Unrestricted Texts (pp. 75–81).
Ji, H., & Grishman, R. (2011). Knowledge base population: Successful approaches and challenges. In Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies (pp. 1148–1158).
Joshi, M., Chen, D., Liu, Y., Weld, D. S., Zettlemoyer, L., & Levy, O. (2020). Spanbert: Improving pre-training by representing and predicting spans. Transactions of the Association for Computational Linguistics, 8, 64–77.
Article Google Scholar
Lee, H., Surdeanu, M., & Jurafsky, D. (2017). A scaffolding approach to coreference resolution integrating statistical and rule-based models.
Lee, K., He, L., & Zettlemoyer, L. (2018). Higher-order coreference resolution with coarse-to-fine inference. arXiv preprint arXiv:1804.05392.
Liu, Y., & Lapata, M. (2019). Hierarchical transformers for multi-document summarization. arXiv preprint arXiv:1905.13164.
Lu, J., & Ng, V. (2016a). Event Coreference Resolution with Multi-Pass Sieves, 8.
Lu, J., & Ng, V. (2016b). Event coreference resolution with multi-pass sieves. In Proceedings of the tenth international conference on language resources and evaluation (lrec’16) (pp. 3996–4003).
Lu, J., & Ng, V. (2017). Joint Learning for Event Coreference Resolution (pp. 90–101). Association for Computational Linguistics. Retrieved 2018 Jul 02, from https://doi.org/10.18653/v1/P17-1009
Lu, J., & Ng, V. (2018, July). Event Coreference Resolution: A Survey of Two Decades of Research. In Proceedings of the twenty-seventh international joint conference on artificial intelligence (pp. 5479–5486). Stockholm, Sweden: International Joint Conferences on Artificial Intelligence Organization. Retrieved 2019 Jan 14, from https://doi.org/10.24963/ijcai.2018/773
Lu, J., Venugopal, D., Gogate, V., & Ng, V. (2016). In Joint inference for event coreference resolution. COLING, 12.
Luo, X. (2005). On coreference resolution performance metrics. In Proceedings of human language technology conference and conference on empirical methods in natural language processing (pp. 25–32).
Minard, A.-L., Speranza, M., Urizar, R., van Erp, M., Schoen, A., & van Son, C. (2016). MEANTIME, the NewsReader Multilingual Event and Time Corpus. In Proceedings of the 10th language resources and evaluation conference (LREC 2016) (p. 6). Portorož, Slovenia: European Language Resources Association (ELRA).
Mitamura, T., Liu, Z., & Hovy, E. (2015). Overview of TAC KBP 2015 Event Nugget Track. Kbp Tac, 2015, 1–31.
Google Scholar
Moosavi, N. S., & Strube, M. (2016). Which coreference evaluation metric do you trust? a proposal for a link-based entity aware metric. In Proceedings of the 54th annual meeting of the association for computational linguistics (volume 1: Long papers) (pp. 632–642).
Narayanan, S., & Harabagiu, S. (2004). Question answering based on semantic structures (Technical Report). International Computer Science Inst, Berkeley, CA.
Ng, V. (2017). Machine learning for entity coreference resolution: A retrospective look at two decades of research. In Proceedings of the aaai conference on artificial intelligence (Vol. 31).
Nguyen, T. H., Meyers, A., & Grishman, R. (2016). New York University 2016 System for KBP Event Nugget: A Deep Learning Approach. Text Analysis Conference, 7.
NIST. (2005). The ACE 2005 (ACE 05) Evaluation Plan.
Oostdijk, N., Reynaert, M., Hoste, V., & Schuurman, I. (2013, November). The Construction of a 500-Million-Word Reference Corpus of Contemporary Written Dutch (pp. 219–247). https://doi.org/10.1007/978-3-642-30910-6_13
Poot, C., & van Cranenburgh, A. (2020). A benchmark of rule-based and neural coreference resolution in Dutch novels and news. In Proceedings of the third workshop on computational models of reference, Anaphora and coreference.
Postma, M., van Miltenburg, E., Segers, R., Schoen, A., & Vossen, P. (2016). Open Dutch WordNet. In Proceedings of the eight global wordnet conference, 300–307. Retrieved from http://wordpress.let.vupr.nl/odwn/
Pradhan, S. S., Ramshaw, L., Weischedel, R., MacBride, J., & Micciulla, L. (2007). Unrestricted coreference: Identifying entities and events in ontonotes. ICSC 2007 International Conference on Semantic Computing, 446–453. https://doi.org/10.1109/ICSC.2007.93
Pustejovsky, J., Castano, J., Ingria, R., Saurı, R., Gaizauskas, R., Setzer, A., & Katz, G. (2003). TimeML: Robust Specication of Event and Temporal Expressions in Text. New Directions in Question Answering, 3, 28–34.
Google Scholar
Quine, W. V. O. (1985). Events and reification. Actions and events: Perspectives on the philosophy of Donald Davidson, pp. 162–171.
Raghunathan, K., Lee, H., Rangarajan, S., Chambers, N., Surdeanu, M., Jurafsky, D., & Manning, C. (2010, October). A Multi-Pass Sieve for Coreference Resolution. In Proceedings of the 2010 conference on empirical methods in natural language processing (pp. 492–501). Cambridge, MA: Association for Computational Linguistics. Retrieved 2019 Feb 27, from http://www.aclweb.org/anthology/D10-1048
Rubin, V. L., Chen, Y., & Conroy, N. J. (2015). Deception detection for news: Three types of fakes: Deception detection for news: Three types of fakes. Proceedings of the Association for Information Science and Technology, 52(1), 1–4. https://doi.org/10.1002/pra2.2015.145052010083.
Article Google Scholar
Sarwar, B., Karypis, G., Konstan, J., & Riedl, J. (2001, April). Item-based collaborative filtering recommendation algorithms. In Proceedings of the 10th international conference on World Wide Web (pp. 285–295). Hong Kong, Hong Kong: Association for Computing Machinery. Retrieved 2020 Feb 26, from https://doi.org/10.1145/371920.372071
Su, M.-H., Wu, C.-H., & Shih, P.-C. (2019). Automatic ontology population using deep learning for triple extraction. In 2019 Asia-Pacific signal and information processing association annual summit and conference (apsipa asc) (pp. 262–267).
Sukthanker, R., Poria, S., Cambria, E., & Thirunavukarasu, R. (2020). Anaphora and coreference resolution: A review. Information Fusion, 59, 139–162.
Article Google Scholar
van Cranenburgh, A. (2019). A Dutch coreference resolution system with an evaluation on literary fiction. Computational Linguistics in the Netherlands Journal, 9, 27–54.
Google Scholar
Van de Kauter, M., Coorman, G., Lefever, E., Desmet, B., Macken, L., & Hoste, V. (2013). Lets preprocess: The multilingual lt3 linguistic preprocessing toolkit. Computational Linguistics in the Netherlands journal, 3, 103–120.
Google Scholar
Van de Kauter, M., Coorman, G., Lefever, E., Desmet, B., Macken, L., & Hoste, V. (2013). LeTs Preprocess: The multilingual LT3 linguistic preprocessing toolkit. Computational Linguistics in the Netherlands Journal, 3, 103–120.
Google Scholar
Van Hee, C., De Clercq, O., & Hoste, V. (2021). Exploring implicit sentiment evoked by fine-grained news events. In Workshop on computational approaches to subjectivity and sentiment analysis (wassa), held in conjunction with eacl 2021 (pp. 138–148).
van Noord, G. J. (2006). At last parsing is now operational.
Vermeulen, J. (2018). newsdna: Promoting news diversity: An interdisciplinary investigation into algorithmic design, personalization and the public interest (2018–2022).
Vilain, M., Burger, J. D., Aberdeen, J., Connolly, D., & Hirschman, L. (1995). A model-theoretic coreference scoring scheme. In Sixth message understanding conference (muc-6): Proceedings of a conference held in Columbia, Maryland, November 6–8, 1995.
Vossen, P. (2018). NewsReader at SemEval-2018 Task 5: Counting events by reasoning over event-centric-knowledge-graphs, 7.
Yan, M., Xia, J., Wu, C., Bi, B., Zhao, Z., Zhang, J., & Chen, H. (2019). A deep cascade model for multi-document reading comprehension. In Proceedings of the aaai conference on artificial intelligence (Vol. 33, pp. 7354–7361).

Download references

Author information

Authors and Affiliations

LT3, Language and Translation Technology Team, Ghent University, Groot-Brittanniëlaan 45, 9000, Ghent, Belgium
Loic De Langhe, Orphée De Clercq & Veronique Hoste

Authors

Loic De Langhe
View author publications
You can also search for this author in PubMed Google Scholar
Orphée De Clercq
View author publications
You can also search for this author in PubMed Google Scholar
Veronique Hoste
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Loic De Langhe.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: IAA scores

See Table 10.

Table 10 Averaged Cohen’s Kappa statistic for all annotator pairings

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

De Langhe, L., De Clercq, O. & Hoste, V. Constructing a cross-document event coreference corpus for Dutch. Lang Resources & Evaluation 57, 819–848 (2023). https://doi.org/10.1007/s10579-022-09597-1

Download citation

Accepted: 16 May 2022
Published: 04 June 2022
Issue Date: June 2023
DOI: https://doi.org/10.1007/s10579-022-09597-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Constructing a cross-document event coreference corpus for Dutch

Abstract

Access this article

Similar content being viewed by others

XCoref: Cross-document Coreference Resolution in the Wild

A brief survey on recent advances in coreference resolution

Exploring Coreference Uncertainty of Generically Extracted Event Mentions

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix: IAA scores

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Constructing a cross-document event coreference corpus for Dutch

Abstract

Access this article

Similar content being viewed by others

XCoref: Cross-document Coreference Resolution in the Wild

A brief survey on recent advances in coreference resolution

Exploring Coreference Uncertainty of Generically Extracted Event Mentions

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix: IAA scores

Appendix: IAA scores

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation