Semantic and Syntactic Features for Dutch Coreference Resolution

Hendrickx, Iris; Hoste, Veronique; Daelemans, Walter

doi:10.1007/978-3-540-78135-6_30

Iris Hendrickx¹,
Veronique Hoste² &
Walter Daelemans¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4919))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

1471 Accesses
5 Citations

Abstract

We investigate the effect of encoding additional semantic and syntactic information sources in a classification-based machine learning approach to the task of coreference resolution for Dutch. We experiment both with a memory-based learning approach and a maximum entropy modeling method.

As an alternative to using external lexical resources, such as the low-coverage Dutch EuroWordNet, we evaluate the effect of automatically generated semantic clusters as information source. We compare these clusters, which group together semantically similar nouns, to two semantic features based on EuroWordNet encoding synonym and hypernym relations between nouns.

The syntactic function of the anaphor and antecedent in the sentence can be an important clue for resolving coreferential relations. As baseline approach, we encode syntactic information as predicted by a memory-based shallow parser in a set of features. We contrast these shallow parse based features with features encoding richer syntactic information from a dependency parser. We show that using both the additional semantic information and syntactic information lead to small but significant performance improvement of our coreference resolution approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Rich, E., LuperFoy, S.: An architecture for anaphora resolution. In: Proceedings of the Second Conference on Applied Natural Language Processing, pp. 18–24 (1988)
Google Scholar
Mitkov, R.: Robust pronoun resolution with limited knowledge. In: Proceedings of the 17th International Conference on Computational Linguistics (COLING-1998/ACL-1998), pp. 869–875 (1998)
Google Scholar
McCarthy, J.: A Trainable Approach to Coreference Resolution for Information Extraction. PhD thesis, Department of Computer Science, University of Massachusetts, Amherst MA (1996)
Google Scholar
Soon, W., Ng, H., Lim, D.: A machine learning approach to coreference resolution of noun phrases. Computational Linguistics 27(4), 521–544 (2001)
Article Google Scholar
Ng, V., Cardie, C.: Combining sample selection and error-driven pruning for machine learning of coreference rules. In: Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP-2002), pp. 55–62 (2002)
Google Scholar
Culotta, A., et al.: First-order probabilistic models for coreference resolution. In: Proceedings of HLT/NAACL, pp. 81–88 (2007)
Google Scholar
Van de Cruys, T.: Semantic clustering in dutch. In: Proceedings of the Sixteenth Computational Linguistics in the Netherlands (CLIN), pp. 17–32 (2005)
Google Scholar
Vossen, P. (ed.): EuroWordNet: a multilingual database with lexical semantic networks. Kluwer Academic Publishers, Norwell (1998)
MATH Google Scholar
Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)
MATH Google Scholar
Poesio, M., et al.: Learning to resolve bridging references. In: Proceedings of the 42nd Meeting of the Association for Computational Linguistics (ACL 2004), pp. 143–150 (2004)
Google Scholar
Harabagiu, S., Bunescu, R., Maiorano, S.: Text and knowledge mining for coreference resolution. In: Proceedings of the 2nd Meeting of the North American Chapter of the Association of Computational Linguistics (NAACL-2001), pp. 55–62 (2001)
Google Scholar
Markert, K., Nissim, M.: Comparing knowledge sources for nominal anaphora resolution. Computational Linguistics 31(3), 367–401 (2005)
Article Google Scholar
Ponzetto, S.P., Strube, M.: Exploiting semantic role labeling, wordnet and wikipedia for coreference resolution. In: Proceedings of the Human Language Technology Conference of the NAACL, Main Conference, pp. 192–199 (2006)
Google Scholar
Ng, V., Cardie, C.: Improving machine learning approaches to coreference resolution. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL-2002), pp. 104–111 (2002)
Google Scholar
Ji, H., Westbrook, D., Grishman, R.: Using semantic relations to refine coreference decisions. In: Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, pp. 17–24 (2005)
Google Scholar
Ng, V.: Semantic class induction and coreference resolution. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, Association for Computational Linguistics, pp. 536–543 (2007)
Google Scholar
Ng, V.: Shallow semantics for coreference resolution. In: Proceedings of the Twentieth International Joint Conference on Artificial Intelligence (IJCAI-2007), pp. 1689–1694 (2007)
Google Scholar
Lin, D.: Automatic retrieval and clustering of similar words. In: COLING-ACL, pp. 768–774 (1998)
Google Scholar
Yang, X., Su, J.: Coreference resolution using semantic relatedness information from automatically discovered patterns. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pp. 528–535 (2007)
Google Scholar
Luo, X., Zitouni, I.: Multi-lingual coreference resolution with syntactic features. In: Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, pp. 660–667 (2005)
Google Scholar
Kehler, A., et al.: The (non)utility of predicate-argument frequencies for pronoun interpretation. In: Proceedings of HLT-NAACL, pp. 289–296 (2004)
Google Scholar
Yang, X., Su, J., Tan, C.L.: Kernel-based pronoun resolution with structured syntactic knowledge. In: Proceedings of the 21st International Conference on Computational Linguistics, pp. 41–48 (2006)
Google Scholar
Tjong Kim Sang, E.: Memory-based named entity recognition. In: Proceedings of CoNLL-2002, Taipei, Taiwan, pp. 203–206 (2002)
Google Scholar
Daelemans, W., et al.: Memory based tagger, version 2.0, reference guide. Technical Report ILK Technical Report - ILK 03-13, Tilburg University (2003)
Google Scholar
Tjong Kim Sang, E., Daelemans, W., Höthker, A.: Reduction of dutch sentences for automatic subtitling. In: Computational Linguistics in the Netherlands 2003, Selected Papers from the Fourteenth CLIN Meeting, pp. 109–123 (2004)
Google Scholar
Bouma, G., van Noord, G., Malouf, R.: Alpino: Wide-coverage computational analysis of dutch. In: Computational Linguistics in The Netherlands 2000 (2001)
Google Scholar
Hoste, V., de Pauw, G.: Knack-2002: a richly annotated corpus of dutch written text. In: The fifth international conference on Language Resources and Evaluation (LREC) (2006)
Google Scholar
Cover, T.M., Hart, P.E.: Nearest neighbor pattern classification. Institute of Electrical and Electronics Engineers Transactions on Information Theory 13, 21–27 (1967)
Article MATH Google Scholar
Berger, A., Della Pietra, S., Della Pietra, V.: Maximum Entropy Approach to Natural Language Processing. Computational linguistics 22(1) (1996)
Google Scholar
Daelemans, W., et al.: TiMBL: Tilburg Memory Based Learner, version 5.1, reference manual. Technical Report ILK-0402, ILK, Tilburg University (2004)
Google Scholar
Le, Z.: Maximum Entropy Modeling Toolkit for Python and C++. Natural Language Processing Lab, Northeastern University, China (2004)
Google Scholar
Daelemans, W., et al.: Combined optimization of feature selection and algorithm parameter interaction in machine learning of language. In: Proceedings of the 14th European Conference on Machine Learning (ECML-2003), pp. 84–95 (2003)
Google Scholar
Vilain, M., et al.: A model-theoretic coreference scoring scheme. In: Proceedings of the Sixth Message Understanding Conference (MUC-6), pp. 45–52 (1995)
Google Scholar

Download references

Author information

Authors and Affiliations

CNTS - Language Technology Group, University of Antwerp, prinsstraat 13, Antwerp, Belgium
Iris Hendrickx & Walter Daelemans
LT3 - Language and Translation Technology Team, University College Ghent, Groot-Brittaniëlaan 45, Ghent, Belgium
Veronique Hoste

Authors

Iris Hendrickx
View author publications
You can also search for this author in PubMed Google Scholar
Veronique Hoste
View author publications
You can also search for this author in PubMed Google Scholar
Walter Daelemans
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hendrickx, I., Hoste, V., Daelemans, W. (2008). Semantic and Syntactic Features for Dutch Coreference Resolution. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2008. Lecture Notes in Computer Science, vol 4919. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78135-6_30

Download citation

DOI: https://doi.org/10.1007/978-3-540-78135-6_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78134-9
Online ISBN: 978-3-540-78135-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics