Language Resources and Evaluation

, Volume 47, Issue 3, pp 661–694 | Cite as

Coreference resolution: an empirical study based on SemEval-2010 shared Task 1

Article

Abstract

This paper presents an empirical evaluation of coreference resolution that covers several interrelated dimensions. The main goal is to complete the comparative analysis from the SemEval-2010 task on Coreference Resolution in Multiple Languages. To do so, the study restricts the number of languages and systems involved, but extends and deepens the analysis of the system outputs, including a more qualitative discussion. The paper compares three automatic coreference resolution systems for three languages (English, Catalan and Spanish) in four evaluation settings, and using four evaluation measures. Given that our main goal is not to provide a comparison between resolution algorithms, these are merely used as tools to shed light on the different conditions under which coreference resolution is evaluated. Although the dimensions are strongly interdependent, making it very difficult to extract general principles, the study reveals a series of interesting issues in relation to coreference resolution: the portability of systems across languages, the influence of the type and quality of input annotations, and the behavior of the scoring measures.

Keywords

Coreference resolution and evaluation NLP system analysis Machine learning based NLP tools SemEval-2010 (Task 1) Discourse entities 

References

  1. Abad, A., Bentivogli, L., Dagan, I., Giampiccolo, D., Mirkin, S., Pianta, E., et al. (2010). A resource for investigating the impact of anaphora and coreference on inference. In Proceedings of the 7th conference on language resources and evaluation (LREC 2010) (pp. 128–135). Valletta, Malta.Google Scholar
  2. Azzam, S., Humphreys, K., & Gaizauskas, R. (1999). Using coreference chains for text summarization. In Proceedings of the ACL workshop on coreference and its applications (pp. 77–84). Baltimore, Maryland,Google Scholar
  3. Bagga, A., & Baldwin, B. (1998). Algorithms for scoring coreference chains. In Proceedings of the linguistic coreference workshop at LREC 98 (pp. 563–566). Granada, Spain.Google Scholar
  4. Bengtson, E., & Roth, D. (2008). Understanding the value of features for coreference resolution. In Proceedings of the conference on empirical methods in natural language processing (EMNLP 2008) (pp. 294–303). Honolulu, USA.Google Scholar
  5. Cai, J., & Strube, M. (2010). Evaluation metrics for end-to-end coreference resolution systems. In Proceedings of the annual SIGdial meeting on discourse and dialogue (SIGDIAL 2010) (pp. 28–36). Tokyo, Japan.Google Scholar
  6. Chambers, N., & Jurafsky, D. (2008). Unsupervised learning of narrative event chains. In Proceedings of the 46th annual meeting of the association for computational linguistics (ACL-HLT 2008) (pp. 789–797). Columbus, USA.Google Scholar
  7. Civit, M., & Martí, M. A. (2005). Building Cast3LB: A Spanish treebank. Research on Language and Computation, 2(4), 549–574.CrossRefGoogle Scholar
  8. Daelemans, W., Buchholz, S., & Veenstra, J. (1999). Memory-based shallow parsing. In Proceedings of the conference on natural language learning (CoNLL 1999) (pp. 53–60). Bergen, Norway.Google Scholar
  9. Daumé, H., & Marcu, D. (2005). A large-scale exploration of effective global features for a joint entity detection and tracking model. In Proceedings of human language technology conference and conference on empirical methods in natural language processing (HLT-EMNLP 2005) (pp. 97–104) Vancouver, Canada.Google Scholar
  10. Denis, P., & Baldridge, J. (2009). Global joint models for coreference resolution and named entity classification. Procesamiento del Lenguaje Natural, 42, 87–96.Google Scholar
  11. Doddington, G., Mitchell, A., Przybocki, M., Ramshaw, L., Strassel, S., & Weischedel, R. (2004). The automatic content extraction (ACE) program—tasks, data, and evaluation. In Proceedings of the 4th conference on language resources and evaluation (LREC 2004) (pp. 837–840). Lisbon, Portugal.Google Scholar
  12. Finkel, J., & Manning, C. (2008). Enforcing transitivity in coreference resolution. In Proceedings of the 46th annual meeting of the association for computational linguistics (ACL-HLT 2008) (pp. 45–48). Columbus, USA.Google Scholar
  13. Gerber, M., & Chai, J. Y. (2010). Beyond NomBank: A study of implicit arguments for nominal predicates. In Proceedings of the 48th annual meeting of the association for computational linguistics (ACL 2010) (pp. 1583–1592). Uppsala, Sweden.Google Scholar
  14. Heim, I. (1983). File change semantics and the familiarity theory of definiteness. In R. BŠuerle, C. Schwarze, & A. von Stechow (Eds.), Meaning, use, and interpretation of language (pp. 164–189). Berlin, Germany: Mouton de Gruyter.Google Scholar
  15. Hirschman, L., & Chinchor, N. (1997). MUC-7 coreference task definition—version 3.0. In Proceedings of the 7th message understanding conference (MUC-7), Fairfax, USA.Google Scholar
  16. Hummel, R. A., & Zucker, S. W. (1987). On the foundations of relaxation labeling processes. In M. A. Fischler, & O. Firschein (Eds.), Readings in computer vision: Issues, problems, principles, and paradigms (pp. 585–605). San Francisco, USA: Morgan Kaufmann Publishers Inc.Google Scholar
  17. Lundquist, L. (2007). Lexical anaphors in Danish and French. In M. Schwarz-Friesel, M. Consten, & M. Knees (Eds.), Anaphors in text: Cognitive, formal and applied approaches to anaphoric reference (pp. 25–32). Amsterdam, Netherlands: John Benjamins.Google Scholar
  18. Luo, X. (2005). On coreference resolution performance metrics. In Proceedings of the joint conference on human language technology and empirical methods in natural language processing (HLT-EMNLP 2005 (pp. 37–48). Vancouver, Canada.Google Scholar
  19. Luo, X., Ittycheriah, A., Jing, H., Kambhatla, N., & Roukos, S. (2004). A mention-synchronous coreference resolution algorithm based on the bell tree. In Proceedings of the 42th annual meeting of the association for computational linguistics (ACL 2004) (pp. 21–26). Barcelona, Spain.Google Scholar
  20. McCarthy, J. F., & Lehnert, W. G. (1995). Using decision trees for coreference resolution. In Proceedings of the 1995 international joint conference on AI (IJCAI 1995) (pp. 1050–1055) Montreal, Canada.Google Scholar
  21. Mirkin, S., Berant, J., Dagan, I., & Shnarch, E. (2010). Recognising entailment within discourse. In Proceedings of the 23rd international conference on computational linguistics (COLING 2010) (pp. 770–778). Beijing, China.Google Scholar
  22. Morton, T. S. (1999). Using coreference in question answering. In Proceedings of the 8th Text REtrieval Conference (TREC-8) (pp. 85–89).Google Scholar
  23. Ng, V. (2010). Supervised noun phrase coreference research: the first fifteen years. In Proceedings of the 48th annual meeting of the association for computational linguistics (ACL 2010) (pp. 1396–1411). Uppsala, Sweden.Google Scholar
  24. Ng, V., & Cardie, C. (2002). Improving machine learning approaches to coreference resolution. In Proceedings of the 40th annual meeting of the association for computational linguistics (ACL 2002) (pp. 104–111). Philadelphia, USA.Google Scholar
  25. Nicolov, N., Salvetti, F., & Ivanova, S. (2008). Sentiment analysis: Does coreference matter? In Proceedings of the symposium on affective language in human and machine (pp. 37–40). Aberdeen, UK.Google Scholar
  26. Orasan, C., Cristea, D., Mitkov, R., & Branco, A. (2008). Anaphora resolution exercise: An overview. In Proceedings of the 6th conference on language resources and evaluation (LREC 2008) (pp. 28–30). Marrakech, Morocco.Google Scholar
  27. Padró, L. (1998). A hybrid environment for syntax–semantic tagging. PhD thesis, Dep. Llenguatges i Sistemes Informaics. Barcelona, Spain: Universitat Politècnica de Catalunya.Google Scholar
  28. Poon, H., Christensen, J., Domingos, P., Etzioni, O., Hoffmann, R., Kiddon, C., et al. (2010). Machine reading at the University of Washington. In Proceedings of the NAACL-HLT first international workshop on formalisms and methodology for learning by reading (pp. 87–95). Los Angeles, USA.Google Scholar
  29. Popescu, A., & Etzioni, O. (2005). Extracting product features and opinions from reviews. In Proceedings of the conference on human language technology and empirical methods in natural language processing (HLT-EMNLP 2005) (pp. 339–346). Vancouver, Canada.Google Scholar
  30. Popescu-Belis, A., Robba, I., & Sabah, G. (1998). Reference resolution beyond coreference: a conceptual frame and its application. In: Proceedings of the 36th annual meeting of the association for computational linguistics joint with the international conference on computational linguistics (COLING-ACL 1998) (pp. 1046–1052). Montreal, Canada.Google Scholar
  31. Pradhan, S., Hovy, E., Marcus, M., Palmer, M., Ramshaw, L., & Weischedel, R. (2007). OntoNotes: A unified relational semantic representation. In Proceedings of the international conference on semantic computing (ICSC 2007) (pp. 517–526). Irvine, USA.Google Scholar
  32. Pradhan, S., Ramshaw, L., Marcus, M., Palmer, M., Weischedel, R., & Xue, N. (2011). CoNLL-2011 shared task: Modeling unrestricted coreference in OntoNotes. In Proceedings of the conference on natural language learning (CoNLL 2011) (pp. 1–27). Shared Task, Portland, USA.Google Scholar
  33. Quinlan, J. (1993). C4.5: Programs for machine learning. MA, USA: Morgan Kaufmann.Google Scholar
  34. Rahman, A., & Ng, V. (2009). Supervised models for coreference resolution. In Proceedings of the conference on empirical methods in natural language processing (EMNLP 2009) (pp. 968–977). Suntec, Singapore.Google Scholar
  35. Rand, W. M. (1971). Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association, 66(336), 846–850.CrossRefGoogle Scholar
  36. Recasens, M. (2010). Coreference: Theory, annotation, resolution and evaluation. PhD thesis, University of Barcelona, Barcelona, Spain.Google Scholar
  37. Recasens, M., & Hovy, E. (2009). A deeper look into features for coreference resolution. In S. L. Devi, A. Branco, & R. Mitkov. (Eds.), Anaphora processing and applications (DAARC 2009) (Vol. 5847, pp. 29–42). Berlin, Germany, LNAI: Springer.CrossRefGoogle Scholar
  38. Recasens, M., & Hovy, E. (2011). BLANC: Implementing the rand index for coreference evaluation. Natural Language Engineering, 17(4), 485–510.CrossRefGoogle Scholar
  39. Recasens, M., & Martí, M. A. (2010). AnCora-CO: Coreferentially annotated corpora for Spanish and Catalan. Language Resources and Evaluation, 44(4), 315–345.CrossRefGoogle Scholar
  40. Recasens, M., Màrquez, L., Sapena, E., Martí, M. A., Taulé, M., Hoste, V., et al. (2010). Semeval-2010 task 1: Coreference resolution in multiple languages. In Proceedings of the 5th international workshop on semantic evaluation (SemEval 2010) (pp. 1–8). Uppsala, Sweden.Google Scholar
  41. Ruppenhofer, J., Sporleder, C., & Morante, R. (2010). SemEval-2010 Task 10: Linking events and their participants in discourse. In Proceedings of the 5th international workshop on semantic evaluation (SemEval 2010) (pp. 45–50). Uppsala, Sweden.Google Scholar
  42. Sapena, E., Padró, L., & Turmo, J. (2010a). A global relaxation labeling approach to coreference resolution. In Proceedings of 23rd international conference on computational linguistics (COLING 2010) (pp. 1086–1094). Beijing, China.Google Scholar
  43. Sapena, E., Padró, L., & Turmo, J. (2010b). Relaxcor: A global relaxation labeling approach to coreference resolution. In Proceedings of the ACL workshop on semantic evaluations (SemEval-2010) (pp. 88–91). Uppsala, Sweden.Google Scholar
  44. Soon, W. M., Ng, H. T., & Lim, D. C. Y. (2001). A machine learning approach to coreference resolution of noun phrases. Computational Linguistics, 27(4), 521–544.CrossRefGoogle Scholar
  45. Steinberger, J., Poesio, M., Kabadjov, M. A., & Jeek, K. (2007). Two uses of anaphora resolution in summarization. Information Processing and Management: An International Journal, 43(6), 1663–1680.CrossRefGoogle Scholar
  46. Stoyanov, V., Gilbert, N., Cardie, C., & Riloff, E. (2009). Conundrums in noun phrase coreference resolution: Making sense of the state-of-the-art. In Proceedings of the 47th annual meeting of the association for computational linguistics and the 4th international joint conference on natural language processing (ACL-IJCNLP 2009) (pp. 656–664). Suntec, Singapore.Google Scholar
  47. Stoyanov, V., Cardie, C., Gilbert, N., Riloff, E., Buttler, D., & Hysom, D. (2010). Coreference resolution with Reconcile. In Proceedings of the 48th annual meeting of the association for computational linguistics (ACL 2010) (pp. 156–161) Uppsala, Sweden.Google Scholar
  48. Versley, Y., Ponzetto, S., Poesio, M., Eidelman, V., Jern, A., Smith, J., et al. (2008). BART: A modular toolkit for coreference resolution. In: Proceedings of the 6th conference on language resources and evaluation (LREC 2008) (pp. 962–965). Marrakech, Morocco.Google Scholar
  49. Vicedo, J. L., & Ferrández, A. (2006). Coreference in Q&A. In T. Strzalkowski & S. Harabagiu (Eds.), Advances in open domain question answering, text, speech and language technology (Vol. 32, pp. 71–96). Berlin, Germany: Springer.CrossRefGoogle Scholar
  50. Vilain, M., Burger, J., Aberdeen, J., Connolly, D., & Hirschman, L. (1995). A model-theoretic coreference scoring scheme. In Proceedings of the 6th message understanding conference (MUC-6) (pp. 45–52).Google Scholar
  51. Wick, M., Culotta, A., Rohanimanesh, K., & McCallum, A. (2009). An entity based model for coreference resolution. In Proceedings of the SIAM data mining conference (SDM 2009) (pp. 365–376). Reno, USA.Google Scholar

Copyright information

© Springer Science+Business Media B.V. 2012

Authors and Affiliations

  1. 1.Departament de Llenguatges i Sistemes Informàtics, TALP Research CenterUniversitat Politècnica de CatalunyaBarcelonaSpain
  2. 2.Departament de Lingüística, CLiC Research CenterUniversitat de BarcelonaBarcelonaSpain

Personalised recommendations