Advertisement

Can Projected Chains in Parallel Corpora Help Coreference Resolution?

  • José Guilherme Camargo de Souza
  • Constantin Orăsan
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7099)

Abstract

The majority of current coreference resolution systems rely on annotated corpora to train classifiers for this task. However, this is possible only for languages for which annotated corpora are available. This paper presents a system that automatically extracts coreference chains from texts in Portuguese without the need for Portuguese corpora manually annotated with coreferential information. To achieve this, an English coreference resolver is run on the English part of an English-Portuguese parallel corpus. The coreference pairs identified by the resolver are projected to the Portuguese part of the corpus using automatic word alignment. These projected pairs are then used to train the coreference resolver for Portuguese. Evaluation of the system reveals that it does not outperform a head match baseline. This is due to the fact that most of the projected pairs have the same head, which is learnt by the Portuguese classifier. This suggests that a more accurate English coreference resolver is necessary. A better projection algorithm is also likely to improve the performance of the system.

Keywords

coreference resolution parallel corpus machine learning 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aone, C., Bennett, S.W.: Evaluating automated and manual acquisition of anaphora resolution strategies. In: The 33rd Annual Meeting on Association for Computational Linguistics, pp. 122–129 (1995)Google Scholar
  2. 2.
    Bentivogli, L., Pianta, E.: Exploiting parallel texts in the creation of multilingual semantically annotated resources: the MultiSemCor Corpus. Natural Language Engineering 11(03), 247 (2005), http://www.journals.cambridge.org/abstract_S1351324905003839 CrossRefGoogle Scholar
  3. 3.
    Bick, E.: The parsing system PALAVRAS: automatic grammatical analysis of Portuguese in a constraint grammar framework. Phd, Arhus (2000)Google Scholar
  4. 4.
    Caseli, H.D.M.: Alinhamento sentencial de textos paralelos português-inglês. Master thesis, USP (2002), http://www2.dc.ufscar.br/~helenacaseli/pdf/2002/QualiMestrado.pdf
  5. 5.
    Chaves, A., Rino, L.: The Mitkov Algorithm for Anaphora Resolution in Portuguese. In: The 8th International Conference on Computational Processing of the Portuguese Language, p. 60 (2008)Google Scholar
  6. 6.
    Cohen, W.: Fast effective rule induction. In: 12th International Workshop Conference on Machine Learning, pp. 115–123. Morgan Kaufmann Publishers, Inc. (1995)Google Scholar
  7. 7.
    Collovini, S., Carbonel, T.I., Fuchs, J.T., Vieira, R.: Summ-it: Um corpus anotado com informacoes discursivas visando à sumarizacao automática. In: TIL - V Workshop em Tecnologia da Informação e da Linguagem Humana, Rio de Janeiro, pp. 1605–1614 (2007)Google Scholar
  8. 8.
    Collovini, S., Vieira, R.: Learning Discourse-new References in Portuguese Texts. In: TIL 2006, pp. 267–276 (2006)Google Scholar
  9. 9.
    Cuevas, R., Paraboni, I.: A Machine Learning Approach to Portuguese Pronoun Resolution. In: The 11th Ibero-American Conference on AI: Advances in Artificial Intelligence, pp. 262–271 (2008)Google Scholar
  10. 10.
    de Souza, J., Orăsan, C.: Coreference resolution for Portuguese using parallel corpora word alignment. In: The International Conference on Knowledge Engineering, Principles and Techniques (KEPT 2011), Cluj-Napoca, Romania (July 2011)Google Scholar
  11. 11.
    Hoste, V., Pauw, G.D.: KNACK-2002: a Richly Annotated Corpus of Dutch Written Text. In: The Fifth International Conference on Language Resources and Evaluation, pp. 1432–1437. ELRA (2006)Google Scholar
  12. 12.
    Konstantinova, N., Orăsan, C.: Issues in topic tracking in wikipedia articles. In: The International Conference on Knowledge Engineering, Principles and Techniques (KEPT 2011), Cluj-Napoca, Romania, July 4-6 (2011)Google Scholar
  13. 13.
    Luo, X.: On coreference resolution performance metrics. In: The Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 25–32 (2005)Google Scholar
  14. 14.
    McCarthy, J.F., Lehnert, W.G.: Using Decision Trees for Coreference Resolution. In: International Joint Conference on Artificial Intelligence, pp. 1050–1055 (1995)Google Scholar
  15. 15.
    Mitkov, R., Barbu, C.: Using bilingual corpora to improve pronoun resolution. Languages in contrast 4(2), 201–212 (2004)CrossRefGoogle Scholar
  16. 16.
    Ng, V.: Graph-Cut-Based Anaphoricity Determination for Coreference Resolution. In: NAACL 2009: Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 575–583. Association for Computational Linguistics, Boulder (2009)Google Scholar
  17. 17.
    Ng, V.: Supervised Noun Phrase Coreference Research: The First Fifteen Years. In: ACL 2010, pp. 1396–1411 (July 2010)Google Scholar
  18. 18.
    Och, F.J., Ney, H.: A Systematic Comparison of Various Statistical Alignment Models. Computational Linguistics 29(1), 19–51 (2003)CrossRefzbMATHGoogle Scholar
  19. 19.
    Padó, S., Lapata, M.: Cross-lingual annotation projection of semantic roles. J. Artificial Intelligence Research. 36, 307–340 (2009)zbMATHGoogle Scholar
  20. 20.
    Paraboni, I., Lima, V.L.S.D.: Possessive Pronominal Anaphor Resolution in Portuguese Written Texts - Project Notes. In: 17th International Conference on Computational Linguistics (COLING 1998), pp. 1010–1014. Morgan Kaufmann Publishers, Montreal (1998)Google Scholar
  21. 21.
    Postolache, O., Cristea, D., Orăsan, C.: Transferring Coreference Chains through Word Alignment. In: The 5th International Conference on Language Resources and Evaluation, Genoa, Italy (2006)Google Scholar
  22. 22.
    Recasens, M., Hovy, E.: A deeper look into features for coreference resolution. Anaphora Processing and Applications (i), 29–42 (2009)Google Scholar
  23. 23.
    Recasens, M., Martí, M.A.: AnCora-CO: Coreferentially annotated corpora for Spanish and Catalan. Language Resources and Evaluation 44(4), 341–345 (2009)Google Scholar
  24. 24.
    Soon, W.M., Ng, H.T., Lim, D.C.Y.: A Machine Learning Approach to Coreference Resolution of Noun Phrases. Computational Linguistics 27(4), 521–544 (2001)CrossRefGoogle Scholar
  25. 25.
    de Souza, J.G.C., Gonçalves, P.N., Vieira, R.: Learning Coreference Resolution for Portuguese Texts. In: Teixeira, A., de Lima, V.L.S., de Oliveira, L.C., Quaresma, P. (eds.) PROPOR 2008. LNCS (LNAI), vol. 5190, pp. 153–162. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  26. 26.
    Stoyanov, V., Cardie, C., Gilbert, N., Buttler, D.: Coreference Resolution with Reconcile. In: The Joint Conference of the 48th Annual Meeting of the Association for Computational Linguistics (ACL 2010). Association for Computational Linguistics (2010)Google Scholar
  27. 27.
    Vilain, M., Burger, J., Aberdeen, J., Connolly, D.: A model-theoretic coreference scoring scheme. In: The 6th Conference on Message Understanding, pp. 45–52 (1995)Google Scholar
  28. 28.
    Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. The Morgan Kaufmann Series in Data Management Systems. Morgan Kaufmann (2005)Google Scholar
  29. 29.
    Yarowsky, D., Ngai, G.: Inducing multilingual POS taggers and NP bracketers via robust projection across aligned corpora. In: Second Meeting of the North American Chapter of the Association for Computational Linguistics on Language Technologies 2001, pp. 1–8. Association for Computational Linguistics, Pittsburgh (2001)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • José Guilherme Camargo de Souza
    • 1
  • Constantin Orăsan
    • 1
  1. 1.Research Group in Computational LinguisticsUniversity of WolverhamptonWolverhamptonUK

Personalised recommendations