Skip to main content

Extracting Anaphoric Agreement Properties from Corpora

  • Chapter
  • First Online:

Abstract

Anaphora resolution algorithms have long made use of the reliable agreement between pronouns and their antecedents in properties such as gender and number. To apply constraints or preferences for anaphoric agreement, real systems need ways to automatically determine these properties for arbitrary noun phrases, in context. This chapter describes a variety of algorithms for extracting noun gender and number, ranging from simple heuristics to large-scale machine learning approaches. We describe the drawbacks and advantages of the different algorithms, focusing mostly on English anaphora resolution. We pay special attention to recent methods for extracting agreement information directly from large volumes of raw text.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    See chapter “Linguistic and Cognitive Evidence About Anaphora” of this book for further discussion on the role of agreement information in human anaphora interpretation.

  2. 2.

    http://bllip.cs.brown.edu/download/emPronoun.tar.gz

  3. 3.

    Their model’s complexity only allowed training on a very small fraction of the total number of articles in Wikipedia. It would be interesting to assess the feasibility of using the resolution-to-article-topic heuristic on its own to learn a gender/number model from all of Wikipedia.

References

  1. Amaral, C., Cassan, A., Figueira, H., Martins, A., Mendes, A., Mendes, P., Pinto, C., Vidal, D.: Priberam’s question answering system in QA@CLEF 2007. In: Cross Language Evaluation Forum: Working Notes for the CLEF 2007 Workshop, Budapest (2007)

    Google Scholar 

  2. Arnold, J., Eisenband, J., Brown-Schmidt, S., Trueswell, J.: The rapid use of gender information: evidence of the time course of pronoun resolution from eyetracking. Cognition 76 (1), B13–B26 (2000)

    Article  Google Scholar 

  3. Baldwin, B.: CogNIAC: high precision coreference with limited knowledge and linguistic resources. In: Proceedings of the ACL Workshop on Operational Factors in Practical, Robust Anaphora Resolution for Unrestricted Texts, Madrid (1997)

    Book  Google Scholar 

  4. Barbu, C., Evans, R., Mitkov, R.: A corpus based investigation of morphological disagreement in anaphoric relations. In: LREC, Las Palmas (2002)

    Google Scholar 

  5. Bengtson, E., Roth, D.: Understanding the value of features for coreference resolution. In: EMNLP, Waikiki (2008)

    Book  Google Scholar 

  6. Berg-Kirkpatrick, T., Bouchard-Côté, A., DeNero, J., Klein, D.: Painless unsupervised learning with features. In: NAACL, Los Angeles (2010)

    Google Scholar 

  7. Bergsma, S.: Automatic acquisition of gender information for anaphora resolution. In: Proceedings of the 18th Conference of the Canadian Society for Computational Studies of Intelligence (Canadian AI), Victoria (2005)

    Google Scholar 

  8. Bergsma, S., Lin, D.: Bootstrapping path-based pronoun resolution. In: COLING-ACL, Sydney (2006)

    Book  Google Scholar 

  9. Bergsma, S., Lin, D., Goebel, R.: Glen, Glenda or Glendale: unsupervised and semi-supervised learning of English noun gender. In: CoNLL, Boulder (2009)

    Book  Google Scholar 

  10. Brennan, S.E., Friedman, M.W., Pollard, C.J.: A centering approach to pronouns. In: ACL, Stanford (1987)

    Book  Google Scholar 

  11. Byron, D.K., Tetreault, J.R.: A flexible architecture for reference resolution. In: EACL, Bergen (1999)

    Book  Google Scholar 

  12. Cardie, C., Wagstaff, K.: Noun phrase coreference as clustering. In: EMNLP-VLC, College Park (1999)

    Google Scholar 

  13. Charniak, E., Elsner, M.: EM works for pronoun anaphora resolution. In: EACL, Athens (2009)

    Book  Google Scholar 

  14. Cherry, C., Bergsma, S.: An Expectation Maximization approach to pronoun resolution. In: CoNLL, Ann Arbor (2005)

    Book  Google Scholar 

  15. Church, K.W., Mercer, R.L.: Introduction to the special issue on computational linguistics using large corpora. Comput. Linguist. 19 (1), 1–24 (1993)

    Google Scholar 

  16. Cucerzan, S., Yarowsky, D.: Minimally supervised induction of grammatical gender. In: NAACL, Edmonton (2003)

    Book  Google Scholar 

  17. Daumé III, H., Marcu, D.: A large-scale exploration of effective global features for a joint entity detection and tracking model. In: HLT-EMNLP, Vancouver (2005)

    Book  Google Scholar 

  18. Denber, M.: Automatic resolution of anaphora in English. Technical report, Imaging Science Division, Eastman Kodak Co. (1998)

    Google Scholar 

  19. Elsner, M., Charniak, E., Johnson, M.: Structured generative models for unsupervised named-entity clustering. In: HLT-NAACL, Boulder (2009)

    Book  Google Scholar 

  20. Evans, R., Orăsan, C.: Improving anaphora resolution by identifying animate entities in texts. In: DAARC, Lancaster (2000)

    Google Scholar 

  21. Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database. MIT, Cambridge (1998)

    MATH  Google Scholar 

  22. Filippova, K., Strube, M.: Using linguistically motivated features for paragraph boundary detection. In: EMNLP, Sydney (2006)

    Google Scholar 

  23. Ge, N., Hale, J., Charniak, E.: A statistical approach to anaphora resolution. In: Proceedings of the Sixth Workshop on Very Large Corpora, Montreal (1998)

    Google Scholar 

  24. Haegeman, L.: Introduction to Government & Binding theory, 2nd edn. Basil Blackwell, Cambridge (1994)

    Google Scholar 

  25. Haghighi, A., Klein, D.: Unsupervised coreference resolution in a nonparametric Bayesian model. In: ACL, Prague (2007)

    Google Scholar 

  26. Haghighi, A., Klein, D.: Simple coreference resolution with rich syntactic and semantic features. In: EMNLP, Singapore (2009)

    Book  Google Scholar 

  27. Haghighi, A., Klein, D.: Coreference resolution in a modular, entity-centered model. In: HLT-NAACL, Los Angeles (2010)

    Google Scholar 

  28. Hajič, J., Hladká, B.: Probabilistic and rule-based tagger of an inflective language: a comparison. In: ANLP, Washington DC (1997)

    Book  Google Scholar 

  29. Hale, J., Charniak, E.: Getting useful gender statistics from English text. Technical report: CS-98-06, Brown University (1998)

    Google Scholar 

  30. Harabagiu, S., Bunescu, R., Maiorano, S.: Text and knowledge mining for coreference resolution. In: NAACL, Pittsburgh (2001)

    Book  Google Scholar 

  31. Hobbs, J.: Resolving pronoun references. Lingua 44 (311), 311–338 (1978)

    Article  Google Scholar 

  32. Ji, H., Lin, D.: Gender and animacy knowledge discovery from web-scale N-grams for unsupervised person mention detection. In: PACLIC, Hong Kong (2009)

    Google Scholar 

  33. Kennedy, C., Boguraev, B.: Anaphora for everyone: pronominal anaphora resolution without a parser. In: COLING, Copenhagen (1996)

    Book  Google Scholar 

  34. Lappin, S., Leass, H.J.: An algorithm for pronominal anaphora resolution. Comput. Linguist. 20 (4), (1994)

    Google Scholar 

  35. Lee, H., Chang, A., Peirsman, Y., Chambers, N., Surdeanu, M., Jurafsky, D.: Deterministic coreference resolution based on entity-centric, precision-ranked rules. Comput. Linguist. 39 (4), 885–916 (2013)

    Article  Google Scholar 

  36. Lin, D., Church, K., Ji, H., Sekine, S., Yarowsky, D., Bergsma, S., Patil, K., Pitler, E., Lathbury, R., Rao, V., Dalwani, K., Narsale, S.: New tools for web-scale N-grams. In: LREC, Valletta (2010)

    Google Scholar 

  37. Mikheev, A., Grover, C., Moens, M.: Description of the LTG system used for MUC-7. In: 7th Message Understanding Conference, Fairfax (1998)

    Google Scholar 

  38. Miller, G.A.: Nouns in WordNet: a lexical inheritance system. Int. J. Lexicogr. 3 (4), 245–264 (1990)

    Article  Google Scholar 

  39. Miltsakaki, E.: Antelogue: pronoun resolution for text and dialogue. In: Coling 2010: Demonstrations, Beijing (2010)

    Google Scholar 

  40. Mitkov, R.: Factors in anaphora resolution: they are not the only things that matter. a case study based on two different approaches. In: ACL/EACL Workshop on Operational Factors in Practical, Robust Anaphora Resolution, Madrid (1997)

    Google Scholar 

  41. Mitkov, R.: Robust pronoun resolution with limited knowledge. In: ACL-COLING, Montreal (1998)

    Google Scholar 

  42. MUC-6: Coreference task definition (v2.3, 8 Sept 1995). In: Proceedings of the Sixth Message Understanding Conference (MUC-6), Columbia (1995)

    Google Scholar 

  43. MUC-7: Coreference task definition (v3.0, 13 July 1997). In: Proceedings of the Seventh Message Understanding Conference (MUC-7), New York (1997)

    Google Scholar 

  44. Ng, V., Cardie, C.: Improving machine learning approaches to coreference resolution. In: ACL, Philadephia (2002)

    Google Scholar 

  45. Orăsan, C., Evans, R.: NP animacy identification for anaphora resolution. JAIR 29 (1), 79–103 (2007)

    MATH  Google Scholar 

  46. Øvrelid, L.: Towards robust animacy classification using morphosyntactic distributional features. In: EACL Student Research Workshop, Trento (2006)

    Book  Google Scholar 

  47. Pantel, P., Ravichandran, D.: Automatically labeling semantic classes. In: HLT-NAACL, Boston (2004)

    Google Scholar 

  48. Pradhan, S., Ramshaw, L., Marcus, M., Palmer, M., Weischedel, R., Xue, N.: Conll-2011 shared task: modeling unrestricted coreference in ontonotes. In: Proceedings of the Fifteenth Conference on Computational Natural Language Learning: Shared Task, Portland, pp. 1–27 (2011)

    Google Scholar 

  49. Raghunathan, K., Lee, H., Rangarajan, S., Chambers, N., Surdeanu, M., Jurafsky, D., Manning, C.: A multi-pass sieve for coreference resolution. In: EMNLP, Portland (2010)

    Google Scholar 

  50. Soon, W.M., Ng, H.T., Lim, D.C.Y.: A machine learning approach to coreference resolution of noun phrases. Comput. Linguist. 27 (4), 521–544 (2001)

    Article  Google Scholar 

  51. Stuckardt, R.: Design and enhanced evaluation of a robust anaphor resolution algorithm. Comput. Linguist. 27 (4), 479–506 (2001)

    Article  Google Scholar 

  52. Tetreault, J.R.: A corpus-based evaluation of centering and pronoun resolution. Comput. Linguist. 27 (4), 507–520 (2001)

    Article  Google Scholar 

  53. Zaenen, A., Carletta, J., Garretson, G., Bresnan, J., Koontz-Garboden, A., Nikitina, T., O’Connor, M.C., Wasow, T.: Animacy encoding in English: why and how. In: ACL Workshop on Discourse Annotation, Barcelona (2004)

    Book  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shane Bergsma .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Bergsma, S. (2016). Extracting Anaphoric Agreement Properties from Corpora. In: Poesio, M., Stuckardt, R., Versley, Y. (eds) Anaphora Resolution. Theory and Applications of Natural Language Processing. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-47909-4_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-47909-4_12

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-47908-7

  • Online ISBN: 978-3-662-47909-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics