Extracting Anaphoric Agreement Properties from Corpora

Bergsma, Shane

doi:10.1007/978-3-662-47909-4_12

Extracting Anaphoric Agreement Properties from Corpora

Shane Bergsma⁷

Chapter
First Online: 05 August 2016

1079 Accesses
1 Citations

Part of the book series: Theory and Applications of Natural Language Processing ((NLP))

Abstract

Anaphora resolution algorithms have long made use of the reliable agreement between pronouns and their antecedents in properties such as gender and number. To apply constraints or preferences for anaphoric agreement, real systems need ways to automatically determine these properties for arbitrary noun phrases, in context. This chapter describes a variety of algorithms for extracting noun gender and number, ranging from simple heuristics to large-scale machine learning approaches. We describe the drawbacks and advantages of the different algorithms, focusing mostly on English anaphora resolution. We pay special attention to recent methods for extracting agreement information directly from large volumes of raw text.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
See chapter “Linguistic and Cognitive Evidence About Anaphora” of this book for further discussion on the role of agreement information in human anaphora interpretation.
2.
http://bllip.cs.brown.edu/download/emPronoun.tar.gz
3.
Their model’s complexity only allowed training on a very small fraction of the total number of articles in Wikipedia. It would be interesting to assess the feasibility of using the resolution-to-article-topic heuristic on its own to learn a gender/number model from all of Wikipedia.

References

Amaral, C., Cassan, A., Figueira, H., Martins, A., Mendes, A., Mendes, P., Pinto, C., Vidal, D.: Priberam’s question answering system in QA@CLEF 2007. In: Cross Language Evaluation Forum: Working Notes for the CLEF 2007 Workshop, Budapest (2007)
Google Scholar
Arnold, J., Eisenband, J., Brown-Schmidt, S., Trueswell, J.: The rapid use of gender information: evidence of the time course of pronoun resolution from eyetracking. Cognition 76 (1), B13–B26 (2000)
Article Google Scholar
Baldwin, B.: CogNIAC: high precision coreference with limited knowledge and linguistic resources. In: Proceedings of the ACL Workshop on Operational Factors in Practical, Robust Anaphora Resolution for Unrestricted Texts, Madrid (1997)
Book Google Scholar
Barbu, C., Evans, R., Mitkov, R.: A corpus based investigation of morphological disagreement in anaphoric relations. In: LREC, Las Palmas (2002)
Google Scholar
Bengtson, E., Roth, D.: Understanding the value of features for coreference resolution. In: EMNLP, Waikiki (2008)
Book Google Scholar
Berg-Kirkpatrick, T., Bouchard-Côté, A., DeNero, J., Klein, D.: Painless unsupervised learning with features. In: NAACL, Los Angeles (2010)
Google Scholar
Bergsma, S.: Automatic acquisition of gender information for anaphora resolution. In: Proceedings of the 18th Conference of the Canadian Society for Computational Studies of Intelligence (Canadian AI), Victoria (2005)
Google Scholar
Bergsma, S., Lin, D.: Bootstrapping path-based pronoun resolution. In: COLING-ACL, Sydney (2006)
Book Google Scholar
Bergsma, S., Lin, D., Goebel, R.: Glen, Glenda or Glendale: unsupervised and semi-supervised learning of English noun gender. In: CoNLL, Boulder (2009)
Book Google Scholar
Brennan, S.E., Friedman, M.W., Pollard, C.J.: A centering approach to pronouns. In: ACL, Stanford (1987)
Book Google Scholar
Byron, D.K., Tetreault, J.R.: A flexible architecture for reference resolution. In: EACL, Bergen (1999)
Book Google Scholar
Cardie, C., Wagstaff, K.: Noun phrase coreference as clustering. In: EMNLP-VLC, College Park (1999)
Google Scholar
Charniak, E., Elsner, M.: EM works for pronoun anaphora resolution. In: EACL, Athens (2009)
Book Google Scholar
Cherry, C., Bergsma, S.: An Expectation Maximization approach to pronoun resolution. In: CoNLL, Ann Arbor (2005)
Book Google Scholar
Church, K.W., Mercer, R.L.: Introduction to the special issue on computational linguistics using large corpora. Comput. Linguist. 19 (1), 1–24 (1993)
Google Scholar
Cucerzan, S., Yarowsky, D.: Minimally supervised induction of grammatical gender. In: NAACL, Edmonton (2003)
Book Google Scholar
Daumé III, H., Marcu, D.: A large-scale exploration of effective global features for a joint entity detection and tracking model. In: HLT-EMNLP, Vancouver (2005)
Book Google Scholar
Denber, M.: Automatic resolution of anaphora in English. Technical report, Imaging Science Division, Eastman Kodak Co. (1998)
Google Scholar
Elsner, M., Charniak, E., Johnson, M.: Structured generative models for unsupervised named-entity clustering. In: HLT-NAACL, Boulder (2009)
Book Google Scholar
Evans, R., Orăsan, C.: Improving anaphora resolution by identifying animate entities in texts. In: DAARC, Lancaster (2000)
Google Scholar
Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database. MIT, Cambridge (1998)
MATH Google Scholar
Filippova, K., Strube, M.: Using linguistically motivated features for paragraph boundary detection. In: EMNLP, Sydney (2006)
Google Scholar
Ge, N., Hale, J., Charniak, E.: A statistical approach to anaphora resolution. In: Proceedings of the Sixth Workshop on Very Large Corpora, Montreal (1998)
Google Scholar
Haegeman, L.: Introduction to Government & Binding theory, 2nd edn. Basil Blackwell, Cambridge (1994)
Google Scholar
Haghighi, A., Klein, D.: Unsupervised coreference resolution in a nonparametric Bayesian model. In: ACL, Prague (2007)
Google Scholar
Haghighi, A., Klein, D.: Simple coreference resolution with rich syntactic and semantic features. In: EMNLP, Singapore (2009)
Book Google Scholar
Haghighi, A., Klein, D.: Coreference resolution in a modular, entity-centered model. In: HLT-NAACL, Los Angeles (2010)
Google Scholar
Hajič, J., Hladká, B.: Probabilistic and rule-based tagger of an inflective language: a comparison. In: ANLP, Washington DC (1997)
Book Google Scholar
Hale, J., Charniak, E.: Getting useful gender statistics from English text. Technical report: CS-98-06, Brown University (1998)
Google Scholar
Harabagiu, S., Bunescu, R., Maiorano, S.: Text and knowledge mining for coreference resolution. In: NAACL, Pittsburgh (2001)
Book Google Scholar
Hobbs, J.: Resolving pronoun references. Lingua 44 (311), 311–338 (1978)
Article Google Scholar
Ji, H., Lin, D.: Gender and animacy knowledge discovery from web-scale N-grams for unsupervised person mention detection. In: PACLIC, Hong Kong (2009)
Google Scholar
Kennedy, C., Boguraev, B.: Anaphora for everyone: pronominal anaphora resolution without a parser. In: COLING, Copenhagen (1996)
Book Google Scholar
Lappin, S., Leass, H.J.: An algorithm for pronominal anaphora resolution. Comput. Linguist. 20 (4), (1994)
Google Scholar
Lee, H., Chang, A., Peirsman, Y., Chambers, N., Surdeanu, M., Jurafsky, D.: Deterministic coreference resolution based on entity-centric, precision-ranked rules. Comput. Linguist. 39 (4), 885–916 (2013)
Article Google Scholar
Lin, D., Church, K., Ji, H., Sekine, S., Yarowsky, D., Bergsma, S., Patil, K., Pitler, E., Lathbury, R., Rao, V., Dalwani, K., Narsale, S.: New tools for web-scale N-grams. In: LREC, Valletta (2010)
Google Scholar
Mikheev, A., Grover, C., Moens, M.: Description of the LTG system used for MUC-7. In: 7th Message Understanding Conference, Fairfax (1998)
Google Scholar
Miller, G.A.: Nouns in WordNet: a lexical inheritance system. Int. J. Lexicogr. 3 (4), 245–264 (1990)
Article Google Scholar
Miltsakaki, E.: Antelogue: pronoun resolution for text and dialogue. In: Coling 2010: Demonstrations, Beijing (2010)
Google Scholar
Mitkov, R.: Factors in anaphora resolution: they are not the only things that matter. a case study based on two different approaches. In: ACL/EACL Workshop on Operational Factors in Practical, Robust Anaphora Resolution, Madrid (1997)
Google Scholar
Mitkov, R.: Robust pronoun resolution with limited knowledge. In: ACL-COLING, Montreal (1998)
Google Scholar
MUC-6: Coreference task definition (v2.3, 8 Sept 1995). In: Proceedings of the Sixth Message Understanding Conference (MUC-6), Columbia (1995)
Google Scholar
MUC-7: Coreference task definition (v3.0, 13 July 1997). In: Proceedings of the Seventh Message Understanding Conference (MUC-7), New York (1997)
Google Scholar
Ng, V., Cardie, C.: Improving machine learning approaches to coreference resolution. In: ACL, Philadephia (2002)
Google Scholar
Orăsan, C., Evans, R.: NP animacy identification for anaphora resolution. JAIR 29 (1), 79–103 (2007)
MATH Google Scholar
Øvrelid, L.: Towards robust animacy classification using morphosyntactic distributional features. In: EACL Student Research Workshop, Trento (2006)
Book Google Scholar
Pantel, P., Ravichandran, D.: Automatically labeling semantic classes. In: HLT-NAACL, Boston (2004)
Google Scholar
Pradhan, S., Ramshaw, L., Marcus, M., Palmer, M., Weischedel, R., Xue, N.: Conll-2011 shared task: modeling unrestricted coreference in ontonotes. In: Proceedings of the Fifteenth Conference on Computational Natural Language Learning: Shared Task, Portland, pp. 1–27 (2011)
Google Scholar
Raghunathan, K., Lee, H., Rangarajan, S., Chambers, N., Surdeanu, M., Jurafsky, D., Manning, C.: A multi-pass sieve for coreference resolution. In: EMNLP, Portland (2010)
Google Scholar
Soon, W.M., Ng, H.T., Lim, D.C.Y.: A machine learning approach to coreference resolution of noun phrases. Comput. Linguist. 27 (4), 521–544 (2001)
Article Google Scholar
Stuckardt, R.: Design and enhanced evaluation of a robust anaphor resolution algorithm. Comput. Linguist. 27 (4), 479–506 (2001)
Article Google Scholar
Tetreault, J.R.: A corpus-based evaluation of centering and pronoun resolution. Comput. Linguist. 27 (4), 507–520 (2001)
Article Google Scholar
Zaenen, A., Carletta, J., Garretson, G., Bresnan, J., Koontz-Garboden, A., Nikitina, T., O’Connor, M.C., Wasow, T.: Animacy encoding in English: why and how. In: ACL Workshop on Discourse Annotation, Barcelona (2004)
Book Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Saskatchewan, JHU Center of Excellence, Baltimore, Saskatoon, SK, Canada, S7N 5C9
Shane Bergsma

Authors

Shane Bergsma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shane Bergsma .

Editor information

Editors and Affiliations

Trento, Italy
Massimo Poesio
Frankfurt am Main, Germany
Roland Stuckardt
Heidelberg, Germany
Yannick Versley

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Bergsma, S. (2016). Extracting Anaphoric Agreement Properties from Corpora. In: Poesio, M., Stuckardt, R., Versley, Y. (eds) Anaphora Resolution. Theory and Applications of Natural Language Processing. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-47909-4_12

Download citation

DOI: https://doi.org/10.1007/978-3-662-47909-4_12
Published: 05 August 2016
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-47908-7
Online ISBN: 978-3-662-47909-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics