NADA: A Robust System for Non-referential Pronoun Detection

Bergsma, Shane; Yarowsky, David

doi:10.1007/978-3-642-25917-3_2

NADA: A Robust System for Non-referential Pronoun Detection

Shane Bergsma²³ &
David Yarowsky²³

Conference paper

701 Accesses
4 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7099))

Abstract

We present \(\textsc{Nada}\): the Non-Anaphoric Detection Algorithm. \(\textsc{Nada}\) is a novel, publicly-available program that accurately distinguishes between the referential and non-referential pronoun it in raw English text. Like recent state-of-the-art approaches, \(\textsc{Nada}\) uses very large-scale web \(\mbox{N-gram}\) features, but \(\textsc{Nada}\) makes these features practical by compressing the \(\mbox{N-gram}\) counts so they can fit into computer memory. \(\textsc{Nada}\) therefore operates as a fast, stand-alone system. \(\textsc{Nada}\) also improves over previous web-scale systems by considering the entire sentence, rather than narrow context windows, via long-distance lexical features. \(\textsc{Nada}\) very substantially outperforms other state-of-the-art systems in non-referential detection accuracy.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bergsma, S., Lin, D., Goebel, R.: Distributional identification of non-referential pronouns. In: ACL 2008: HLT, pp. 10–18 (2008)
Google Scholar
Bergsma, S., Lin, D., Goebel, R.: Web-scale N-gram models for lexical disambiguation. In: IJCAI, pp. 1507–1512 (2009)
Google Scholar
Bergsma, S., Pitler, E., Lin, D.: Creating robust supervised classifiers via web-scale N-gram data. In: ACL, pp. 865–874 (2010)
Google Scholar
Boyd, A., Gegg-Harrison, W., Byron, D.: Identifying non-referential it: A machine learning approach incorporating linguistically motivated patterns. In: ACL Workshop on Feature Engineering for Machine Learning in Natural Language Processing, pp. 40–47 (2005)
Google Scholar
Brants, T., Alex Franz, A.: The Google Web 1T 5-gram Corpus Version 1.1. LDC2006T13 (2006)
Google Scholar
Byron, D.: Resolving pronominal reference to abstract entities. In: ACL, pp. 80–87 (2002)
Google Scholar
Charniak, E., Elsner, M.: EM works for pronoun anaphora resolution. In: EACL, pp. 148–156 (2009)
Google Scholar
Cherry, C., Bergsma, S.: An Expectation Maximization approach to pronoun resolution. In: CoNLL, pp. 88–95 (2005)
Google Scholar
Church, K., Hart, T., Gao, J.: Compressing trigram language models with Golomb coding. In: EMNLP-CoNLL, pp. 199–207 (2007)
Google Scholar
Danlos, L.: Automatic recognition of French expletive pronoun occurrences. In: IJCNLP, pp. 73–78 (2005)
Google Scholar
Denis, P., Baldridge, J.: Joint determination of anaphoricity and coreference using integer programming. In: NAACL-HLT, pp. 236–243 (2007)
Google Scholar
Evans, R.: Applying machine learning toward an automatic classification of it. Literary and Linguistic Computing 16(1), 45–57 (2001)
Article Google Scholar
Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R., Lin, C.-J.: LIBLINEAR: A library for large linear classification. Mach. Learn. Res. 9, 1871–1874 (2008)
MATH Google Scholar
Ge, N., Hale, J., Charniak, E.: A statistical approach to anaphora resolution. In: Proceedings of the Sixth Workshop on Very Large Corpora, pp. 161–170 (1998)
Google Scholar
Guthrie, D., Hepple, M.: Storing the web in memory: Space efficient language models with constant time retrieval. In: EMNLP, pp. 262–272 (2010)
Google Scholar
Hammami, S.M., Sallemi, R., Belguith, L.H.: A bayesian classifier for the identification of non-referential pronouns in Arabic. In: INFOS, Special Track On Natural Language Processing and Knowledge Mining (2010)
Google Scholar
Hirst, G.: Anaphora in Natural Language Understanding: A Survey. Springer, Heidelberg (1981)
Book Google Scholar
Hobbs, J.: Resolving pronoun references. Lingua 44(311), 339–352 (1978)
Google Scholar
Kehler, A., Appelt, D., Taylor, L., Simma, A.: The (non)utility of predicate-argument frequencies for pronoun interpretation. In: HLT-NAACL, pp. 289–296 (2004)
Google Scholar
Koehn, P.: Europarl: A parallel corpus for statistical machine translation. In: MT Summit X, pp. 79–86 (2005)
Google Scholar
Lappin, S., Leass, H.J.: An algorithm for pronominal anaphora resolution. Computational Linguistics 20(4), 535–561 (1994)
Google Scholar
Marcus, M.P., Santorini, B., Marcinkiewicz, M.: Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics 19(2), 313–330 (1993)
Google Scholar
Miltsakaki, E.: Antelogue: Pronoun resolution for text and dialogue. In: Coling 2010: Demonstrations, pp. 41–44 (2010)
Google Scholar
Mitkov, R., Evans, R., Orasan, C.: A new, fully automatic version of Mitkov’s knowledge-poor pronoun resolution method. In: CICLing, pp. 168–186 (2002)
Google Scholar
Müller, C.: Automatic detection of nonreferential It in spoken multi-party dialog. In: EACL, pp. 49–56 (2006)
Google Scholar
Ng, V., Cardie, C.: Identifying anaphoric and non-anaphoric noun phrases to improve coreference resolution. In: COLING, pp. 730–736 (2002)
Google Scholar
Paice, C.D., Husk, G.D.: Towards the automatic recognition of anaphoric features in English text: the impersonal pronoun ”it”. Computer Speech and Language 2, 109–132 (1987)
Article Google Scholar
Pauls, A., Klein, D.: Faster and smaller N-Gram language models. In: ACL, pp. 258–267 (2011)
Google Scholar
Rello, L., Suárez, P., Mitkov, R.: A machine learning method for identifying impersonal constructions and zero pronouns in Spanish. In: Procesamiento del Lenguaje Natural, pp. 281–287 (2010)
Google Scholar
Stoyanov, V., Gilbert, N., Cardie, C., Riloff, E.: Conundrums in noun phrase coreference resolution: making sense of the state-of-the-art. In: ACL-IJCNLP, pp. 656–664 (2009)
Google Scholar
Webber, B.L.: Discourse deixis: reference to discourse segments. In: ACL, pp. 113–122 (1988)
Google Scholar
Weischedel, R., Brunstein, A.: BBN pronoun coreference and entity type corpus. LDC2005T33 (2005)
Google Scholar
Yang, X., Jian Su, J., Tan, C.L.: Improving pronoun resolution using statistics-based semantic compatibility information. In: ACL (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Computer Science and Human Language Technology Center of Excellence, Johns Hopkins University, US
Shane Bergsma & David Yarowsky

Authors

Shane Bergsma
View author publications
You can also search for this author in PubMed Google Scholar
David Yarowsky
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Centro de Linguística da Universidade de Lisboa, Complexo Interdisciplinar da Universidade de Lisboa, Av. Prof. Gama Pinto, 2, 1649-003, Lisboa, Portugal
Iris Hendrickx
K. B. Chandrasekhar Research Centre, Anna University, MIT Campus of Anna University, Chromepet, 600044, Chennai, India
Sobha Lalitha Devi
Faculdade de Ciências, Departamento de Informática, Cidade Universitária, Universidade de Lisboa, 1749-016, Lisboa, Portugal
António Branco
School of Humanities, Languages and Social Studies, Research Group in Computational Linguistics, University of Wolverhampton, WV1 1SB, Wolverhampton, UK
Ruslan Mitkov

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bergsma, S., Yarowsky, D. (2011). NADA: A Robust System for Non-referential Pronoun Detection. In: Hendrickx, I., Lalitha Devi, S., Branco, A., Mitkov, R. (eds) Anaphora Processing and Applications. DAARC 2011. Lecture Notes in Computer Science(), vol 7099. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25917-3_2

Download citation

DOI: https://doi.org/10.1007/978-3-642-25917-3_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25916-6
Online ISBN: 978-3-642-25917-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics