Abstract
We present \(\textsc{Nada}\): the Non-Anaphoric Detection Algorithm. \(\textsc{Nada}\) is a novel, publicly-available program that accurately distinguishes between the referential and non-referential pronoun it in raw English text. Like recent state-of-the-art approaches, \(\textsc{Nada}\) uses very large-scale web \(\mbox{N-gram}\) features, but \(\textsc{Nada}\) makes these features practical by compressing the \(\mbox{N-gram}\) counts so they can fit into computer memory. \(\textsc{Nada}\) therefore operates as a fast, stand-alone system. \(\textsc{Nada}\) also improves over previous web-scale systems by considering the entire sentence, rather than narrow context windows, via long-distance lexical features. \(\textsc{Nada}\) very substantially outperforms other state-of-the-art systems in non-referential detection accuracy.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Bergsma, S., Lin, D., Goebel, R.: Distributional identification of non-referential pronouns. In: ACL 2008: HLT, pp. 10–18 (2008)
Bergsma, S., Lin, D., Goebel, R.: Web-scale N-gram models for lexical disambiguation. In: IJCAI, pp. 1507–1512 (2009)
Bergsma, S., Pitler, E., Lin, D.: Creating robust supervised classifiers via web-scale N-gram data. In: ACL, pp. 865–874 (2010)
Boyd, A., Gegg-Harrison, W., Byron, D.: Identifying non-referential it: A machine learning approach incorporating linguistically motivated patterns. In: ACL Workshop on Feature Engineering for Machine Learning in Natural Language Processing, pp. 40–47 (2005)
Brants, T., Alex Franz, A.: The Google Web 1T 5-gram Corpus Version 1.1. LDC2006T13 (2006)
Byron, D.: Resolving pronominal reference to abstract entities. In: ACL, pp. 80–87 (2002)
Charniak, E., Elsner, M.: EM works for pronoun anaphora resolution. In: EACL, pp. 148–156 (2009)
Cherry, C., Bergsma, S.: An Expectation Maximization approach to pronoun resolution. In: CoNLL, pp. 88–95 (2005)
Church, K., Hart, T., Gao, J.: Compressing trigram language models with Golomb coding. In: EMNLP-CoNLL, pp. 199–207 (2007)
Danlos, L.: Automatic recognition of French expletive pronoun occurrences. In: IJCNLP, pp. 73–78 (2005)
Denis, P., Baldridge, J.: Joint determination of anaphoricity and coreference using integer programming. In: NAACL-HLT, pp. 236–243 (2007)
Evans, R.: Applying machine learning toward an automatic classification of it. Literary and Linguistic Computing 16(1), 45–57 (2001)
Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R., Lin, C.-J.: LIBLINEAR: A library for large linear classification. Mach. Learn. Res. 9, 1871–1874 (2008)
Ge, N., Hale, J., Charniak, E.: A statistical approach to anaphora resolution. In: Proceedings of the Sixth Workshop on Very Large Corpora, pp. 161–170 (1998)
Guthrie, D., Hepple, M.: Storing the web in memory: Space efficient language models with constant time retrieval. In: EMNLP, pp. 262–272 (2010)
Hammami, S.M., Sallemi, R., Belguith, L.H.: A bayesian classifier for the identification of non-referential pronouns in Arabic. In: INFOS, Special Track On Natural Language Processing and Knowledge Mining (2010)
Hirst, G.: Anaphora in Natural Language Understanding: A Survey. Springer, Heidelberg (1981)
Hobbs, J.: Resolving pronoun references. Lingua 44(311), 339–352 (1978)
Kehler, A., Appelt, D., Taylor, L., Simma, A.: The (non)utility of predicate-argument frequencies for pronoun interpretation. In: HLT-NAACL, pp. 289–296 (2004)
Koehn, P.: Europarl: A parallel corpus for statistical machine translation. In: MT Summit X, pp. 79–86 (2005)
Lappin, S., Leass, H.J.: An algorithm for pronominal anaphora resolution. Computational Linguistics 20(4), 535–561 (1994)
Marcus, M.P., Santorini, B., Marcinkiewicz, M.: Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics 19(2), 313–330 (1993)
Miltsakaki, E.: Antelogue: Pronoun resolution for text and dialogue. In: Coling 2010: Demonstrations, pp. 41–44 (2010)
Mitkov, R., Evans, R., Orasan, C.: A new, fully automatic version of Mitkov’s knowledge-poor pronoun resolution method. In: CICLing, pp. 168–186 (2002)
Müller, C.: Automatic detection of nonreferential It in spoken multi-party dialog. In: EACL, pp. 49–56 (2006)
Ng, V., Cardie, C.: Identifying anaphoric and non-anaphoric noun phrases to improve coreference resolution. In: COLING, pp. 730–736 (2002)
Paice, C.D., Husk, G.D.: Towards the automatic recognition of anaphoric features in English text: the impersonal pronoun ”it”. Computer Speech and Language 2, 109–132 (1987)
Pauls, A., Klein, D.: Faster and smaller N-Gram language models. In: ACL, pp. 258–267 (2011)
Rello, L., Suárez, P., Mitkov, R.: A machine learning method for identifying impersonal constructions and zero pronouns in Spanish. In: Procesamiento del Lenguaje Natural, pp. 281–287 (2010)
Stoyanov, V., Gilbert, N., Cardie, C., Riloff, E.: Conundrums in noun phrase coreference resolution: making sense of the state-of-the-art. In: ACL-IJCNLP, pp. 656–664 (2009)
Webber, B.L.: Discourse deixis: reference to discourse segments. In: ACL, pp. 113–122 (1988)
Weischedel, R., Brunstein, A.: BBN pronoun coreference and entity type corpus. LDC2005T33 (2005)
Yang, X., Jian Su, J., Tan, C.L.: Improving pronoun resolution using statistics-based semantic compatibility information. In: ACL (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bergsma, S., Yarowsky, D. (2011). NADA: A Robust System for Non-referential Pronoun Detection. In: Hendrickx, I., Lalitha Devi, S., Branco, A., Mitkov, R. (eds) Anaphora Processing and Applications. DAARC 2011. Lecture Notes in Computer Science(), vol 7099. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25917-3_2
Download citation
DOI: https://doi.org/10.1007/978-3-642-25917-3_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25916-6
Online ISBN: 978-3-642-25917-3
eBook Packages: Computer ScienceComputer Science (R0)