Skip to main content

NADA: A Robust System for Non-referential Pronoun Detection

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7099))

Abstract

We present \(\textsc{Nada}\): the Non-Anaphoric Detection Algorithm. \(\textsc{Nada}\) is a novel, publicly-available program that accurately distinguishes between the referential and non-referential pronoun it in raw English text. Like recent state-of-the-art approaches, \(\textsc{Nada}\) uses very large-scale web \(\mbox{N-gram}\) features, but \(\textsc{Nada}\) makes these features practical by compressing the \(\mbox{N-gram}\) counts so they can fit into computer memory. \(\textsc{Nada}\) therefore operates as a fast, stand-alone system. \(\textsc{Nada}\) also improves over previous web-scale systems by considering the entire sentence, rather than narrow context windows, via long-distance lexical features. \(\textsc{Nada}\) very substantially outperforms other state-of-the-art systems in non-referential detection accuracy.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bergsma, S., Lin, D., Goebel, R.: Distributional identification of non-referential pronouns. In: ACL 2008: HLT, pp. 10–18 (2008)

    Google Scholar 

  2. Bergsma, S., Lin, D., Goebel, R.: Web-scale N-gram models for lexical disambiguation. In: IJCAI, pp. 1507–1512 (2009)

    Google Scholar 

  3. Bergsma, S., Pitler, E., Lin, D.: Creating robust supervised classifiers via web-scale N-gram data. In: ACL, pp. 865–874 (2010)

    Google Scholar 

  4. Boyd, A., Gegg-Harrison, W., Byron, D.: Identifying non-referential it: A machine learning approach incorporating linguistically motivated patterns. In: ACL Workshop on Feature Engineering for Machine Learning in Natural Language Processing, pp. 40–47 (2005)

    Google Scholar 

  5. Brants, T., Alex Franz, A.: The Google Web 1T 5-gram Corpus Version 1.1. LDC2006T13 (2006)

    Google Scholar 

  6. Byron, D.: Resolving pronominal reference to abstract entities. In: ACL, pp. 80–87 (2002)

    Google Scholar 

  7. Charniak, E., Elsner, M.: EM works for pronoun anaphora resolution. In: EACL, pp. 148–156 (2009)

    Google Scholar 

  8. Cherry, C., Bergsma, S.: An Expectation Maximization approach to pronoun resolution. In: CoNLL, pp. 88–95 (2005)

    Google Scholar 

  9. Church, K., Hart, T., Gao, J.: Compressing trigram language models with Golomb coding. In: EMNLP-CoNLL, pp. 199–207 (2007)

    Google Scholar 

  10. Danlos, L.: Automatic recognition of French expletive pronoun occurrences. In: IJCNLP, pp. 73–78 (2005)

    Google Scholar 

  11. Denis, P., Baldridge, J.: Joint determination of anaphoricity and coreference using integer programming. In: NAACL-HLT, pp. 236–243 (2007)

    Google Scholar 

  12. Evans, R.: Applying machine learning toward an automatic classification of it. Literary and Linguistic Computing 16(1), 45–57 (2001)

    Article  Google Scholar 

  13. Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R., Lin, C.-J.: LIBLINEAR: A library for large linear classification. Mach. Learn. Res. 9, 1871–1874 (2008)

    MATH  Google Scholar 

  14. Ge, N., Hale, J., Charniak, E.: A statistical approach to anaphora resolution. In: Proceedings of the Sixth Workshop on Very Large Corpora, pp. 161–170 (1998)

    Google Scholar 

  15. Guthrie, D., Hepple, M.: Storing the web in memory: Space efficient language models with constant time retrieval. In: EMNLP, pp. 262–272 (2010)

    Google Scholar 

  16. Hammami, S.M., Sallemi, R., Belguith, L.H.: A bayesian classifier for the identification of non-referential pronouns in Arabic. In: INFOS, Special Track On Natural Language Processing and Knowledge Mining (2010)

    Google Scholar 

  17. Hirst, G.: Anaphora in Natural Language Understanding: A Survey. Springer, Heidelberg (1981)

    Book  Google Scholar 

  18. Hobbs, J.: Resolving pronoun references. Lingua 44(311), 339–352 (1978)

    Google Scholar 

  19. Kehler, A., Appelt, D., Taylor, L., Simma, A.: The (non)utility of predicate-argument frequencies for pronoun interpretation. In: HLT-NAACL, pp. 289–296 (2004)

    Google Scholar 

  20. Koehn, P.: Europarl: A parallel corpus for statistical machine translation. In: MT Summit X, pp. 79–86 (2005)

    Google Scholar 

  21. Lappin, S., Leass, H.J.: An algorithm for pronominal anaphora resolution. Computational Linguistics 20(4), 535–561 (1994)

    Google Scholar 

  22. Marcus, M.P., Santorini, B., Marcinkiewicz, M.: Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics 19(2), 313–330 (1993)

    Google Scholar 

  23. Miltsakaki, E.: Antelogue: Pronoun resolution for text and dialogue. In: Coling 2010: Demonstrations, pp. 41–44 (2010)

    Google Scholar 

  24. Mitkov, R., Evans, R., Orasan, C.: A new, fully automatic version of Mitkov’s knowledge-poor pronoun resolution method. In: CICLing, pp. 168–186 (2002)

    Google Scholar 

  25. Müller, C.: Automatic detection of nonreferential It in spoken multi-party dialog. In: EACL, pp. 49–56 (2006)

    Google Scholar 

  26. Ng, V., Cardie, C.: Identifying anaphoric and non-anaphoric noun phrases to improve coreference resolution. In: COLING, pp. 730–736 (2002)

    Google Scholar 

  27. Paice, C.D., Husk, G.D.: Towards the automatic recognition of anaphoric features in English text: the impersonal pronoun ”it”. Computer Speech and Language 2, 109–132 (1987)

    Article  Google Scholar 

  28. Pauls, A., Klein, D.: Faster and smaller N-Gram language models. In: ACL, pp. 258–267 (2011)

    Google Scholar 

  29. Rello, L., Suárez, P., Mitkov, R.: A machine learning method for identifying impersonal constructions and zero pronouns in Spanish. In: Procesamiento del Lenguaje Natural, pp. 281–287 (2010)

    Google Scholar 

  30. Stoyanov, V., Gilbert, N., Cardie, C., Riloff, E.: Conundrums in noun phrase coreference resolution: making sense of the state-of-the-art. In: ACL-IJCNLP, pp. 656–664 (2009)

    Google Scholar 

  31. Webber, B.L.: Discourse deixis: reference to discourse segments. In: ACL, pp. 113–122 (1988)

    Google Scholar 

  32. Weischedel, R., Brunstein, A.: BBN pronoun coreference and entity type corpus. LDC2005T33 (2005)

    Google Scholar 

  33. Yang, X., Jian Su, J., Tan, C.L.: Improving pronoun resolution using statistics-based semantic compatibility information. In: ACL (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bergsma, S., Yarowsky, D. (2011). NADA: A Robust System for Non-referential Pronoun Detection. In: Hendrickx, I., Lalitha Devi, S., Branco, A., Mitkov, R. (eds) Anaphora Processing and Applications. DAARC 2011. Lecture Notes in Computer Science(), vol 7099. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25917-3_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-25917-3_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-25916-6

  • Online ISBN: 978-3-642-25917-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics