Distributional Thesaurus Versus WordNet: A Comparison of Backoff Techniques for Unsupervised PP Attachment

  • Hiram Calvo
  • Alexander Gelbukh
  • Adam Kilgarriff
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3406)

Abstract

Prepositional Phrase (PP) attachment can be addressed by considering frequency counts of dependency triples seen in a non-annotated corpus. However, not all triples appear even in very big corpora. To solve this problem, several techniques have been used. We evaluate two different backoff methods, one based on WordNet and the other on a distributional (automatically created) thesaurus. We work on Spanish. The thesaurus is created using the dependency triples found in the same corpus used for counting the frequency of unambiguous triples. The training corpus used for both methods is an encyclopaedia. The method based on a distributional thesaurus has higher coverage but lower precision than the WordNet method.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Biblioteca de Consulta Microsoft Encarta 2004, Microsoft Corporation (1994–2004)Google Scholar
  2. 2.
    Banerjee, S., Ted Pedersen, T.: The Design, Implementation, and Use of the Ngram Statistic Package. In: Proceedings of the Fourth International Conference on Intelligent Text Processing and Computational Linguistics, Mexico City, pp. 370–381 (2003)Google Scholar
  3. 3.
    Brants, T.: TnT: A Statistical Part-of-Speech Tagger. In: Proceedings of the 6th Applied Natural Language Processing Conference, Seattle, WA, USA (2000)Google Scholar
  4. 4.
    Calvo, H., Gelbukh, A.: Improving Disambiguation of Prepositional Phrase Attachments Using the Web as Corpus. In: Procs. of CIARP 2003, Cuba, pp. 592–598 (2003)Google Scholar
  5. 5.
    Calvo, H., Gelbukh, A.: Unsupervised Learning of Ontology-Linked Selectional Preferences. In: Sanfeliu, A., Martínez Trinidad, J.F., Carrasco Ochoa, J.A. (eds.) CIARP 2004. LNCS, vol. 3287, pp. 418–424. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  6. 6.
    Clark, S., Weir, D.: Class-based Probability Estimation Using a Semantic Hierarchy. Computational Linguistics 28(2) (2002)Google Scholar
  7. 7.
    Farreres, X., Rigau, G., Rodríguez, H.: Using WordNet for Building WordNets. In: Proceedings of COLING-ACL Workshop Usage of WordNet in Natural Language Processing Systems, Montreal, Canada (1998)Google Scholar
  8. 8.
    Grefenstette, G.: Explorations in Automatic Thesaurus Discovery. Kluwer, Dordrecht (1994)MATHGoogle Scholar
  9. 9.
    Hindle, D., Rooth, M.: Structural ambiguity and lexical relations. Computational Linguistics 19, 103–120 (1993)Google Scholar
  10. 10.
    Kilgarriff, A.: Thesauruses for Natural Language Processing. In: Proceedings of NLP KE 2003, Beijing, China, pp. 5–13 (2003)Google Scholar
  11. 11.
    Lázaro Carreter, F. (ed.): Diccionario Anaya de la Lengua, Vox (1991)Google Scholar
  12. 12.
    Li, H., Abe, N.: Word clustering and disambiguation based on co-ocurrence data. In: Proceedings of COLING 1998, pp. 749–755 (1998)Google Scholar
  13. 13.
    Lin, D.: An information-theoretic measure of similarity. In: Proceedings of ICML 1998, pp. 296–304 (1998)Google Scholar
  14. 14.
    Manning, C., Schutze, H.: Foundations of Statistical Natural Language Processing, ch. 1. MIT Press, Cambridge (1999)MATHGoogle Scholar
  15. 15.
    McLauchlan, M.: Thesauruses for Prepositional Phrase Attachment. In: Proceedings of CoNLL 2004, Boston, MA, USA, pp. 73–80 (2004)Google Scholar
  16. 16.
    Mitchell, B.: Prepositional phrase attachment using machine learning algorithms. Ph.D. thesis, University of Sheffield (2003)Google Scholar
  17. 17.
    Morales-Carrasco, R., Gelbukh, A.: Evaluation of TnT Tagger for Spanish. In: Proc. Fourth Mexican International Conference on Computer Science, Mexico (2003)Google Scholar
  18. 18.
    Navarro, B., Civit, M., Antonia Martí, M., Marcos, R., Fernández, B.: Syntactic, semantic and pragmatic annotation in Cast3LB. In: Shallow Processing of Large Corpora (SProLaC), a Workshop of Corpus Linguistics, Lancaster, UK (2003)Google Scholar
  19. 19.
    Pantel, P., Lin, D.: An Unsupervised Approach to Prepositional Phrase Attachment using Contextually Similar Words. In: Proceedings of Association for Computational Linguistics (ACL 2000), Hong Kong, pp. 101–108 (2000)Google Scholar
  20. 20.
    Ratnaparkhi, A., Reynar, J., Roukos, S.: A maximum entropy model for prepositional phrase attachment. In: Proceedings of the ARPA Workshop on Human Language Technology, pp. 250–255 (1994)Google Scholar
  21. 21.
    Ratnaparkhi, A.: Unsupervised Statistical Models for Prepositional Phrase Attachment. In: Proceedings of COLINGACL 1998, Montreal, Canada (1998)Google Scholar
  22. 22.
    Resnik, P.: Selectional preference and sense disambiguation. In: ACL SIGLEX Workshop on Tagging Text with Lexical Semantics, Washington, D. C., USA (1997)Google Scholar
  23. 23.
    Roth, D.: Learning to Resolve Natural Language Ambiguities: A Unified Approach. In: Proceedings of AAAI 1998, Madison, Wisconsin, pp. 806–813 (1998)Google Scholar
  24. 24.
    Stetina, J., Nagao, M.: Corpus based PP attachment ambiguity resolution with a semantic dictionary. In: Proceedings of WVLC 1997, pp. 66–80 (1997)Google Scholar
  25. 25.
    Jones, S., Karen: Synonymy and Semantic Classification. Edinburgh University Press (1986)Google Scholar
  26. 26.
    Weeds, J.: Measures and Applications of Lexical Distributional Similarity. Julie Weeds, Ph.D. thesis. University of Sussex (2003)Google Scholar
  27. 27.
    Volk, M.: Exploiting the WWW as a corpus to resolve PP attachment ambiguities. In: Proceeding of Corpus Linguistics 2001, Lancaster (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Hiram Calvo
    • 1
  • Alexander Gelbukh
    • 1
  • Adam Kilgarriff
    • 2
  1. 1.Center for Computing ResearchNational Polytechnic InstituteMexico
  2. 2.Lexical Computing Ltd.United Kingdom

Personalised recommendations