Language Resources and Evaluation

, Volume 41, Issue 1, pp 61–89 | Cite as

Automatically learning semantic knowledge about multiword predicates

Article

Abstract

Highly frequent and highly polysemous verbs, such as give, take, and make, pose a challenge to automatic lexical acquisition methods. These verbs widely participate in multiword predicates (such as light verb constructions, or LVCs), in which they contribute a broad range of figurative meanings that must be recognized. Here we focus on two properties that are key to the computational treatment of LVCs. First, we consider the degree of figurativeness of the semantic contribution of such a verb to the various LVCs it participates in. Second, we explore the patterns of acceptability of LVCs, and their productivity over semantically related combinations. To assess these properties, we develop statistical measures of figurativeness and acceptability that draw on linguistic properties of LVCs. We demonstrate that these corpus-based measures correlate well with human judgments of the relevant property. We also use the acceptability measure to estimate the degree to which a semantic class of nouns can productively form LVCs with a given verb. The linguistically-motivated measures outperform a standard measure for capturing the strength of collocation of these multiword expressions.

Keywords

Lexical acquisition Corpus-based statistical measures Verb semantics Multiword predicates Light verb constructions 

Notes

Acknowledgements

We thank Anne-Marie Brousseau, for the enlightening discussions regarding the human judgments on figurativeness; Eric Joanis, for providing us with NP-head extraction software; and our judges, who made the evaluation of our ideas possible. We are also grateful of the Natural Sciences and Engineering Research Council of Canada (NSERC), the Ontario Graduate Scholarship program (OGS), and the University of Toronto for the financial support.

References

  1. Alba-Salas, J. (2002). Light verb constructions in Romance: A syntactic analysis. PhD thesis, Cornell University.Google Scholar
  2. Baldwin, T., Bannard, C., Tanaka, T., & Widdows, D. (2003). An empirical model of multiword expression decomposability. In Proceedings of the ACL-SIGLEX Workshop on Multiword Expressions: Analysis, Acquisition and Treatment, pp. 89–96.Google Scholar
  3. Baldwin, T., & Villavicencio, A. (2002). Extracting the unextractable: A case study on verb-particles. In Proceedings of the Sixth Conference on Computational Natural Language Learning (CoNLL’02), pp. 98–104.Google Scholar
  4. Bannard, C., Baldwin, T., & Lascarides, A. (2003). A statistical approach to the semantics of verb-particles. In Proceedings of the ACL-SIGLEX Workshop on Multiword Expressions: Analysis, Acquisition and Treatment, pp. 65–72.Google Scholar
  5. BNC Reference Guide (2000). Reference guide for the British National Corpus (World Edition). Second edition.Google Scholar
  6. Brinton, L. J., & Akimoto, M. (Eds.) (1999). Collocational and idiomatic aspects of composite predicates in the history of English. John Benjamins Publishing Company.Google Scholar
  7. Butt, M. (2003). The light verb jungle. Manuscript.Google Scholar
  8. Cacciari, C. (1993). The place of idioms in a literal and metaphorical world. In C. Cacciari & P. Tabossi (Eds.), Idioms: Processing, structure, and interpretation (pp. 27–53). Lawrence Erlbaum Associates.Google Scholar
  9. Church, K., Gale, W., Hanks, P., & Hindle, D. (1991). Using statistics in lexical analysis. In U. Zernik (Ed.), Lexical acquisition: Exploiting on-line resources to build a lexicon (pp. 115–164). Lawrence Erlbaum.Google Scholar
  10. Claridge, C. (2000). Multi-word verbs in early modern English: A corpus-based study. Amsterdam, Atlanta: Rodopi B.V.Google Scholar
  11. Cohen, J. (1968). Weighted kappa: Nominal scale agreement with provision for scaled disagreement or partial credit. Psychological Bulletin, 70, 213–220.CrossRefGoogle Scholar
  12. Collins, M. (1999). Head-driven statistical models for natural language parsing. PhD thesis, University of Pennsylvania.Google Scholar
  13. Cruse, D. A. (1986). Lexical semantics. Cambridge University Press.Google Scholar
  14. Desbiens, M. C., & Simon, M. (2003). Déterminants et locutions verbales. Manuscript.Google Scholar
  15. Dras, M., & Johnson, M. (1996). Death and lightness: Using a demographic model to find support verbs. In Proceedings of the Fifth International Conference on the Cognitive Science of Natural Language Processing.Google Scholar
  16. Dunning, T. (1993). Accurate methods for the statistics of surprise and coincidence. Computational Linguistics, 19(1), 61–74.Google Scholar
  17. Fazly, A. (2007). Automatic acquisition of lexical knowledge about multiword predicates. PhD thesis, University of Toronto.Google Scholar
  18. Fazly, A., North, R., & Stevenson, S. (2005). Automatically distinguishing literal and figurative usages of highly polysemous verbs. In Proceedings of the ACL’05 Workshop on Deep Lexical Acquisition, pp. 38–47.Google Scholar
  19. Fazly, A., North, R., & Stevenson, S. (2006). Automatically determining allowable combinations of a class of flexible multiword expressions. In Proceedings of the International Conference on Intelligent Text Processing and Computational Linguistics (CICLing’06), pp. 81–92.Google Scholar
  20. Fazly, A., & Stevenson, S. (2006). Automatically constructing a lexicon of verb phrase idiomatic combinations. In Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL’06), pp. 337–344.Google Scholar
  21. Feinstein, A. R., & Cicchetti, D. V. (1990). High agreement but low kappa:I. The problems of two paradoxes. Journal of Clinical Epidemiology, 43(6), 543–549.CrossRefGoogle Scholar
  22. Fellbaum, C. (Ed.) (1998). WordNet, an electronic lexical database. The MIT Press.Google Scholar
  23. Gibbs, R. W. (1993). Why idioms are not dead metaphors. In C. Cacciari & P. Tabossi (Eds.), Idioms: Processing, structure, and interpretation (pp. 57–77). Lawrence Erlbaum Associates.Google Scholar
  24. Gibbs, R., & Nayak, N. P. (1989). Psychololinguistic studies on the syntactic behaviour of idioms. Cognitive Psychology, 21, 100–138.CrossRefGoogle Scholar
  25. Glucksberg, S. (1993). Idiom meanings and allusional content. In C. Cacciari & P. Tabossi (Eds.), Idioms: Processing, structure, and interpretation (pp. 3–26). Lawrence Erlbaum Associates.Google Scholar
  26. Grefenstette, G., & Teufel, S. (1995). Corpus-based method for automatic identification of support verbs for nominalization. In Proceedings of the Seventh Meeting of the European Chapter of the Association for Computational Linguistics (EACL’95).Google Scholar
  27. Inkpen, D. (2003). Building a lexical knowledge-base of near-synonym differences. PhD thesis, University of Toronto.Google Scholar
  28. Johnson, M. (1987). The body in the mind: The bodily basis of meaning, imagination, and reason. The University of Chicago Press.Google Scholar
  29. Karimi, S. (1997). Persian complex verbs: Idiomatic or compositional? Lexicology, 3(1), 273–318.Google Scholar
  30. Kearns, K. (2002). Light verbs in English. Manuscript.Google Scholar
  31. Keller, F., & Lapata, M. (2003). Using the web to obtain frequencies for unseen bigrams. Computational Linguistics, 29, 459–484.CrossRefGoogle Scholar
  32. Krenn, B., & Evert, S. (2001). Can we do better than frequency? A case study on extracting PP-verb collocations. In Proceedings of the ACL’01 Workshop on Collocations, pp. 39–46.Google Scholar
  33. Lakoff, G., & Johnson, M. (1980). Metaphors we live by. The University of Chicago Press.Google Scholar
  34. Levin, B. (1993). English verb classes and alternations: A preliminary investigation. The University of Chicago Press.Google Scholar
  35. Lin, D. (1999). Automatic identification of non-compositional phrases. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics (ACL’99), pp. 317–324.Google Scholar
  36. Lin, T. -H. (2001). Light verb syntax and the theory of phrase structure. PhD thesis, University of California, Irvine.Google Scholar
  37. McCarthy, D., Keller, B., & Carroll, J. (2003). Detecting a continuum of compositionality in phrasal verbs. In Proceedings of the ACL-SIGLEX Workshop on Multiword Expressions: Analysis, Acquisition and Treatment.Google Scholar
  38. Melamed, I. D. (1997). Automatic discovery of non-compositional compounds in parallel data. In Proceedings of the Second Conference on Empirical Methods for Natural Language Processing (EMNLP’97).Google Scholar
  39. Miyamoto, T. (2000). The light verb construction in Japanese: The role of the verbal noun. John Benjamins Publishing Company.Google Scholar
  40. Mohammad, S., & Hirst, G. (2006). Determining word sense dominance using a thesaurus. In Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL’06), pp. 121–128.Google Scholar
  41. Moirón, M. B. V. (2004). Discarding noise in an automatically acquired lexicon of support verb constructions. In Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC).Google Scholar
  42. Moon, R. (1998). Fixed expressions and idioms in English: A corpus-based approach. Oxford University Press.Google Scholar
  43. Newman, J. (1996). Give: A cognitive linguistic study. Mouton de Gruyter.Google Scholar
  44. Newman, J., & Rice, S. (2004). Patterns of usage for English SIT, STAND, and LIE: A cognitively inspired exploration in corpus linguistics. Cognitive Linguistics, 15(3), 351–396.CrossRefGoogle Scholar
  45. Nunberg, G., Sag, I. A., & Wasow, T. (1994). Idioms. Language, 70(3), 491–538.CrossRefGoogle Scholar
  46. Pauwels, P. (2000). Put, set, lay and place: A cognitive linguistic approach to verbal meaning. LINCOM EUROPA.Google Scholar
  47. Pustejovsky, J. (1995). The generative lexicon. MIT Press.Google Scholar
  48. Quirk, R., Greenbaum, S., Leech, G., & Svartvik, J. (1985). A comprehensive grammar of the English language. Longman.Google Scholar
  49. Rohde, D. L. T. (2004). TGrep2 User Manual.Google Scholar
  50. Sag, I. A., Baldwin, T., Bond, F., Copestake, A., & Flickinger, D. (2002). Multiword expressions: A pain in the neck for NLP’. In Proceedings of the Third International Conference on Intelligent Text Processing and Computational Linguistics (CICLing’02), pp. 1–15.Google Scholar
  51. Seretan, V., Nerima, L., & Wehrli, E. (2003). Extraction of multi-word collocations using syntactic bigram composition. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP’03).Google Scholar
  52. Stevenson, S., Fazly, A., & North, R. (2004). Statistical measures of the semi-productivity of light verb constructions. In Proceedings of the ACL’04 Workshop on Multiword Expressions: Integrating Processing, pp. 1–8Google Scholar
  53. Turney, P. D. (2001). Mining the Web for synonyms: PMI-IR versus LSA on TOEFL. In Proceedings of the 12th European Conference on Machine Learning (ECML’01), pp. 491–502.Google Scholar
  54. Uchiyama, K., Baldwin, T., & Ishizaki, S. (2005). Disambiguating Japanese compound verbs. Computer Speech and Language, 19, 497–512.Google Scholar
  55. Venkatapathy, S., & Joshi, A. (2005). Measuring the relative compositionality of verb-noun (V-N) collocations by integrating features. In Proceedings of the Joint Conference on Human Language Technology and Empirical Methods for Natural Language Processing (HLT-EMNLP’05), pp. 899–906.Google Scholar
  56. Villavicencio, A. (2003). Verb-particle constructions and lexical resources. In Proceedings of the ACL-SIGLEX Workshop on Multiword Expressions: Analysis, Acquisition and Treatment, pp. 57–64.Google Scholar
  57. Villavicencio, A. (2005). The availability of verb-particle constructions in lexical resources: How much is enough? Computer Speech and Language, 19, 415–432.Google Scholar
  58. Wanner, L. (2004). Towards automatic fine-grained semantic classification of verb-noun collocations. Natural Language Engineering, 10(2), 95–143.CrossRefGoogle Scholar
  59. Wermter, J., & Hahn, U. (2005). Paradigmatic modifiability statistics for the extraction of complex multi-word terms. In Proceedings of the Joint Conference on Human Language Technology and Empirical Methods for Natural Language Processing (HLT-EMNLP’05), pp. 843–850.Google Scholar
  60. Wierzbicka, A. (1982). Why can you have a drink When you can’t *Have an eat? Language, 58(4), 753–799.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media 2007

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of TorontoTorontoCanada

Personalised recommendations