Human Associations and the Choice of Features for Semantic Verb Classification

Schulte im Walde, Sabine

doi:10.1007/s11168-008-9044-8

Human Associations and the Choice of Features for Semantic Verb Classification

Published: 07 February 2008

Volume 6, pages 79–111, (2008)
Cite this article

Research on Language and Computation

Sabine Schulte im Walde¹

65 Accesses
4 Citations
Explore all metrics

Abstract

This article investigates whether human associations to verbs as collected in a web experiment can help us to identify salient features for semantic verb classes. Starting from the assumption that the associations, i.e., the words that are called to mind by the stimulus verbs, reflect highly salient linguistic and conceptual features of the verbs, we apply a cluster analysis to the verbs, based on the associations, and validate the resulting verb classes against standard approaches to semantic verb classes. Then, we perform various clusterings on the same verbs using standard corpus-based feature types, and evaluate them against the association-based clustering as well as GermaNet and FrameNet classes. Comparing the cluster analyses provides an insight into the usefulness of standard feature types in verb clustering, and assesses shallow vs. deep syntactic features, and the role of corpus frequency. We show that (a) there is no significant preference for using a specific syntactic relationship (such as direct objects) as nominal features in clustering; (b) that simple window co-occurrence features are not significantly worse (and in some cases even better) than selected grammar-based functions; and (c) that a restricted feature choice disregarding high- and low-frequency features is sufficient. Finally, by applying the feature choices to GermaNet and FrameNet verbs and classes, we address the question of whether the same types of features are salient for different types of semantic verb classes. The variation of the gold standard classifications demonstrates that the clustering results are significantly different, even when relying on the same features.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Beigman Klebanov, B. (2006). Measuring semantic relatedness using people and wordnet. In Proceedings of the Joint Conference on Human Language Technology and the North American Chapter of the Association for Computational Linguistics, New York City, NY, pp. 13–17.
Beigman Klebanov B. and Shamir E. (2006). Reader-based exploration of lexical cohesion. Language Resources and Evaluation, 40(2): 109–126
Article Google Scholar
Berland, M., & Charniak, E. (1999). Finding parts in very large corpora. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, Maryland, MD, pp. 57–64.
Biemann, C., Bordag, S., & Quasthoff, U. (2004). Automatic acquisition of paradigmatic relations using iterated co-occurrences. In Proceedings of the 4th International Conference on Language Resources and Evaluation, Lisbon, Portugal.
Boyd-Graber, J., Fellbaum, C., Osherson, D., & Schapire, R. (2006). Adding dense, weighted connections to Wordnet. In Proceedings of the Third Global WordNet Meeting, Jeju Island, Korea.
Chklovski, T., & Pantel, P. (2004). VerbOcean: Mining the web for fine-grained semantic verb relations. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Barcelona, Spain.
Church, K. W., & Hanks, P. (1989). Word association norms, mutual information, and lexicography. In Proceedings of the 27th Annual Meeting of the Association for Computational Linguistics, Vancouver, Canada, pp. 76–83.
Clark, H. H. (1971). Word associations and linguistic theory. In: J. Lyons (ed.), New horizon in linguistics. (pp. 271–286). Penguin, Chapt. 15.
Dorr B.J. (1997). Large-scale dictionary construction for foreign language tutoring and interlingual machine translation. Machine Translation 12(4): 271–322
Article Google Scholar
Dorr, B. J., & Jones, D. (1996). Role of word sense disambiguation in lexical acquisition: Predicting semantics from syntactic cues. In Proceedings of the 16th International Conference on Computational Linguistics, Copenhagen, Denmark, pp. 322–327.
Erk, K., Kowalski, A., Padó, S., & Pinkal, M. (2003). Towards a resource for lexical semantics: A large German corpus with extensive semantic annotation. In Proceedings of the 41st Annual Metting of the Association for Computational Linguistics, Sapporo, Japan, pp. 537–544.
Fellbaum, C. (1995). Co-Occurrence and antonymy. Lexicography, 8(4).
Fellbaum, C. (ed.). (1998). WordNet - An electronic lexical database, Language, Speech, and Communication. Cambridge, MA: MIT Press.
Fellbaum, C., & Chaffin, R. (1990). Some principles of the organization of verbs in the mental lexicon. In Proceedings of the 12th Annual Conference of the Cognitive Science Society of America.
Fernández A., Diez E., Alonso M.A. and Beato M.S. (2004). Free-association norms for the Spanisch names of the Snodgrass and Vanderwart pictures. Behavior Research Methods, Instruments and Computers 36(3): 577–583
Google Scholar
Ferrand L. and Alario F.-X. (1998). French word association norms for 366 names of objects. L’Annee Psychologique, 98(4): 659–709
Article Google Scholar
Ferrer, E. E. (2004). Towards a semantic classification of Spanish verbs based on subcategorisation information. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics, Barcelona, Spain.
Fillmore, C. J. (1982). Frame semantics. Linguistics in the Morning Calm (pp. 111–137), Seoul: Hanshin Publishing Co.
Fillmore C.J., Johnson C.R. and Petruck M.R. (2003). Background to FrameNet. International Journal of Lexicography, 16: 235–250
Article Google Scholar
Girju, R. (2003). Automatic detection of causal relations for question answering. In Proceedings of the ACL Workshop on Multilingual Summarization and Question Answering – Machine Learning and Beyond, Sapporo, Japan.
Girju R., Badulescu A. and Moldovan D. (2006). Automatic discovery of part–whole relations. Computational Linguistics, 32(1): 83–135
Google Scholar
Girju, R., Moldovan, D., Tatu, M., & Antohe, D. (2005). On the semantics of noun compounds. Journal of Computer Speech and Language, 19(4) (Special Issue on Multiword Expressions).
Gurevych, I., Müller, C., & Zesch, T. (2007). Electronic career guidance based on semantic relatedness. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, Prague, Czech Republic.
Harris Z. (1968). Distributional structure. In: Katz J J. (eds). The philosophy of linguistics, (pp. 26–47). Oxford Readings in Philosophy. Oxford University Press.
Hatzivassiloglou, V., & McKeown, K. R. (1993). Towards the automatic identification of adjectival scales: Clustering adjectives according to meaning. In Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics, Columbus, Ohio, pp. 172–182.
Hearst M. (1998). Automated discovery of WordNet relations. In: Fellbaum, C. (eds) WordNet – An electronic lexical database, Language, Speech and Communication, pp. MIT Press, Cambridge, MA
Google Scholar
Heringer H.J. (1986). The verb and its semantic power: Association as the basis for valence. Journal of Semantics, 4: 79–99
Article Google Scholar
Hirsh K.W. and Tree J. (2001). Word association norms for two cohorts of British adults. Journal of Neurolinguistics, 14(1): 1–44
Article Google Scholar
Joanis, E., & Stevenson, S. (2003). A general feature space for automatic verb classification. In Proceedings of the 10th Conference of the European Chapter of the Association for Computational Linguistics, Budapest, Hungary.
Joanis, E., Stevenson, S., & James, D. (to appear). A general feature space for automatic verb classification. Natural Language Engineering.
Kaufman L. and Rousseeuw P.J. (1990). Finding Groups in Data – An Introduction to Cluster Analysis, Probability and Mathematical Statistics. Wiley, New York
Google Scholar
Kavalek, M., & Svatek, V. (2005). A study on automated relation labelling in ontology learning. In P. Buitelaar, P. Cimiano, & Magnini, B. (Eds.), Ontology learning and population, Vol. 123 of Frontiers in Artificial Intelligence. IOS Press.
Kiss, G., Armstrong, C., Milroy, R., & Piper, J. (1973). An associative thesaurus of English and its computer analysis. In The Computer and Literary Studies, Edinburgh University Press.
Klavans, J. L., & Kan, M.-Y. (1998). The role of verbs in document analysis. In Proceedings of the 17th International Conference on Computational Linguistics and the 36th Annual Meeting of the Association for Computational Linguistics, Montreal, Canada, pp. 680–686.
Koehn, P., & Hoang, H. (2007). Factored translation models. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Prague, Czech Republic, pp. 868–876.
Kohomban, U. S., & Lee, W. S. (2005). Learning semantic classes for word sense disambiguation. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, Ann Arbor, MI, pp. 34–41.
Korhonen, A. (2002). Subcategorization acquisition. Ph.D. Thesis, University of Cambridge, Computer Laboratory. Technical Report UCAM-CL-TR-530.
Korhonen, A., Krymolowski, Y., & Marx, Z. (2003). Clustering polysemic subcategorization frame distributions semantically. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, Sapporo, Japan, pp. 64–71.
Kunze, C. (2000). Extension and use of GermaNet, a lexical-semantic database. In Proceedings of the 2nd International Conference on Language Resources and Evaluation, Athens, Greece, pp. 999–1002.
Lapata M. and Brew C. (2004). Verb class disambiguation using informative priors. Computational Linguistics, 30(1): 45–73
Article Google Scholar
Lauteslager, M., Schaap, T., & Schievels, D. (1986). Schriftelijke Woordassociatienormen voor 549 Nederlandse Zelfstandige Naamworden. Swets and Zeitlinger.
Lee, L. (2001). On the effectiveness of the skew divergence for statistical language analysis. Artificial Intelligence and Statistics pp. 65–72.
Levin, B. (1993). English verb classes and alternations. The University of Chicago Press.
Lin, D. (1998). Automatic retrieval and clustering of similar words. In Proceedings of the 17th International Conference on Computational Linguistics, Montreal, Canada.
Lin, D. (1999). Automatic identification of non-compositional phrases. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, Maryland, MD, pp. 317–324.
Maedche, A., & Staab, S. (2000). Discovering conceptual relations from text. In Proceedings of the 14th European Conference on Artificial Intelligence, Berlin, Germany.
McCarthy, D., Keller, B., & Carroll, J. (2003). Detecting a continuum of compositionality in phrasal verbs. In Proceedings of the ACL-SIGLEX Workshop on Multiword Expressions: Analysis, Acquisition and Treatment, Sapporo, Japan.
McKoon G. and Ratcliff R. (1992). Spreading activation versus compound cue accounts of priming: Mediated priming revisited. Journal of Experimental Psychology: Learning, Memory and Cognition, 18, 1155–1172
Article Google Scholar
McRae K. and Boisvert S. (1998). Automatic semantic similarity priming. Journal of Experimental Psychology: Learning, memory and cognition, 24(3): 558–572
Article Google Scholar
Melinger, A., Schulte im Walde, S., & Weber, A. (2006). Characterizing response types and revealing noun ambiguity in German association norms. In Proceedings of the EACL Workshop ‘Making Sense of Sense: Bringing Computational Linguistics and Psycholinguistics together’, Trento, Italy, pp. 41–48.
Melinger, A., & Weber, A. (2006). Database of noun associations for German. URL: www.coli.uni-saarland.de/projects/nag/.
Merlo P. and Stevenson S. (2001). Automatic verb classification based on statistical distributions of argument structure. Computational Linguistics, 27(3): 373–408
Article Google Scholar
Morris, J., & Hirst, G. (2004). Non-classical lexical semantic relations. In Proceedings of the HLT Workshop on Computational Lexical Semantics, Boston, MA.
Navigli R. and Velardi P. (2004). Learning domain ontologies from document warehouses and dedicated web sites. Computational Linguistics 30(2): 151–179
Article Google Scholar
Nelson, D., McEvoy, C., & Schreiber, T. (1998). The University of South Florida Word Association, Rhyme, and Word Fragment Norms.
Padó, U., Crocker, M., & Keller, F. (2006). Modelling semantic role plausibility in human sentence processing. In Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics, Trento, Italy.
Palermo, D., & Jenkins, J. (1964). Word association norms: Grade school through college. Minneapolis: University of Minnesota Press.
Palmer M., Gildea D. and Kingsbury P. (2005). The proposition bank: An annotated resource of semantic roles. Computational Linguistics 31(1): 71–106
Article Google Scholar
Pereira, F., Tishby, N., & Lee, L. (1993). Distributional clustering of english words. In Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics, Columbus, OH, pp. 183–190.
Pinker S. (1989). Learnability and cognition: The acquisition of argument structure. MIT Press, Cambridge, MA
Google Scholar
Plaut, D. C. (1995). Semantic and associative priming in a distributed attractor network. In Proceedings of the 17th Annual Conference of the Cognitive Science Society, Vol. 17. pp. 37–42.
Prescher, D., Riezler, S., & Rooth, M. (2000). Using a probabilistic class-based Lexicon for Lexical ambiguity resolution. In Proceedings of the 18th International Conference on Computational Linguistics.
Rapp, R. (1996). Die Berechnung von Assoziationen, Vol. 16 of Sprache und Computer. Georg Olms Verlag.
Rapp, R. (2002). The computation of word associations: Comparing syntagmatic and paradigmatic approaches. In Proceedings of the 19th International Conference on Computational Linguistics, Taipei, Taiwan.
Rooth, M., Riezler, S., Prescher, D., Carroll, G., & Beil, F. (1999). Inducing a semantically annotated Lexicon via EM-Based clustering. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, Maryland, MD.
Rosario, B., & Hearst, M. (2001). Classifying the semantic relations in noun compounds via a domain-specific Lexical hierarchy. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Pittsburgh, PA.
Roth, M. (2006). Relationen zwischen Nomen und ihren Assoziationen. Studienarbeit. Institut für Computerlinguistik und Phonetik, Universität des Saarlandes.
Russell W.A. (1970). The complete German language norms for responses to 100 words from the Kent-Rosanoff word association test. In: Postman, L. and Keppel, G. (eds) Norms of word association, pp 53–94. Academic Press, New York
Google Scholar
Russell W.A. and Meseck O. (1959). Der Einfluss der Assoziation auf das Erinnern von Worten in der deutschen, französischen und englischen Sprache. Zeitschrift für Experimentelle und Angewandte Psychologie 6: 191–211
Google Scholar
Sahlgren, M. (2006). The Word-Space model: Using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces. Ph.D. Thesis, Stockholm University.
Schulte im Walde, S. (2000). Clustering verbs semantically according to their alternation behaviour. In Proceedings of the 18th International Conference on Computational Linguistics, Saarbrücken, Germany, pp. 747–753.
Schulte im Walde, S. (2003). Experiments on the automatic induction of German semantic verb classes. Ph.D. Thesis, Institut für Maschinelle Sprachverarbeitung, Universität Stuttgart. Published as AIMS Report 9(2).
Schulte im Walde, S. (2006a). Can human verb associations help identify salient features for semantic verb classification?. In Proceedings of the 10th Conference on Computational Natural Language Learning, New York City, NY, pp. 69–76.
Schulte im Walde S. (2006). Experiments on the automatic induction of German semantic verb classes. Computational Linguistics 32(2): 159–194
Article Google Scholar
Schulte im Walde, S. (2006c). Human verb associations as the basis for gold standard verb classes: Validation against GermaNet and FrameNet. In Proceedings of the 5th Conference on Language Resources and Evaluation, Genoa, Italy, pp. 825–830.
Schulte im Walde, S. (2008). The induction of verb frames and verb classes from corpora. In A. Lüdeling, & M. Kytö (Eds.). Corpus linguistics. An international handbook., Handbooks of Linguistics and Communication Science. Berlin: Mouton de Gruyter, Chapt. 61. To appear.
Schulte im Walde, S., & Melinger, A. (2005). Identifying semantic relations and functional properties of human verb associations. In Proceedings of the joint Conference on Human Language Technology and Empirial Methods in Natural Language Processing, Vancouver, Canada, pp. 612–619.
Schulte im Walde, S., & Melinger, A. (2008). An in-depth look into the co-occurrence distribution of semantic associates. Italian Journal of Linguistics. Special Issue on “From Context to Meaning: Distributional Models of the Lexicon in Linguistics and Cognitive Science”. To appear.
Siegel E.V. and McKeown K.R. (2000). Learning methods to combine linguistic indicators: Improving aspectual classification and revealing linguistic insights. Computational Linguistics 26(4): 595–628
Article Google Scholar
Spence D.P. and Owens K.C. (1990). Lexical co-occurrence and association strength. Journal of Psycholinguistic Research 19: 317–330
Article Google Scholar
Stevenson, S., & Joanis, E. (2003). Semi-supervised verb class discovery using noisy features. In Proceedings of the 7th Conference on Natural Language Learning, Edmonton, Canada, pp. 71–78.
Tanenhaus M.K., Leiman J.M. and Seidenberg M.S. (1979). Evidence for multiple stages in the processing of ambiguous words in syntactic contexts. Journal of Verbal Learning and Verbal Behavior 18: 427–440
Article Google Scholar
Wettler, M., & Rapp, R. (1993). Computation of word associations based on the co-occurrence of words in large corpora. In Proceedings of the Workshop on Very Large Corpora, Columbus, OH, pp. 84–93.

Download references

Author information

Authors and Affiliations

Institute for Natural Language Processing, University of Stuttgart, Azenbergstr. 12, Office 2/20, 70174, Stuttgart, Germany
Sabine Schulte im Walde

Authors

Sabine Schulte im Walde
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sabine Schulte im Walde.

About this article

Cite this article

Schulte im Walde, S. Human Associations and the Choice of Features for Semantic Verb Classification. Res on Lang and Comput 6, 79–111 (2008). https://doi.org/10.1007/s11168-008-9044-8

Download citation

Received: 11 January 2007
Accepted: 11 January 2008
Published: 07 February 2008
Issue Date: March 2008
DOI: https://doi.org/10.1007/s11168-008-9044-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Human Associations and the Choice of Features for Semantic Verb Classification

Abstract

Access this article

Similar content being viewed by others

A survey on semi-supervised learning

Semantic memory: A review of methods, models, and current challenges

The English Sublexical Toolkit: Methods for indexing sound–spelling consistency

References

Author information

Authors and Affiliations

Corresponding author

About this article

Cite this article

Keywords

Navigation

Abstract

Access this article

Similar content being viewed by others

A survey on semi-supervised learning

Semantic memory: A review of methods, models, and current challenges

The English Sublexical Toolkit: Methods for indexing sound–spelling consistency

References

Author information

Authors and Affiliations

Corresponding author

About this article

Cite this article

Share this article

Keywords

Search

Navigation