Association norms for German noun compounds and their constituents

Schulte im Walde, Sabine; Borgwaldt, Susanne R.

doi:10.3758/s13428-014-0539-y

Association norms for German noun compounds and their constituents

Published: 16 January 2015

Volume 47, pages 1199–1221, (2015)
Cite this article

Download PDF

Behavior Research Methods Aims and scope Submit manuscript

Association norms for German noun compounds and their constituents

Download PDF

Sabine Schulte im Walde¹ &
Susanne R. Borgwaldt²

2990 Accesses
7 Citations
Explore all metrics

Abstract

We present a collection of association norms for 246 German depictable compound nouns and their constituents, comprising 58,652 association tokens distributed over 26,004 stimulus–associate pair types. Analyses of the data revealed that participants mainly provided noun associates, followed by adjective and verb associates. In corpus analyses, co-occurrence values for compounds and their associates were below those for nouns in general and their associates. The semantic relations between compound stimuli and their associates were more often co-hyponymy and hypernymy and less often hyponymy than for associations to nouns in general. Finally, we found a moderate correlation between the overlap of the associations to compounds and their constituents and the degree of semantic transparency. These data represent a collection of associations to German compound nouns and their constituents that constitute a valuable resource concerning the lexical semantic properties of the compound stimuli and the semantic relations between the stimuli and their associates. More specifically, the norms can be used for stimulus selection, hypothesis testing, and further research on morphologically complex words. The norms are available in text format (utf-8 encoding) as supplemental materials.

Near-term advances in quantum natural language processing

Article 11 April 2024

In defense of an HPSG-based theory of non-constituent coordination: a reply to Kubota and Levine

Article 02 November 2019

Language and perception: Introduction to the Special Issue “Speakers and Listeners in the Visual World”

Article Open access 14 October 2019

In this article, we introduce a new data collection of association norms for German noun compounds and their constituents. Association norms have a long tradition in psycholinguistic research. They have been used for more than 30 years to investigate semantic memory, making use of the implicit notion that associates reflect meaning components of words. In experimental psychology, association norms have—for example—been extensively used to conduct studies of semantic priming to investigate (among other things) word recognition, knowledge representation and semantic processes (see McNamara, 2005, for a review of methods, issues, and findings).

We collected associates to German noun compounds because we believe that the associations are a valuable resource for cognitive and computational linguistics research. On the basis of an existing collection of German noun compounds (von der Heide & Borgwaldt, 2009), we therefore gathered associates for the compounds and also for their constituents (e.g., Ahornblatt/Ahorn/Blatt “maple leaf/maple/leaf’). The data were collected via the crowdsourcing interface Amazon Mechanical Turk (AMT). We perform detailed analyses of the collection, regarding the parts of speech of the associate responses, and the co-occurrence and syntactic patterns as well as the semantic relations between the stimulus–associate pairs. We also predicted the degree of semantic transparency of the compounds, as based on a simple association overlap. The analyses are compared to those of an earlier collection (Schulte im Walde, Borgwaldt, & Jauch, 2012), in which associates to a superset of our compound and constituent stimuli were gathered in a more controlled Web experiment.

The association norms can be used as a lexical semantic resource concerning the target stimuli—that is, the compound nouns and their constituents. The data should be relevant for research on the lexical semantic properties of the compound stimuli—for example, the semantic relations between the stimuli and their associates—and the degree of semantic relatedness between the compound stimuli and their constituents—that is, the degree of semantic transparency (alternatively compositionality).

In this article, we first provide an overview of association norms in general terms (Previous work on association norms) and introduce the German compound and constituent targets the norms rely on (Noun compounds), before we describe the collection and analyses of the noun compound association norms. The final part of the article summarizes and discusses the results.

Previous work on association norms

Collections of association norms

One of the first collections of word association norms was done by Palermo and Jenkins (1964), comprising associations for 200 words. The Edinburgh Association Thesaurus (Kiss, Armstrong, Milroy, & Piper, 1973) was a first attempt to collect association norms on a larger scale, and also to create a network of stimuli and associates, starting from a small set of stimuli derived from the Palermo and Jenkins norms. A similar motivation underlay the association norms from the University of South Florida (Nelson, McEvoy, & Schreiber, 2004), who developed a stimulus–associate network for more than 20 years, starting in 1973. Their goal was to obtain the largest database of free associations ever collected in the United States available to interested researchers and scholars. More than 6,000 participants produced nearly three-quarters of a million responses to 5,019 stimulus words. Smaller sets of association norms have also been collected for example for German (Melinger & Weber, 2006; Russell, 1970; Russell & Meseck, 1959; Schulte im Walde, Melinger, Roth, & Weber, 2008), Dutch (De Deyne & Storms, 2008; Lauteslager, Schaap, & Schievels, 1986), French (Ferrand & Alario, 1998), Spanish (Fernandez, Diez, Alonso, & Beato, 2004; Macizo, Gómez-Ariza, & Bajo, 2000), Portuguese (Comesaña, Fraga, Moreira, Frade, & Soares, 2014), and across languages (Kremer & Baroni, 2011), as well as for different populations of speakers, such as adults versus children (Hirsh & Tree, 2001; Macizo et al., 2000), and for words with various degrees of emotion (John, 1988) or homographs (French & Richards, 1992). Although some of the norms occasionally contain compounds, as far as we know, no specific collection has focused on associations for compounds and their constituents yet, other than our own previous collection for a superset of the compound and constituent stimuli (Schulte im Walde et al., 2012).

Analyses of association norms

In parallel to the interest in collecting association norms, researchers have analyzed association data in order to get insight into semantic memory. The following paragraphs provide an overview of these analyses, starting with theoretical considerations on relationships between stimuli and responses in association norms, and progressing toward analyses of collected norms.

Clark (1971) identified relations between stimulus words and their associates on a theoretical basis, not with respect to collected association norms. He categorized stimulus–associate relations into sub-categories of paradigmatic and syntagmatic relations, such as synonymy and antonymy, selectional preferences, and so forth. Heringer (1986) concentrated on syntagmatic associations to a small selection of 20 German verbs. He asked his subjects to provide question words as associates (e.g., wer “who,” warum “why”), and used the responses to investigate the valency behaviour of the verbs. Spence and Owens (1990) showed that associative strength and word co-occurrence are correlated. Their investigation was based on 47 pairs of semantically related concrete nouns, as taken from the Palermo and Jenkins norms, and co-occurrence counts in a window of 250 characters in the 1-million-word Brown corpus. Church and Hanks (1990) were the first to apply information-theoretic measures to corpus data in order to predict word associations. However, they did not rely on or evaluate their findings against existing association data, but rather concentrated on the usage of the measure for lexicographic purposes. Rapp (2002) combined research questions and methods from the above previous work: He developed corpus-based approaches to predict paradigmatic and syntagmatic associations, relying on the 100-million word BNC corpus. Concerning paradigmatic associations, he computed word association as the similarity of context vectors, applying the City block distance (also known as Manhattan distance, or L ₁ norm) as a similarity measure. A qualitative inspection revealed a strong overlap of very similar words with human associations, and applying the associations to solve the TOEFL test resulted in an accuracy of 69 %. Concerning syntagmatic associations, he demonstrated that the word with the strongest co-occurrence to a target word (and filtered by a log-likelihood test) corresponded to the first human association of the respective target word in 27 out of 100 cases. Rapp’s work used the Edinburgh Association Thesaurus as the association database. In addition to his above contributions, his article also provided an illustration of how strongly the co-occurrence distance between target stimuli and their associates was related to the respective numbers of responses to the stimuli in the association norms.

Work by Fellbaum and colleagues in the 1990s focused on human judgments concerning the semantic relationships between verbs. Fellbaum and Chaffin (1990) asked participants in an experiment to provide associates to verbs. Their work concentrated on verb–verb relations and therefore explicitly required verb responses to the verb stimuli. The resulting verb–verb pairs were manually classified into five predefined semantic relations. Fellbaum (1995) investigated the relatedness between antonymous verbs and nouns and their co-occurrence behavior. Within that work, she searched the Brown corpus for antonymous word pairs in the same sentence, and found that regardless of the syntactic category, antonyms occur in the same sentence with frequencies that are much higher than chance. Last but not least, the WordNet organization of the various parts-of-speech largely relies on psycholinguistic evidence (Fellbaum, 1998).

On the basis of the associates to German nouns and verbs that were collected by Schulte im Walde et al. (2008; cf. Collections of association norms above), they performed detailed analyses at the syntax–semantics interface. Guida (2007) replicated most of their analyses on verb association norms for Italian verbs. Roth and Schulte im Walde (2008) extended the co-occurrence analysis for noun associations in Schulte im Walde et al. (2008) and explored whether dictionary and encyclopedic information provided more world knowledge about associations than corpus co-occurrence. They found that the information in the three resource types complemented each other. Schulte im Walde and Melinger (2008) performed a more in-depth analysis of the co-occurrence distributions of the noun associations in Schulte im Walde et al. (2008). Schulte im Walde et al. (2012) performed a part-of-speech analysis on previously collected associates for compound nouns and their constituents.

From a more applied point of view in the field of computational linguistics, Melinger, Schulte im Walde, and Weber (2006) took the noun associations as input to a soft clustering approach, in order to predict noun ambiguity, and to discriminate the various noun senses of ambiguous stimulus nouns. Schulte im Walde (2008) relied on the associations to German verbs described by Schulte im Walde et al. (2008) to determine salient features for automatic semantic verb classification.

Noun compounds

Compounds are morphologically complex words, coined by two or more simple words. Our focus of interest is on German noun compounds (see Fleischer & Barz, 2012, for a detailed overview, and Klos, 2011, for a recent exploration), such as Ahornblatt “maple leaf,” Feuerwerk “fireworks,” Nähmaschine “sewing machine,” Obstkuchen “fruit cake,” and Rotkohl “red cabbage,” in which the grammatical head (in German, the rightmost constituent) is a noun and the modifier can belong to various parts of speech.

More specifically, we are interested in the degrees of semantic transparency of German noun compounds—that is, the relation between the meaning of the whole compound (e.g., butterfly) and the meaning of its parts (e.g., butter, fly)—which has been studied intensively by psycholinguists, in order to find out how compound words are cognitively processed and represented in the mental lexicon. There is ongoing debate about whether morphologically complex words are stored and processed as single units (Butterworth’s full listing approach (Butterworth, 1983)), whether they are decomposed into their morphemes (Taft, 2004; Taft & Forster, 1975) or can be accessed both ways—as whole forms and componentially, via their constituent morphemes (dual-route models; see, e.g., Baayen & Schreuder, 1999; Caramazza, Laudanna, & Romani, 1988)—and which variables predict processing behavior. The majority of studies in this area have investigated morphological decomposition during compound comprehension (but see, e.g., Lüttmann, Zwitserlood, Böhl, and Bölte, 2011, for evidence of morphological composition during compound production).

Factors that have been found to influence the cognitive processing and representation of compounds include orthographic variables like the number of letters (Bertram & Hyönä, 2003) or the presence of hyphens or interword spaces (Bertram, Kuperman, Baayen, & Hyönä, 2011), frequency-based measures as the frequencies of the compound and its constituents (e.g., Janssen, Bi, & Caramazza, 2008; van Jaarsveld & Rattink, 1988) and the morphological family size—that is, the number of compounds that share a constituent (de Jong, Feldman, Schreuder, Pastizzo, & Baayen, 2002), variables relating to morphological complexity, such as the number of morphemes or the existence of linking elements (Krott, Schreuder, Baayen, & Dressler, 2007), and semantic variables, such as the relationship between compound modifier and head—that is, a teapot is a pot FOR tea, and a snowball is a ball MADE OF snow (Gagné & Spalding, 2009).

With some researchers (e.g., Longtin, Segui, & Hallé, 2003; Marslen-Wilson, Tyler, Waksler, & Older, 1994) arguing that morphological decomposition happens only in semantically transparent polymorphemic words and not in semantically opaque ones, one variable that might be particularly important for the processing of compounds is their compositionality/semantic transparency. For example, studies by Sandra (1990) and Zwitserlood (1994) showed that the meanings of the constituents of semantically transparent compounds (e.g., dog and house in doghouse) were activated during processing, whereas the meanings of the constituents of opaque compounds (e.g., butter and fly in butterfly) were not activated.

Although interrater agreement about compounds’ perceived semantic transparency is generally rather high (Reddy, McCarthy, & Manandhar, 2011; Roller, Schulte im Walde, & Scheible, 2013), little research has investigated how the semantic transparency is assessed. One assumption is that the degree of a compound’s semantic transparency should also be reflected in its association patterns. If a compound is classified as opaque (e.g., butterfly), one would assume that the associations to the whole compound show less overlap with the associations to the components (butter, fly), than in the case of a transparent compound (e.g., doghouse), as during the processing of (partially) opaque compounds the opaque constituents might be less activated at the semantic level. This is what Libben’s (1994, 1998) “automatic progressive parsing and lexical excitation (APPLE) model of morphological parsing” predicts: It assumes that compound words are represented at three levels: a stimulus level, a lexical level (purely morphological) and a conceptual level (semantic). A (partially) opaque compound such as strawberry is decomposed at the lexical level into straw and berry. At the conceptual level, however, only the semantically transparent constituent berry is represented and can accordingly generate associations. This representation difference might, for example, explain the observed dissociation between constituent repetition priming effects (Libben, Gibson, Yoon, & Sandra, 2003; Zwitserlood, 1994) and semantic priming effects (Sandra, 1990; Zwitserlood, 1994) for opaque compounds.

In sum, our collection of association norms for compounds and their constituents is mainly aimed to be of use for researchers on compound processing. It provides a resource for stimulus selection in experimental studies and also allows researchers to examine properties of associations to compounds and their constituents in detail, in order to gain more insight into the way meanings of compounds are processed and/or represented in the mental lexicon.

Experiment

Associations are commonly obtained by presenting target stimuli to the participants in an experiment, who then provide associate responses—that is, words that are spontaneously called to mind by the stimulus words. The quantification of the resulting stimulus–associate pairs (i.e., how often a certain associate is provided for a certain stimulus) is called an association norm. In the following sections, we describe the collection of our associates to German noun compounds.

Method

Material

The target compounds and constituents were based on the selection of noun compounds by von der Heide and Borgwaldt (2009). They created a set of 450 concrete, depictable German noun compounds that they grouped into four transparency classes: compounds that are transparent with regard to both constituents (TT; e.g., Ahornblatt “maple leaf”), compounds that are opaque with regard to both constituents (OO; e.g., Löwenzahn “lion + tooth → dandelion”), compounds that are transparent with regard to the modifier but opaque with regard to the head (TO; e.g., Feuerzeug “fire + stuff → lighter”), and compounds that are opaque with regard to the modifier but transparent with regard to the head (OT; e.g., Fliegenpilz “fly + mushroom → toadstool”).^{Footnote 1} In total, the four classes contained 220 instances of TT, 126 instances of OT, 79 instances of TO, and 25 instances of OO.

The 450 noun compounds from von der Heide and Borgwaldt (2009) were categorized according to the morphological category of the modifier (AN, adjective–noun compound; NN, noun–noun compound; PN, preposition–noun compound; VN, verb–noun compound; MN, noun compound, in which the modifier is morphologically motivated by multiple classes; unique, noun compound with unique modifier). The categorization was performed by consensus decision of four computational linguists.

In our associate collection, we only used 246 of the 450 noun compounds from von der Heide and Borgwaldt (2009)—that is, 237 bimorphemic noun–noun compounds and nine noun compounds in which the modifier is unique, a so-called “cranberry morpheme” (such as him in Himbeere “him + berry → raspberry”), with no meaning by itself. Each compound had exactly two simple constituents. The compound set comprised 106 instances of TT, 37 instances of TO, 87 instances of OT, and 16 instances of OO. We restricted the target set because the subset of the two-part noun–noun compounds was most relevant to our research. Appendix A provides the complete list of our noun compounds.

In total, our materials comprised 571 targets. The total number of target stimuli was less than 3 × 246 because some of the compounds share constituents. We first divided the stimuli randomly into four separate packages, making sure that there were similar numbers of compounds in each of the four packages, and that both constituents of each compound were in the same package as the compound. In this way, constituents that are shared by several compounds might appear in more than one part, but we could collect the associates to the compounds and their constituents in the four packages independently of each other. Taking the multiple occurrences of some constituents into account, the four packages contained 173/169/173/172 targets, respectively. The four packages were then each randomly divided into eight parts containing 21–22 targets in a random order. To control for spammers and to identify non-native speakers of German, we also included three German fake compound nouns into each of the batches, in random positions of the lists. The list of fake nouns is (in alphabetical order): Analigzerbruch, Armmoder, Brattlider, Bulkerzagen, Engschogen, Fennhoder, Harmweg, Luderschwege, Malligwohmer, Pillinrugen, Quetpfluge, Tropebuhle, Wierzverkuhr, and Zogschucht.

Procedure

The experiment was performed via Amazon Mechanical Turk.^{Footnote 2} When an AMT worker chose one of our batches, the worker was presented with 24 or 25 noun compounds (including 21or 22 real and three fake compounds) in 24/25 HITs, respectively. The setup of the experiment is shown by Fig. 1 in Appendix B, with translations in red font. The actual collection of the associates was performed as is shown in Figs. 2 and 3: Each trial (represented by a HIT) provided a brief description of the experiment, an example item with potential responses, and a single target (one of the noun compounds, or one of the constituents, or a fake noun). Below the target were three data input lines where participants could type their associates. They were instructed to type at most one word per line and, following German grammar, to distinguish nouns from other parts of speech with capitalization. Below the three input lines was a box that participants were asked to check if they did not know the word. The 4 × 8 = 32 batches were completed within 1 to 26 days each, and the whole collection was finished within 3 months.

Participants

The participants were AMT workers. We asked for 25 native German speakers per target, and paid USD 0.02 per trial (i.e., for up to three associates). We only accepted participants who identified the fake nouns correctly and who in addition had an overall approval rate of at least 95 % (after the experiment was completed). These checks ensured that we received associates from native German speakers only. Over all trials, we had participants with 146 different worker IDs, who provided associates for 1 up to 683 stimuli.

Data

For each stimulus, we had between 2 and 120 participants. Because the participants could provide between zero and three associates per target, the actual number of associates per stimulus varied between 6 and 356. All but the unique constituents received ≥ 48 associates. In total, we collected 58,652 associates from 20,333 trials, an average of 2.88 associates per trial. The 58,652 associate tokens are distributed over 26,004 association types.

Previous experiment

This section reports on a previous experiment to collect associates to the compound nouns and their constituents from von der Heide and Borgwaldt (2009), as described by Schulte im Walde et al. (2012). This experiment was our first attempt to collect associates to compound nouns and their constituents, and was stopped after one year because the incoming data stagnated. The present experiment described in Experiment was then set up to continue the collection with a more specific focus on noun–noun compounds, still using identical collection instructions and procedures.

The reasons why we report on the earlier experiment are the following.

The previous associates were collected for a superset of the noun compounds that were used in the present study. The two collections can be exploited independently (taking into account that they were collected in different ways) or together (making use of a richer set of associations for the intersection of the noun–noun compounds and their constituents, as well as exploiting an overall larger stimulus set).
Moreover, we were interested in whether we could find differences between the two sets of association norms, since they were collected in different ways.

Method

Material

We used 442 compound nouns and constituents from the original selection by von der Heide and Borgwaldt (2009). In total, our materials comprised 996 targets—that is, 442 compounds and 554 constituents. The stimuli were divided randomly into 12 separate experimental lists of 83 nouns each.

Procedure

The experiment was administered over the Internet and announced by e-mails to colleagues and friends. Participants were first provided a brief description of the experiment and asked for their biographical information, such as linguistic expertise, age and regional dialect (see Fig. 4, with a translated version in Fig. 5). Next, the participant was presented with the written instructions for the experiment and an example item with potential responses. In the actual experiment, one of the 12 experimental lists was chosen randomly. Each of the 83 trials in the experimental list consisted of a single word presented in a box at the top of the screen. The word was either one of the noun compounds or one of the constituents. If a compound constituent was not a noun, the base form was nominalized by starting it with a capital letter. For example, the verbal modifier fahren in Fahrplan “to ride + schedule → time table”) was represented as Fahren, the adjectival modifier blau in Blaubeere “blue + berry → blueberry”) was represented as Blau. The experiment ran for approximately 1 year.

The order of the target words was random for each data set and each participant. Below the target were three data input lines where participants could type their associates. They were instructed to type at most one word per line and, following German grammar, to distinguish nouns from other parts of speech with capitalization. Below the three input lines was a box that participants were asked to check if they did not know the word.

Participants

A group of 268 participants took part in the experiment. Of these, 225 claimed that their L1 was German; for 19 of them, German was not their L1; one of the participants claimed to have grown up bilingual; and 23 did not provide information about their L1. One hundred twenty four of the individuals identified themselves as having a linguistic background, and 112 rated themselves as linguistic novices; 32 participants did not provide information about their linguistic background.

Data

For each experimental list, we had between 14 and 28 participants. Because the participants could provide between zero and three associates per target, the number of participants per stimulus varied between 10 and 36, and the number of associates per stimulus varied between 6 and 74. In total, we collected 47,249 associates from 17,128 trials, an average of 2.76 associates per trial. The 47,249 associate tokens were distributed over 28,238 association types. In 861 trials, the participants did not provide any associate, out of which 327 targets were explicitly checked as not being known by the participants.

Overall results and analyses

Quantifying overall responses in the two experiments resulted in a total of 47,523/106,693 stimulus–associate types/tokens. Table 1 summarizes the numbers of association norms with regard to the present study, the previous study, and the union of the two. We distinguish between associations to all noun compound stimuli and their constituents (as in the previous experiment) and associations to the subset of 246 noun–noun compounds and their constituents (which were used in the present experiment and were already part of the previous experiment). Even though our previous experiment ran for approximately 1 year and the present experiment for only three months, we collected more associates in the present experiment.

Table 1 Numbers of association norms

Full size table

Tables 2, 3, and 4 provide examples of the associates to three compounds and their constituents, in each case listing the ten strongest (i.e., most frequently provided) associates. We selected three examples rather than a single one, to observe (i) the effect of the degree of transparency of the compound on the associates to the compounds versus the constituents, as well as (ii) the effect of monosemous versus polysemous stimuli with regard to the semantics of the associates. Note that Fliegenpilz is less transparent than Ahornblatt (at least with respect to its modifier), so that the associates of the compound Fliegenpilz and the modifier Fliege differ more strongly. Note also that two of the nouns are polysemous: Fliege and Blatt. For both nouns, we collected associates to two senses: In the case of Blatt, associates were given to the plant sense “leaf” as well as to the paper sense “sheet (of paper)”; Fliege evoked associates to the animal sense “fly” as well as to the clothes sense “bowtie.”

Table 2 Most frequent responses to the compound Ahornblatt “maple leaf” and its constituents

Full size table

Table 3 Most frequent responses to the compound Fliegenpilz “toadstool” and its constituents

Full size table

Table 4 Most frequent responses to the compound Schlittenhund “sledge dog” and its constituents

Full size table

In the following subsections, we present a series of analyses of the association norms with regard to the following stimulus properties and stimulus–associate relations:

a morpho-syntactic analysis, looking into the parts of speech of the associates,
a distributional analysis, looking into the co-occurrence of stimuli and associates,
a syntactic analysis, looking into the dependency paths between stimuli and associates, and
a semantic relation analysis, looking into the stimulus–associate semantic relations.

In all of the analyses, we will pay attention to the effects of our specific set of stimuli, compounds and their constituents, and explore which syntactic and semantic properties are specific to these stimuli. We conclude our analyses with a study regarding the compounds’ semantic transparency by investigating the correlation between association overlap (of compounds and constituents) and an existing set of compound–constituent ratings.

Morpho-syntactic analysis

In the morpho-syntactic analysis, each response to the stimuli was assigned its—possibly ambiguous—part of speech (pos). The results provide insight into the relevance of predominant part-of-speech categories. Similar morpho-syntactic analyses have been performed by Guida (2007), Schulte im Walde et al. (2008), and Schulte im Walde et al. (2012).

As a resource for the pos assignment, we relied on the lemmatized and pos-tagged frequency list from the SdeWaC corpus (Faaß & Eckart, 2013), a cleaned version of the German Web corpus deWaC created by the WaCky group (Baroni, Bernardini, Ferraresi, & Zanchetta, 2009). The SdeWaC contains approximately 880 million word tokens. We disregarded fine-grained distinctions such as case, number, and gender features, and considered only the major categories verb (V), noun (N), and adjective (ADJ). A fourth category, “OTHER,” comprises all other part-of-speech categories, such as adverbs, prepositions, particles, interjections, conjunctions, and so forth. Ambiguities between the categories arose, for instance, when the experiment participant could have been referring to either an adjective or a (noncapitalized) noun, such as fett “fat.”^{Footnote 3}

Having assigned part-of-speech tags to the associates, we were able to distinguish and quantify the morpho-syntactic categories of the associates. In nonambiguous situations, the unique part of speech received the total stimulus–associate strength. For example, Herd “cooker” was provided as an associate of the stimulus Pfanne “pan” by 11 participants. Our pos resource contained Herd in the corpus only as a noun. So Pfanne received a contribution of all 11 mentions for a noun pos. In ambiguous situations, the stimulus–associate frequency was split over the possible part-of-speech tags according to the pos proportions in the frequency list. For example, fett “fat” was provided as an associate of the stimulus Pfanne by three participants. Our pos resource contained Fett 10,780 times in the corpus as a noun, and 493 times as an adjective. So, with regard to Fett, Pfanne received a contribution of 3 × 10,780/(10,780 +493) = 2.87 nouns, and of 3 × 493/(10,780 +493) = 0.13 adjectives.

The output of this analysis was frequency distributions of the part-of-speech tags for each stimulus individually, and also as a sum over all stimuli. Table 5 presents the total numbers over all associate response tokens, across the two experiments. Overall, the participants provided noun associates for the clear majority of token instances, 69 %–73 %; adjectives were given for 12 %–16 % of the associates, and verbs for 11 %–13 %. There are slight differences across the two experiments and when considering all compounds versus only the noun–noun compounds, but overall the proportions are very similar in this table and in comparison to the same analyses in previous work regarding associations to noun stimuli (Schulte im Walde et al., 2008). This is the case even if we only consider the associations to the compound stimuli.

Table 5 Numbers and percentages of associate parts of speech

Full size table

Co-occurrence analysis

In this analysis, we examined whether the co-occurrence assumption held for our association norms—that is, what percentage of the associates were found in co-occurrence with the stimulus words in a corpus. The co-occurrence hypothesis assumes that associations are related to the textual co-occurrence of the stimulus–associate pairs. The hypothesis has been confirmed in many previous studies, among them Miller (1969), Spence and Owens (1990), Fellbaum (1995), Schulte im Walde and Melinger (2008), Schulte im Walde et al. (2008), and Schulte im Walde and Müller (2013).

As association norms, we used all of the experiment data, and our study again relied on the SdeWaC corpus. This time, no parsing information was required, since we only checked how often the associates co-occurred within windows of 5 and 20 words (to the left and to the right).^{Footnote 4} The analysis was again token-based. Table 6 presents the results. The columns show the proportions of pairs that co-occur with a co-occurrence strength of 1 (i.e., at least once), 2 (i.e., at least twice), and so forth. The rows distinguish between windows of sizes 5 and 20. For example, the associate Wald “forest,” which was provided by 32 participants for the stimulus Pilz “mushroom,” was found 76 times in a window of five words from the stimulus. The 32 tokens thus appear in all columns. In contrast, the associate Husky “husky,” which was provided by 21 participants for the stimulus Schlittenhund “sledge dog,” was found ten times in a window of five words from the stimulus. The ten tokens thus appear in the columns with co-occurrence strengths ≥ 1, 2, 5, and 10.

Table 6 Corpus co-occurrence of stimulus–associate pairs

Full size table

For each co-occurrence range, we present two lines, one for all stimulus–associate pairs, and the second when restricting the stimuli to the compounds. Note that the all-stimuli condition was restricted to the 246 noun–noun compounds and their nominal constituents (i.e., disregarding nonnominal stimuli), so that we could compare the co-occurrence analyses to previous work on nominal stimuli and thus compare the co-occurrence information for (i) noun–noun compounds and their nominal constituents (all stimuli), (ii) the nominal compound stimuli only, and (iii) nouns in general (cf. Schulte im Walde et al., 2008).

We can see that the co-occurrence assumption clearly holds for our norms: 73 % of the associates appeared within a five-word window of the respective stimuli at least once, and more than half (53 %) of the associates appeared at least five times in a five-word window of the respective stimuli. However, we also notice that the co-occurrence values are below those for German nouns and their associates in general: As compared to earlier studies (Schulte im Walde et al., 2008), in which we performed the co-occurrence check on the associates of 409 depictable German nouns (only including a few compound nouns) and found 84 % of the stimulus–associate pairs in a 20-word window at least once, in the present analysis we found a value of only 79 % (and with regard to compound–associate pairs, only 67 %). The difference is stronger than it first appears, because the earlier analyses were performed on a 200-million word German newspaper corpus from the 1990s (so the co-occurrence source was much smaller than the 880-million word corpus SdeWaC). In order to make the two analyses more comparable, we repeated our co-occurrence check for the compound norms on the 200-million word corpus. For the general noun norm data in Schulte im Walde et al. (2008), we found 73 %/84 % of the stimulus–associate pairs at least once in windows 5 and 20, whereas we found only 57 %/68 % of the stimulus–associate pairs in the present analysis (taking compound and constituent stimuli into account), and with regard to compound–associate pairs we found only 38 %/53 %.^{Footnote 5} This comparison clearly shows that although the co-occurrence assumption holds for our norms, the corpora offer less information on compounds than on simple nouns.

Syntactic dependency paths

This analysis went beyond pure co-occurrence and investigates the syntactic dependency paths between the stimuli and their associates. The analysis would tell us whether any syntactic relationships were specifically strong as triggers for associates. For example, we do not know a priori whether responses refer to associates that are in a modifier relationship with the stimuli (such as grün “green” for Salat “salad”) or in conjunction (such as Katze “cat” for Hund “dog”), and so forth. To obtain insight into which syntactic dependencies might trigger associates to our compound and constituent stimuli, we checked, for all stimulus–associate pairs, whether and which dependency paths exist between the stimuli and the associates.

The analysis relied on dependency parses of the SdeWaC corpus (see above) by Bohnet’s MATE dependency parser (Bohnet, 2010). For each occurrence of a stimulus in the parsed corpus, we first checked for each associate whether it appeared in the same sentence. In the positive case, we determined the shortest dependency path between the stimulus and the associate. The analysis was again token-based: That is, we took the strength between the stimulus–associate pairs into account—how often the associate was provided for a stimulus.

Table 7 presents the ten most frequent paths that exist between the stimuli and the associates, accompanied by a stimulus–associate pair example and by the total frequency, indicating how often the paths appear over all stimulus–associate pairs. The paths start with the part of speech of the stimulus and end with the part of speech of the associate response. The tag “NN” refers to common nouns; “ADJA” to attributive adjectives, “APPR(ART)” to (portmanteau) prepositions, “KON” to conjunctions, and any part of speech starting with “VV” to a main verb.

Table 7 Most frequent dependency paths between stimuli and associates

Full size table

The semantics of the paths (at least of those that are similarly short as those in the examples) is quite obvious. For example, “NN/KON/NN” refers to the stimulus and the associate appearing as two conjuncts (such as Hund “dog” and Katze “cat”); “NN/ADJA” to an attributive adjective that directly modifies the noun stimulus (such as grün “green” for Salat “salad”), “NN/APPR/NN” to a syntactic construction in which either the stimulus or the associate depends on a preposition that itself depends on the other noun (such as Hütte/aus/Holz “shed/made from/wood”). One could specify the paths in more detail (e.g., by adding the dependency direction or by specifying the prepositional heads), or generalize more about some aspects (e.g., by summarizing over the various forms of the verbs). However, the present version seems appropriate for a first impression of the syntactic dependencies between the stimuli and their associates.

Table 8 lists the five strongest examples for the five most prominent syntactic paths.^{Footnote 6} The path strength is that portion of the association strength for which the respective path between the stimulus and the associate was found in the parsed corpus (since there might have been other paths between the stimulus–associate pair). For example, 74.39 % of the dependency paths between Tasse “cup” and Kaffee “coffee” were NN/NN, so given an association strength of 36, the path strength is 36 × .7439 = 26.78.

Table 8 Examples of stimulus–associate pairs and their dependency paths

Full size table

In order to look into the path instances, we checked on the actual corpus parses of the sentences in which stimuli and associates appeared with the given paths. In most cases, our initial intuitions about the semantics of the paths were confirmed. For example, the NN/ADJA cases all refer to adjectival noun modifiers; the NN/KON/NN cases refer to conjuncts (often representing collocations) such as Pfeil und Bogen “bow and arrow,” Hund und Katze “dog and cat,” and Kaffee und Kuchen “coffee and cake”; and the NN/VVFIN cases refer to nominal complements of verbs (such as subjects, as in Hund-bellen “dog barks” and Telefon-klingeln “phone rings,” or direct objects, as in Brief-schreiben “write letter”). The NN/NN cases, however, refer to measure constructions such as Tasse Tee/Kaffee “a cup of tea/coffee,” or genitive constructions such as Dach eines Hauses “roof of a house,” which we had not predicted in advance. Looking into actual sentences also helps to specify the NN/APPR/NN cases, by investigating the prepositional heads. For example, there are instances of Zeit zwischen . . . Uhr und . . . Uhr “time between . . . o’clock and . . . o’clock,” and Zeit von . . . Uhr bis . . . Uhr “time from . . . o’clock until . . . o’clock”; Thermoskanne mit Tee “thermos with tea,” and Tee aus der Thermoskanne “tea from the thermos”; Lampe auf dem Nachttisch “lamp on the bed table”; Hütte aus Holz “hut made out of wood”; Wasser auf die Mühle “water onto the mill” (note that the latter example is part of a collocation, meaning “grist to the mill,” encouragement). In the NN/APPR/NN cases, a refinement of the paths with prepositions types would obviously be a useful extension.

In sum, the dependency paths provide interesting and diverse insights into salient syntactic dependencies between the stimuli and the associates. To our knowledge, none of the previous related work had explored the syntactic paths in association norms. Rather, Schulte im Walde et al. (2008) explored parse tuples for noun–verb pairs to identify potential syntactic dependencies between stimulus nouns and associate verbs, and between stimulus verbs and associate nouns. The present analysis provides a broader spectrum of syntactic information, such that not only noun–verb pairs are involved. Although the above information comprises both compound and constituent stimuli, the strongest syntactic paths for only compounds strongly overlap with this larger set.

Semantic relations

The final analysis concerned types of semantic relationships (such as synonymy) that hold between the stimulus words and the associate responses. This analysis provides insight into which semantic relations the experiment’s participants might have had in mind when they provided associates. We relied on the German WordNet, GermaNet (Kunze, 2000), to explore the semantic information, and used GermaNet version 6.0, which was the latest version when the study started, comprising 69,594 synsets and 93,407 lexical units. We considered the paradigmatic relations synonymy, antonymy, hypernymy, hyponymy, and co-hyponymy. The synonymy information was based on the 69,594 synsets; in addition, GermaNet 6.0 contains a total of 74,945 hypernymy relations, and 1,587 antonymy relations.

The analysis was again token-based—that is, incorporating the strengths between stimuli and their associates. So, if a relationship between a stimulus and an associate was found, it was instantiated by the strength between them. For example, if the associate Obst “fruit” was provided 12 times for the stimulus Apfel “apple,” the strength of the semantic relation hypernymy that combined the two lexemes in GermaNet was instantiated by 12. If another hypernym of Apfel (such as Frucht “fruit”) was found among the associates, the stimulus–associate strength was added to the existing hypernym strength. So, on the one hand, we determined the strengths between individual stimulus–associate pairs, and on the other hand, we could summarize the overall associations for a certain stimulus (and also for all stimuli), in order to identify the strengths of the various semantic relations with regard to a specific stimulus, and also over all stimuli. Table 9 presents the 20 strongest semantic relationships between individual compound stimuli and their associates.

Table 9 Examples of semantic relations between stimulus–associate pairs

Full size table

In total, we found semantic relation information for 426 of our 571 stimuli (75 %). If we only look at the compound stimuli, the proportion is much lower, at 59 %, covering associations to 145 out of the 246 compounds. It is very impressive, however, that all of the stimuli are included in GermaNet. That is, GermaNet covers not only the simple nouns in our data set, but also all of the compounds. The missing proportions are due to nonexistent relationships between the stimuli and the associates; for this, the coverage was lower for the compounds than for the whole set of stimuli, which is not surprising.

Table 10 shows the proportions of GermaNet relations that were identified for the stimulus–associate pairs. The numbers in columns 2 and 3 correlate with the total proportions of semantic relation instances that are coded in GermaNet, so there are no specifically strong semantic relations among the stimulus–associate pairs. Looking at the associations to the compound stimuli in columns 4 and 5, however, demonstrates that with regard to the compound stimuli, it is more intuitive, and thus easier, to generate hypernyms and co-hyponyms than hyponyms, since the compounds are often very specific. There are only 26 hyponyms to the compound stimuli in total, with the most prominent examples being Taschentuch–Tempo and Kreditkarte–VISA, which appear among the top 20 pairs in Table 9. In sum, we found an interesting difference in the semantic relations of the stimulus–associate pairs when looking at the compounds in comparison to all stimulus nouns.

Table 10 Percentages of semantic relations among stimulus–associate pairs according to GermaNet

Full size table

Associations and semantic transparency

Our main motivation to collect the association norms for the German noun compounds and their constituents was that the associations provide insight into the semantic properties of the compounds and their constituents, and should therefore represent a valuable resource for cognitive and computational linguistics research on semantic transparency, and for lexical semantics in general. More specifically, we are interested in the degree of semantic transparency of the compound nouns with regard to their constituents, and are currently investigating whether the degree of overlap of associations is indeed an indicator of the semantic relatedness between compounds and their constituents. The examples in Tables 2, 3, and 4 give a first impression that this might indeed be the case, and in the following discussion we will provide more evidence for our hypothesis.

We applied a measure suggested by Schulte im Walde et al. (2012) that relies on a simple association overlap to predict the degree of semantic transparency of the experimental compound nouns: That is, we used the proportion of shared associations of a compound and its constituent with respect to the total number of associations of the compound. The degree of semantic transparency of a compound noun was calculated with respect to each constituent of the compound. As an example of the calculation, when considering the ten most frequent responses of the compound noun Ahornblatt “maple leaf” and its constituents, provided in Table 2, the compound noun received a total of 99 associate tokens, out of which it shared 87 with the first constituent Ahorn “maple,” and 52 with the second constituent Blatt “leaf.” Thus, the predicted degrees of semantic transparency were 87/99 = .88 for Ahornblatt–Ahorn, and 52/99 = .53 for Ahornblatt–Blatt. These predicted degrees of semantic transparency were compared against the mean transparency judgments collected by von der Heide and Borgwaldt (2009),^{Footnote 7} using the Spearman rank-order correlation coefficient ρ. Our hypothesis was that the larger the overlap of the associations of a compound and a constituent, the stronger would be the degree of compound–constituent transparency.

The resulting correlation values are ρ = .6227 (taking both constituent types into account), ρ = .6128 for the compound–modifier pairs, and ρ = .6547 for the compound–head pairs. The simple association overlap measure therefore exceeds moderate correlation values, going up to .6547. Looking at the two experiments separately, the results are slightly better for the AMT norms than for the standard Web data, and thus confirm the semantic usefulness of both experimental setups. In addition, the overlap measure is stronger for heads than for modifiers.

This case study shows that the associations provide insight into the semantic properties of the compounds (and their constituents) that should be useful for models of compounds’ semantic transparency. Specifically, the data seem to indicate that associations to compound nouns comprise associations to both the head and modifier; that is, they reflect the meaning components of both constituents. Since the overlap between constituents and compounds correlates with transparency ratings, we can conclude that associations to compound nouns include fewer associations to their opaque than to their transparent constituents. That is, for a transparent compound like Schlittenhund “sledge dog,” we would expect to find associations to both components Schlitten “sledge” and Hund “dog,” whereas in a partially opaque compound like Fliegenpilz “toadstool,” the associations might be more related to the transparent constituent Pilz “mushroom” than to the opaque constituent Fliege “fly/bowtie.” This result is in accordance with Libben’s APPLE model (1994, 1998), which assumes that opaque constituents are not conceptually activated during processing.

Conclusion

In the present study, we have presented a collection of association norms for German noun compounds and their constituents. The collection was conducted via crowdsourcing and was compared to an earlier Web experiment. The norms are aimed to be of use for researchers on compound processing, containing association norms not only for compounds but also for their constituents, in order to provide a resource for stimulus selection in experimental studies (e.g., Sandra, 1990). Second, by examining properties of the associations for compounds and their constituents in detail, our data could provide more insight into the way in which the meanings of compounds are being processed and/or represented in the mental lexicon—that is, they could complement the results of online experiments.

In total, the new association norms for 246 noun–noun compounds and their constituents comprise 58,652 associates (tokens), distributed over 26,004 stimulus–associate pairs (types). The quantitative contribution of the crowdsourcing experiment is slightly larger than that of the previous Web experiment (58,652 vs. 47,249 stimulus–associate tokens), even though it ran for only a quarter of the time of the previous study.

We analyzed the stimulus–associate pairs of both the present and the previous experiments in four ways, performing (1) a morpho-syntactic analysis, (2) a distributional analysis, (3) a syntactic analysis, and (4) a semantic relation analysis. Concerning the first of these, a part-of-speech analysis of the associate responses, and the third, a dependency path analysis of the stimulus–associate pairs, there were no noticeable differences when comparing the associations of the compounds and the associations of nouns in general. Concerning the second analysis, the window co-occurrence of stimulus–associate pairs, and the fourth, the semantic relations between stimuli and associates, we demonstrated:

Although we could confirm the co-occurrence hypothesis for our compound nouns, the corpus coverage pointed to the caveat that compounds are typically covered worse by corpus data than are simple nouns. This demonstrates the importance of a reasonable morphological annotation of compounds, and that we have considerably less corpus information for compound nouns than for nouns in general.
The semantic relations between associates and compound stimuli differ from those between associates and nouns in general. Not surprisingly, we found fewer hyponyms among the associates, but more hypernyms and co-hyponyms.

Finally, the overlap between associations to compounds and to their constituents correlated moderately with semantic transparency ratings obtained from human raters (von der Heide & Borgwaldt, 2009), providing additional insight into how the meanings of complex words are related to the meanings of their parts. In summary, the associations could in general represent a valuable resource concerning the lexical semantic properties of the compound stimuli and the semantic relations between the stimuli and their associations, and—more specifically—they could be useful in future cognitive and computational linguistic research on semantic transparency.

Notes

The original classification into TT, TO, OT, and OO was performed by the two authors (Claudia von der Heide and Susanne Borgwaldt). Transparency ratings were collected in a follow-up step, using a 1–7 scale.
www.mturk.com
Despite our instructions, some participants failed to use capitalization, leading to ambiguity.
Note that the sentences in the SdeWaC corpus are sorted alphabetically, so the window co-occurrence refers to 5/20 words to the left and right BUT within the same sentence.
Schulte im Walde and Müller (2013) performed a detailed comparison of stimulus–associate corpus co-occurrences across corpora, varying the corpus sizes and corpus domains.
Note that for ambiguous stimuli and associates, only the translation that refers to the sense in the respective context is provided.
Von der Heide and Borgwaldt (2009) collected transparency ratings for their 450 compounds: 30 native German speakers were asked to rate the semantic transparency of the compounds with respect to each of their constituents, on a scale from 1 (opaque) to 7 (transparent). For more details on the ratings, see Schulte im Walde, Müller, and Roller (2013).

References

Baayen, R. H., & Schreuder, R. (1999). War and peace: Morphemes and full forms in a noninteractive activation parallel dual-route model. Brain and Language, 68, 27–32.
Article PubMed Google Scholar
Baroni, M., Bernardini, S., Ferraresi, A., & Zanchetta, E. (2009). The WaCky Wide Web: A collection of very large linguistically processed web-crawled corpora. Language Resources and Evaluation, 43, 209–226.
Article Google Scholar
Bertram, R., & Hyönä, J. (2003). The length of a complex word modifies the role of morphological structure: Evidence from reading short and long Finnish compounds. Journal of Memory and Language, 48, 615–634.
Article Google Scholar
Bertram, R., Kuperman, V., Baayen, R. H., & Hyönä, J. (2011). The hyphen as a segmentation cue in triconstituent compound processing: It’s getting better all the time. Scandinavian Journal of Psychology, 52, 530–544.
Article PubMed Google Scholar
Bohnet, B. (2010). Top accuracy and fast dependency parsing is not a contradiction. In Proceedings of the 23rd International Conference on Computational Linguistics (pp. 89–97). Stroudsburg, PA: Association for Computational Linguistics.
Google Scholar
Butterworth, B. (1983). Lexical representation. In B. Butterworth (Ed.), Language production (Vol. 2, pp. 257–294). New York, NY: Academic Press.
Google Scholar
Caramazza, A., Laudanna, A., & Romani, C. (1988). Lexical access and inflectional morphology. Cognition, 28, 297–332.
Article PubMed Google Scholar
Church, K., & Hanks, P. (1990). Word association norms, mutual information, and lexicography. Computational Linguistics, 16, 22–29.
Google Scholar
Clark, H. (1971). Word associations and linguistic theory. In J. Lyons (Ed.), New horizons in linguistics (pp. 271–286). Baltimore, MD: Penguin.
Google Scholar
Comesaña, M., Fraga, I., Moreira, A. J., Frade, C. S., & Soares, A. P. (2014). Free associate norms for 139 European Portuguese words for children from different age groups. Behavior Research Methods, 46, 564–574. doi:10.3758/s13428-013-0388-0
Article PubMed Google Scholar
De Deyne, S., & Storms, G. (2008). Word associations: Norms for 1,424 Dutch words in a continuous task. Behavior Research Methods, 40, 198–205. doi:10.3758/BRM.40.1.198
Article PubMed Google Scholar
de Jong, N. H., Feldman, L. B., Schreuder, R., Pastizzo, M., & Baayen, R. H. (2002). The processing and representation of Dutch and English compounds: Peripheral morphological and central orthographic effects. Brain and Language, 81, 555–567. doi:10.1006/brln.2001.2547
Article PubMed Google Scholar
Faaß, G., & Eckart, K. (2013). SdeWaC—A corpus of parsable sentences from the web. In Proceedings of the International Conference of the German Society for Computational Linguistics and Language Technology (pp. 61–68). Darmstadt, Germany: Gesellschaft für Sprachtechnologie und Computerlinguistik.
Google Scholar
Fellbaum, C. (1995). Co-occurrence and antonymy. Lexicography, 8, 281–303.
Article Google Scholar
Fellbaum, C. (1998). WordNet—An electronic lexical database. Cambridge, MA: MIT Press.
Google Scholar
Fellbaum, C., & Chaffin, R. (1990). Some principles of the organization of verbs in the mental lexicon. In Proceedings of the 12th Annual Conference of the Cognitive Science Society of America (pp. 420–427). Stroudsburg, PA: Association for Computational Linguistics.
Google Scholar
Fernandez, A., Diez, E., Alonso, M. A., & Beato, M. S. (2004). Free-association norms for the Spanish names of the Snodgrass and Vanderwart pictures. Behavior Research Methods, Instruments, & Computers, 36, 577–584. doi:10.3758/BF03195604
Article Google Scholar
Ferrand, L., & Alario, F.-X. (1998). Normes d’associations verbales pour 366 noms d’objets concrets. L'Année Psychologique, 98, 659–709. doi:10.3406/psy.1998.28564
Article Google Scholar
Fleischer, W., & Barz, I. (2012). Wortbildung der deutschen Gegenwartssprache (4th ed.). Berlin, Germany: de Gruyter.
Book Google Scholar
French, C., & Richards, A. (1992). Word association norms for a set of threat/neutral homographs. Cognition and Emotion, 6, 65–87.
Article Google Scholar
Gagné, C., & Spalding, T. (2009). Constituent integration during the processing of compound words: Does it involve the use of relational structures? Journal of Memory and Language, 60, 20–35.
Article Google Scholar
Guida, A. (2007). The representation of verb meaning within lexical semantic memory: Evidence from word associations (Unpublished master’s thesis). Università degli studi di Pisa, Pisa, Italy.
Heringer, H. J. (1986). The verb and its semantic power: Association as the basis for valence. Journal of Semantics, 4, 79–99.
Article Google Scholar
Hirsh, K., & Tree, J. (2001). Word association norms for two cohorts of British adults. Journal of Neurolinguistics, 14, 1–44.
Article Google Scholar
Janssen, N., Bi, Y., & Caramazza, A. (2008). A tale of two frequencies: Determining the speed of lexical access for Mandarin Chinese and English compounds. Language and Cognitive Processes, 23, 1191–1223.
Article Google Scholar
Jauch, R. (2012). Empirische Analysen von Assoziationen und distributionelle Modellierung der Kompositionalität von deutschen Nomen-Komposita (Unpublished M. Sc. Thesis). Institut für Maschinelle Sprachverarbeitung, Universität Stuttgart.
John, C. H. (1988). Emotionality ratings and free-association norms of 240 emotional and non-emotional words. Cognition and Emotion, 2, 49–70.
Article Google Scholar
Kiss, G., Armstrong, C., Milroy, R., & Piper, J. (1973). An associative thesaurus of English and its computer analysis. In A. Aitken, R. Bailey, & N. Hamilton-Smith (Eds.), The computer and literary studies (pp. 153–165). Edinburgh, UK: Edinburgh University Press.
Google Scholar
Klos, V. (2011). Komposition und Kompositionalität. Berlin, Germany: Walter de Gruyter.
Book Google Scholar
Kremer, G., & Baroni, M. (2011). A set of semantic norms for German and Italian. Behavior Research Methods, 43, 97–109. doi:10.3758/s13428-010-0028-x
Article PubMed Google Scholar
Krott, A., Schreuder, R., Baayen, R. H., & Dressler, W. (2007). Analogical effects on linking elements in German compound words. Language and Cognitive Processes, 22, 25–57.
Article Google Scholar
Kunze, C. (2000). Extension and use of GermaNet, a lexical-semantic database. In Proceedings of the 2nd International Conference on Language Resources and Evaluation (pp. 999–1002). Stroudsburg, PA: Association for Computational Linguistics.
Google Scholar
Lauteslager, M., Schaap, T., & Schievels, D. (1986). Schriftelijke Woordassociatienormen voor 549 Nederlandse Zelfstandige Naamworden. Lisse, The Netherlands: Swets & Zeitlinger.
Google Scholar
Libben, G. (1994). How is morphological decomposition achieved? Language and Cognitive Processes, 9, 369–391.
Article Google Scholar
Libben, G. (1998). Semantic transparency in the processing of compounds: Consequences for representation, processing, and impairment. Brain and Language, 61, 30–44.
Article PubMed Google Scholar
Libben, G., Gibson, M., Yoon, Y. B., & Sandra, D. (2003). Compound fracture: The role of semantic transparency and morphological headedness. Brain and Language, 84, 50–64. doi:10.1016/S0093-934X(02)00520-5
Article PubMed Google Scholar
Longtin, C.-M., Segui, J., & Hallé, P. (2003). Morphological priming without morphological relationship. Language and Cognitive Processes, 18, 313–334. doi:10.1080/01690960244000036
Article Google Scholar
Lüttmann, H., Zwitserlood, P., Böhl, A., & Bölte, J. (2011). Evidence for morphological composition at the form level in speech production. Journal of Cognitive Psychology, 23, 818–836.
Article Google Scholar
Macizo, P., Gómez-Ariza, C., & Bajo, M. (2000). Associative norms of 58 Spanish words for children from 8 to 13 years old. Psicológica, 21, 287–300.
Google Scholar
Marslen-Wilson, W., Tyler, L. K., Waksler, R., & Older, L. (1994). Morphology and meaning in the English mental lexicon. Psychological Review, 101, 3–33. doi:10.1037/0033-295X.101.1.3
Article Google Scholar
McNamara, T. (2005). Semantic priming: Perspectives from memory and word recognition. New York, NY: Psychology Press.
Book Google Scholar
Melinger, A., Schulte im Walde, S., & Weber, A. (2006a). Characterizing response types and revealing noun ambiguity in German association norms. In Proceedings of the EACL Workshop “Making Sense of Sense: Bringing Computational Linguistics and Psycholinguistics Together” (pp. 41–48). Stroudsburg, PA: Association for Computational Linguistics.
Google Scholar
Melinger, A., & Weber, A. (2006). Database of noun associations for German. Retrieved from www.coli.uni-saarland.de/projects/nag/
Miller, G. (1969). The organization of lexical memory: Are word associations sufficient? In G. Talland & N. Waugh (Eds.), The pathology of memory (pp. 223–237). New York, NY: Academic Press.
Google Scholar
Nelson, D. L., McEvoy, C. L., & Schreiber, T. (2004). The University of South Florida free association, rhyme, and word fragment norms. Behavior Research Methods, Instruments, & Computers, 36, 402–407. doi:10.3758/BF03195588
Article Google Scholar
Palermo, D., & Jenkins, J. (1964). Word association norms: Grade school through college. Minneapolis, MN: University of Minnesota Press.
Google Scholar
Rapp, R. (2002). The computation of word associations: Comparing syntagmatic and paradigmatic approaches. In Proceedings of the 19th International Conference on Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics.
Reddy, S., McCarthy, D., & Manandhar, S. (2011). An empirical study on compositionality in compound nouns. In Proceedings of the 5th International Joint Conference on Natural Language Processing (pp. 210–218). Chiang Mai, Thailand.
Roller, S., Schulte im Walde, S., & Scheible, S. (2013). The (un)expected effects of applying standard cleansing models to human ratings on compositionality. In Proceedings of the 9th Workshop on Multiword Expressions (pp. 32–41). Stroudsburg, PA: Association for Computational Linguistics.
Google Scholar
Roth, M., & Schulte im Walde, S. (2008). Corpus co-occurrence, dictionary and wikipedia entries as resources for semantic relatedness information. In Proceedings of the 6th International Conference on Language Resources and Evaluation (pp. 1852–1859). European Language Resources Association (ELRA).
Russell, W. (1970). The complete German language norms for responses to 100 words from the Kent-Rosanoff word association test. In L. Postman & G. Keppel (Eds.), Norms of word association (pp. 53–94). New York, NY: Academic Press.
Google Scholar
Russell, W., & Meseck, O. (1959). Der Einfluss der Assoziation auf das Erinnern von Worten in der deutschen, französischen und englischen Sprache. Zeitschrift für Experimentelle und Angewandte Psychologie, 6, 191–211.
Google Scholar
Sandra, D. (1990). On the representation and processing of compound words: Automatic access to constituent morphemes does not occur. Quarterly Journal of Experimental Psychology, 42A, 529–567. doi:10.1080/14640749008401236
Article Google Scholar
Schulte im Walde, S. (2008). Human associations and the choice of features for semantic verb classification. Research on Language and Computation, 6, 79–111.
Article Google Scholar
Schulte im Walde, S., Borgwaldt, S., & Jauch, R. (2012). Association norms of German noun compounds. In Proceedings of the 8th International Conference on Language Resources and Evaluation (pp. 632–639). European Language Resources Association (ELRA).
Schulte im Walde, S., & Melinger, A. (2008). An in-depth look into the co-occurrence distribution of semantic associates. Italian Journal of Linguistics, 20, 89–128.
Google Scholar
Schulte im Walde, S., Melinger, A., Roth, M., & Weber, A. (2008). An empirical characterisation of response types in German association norms. Research on Language and Computation, 6, 205–238.
Article Google Scholar
Schulte im Walde, S., & Müller, S. (2013). Using web corpora for the automatic acquisition of lexical-semantic knowledge. Journal for Language Technology and Computational Linguistics, 28, 85–105.
Schulte im Walde, S., Müller, S., & Roller, S. (2013). Exploring vector space models to predict the compositionality of German noun–noun compounds. In: Proceedings of the 2nd Joint Conference on Lexical and Computational Semantics 255–265. Stroudsburg, PA: Association for Computational Linguistics.
Spence, D. P., & Owens, K. C. (1990). Lexical co-occurrence and association strength. Journal of Psycholinguistic Research , 19, 317–330.
Article Google Scholar
Taft, M. (2004). Morphological decomposition and the reverse base frequency effect. Quarterly Journal of Experimental Psychology, 57A, 745–765.
Article Google Scholar
Taft, M., & Forster, K. I. (1975). Lexical storage and retrieval of prefixed words. Journal of Verbal Learning and Verbal Behavior, 14, 638–647. doi:10.1016/S0022-5371(75)80051-X
Article Google Scholar
van Jaarsveld, H., & Rattink, G. (1988). Frequency effects in the processing of lexicalized and novel nominal compounds. Journal of Psycholinguistic Research , 17, 447–473.
Article Google Scholar
von der Heide, C., & Borgwaldt, S. (2009). Assoziationen zu Unter-, Basis- und Oberbegriffen: Eine explorative Studie. In R. Vogel & S. Sahel (Eds.), Proceedings of the 9th Norddeutsches Linguistisches Kolloquium (pp. 51–74). Bielefeld, Germany: Universität Bielefeld.
Google Scholar
Zwitserlood, P. (1994). The role of semantic transparency in the processing and representation of Dutch compounds. Language and Cognitive Processes, 9, 341–368. doi:10.1080/01690969408402123
Article Google Scholar

Download references

Author note

This work was supported by Heisenberg Fellowship SCHU-2580/1 from the Deutsche Forschungsgemeinschaft (Sabine Schulte im Walde). We also thank Ronny Jauch, who performed parts of the analyses in Overall results and analyses for his diploma thesis (Jauch, 2012), and the anonymous reviewers for their valuable comments.

Author information

Authors and Affiliations

Institut für Maschinelle Sprachverarbeitung, Universität Stuttgart, Pfaffenwaldring 5B, 70569, Stuttgart, Germany
Sabine Schulte im Walde
Germanistisches Seminar, Universität Siegen, Siegen, Germany
Susanne R. Borgwaldt

Authors

Sabine Schulte im Walde
View author publications
You can also search for this author in PubMed Google Scholar
Susanne R. Borgwaldt
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sabine Schulte im Walde.

Electronic supplementary material

Below is the link to the electronic supplementary material.

ESM 1

(ZIP 547 kb)

Appendices

Appendix A

In the following, we provide a list of our subset of bi-morphemic noun compounds from von der Heide & Borgwaldt (2009), including the compound constituents, and the categorisation label for „NN“ (noun-noun compound) vs. „unique“ (noun compound with unique modifier).

Ahornblatt	Ahorn	Blatt	NN
Armband	Arm	Band	NN
Armbrust	Arm	Brust	NN
Aschenbecher	Asche	Becher	NN
Bahnhof	Bahn	Hof	NN
Bananenschale	Banane	Schale	NN
Bärlauch	Bär	Lauch	NN
Basketball	Basket	Ball	NN
Baumhaus	Baum	Haus	NN
Bettwäsche	Bett	Wäsche	NN
Bierfass	Bier	Fass	NN
Bildschirm	Bild	Schirm	NN
Billiardtisch	Billiard	Tisch	NN
Bleistift	Blei	Stift	NN
Blockflöte	Block	Flöte	NN
Blumenkohl	Blume	Kohl	NN
Blumenstrauß	Blume	Strauß	NN
Blumenvase	Blume	Vase	NN
Brautkleid	Braut	Kleid	NN
Briefkasten	Brief	Kasten	NN
Briefmarke	Brief	Marke	NN
Briefpapier	Brief	Papier	NN
Briefträger	Brief	Träger	NN
Brillenetui	Brille	Etui	NN
Brombeere	Brom	Beere	unique
Bullauge	Bulle	Auge	NN
Büroklammer	Büro	Klammer	NN
Cowboyhut	Cowboy	Hut	NN
Cowboystiefel	Cowboy	Stiefel	NN
Dachboden	Dach	Boden	NN
Dachfenster	Dach	Fenster	NN
Dachstuhl	Dach	Stuhl	NN
Dominostein	Domino	Stein	NN
Eidechse	Eid	Echse	unique
Eisbär	Eis	Bär	NN
Eisberg	Eis	Berg	NN
Eisenbahn	Eisen	Bahn	NN
Eisstadion	Eis	Stadion	NN
Eiswürfel	Eis	Würfel	NN
Ellenbogen	Elle	Bogen	NN
Erdbeere	Erde	Beere	NN
Erdnuss	Erde	Nuss	NN
Espressomaschine	Espresso	Maschine	NN
Federball	Feder	Ball	NN
Federboa	Feder	Boa	NN
Feldsalat	Feld	Salat	NN
Feuerwerk	Feuer	Werk	NN
Feuerzeug	Feuer	Zeug	NN
Fieberthermometer	Fieber	Thermometer	NN
Filzstift	Filz	Stift	NN
Fingerhut	Finger	Hut	NN
Fleischwolf	Fleisch	Wolf	NN
Fliegenklatsche	Fliege	Klatsche	NN
Fliegenpilz	Fliege	Pilz	NN
Flohmarkt	Floh	Markt	NN
Fotoalbum	Foto	Album	NN
Fotoapparat	Foto	Apparat	NN
Fußball	Fuß	Ball	NN
Fußleiste	Fuß	Leiste	NN
Gesangsbuch	Gesang	Buch	NN
Gewächshaus	Gewächs	Haus	NN
Glockenspiel	Glocke	Spiel	NN
Golfball	Golf	Ball	NN
Gummibärchen	Gummi	Bärchen	NN
Gummiente	Gummi	Ente	NN
Gummistiefel	Gummi	Stiefel	NN
Gürteltier	Gürtel	Tier	NN
Haarreifen	Haar	Reifen	NN
Hahnenfuß	Hahn	Fuß	NN
Halskette	Hals	Kette	NN
Handcreme	Hand	Creme	NN
Handschuh	Hand	Schuh	NN
Handtasche	Hand	Tasche	NN
Handtuch	Hand	Tuch	NN
Haselnuss	Hasel	Nuss	NN
Hausboot	Haus	Boot	NN
Heidelbeere	Heide	Beere	NN
Heuhaufen	Heu	Haufen	NN
Himbeere	Him	Beere	unique
Hirschkäfer	Hirsch	Käfer	NN
Hollywoodschaukel	Hollywood	Schaukel	NN
Hornbrille	Horn	Brille	NN
Hufeisen	Huf	Eisen	NN
Hundehütte	Hund	Hütte	NN
Hüttenkäse	Hütte	Käse	NN
Jägerzaun	Jäger	Zaun	NN
Jeanshemd	Jeans	Hemd	NN
Jeansjacke	Jeans	Jacke	NN
Kaffeemaschine	Kaffee	Maschine	NN
Kaffeemühle	Kaffee	Mühle	NN
Kaffeepad	Kaffee	Pad	NN
Kettensäge	Kette	Säge	NN
Kleiderschrank	Kleider	Schrank	NN
Knoblauch	Knob	Lauch	unique
Kokosnuss	Kokos	Nuss	NN
Kopfkissen	Kopf	Kissen	NN
Kopfsalat	Kopf	Salat	NN
Kreditkarte	Kredit	Karte	NN
Kreissäge	Kreis	Säge	NN
Kronkorken	Krone	Korken	NN
Kronleuchter	Krone	Leuchter	NN
Kuckucksuhr	Kuckuck	Uhr	NN
Kuhfladen	Kuh	Fladen	NN
Kulturbeutel	Kultur	Beutel	NN
Kürbiskern	Kürbis	Kern	NN
Lachsschinken	Lachs	Schinken	NN
Latzhose	Latz	Hose	NN
Leberwurst	Leber	Wurst	NN
Lederhose	Leder	Hose	NN
Lichtschalter	Licht	Schalter	NN
Löwenzahn	Löwe	Zahn	NN
Luftmatratze	Luft	Matratze	NN
Luftpumpe	Luft	Pumpe	NN
Maisfeld	Mais	Feld	NN
Maiskolben	Mais	Kolben	NN
Marienkäfer	Maria	Käfer	NN
Maßstab	Maß	Stab	NN
Maulwurf	Maul	Wurf	NN
Mausefalle	Maus	Falle	NN
Meerschweinchen	Meer	Schweinchen	NN
Mettwurst	Mett	Wurst	NN
Mikadostäbchen	Mikado	Stäbchen	NN
Milchshake	Milch	Shake	NN
Mohrrübe	Mohr	Rübe	NN
Motorhaube	Motor	Haube	NN
Motorrad	Motor	Rad	NN
Mülleimer	Müll	Eimer	NN
Mülltonne	Müll	Tonne	NN
Mundharmonika	Mund	Harmonika	NN
Nachttisch	Nacht	Tisch	NN
Nadelkissen	Nadel	Kissen	NN
Nagelfeile	Nagel	Feile	NN
Nagellack	Nagel	Lack	NN
Nasenbär	Nase	Bär	NN
Nashorn	Nase	Horn	NN
Nilpferd	Nil	Pferd	NN
Nudelholz	Nudel	Holz	NN
Nummernschild	Nummer	Schild	NN
Obstkuchen	Obst	Kuchen	NN
Ohrring	Ohr	Ring	NN
Papierkorb	Papier	Korb	NN
Pfannkuchen	Pfanne	Kuchen	NN
Pfauenfeder	Pfau	Feder	NN
Pfeffermühle	Pfeffer	Mühle	NN
Postbote	Post	Bote	NN
Postkarte	Post	Karte	NN
Pudelmütze	Pudel	Mütze	NN
Radkappe	Rad	Kappe	NN
Reetdach	Reet	Dach	NN
Regenbogen	Regen	Bogen	NN
Regenmantel	Regen	Mantel	NN
Regenrinne	Regen	Rinne	NN
Regenschirm	Regen	Schirm	NN
Ringfinger	Ring	Finger	NN
Rittersporn	Ritter	Sporn	NN
Rucksack	Ruck	Sack	NN
Sandburg	Sand	Burg	NN
Sanduhr	Sand	Uhr	NN
Schachbrett	Schach	Brett	NN
Schildkröte	Schild	Kröte	NN
Schlauchboot	Schlauch	Boot	NN
Schlittenhund	Schlitten	Hund	NN
Schlüsselbund	Schlüssel	Bund	NN
Schneckenhaus	Schnecke	Haus	NN
Schneeball	Schnee	Ball	NN
Schneemann	Schnee	Mann	NN
Schnittlauch	Schnitt	Lauch	NN
Schornstein	Schorn	Stein	unique
Schulbuch	Schule	Buch	NN
Schwertfisch	Schwert	Fisch	NN
Seehund	See	Hund	NN
Seemann	See	Mann	NN
Seerose	See	Rose	NN
Seestern	See	Stern	NN
Seezunge	See	Zunge	NN
Seifenblase	Seife	Blase	NN
Seilbahn	Seil	Bahn	NN
Sessellift	Sessel	Lift	NN
Skistock	Ski	Stock	NN
Sonnenblume	Sonne	Blume	NN
Sonnenbrille	Sonne	Brille	NN
Sonnencreme	Sonne	Creme	NN
Sonnenschirm	Sonne	Schirm	NN
Sonnenuhr	Sonne	Uhr	NN
Spiegelei	Spiegel	Ei	NN
Spinnennetz	Spinne	Netz	NN
Stachelbeere	Stachel	Beere	NN
Stacheldraht	Stachel	Draht	NN
Stereoanlage	Stereo	Anlage	NN
Sternschnuppe	Stern	Schnuppe	NN
Strandkorb	Strand	Korb	NN
Straßenbahn	Straße	Bahn	NN
Strohhalm	Stroh	Halm	NN
Strumpfhose	Strumpf	Hose	NN
Suppenteller	Suppe	Teller	NN
Tannenzapfen	Tanne	Zapfen	NN
Taschenbuch	Tasche	Buch	NN
Taschenlampe	Tasche	Lampe	NN
Taschenmesser	Tasche	Messer	NN
Taschentuch	Tasche	Tuch	NN
Teddybär	Teddy	Bär	NN
Teebeutel	Tee	Beutel	NN
Teekanne	Tee	Kanne	NN
Teelicht	Tee	Licht	NN
Teelöffel	Tee	Löffel	NN
Teetasse	Tee	Tasse	NN
Telefonbuch	Telefon	Buch	NN
Telefonhörer	Telefon	Hörer	NN
Telefonzelle	Telefon	Zelle	NN
Tennisball	Tennis	Ball	NN
Tennisschläger	Tennis	Schläger	NN
Thermoskanne	Thermo	Kanne	unique
Tintenfisch	Tinte	Fisch	NN
Toilettenpapier	Toilette	Papier	NN
Truthahn	Trut	Hahn	unique
Türklinke	Tür	Klinke	NN
Vanilleeis	Vanille	Eis	NN
Visitenkarte	Visite	Karte	NN
Vogelhaus	Vogel	Haus	NN
Vogelkäfig	Vogel	Käfig	NN
Walnuss	Wal	Nuss	unique
Wäscheklammer	Wäsche	Klammer	NN
Wasserfall	Wasser	Fall	NN
Wasserhahn	Wasser	Hahn	NN
Wassermelone	Wasser	Melone	NN
Wasserwaage	Wasser	Waage	NN
Weihnachtsbaum	Weihnachten	Baum	NN
Werwolf	Wer	Wolf	unique
Windlicht	Wind	Licht	NN
Windmühle	Wind	Mühle	NN
Wintermantel	Winter	Mantel	NN
Wirbelsäule	Wirbel	Säule	NN
Wollschal	Wolle	Schal	NN
Würfelzucker	Würfel	Zucker	NN
Zahnbürste	Zahn	Bürste	NN
Zahnkrone	Zahn	Krone	NN
Zahnpasta	Zahn	Pasta	NN
Zahnrad	Zahn	Rad	NN
Zahnseide	Zahn	Seide	NN
Zahnspange	Zahn	Spange	NN
Zeitschrift	Zeit	Schrift	NN
Ziegelstein	Ziegel	Stein	NN
Zitronenpresse	Zitrone	Presse	NN
Zollstock	Zoll	Stock	NN
Zuckerhut	Zucker	Hut	NN
Zuckerwatte	Zucker	Watte	NN

Appendix B

In the following, we provide the experiment setups for the association collection via AMT in the current study, and for the association collection through a standard web experiment in the previous study (Schulte im Walde et al., 2012). The AMT setup of the current collection is shown by Fig. 1, with translations in red font. The actual collection of the associations was performed as shown in Fig. 2, translated by Fig. 3. The web collection of the associations was performed as shown in Fig. 4, translated by Fig. 5.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Schulte im Walde, S., Borgwaldt, S.R. Association norms for German noun compounds and their constituents. Behav Res 47, 1199–1221 (2015). https://doi.org/10.3758/s13428-014-0539-y

Download citation

Published: 16 January 2015
Issue Date: December 2015
DOI: https://doi.org/10.3758/s13428-014-0539-y

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Association norms for German noun compounds and their constituents

Abstract

Similar content being viewed by others

Near-term advances in quantum natural language processing

In defense of an HPSG-based theory of non-constituent coordination: a reply to Kubota and Levine

Language and perception: Introduction to the Special Issue “Speakers and Listeners in the Visual World”

Previous work on association norms

Collections of association norms

Analyses of association norms

Noun compounds

Experiment

Method

Material

Procedure

Participants

Data

Previous experiment

Method

Material

Procedure

Participants

Data

Overall results and analyses

Morpho-syntactic analysis

Co-occurrence analysis

Syntactic dependency paths

Semantic relations

Associations and semantic transparency

Conclusion

Notes

References

Author note

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

ESM 1

Appendices

Appendix A

Appendix B

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation