Introduction

Conceptual representation has been the focus of much study and debate for decades in the fields of semantics and psycholinguistics. In contrast with the holistic and non-decompositional view held by Collins and Loftus (1975), at the heart of the debate now are two seemingly opposite accounts of semantic representation: the distributional and the embodied accounts of conceptual representation (Harris, 1954; Firth, 1957; see Lenci, 2008; Andrews, Vigliocco, & Vinson, 2009; Andrews, Frank, & Vigliocco, 2014; Bruni, Tran, & Baroni, 2014; Lenci, 2018 for reviews of the distributional account. See Glenberg, 1997; Barsalou, 1999; Zwaan, 2004; Meteyard, Cuadraro, Bahrami, & Vigliocco, 2012; Pulvermüller, 2013; Ostarek & Huettig, 2019 for reviews of the embodied account). Both these accounts consider that feature and property overlap play a major role in the processing of meaning (see Vigliocco & Vinson, 2007; Vigliocco, Meteyard, Andrews, & Kousta, 2009 for reviews). Indeed, there is much evidence of this from semantic priming studies, widely regarded as the gold standard for studying semantic representation in the mind and brain (e.g., Hutchison et al., 2013; Kim, Yap, & Goh, 2019). However, for both sides of the spectrum, the representation of abstract concepts remains a challenge, hence the need for a database of source material enabling us to further our understanding of abstract concept representation.

Accounts of semantic representation

Holistic view and spreading of activation

According to the holistic view of semantic representation, for every element of the world—be it an object, an event, property, etc.—there is an abstract and symbolic lexical equivalent that acts as a referent in the conceptual system of the mind (Fodor, Garrett, Walker, & Parkes, 1980; Berg & Levelt, 1990; Roelofs, 1997; Levelt, Roelofs, & Meyer, 1999). In this view of one-to-one mapping, each referent represents a single node in a semantic network, with nodes linked according to their semantic similarity. For instance, the concept fire would be represented by a single node linked to related concepts or properties, such as red, also represented by a single node. Collins and Loftus (1975) described the mechanisms of semantic processing based on their theory of the spreading of activation in a network, according to which a concept, when it is processed, activates the path between related nodes at a speed proportional to the strength of the link between them. The assumption of semantic similarity in the spreading of activation theory accounts for both the strength of the link between nodes and the ensuing dynamics of activation for related concepts. Given that in this holistic view, each property or feature of a concept is represented by a single node, it is a view which contrasts with the decompositional or featural view.

Featural view

According to the featural view of semantic representation, words can be decomposed into a set of defining features or properties reflecting the meaning of the concept to which they relate (Smith, Shoben, & Rips, 1974). For instance, the concept fire would be decomposed according to its defining features such as <is hot> and <is red>. As with the holistic view, at the core of the featural view is semantic similarity, but in this case it is measured by the number of features two concepts have in common (Plaut, 1995; McRae, de Sa, & Seidenberg, 1997; Cree, McRae, & McNorgan, 1999; Vigliocco, Vinson, Lewis, & Garrett, 2004; Kiefer & Pulvermüller, 2012). The more features they share, the more semantically similar they are. In recent years, two seemingly opposite accounts of this featural view have dominated the debate on the nature of semantic representation, namely the distributional and the embodiment accounts. They differ from each other in respect of the information used to represent meaning. While distributional semantics relies on symbolic and linguistic features, embodiment relies on perceptual and sensory-motor states.

According to models of distributional semantics, meaning is the result of the statistical distribution of words across written and spoken language (see Andrew, Frank, & Vigliocco, 2014; Lenci, 2018 for reviews of this account, see also Lund & Burgess, 1996; Landauer & Dumais, 1997; Griffiths, Steyvers, & Tenenbaum, 2007; Mandera, Keuleers, & Brysbaert, 2017). The meaning of words is therefore defined in relation to other words, depending on their shared symbolic and linguistic features. According to the distributional hypothesis, words occurring in similar contexts have similar meanings (Harris, 1954). This use of intralinguistic relationships was successfully implemented in computational models of semantics (e.g., Hoffman, McClelland, & Lambon Ralph, 2018). Motivation for using algorithms such as latent semantic analysis (LSA; Landauer & Dumais, 1997) is the notion that meaning can be extracted by computing semantic similarities between concepts (Louwerse, 2008, 2011; Louwerse & Jeuniaux, 2008, 2010; Rogers & McClelland, 2004; Kintsch, McNamara, Dennis, & Landauer, 2007). In addition, the close performance between computational models and human behaviour suggests these models are able, to some extent, to mimic the extraction of semantic representation from language (see Andrews et al., 2009; Binder, Conant, Humphries, Fernandino, Simons, Aguilar, & Desai, 2016).

This view of distributional semantics using amodal linguistic symbols as a proxy for representing meaning has been under fire, particularly from researchers subscribing to the theory of embodiment, for its lack of grounding in perceptual and motor states.

The embodied account of semantic representation defines meaning as grounded in perceptual and motor states derived from an individual’s sensory experience (Barsalou, 1999; Glenberg, 1997; Zwaan, 2004; Kiefer & Pulvermüller, 2012; Meteyard, Cuadrado, Bahrami, & Vigliocco, 2012). For instance, Pulvermüller, Shtyrov, and Ilmoniemi (2005) used brain-imaging techniques to show that brain areas responsible for motor actions of the face and leg are activated when action words such as kick or lick are processed. Evidence like this struggles, however, to explain the grounding mechanisms for abstract concepts, where there are no physical and sensory features (see Borghi & Pecher, 2011; Borghi, Binkofski, Castelfranchi, Cimatti, Scorolli, & Tummolini, 2017, for reviews).

The dichotomy between abstract and concrete concepts is not clear-cut (Della Rosa et al., 2010). The most commonly invoked criterion is tangibility, with concrete concepts referring to tangible entities that are perceptible via the senses, whereas abstract concepts are intangible. According to the dual-coding theory (Paivio, Yuille, & Madigan, 1968), concrete concepts trigger processing based on two informational systems, one visual, the other verbal, whereas abstract concepts are processed only in the verbal system. The context availability theory (Schwanenflugel, Harnishfeger, Stowe, 1988) posits that while concrete concepts refer to a definite number of contexts, abstract concepts are connected to varied contexts. Although true, this distinction can be considered reductive and contributes to the view that abstract concepts are poor in terms of features. More recently, with the interest shown in abstract concepts by grounded cognition, new elements of definition have emerged, according to which abstract concepts refer to intangible features such as emotions, events, social contexts, and introspective states (e.g., Barsalou & Wiemer-Hastings, 2005; Harpainter et al., 2018; see Borghi et al., 2017 for a review). This latter definition reflects a new interest in their grounding mechanisms and semantic representation.

The tangibility criterion is best represented by the concreteness variable defining the distinction between concrete and abstract concepts based on the dual coding and context availability theories. It plays a key role in psycholinguistic research, as well as providing an explanation for many phenomena, such as hemispheric lateralisation in the processing of concrete and abstract concepts (Oliveira, Perea, Ladera, & Gamito, 2013), or ease of retrieval of concrete words compared to abstract ones (Mate, Allen, & Baques, 2012; Nishiyama, 2013).

The importance of the concreteness variable is further borne out by the development of several widely used databases containing concreteness rating norms (Coltheart, 1981) and, more recently, 40,000 words in English (Brysbaert, Warriner, & Kuperman, 2014) and 1659 words in French (Bonin et al., 2018).

Abstract concept representation

The embodied account has yet to propose a unified theory for the representation of abstract concepts such as justice or freedom which do not refer to direct perceptual features or sensory-motor states (Dove, 2009, 2011, 2014; Machery, 2016; see Pecher, 2018 for a review). However, several hypotheses, ranging from strongly to weakly embodied, have been put forward as explanations for the grounding mechanisms of abstract concepts. The strong embodiment assumptions make no allowance for multiple representations and consider abstract concepts to be as grounded and reliant on sensory-motor systems as concrete concepts are (e.g., Glenberg & Kaschak, 2002; see Borghi et al., 2017 for a review). For instance, according to the conceptual metaphor theory, abstract concepts are grounded through image schemas corresponding to mental representations (e.g., Lakoff & Johnson, 1980; Gallese & Lakoff, 2005). Several studies have shown that abstract concepts of valence and power are grounded in two-dimensional spatial schema with the higher point of a vertical vector representing positions of power while the left-hand side of a horizontal vector represents negative concepts (see Pecher, 2018 for a review). However, the need for one-to-one mapping between abstract concepts and concrete metaphors means there are limits to the availability of such metaphors for every type of abstract concept.

At the other end of the spectrum, according to weak embodiment assumptions, abstract concepts are grounded via multiple representations of meaning with the involvement of both sensorimotor and linguistic processing. These grounding mechanisms place a greater emphasis on the context in which abstract concepts are used (e.g., Barsalou, 1999, 2003; Wiemer-Hastings & Xu, 2005). Several studies have shown that abstract concepts activate social and introspective aspects of situations (Barsalou & Wiemer-Hastings, 2005), emotional features (Kousta, Vigliocco, Vinson, Andrews, & Del Campo, 2011; Lenci, Lebani, & Passaro, 2018), information about events, and thematic roles (Ferretti, McRae, & Hatherell, 2001), and, more generally, linguistic information acting as a shortcut to conceptual simulation (Barsalou, Santos, Simmons, & Wilson, 2008). Such assumptions have the advantage of being sufficiently general to apply to a variety of abstract concepts.

Whether seen from the distributional or embodied end of the spectrum, all accounts agree on the importance of relationships between concepts for the organisation of semantic knowledge. Two kinds of relationships have been widely investigated: semantic similarities (theft-burglar) and verbal association (theft-prison), and much effort has gone into creating databases of material to use in semantic priming studies regarded as the gold standard for studying how semantic knowledge is organised (see Hutchinson, Balota, Cortese, & Watson, 2008; Hutchison et al., 2013; Pulvermüller, 2013; Mandera, Keuleers, & Brysbaert, 2017).

Semantic priming and semantic similarity for concrete and abstract concepts

Semantic priming

In a semantic priming study, participants are presented with a prime word followed by a target word (Meyer & Schvaneveldt, 1971). The relationship between the two words is one of either semantic similarity, where the two words belong to the same superordinate category (e.g., prime: eagle; target: owl), or verbal association, where the two words are frequently found together across spoken and written language (e.g., prime: fireman; target: truck; McNamara, 1992; Plaut, 1995). In a lexical decision task, participants make a decision on the target by indicating whether or not it is a word. The semantic priming effect refers to the robust result, which has been replicated hundreds of times, showing that participants respond faster for related primes and targets compared to unrelated ones (Hutchison et al., 2008, 2013). This phenomenon has been widely studied as it provides considerable insight into the organisation and mechanisms of semantic knowledge. In actual fact, each theoretical view discussed above can account for this priming effect. According to the holistic view (Fodor et al., 1980; Berg & Levelt, 1990; Roelofs, 1997), the priming effect is the result of spreading activation from the prime to the target along strongly linked nodes, whereas according to the distributional, embodied, and hybrid accounts, it results from the activation of features shared between the prime and the target (Mahon & Caramazza, 2008; Dove, 2009; Andrews, Frank, & Vigliocco, 2014; Carota, Kriegeskarte, Nili, & Pulvermüller, 2017). Where these last accounts differ, however, is in the nature of the features. The distributional account suggests the priming effect results from the activation of linguistic features, the embodied account that it results from the activation of shared sensorimotor states, and the hybrid account that both linguistic and perceptual features are responsible for this phenomenon.

Despite the robustness of the semantic priming effect with concrete concepts, with abstract concepts results have been inconsistent. Crutch (2005; Crutch, Conell, & Warrington, 2009; Crutch and Warrington, 2010) showed that while concrete concepts are organised according to semantic similarity, abstract concepts are organised according to verbal association. Several studies tried to replicate these results but revealed discrepancies (e.g., Hamilton & Coslett, 2008; Duñabeitia, Avilés, Afonso, Scheepers, & Carreiras, 2009; Geng & Schnur, 2015). Indeed, these studies have attempted to replicate the results according to which concrete and abstract concepts have different dependencies upon semantic similarity and associative strength. They have however failed to find any such difference in the organisation of concrete and abstract concepts. A more recent study found that both semantic similarity and verbal association elicited a priming effect for concrete concepts, whereas for abstract concepts it was found only with verbal association (Ferré, Guasch, García-Chico, & Sánchez-Casas, 2015). Crutch and Jackson (2011) suggested the relationship between concreteness and association type could explain these disparities. They presented evidence based on data from healthy and neuropsychological patients showing that when presented with triplets of low, middle, and high levels of concreteness, the effect of semantic similarity increased with concreteness, while the effect of verbal association decreased with concreteness. Furthermore, they suggested that concreteness be used as a graded variable rather than a binary one, especially when studying its effect on the organisation of semantic memory. Accordingly, this calls for a shift in the way abstract concepts are studied, to place more emphasis on the type and associated level of concreteness for selected abstract concepts. Two different procedures are used to generate material for semantic similarity and priming studies: feature generation tasks and semantic similarity ratings.

Semantic similarity: feature generation and semantic pairs

In a feature generation task, participants are given a list of words for which they are required to provide a list of features defining each word. The procedure provides measures of semantic similarity by comparing the feature overlap between two words. The more features two words have in common, the more similar they are (McRae, Cree, Seidenberg, & McNorgan, 2005; McRae, de Sa, & Seidenberg, 1997; Sánchez-Casas, Ferré, García-Albea, & Guasch, 2006; Vigliocco, Vinson, Lewis, & Garrett, 2004; Vinson & Vigliocco, 2008). However, it is a procedure which is highly time-consuming and which has limitations (see McRae et al., 2005 for a discussion of these limitations). For instance, in feature naming, participants may provide only a linguistic approximation of conceptual content. It is fair to assume, therefore, that some parts of the concepts would be lost in verbalisation. This criticism appears to be particularly relevant in the case of abstract concepts which may themselves be decomposed into abstract features. Indeed, many authors have suggested that, compared to concrete concepts, abstract concepts appear to be semantically impoverished, with their representation requiring associations with other concepts or grounding simulations in introspective and social states (Barsalou et al., 2008; Borghi, Scorolli, Caligiore, Baldassare, & Tummolini, 2013; Borghi, Barca, Binkofski, Castelfranchi, Pezzulo, & Tummolini, 2019, see also Recchia & Jones, 2012).

On the other hand, Wiemer-Hastings and Xu (2005) suggested that this apparent paucity of features for abstract concepts is due mainly to the instructions given to participants during a feature generation task. In the original method, Wiemer-Hastings and Xu (2005) asked participants only to generate features defining the concept, whereas later they instructed them to provide context features. The results showed that the difference between abstract and concrete concepts in terms of semantic richness disappeared when participants were encouraged to provide context features. By using the same method of property listing as Wiemer-Hastings and Xu (2005), Harpainter, Trumpp, and Kiefer (2018) gathered properties for close to 300 abstract concepts. By doing so, they further demonstrated the richness and heterogeneity of abstract concepts, showing that they can elicit affective, introspective, social, and sensory-motor properties. This heterogeneity of abstract concepts was further investigated by Villani, Lugli, Liuzza, and Borghi (2019) who evaluated more than 400 abstract concepts on 15 dimensions. Their results provided further support for a multiple representation view of abstract concepts.

In addition, Bolognesi, Pilgram, and van den Heerik (2017) adapted Wu and Barsalou’s taxonomy (2009) to include 20 feature categories belonging to four main dimensions (concept properties, situation properties, introspections, and taxonomic properties) that must be distinguished to convey the full semantic richness of concepts. Recchia and Jones (2012) were not, however, able to determine whether such distinctions in respect of semantic features could benefit abstract concept representation. They invoked the shallowness of lexical decision tasks in semantic processing. Consequently, future studies will need to reach conclusions on the role of feature categories for abstract concept representation.

Another, less costly, way of creating material for semantic representation studies is to generate semantically similar word pairs. This option relies on a similarity-rating task where participants are presented with pairs of words formed by the researcher with a view to obtaining concepts either belonging to the same category or being similar in meaning (e.g., truck-car; Ferrand & New, 2003; Perea & Rosa, 2002). Participants must rate the semantic similarity of the pairs on a scale (Ferrand & New, 2003; Sánchez-Casas et al., 2006). Studies have shown that the pairs rated as being highly similar produced a strong priming effect (e.g., McRae & Boisvert, 1998; Plaut & Booth, 2000; Hutchison, 2003; Andrews, Lo, & Xia, 2017). In addition, studies have shown a strong correlation between the measures from similarity-rating tasks and feature generation, ensuring the legitimacy of this latter technique (e.g., McRae, de Sa, & Seidenberg, 1997). More recently, Maki, Krimsky, and Muñoz (2006) used a semantic rating task to show that ratings were a good predictor of feature overlap for existing semantic feature norms.

Normative databases for semantic similarity

Given the importance of carefully crafted material for studying semantic representation, much effort has been directed towards building normative databases to provide the research community with the material it needs. The most commonly found data sets gather English feature norms. McRae and collaborators (2005), for instance, provides feature norms for 541 living and non-living concepts. Subsequently, Buchanan, Holmes, Teasley, and Hutchison (2013) built a searchable web portal based on the work of McRae and collaborators (2005), facilitating the search for experimental stimuli in their data set. Buchanan, Valentine, and Maxwell (2019) expanded previous databases and provided features for more that 4000 words. Vinson and Vigliocco (2008) provided an interesting data set based on concrete object nouns and verb events that allow semantic representation to be studied beyond the usual focus on concrete concepts. Devereux, Tyler, Geertzen, and Randall (2014) built on McRae and colleagues’ work by adding features produced by at least two participants compared to McRae and collaborator’s (2005) five-feature threshold for inclusion. In other languages, De Deyne and Storms (2008) and De Deyne et al. (2008) collected normative features among Dutch participants. Lebani, Bondielli, and Lenci (2015) collected thematic role features to study the semantic content of Italian verbs. Also in Italian, Lenci, Baroni, Cazzolli, and Marotta (2013) collected semantic features from congenitally blind and sighted participants, making it possible to study the role of perceptual information in concept processing. Kremer and Baroni (2011) collected properties and semantic relation types for German and Italian. More recently, Vivas, Vivas, Comesaña, Coni, and Vorano (2017) published the first Spanish semantic feature production norms for living and non-living concepts.

Researchers have used similarity-rating tasks to a lesser extent to produce such norms. Buchanan and collaborators (2013) compiled an English data set comprising 1808 words paired according to semantic similarity. In Spanish, Moldovan, Ferré, Demestre, and Sánchez-Casas (2015) collected normative ratings for 185 Spanish noun triplets with variation of semantic distance within each triplet. However, much of the effort in developing databases has been focused on concrete concepts. To the best of our knowledge, the present work offers the first database of semantically similar abstract word pairs in French.

The present study: Semantic similarity norms for abstract words

The present work introduces a data set comprising semantic similarity ratings for abstract word pairs obtained from French participants. We have added a measure of the concreteness of each word from each pair to allow for the selection of abstract concepts in line with Crutch and Jackson’s (2011) suggestion that there is a relationship between graded levels of concreteness and semantic organisation. To provide a data set of experimental stimuli according to the significant lexical variables and lexical latencies previously discussed, we have combined our list of words with existing databases such as the French Lexicon Project (FLP, Ferrand et al., 2010), Lexique (New et al., 2001, 2004, 2007), MEGALEX (Ferrand et al., 2018), and Wordlex (Gimenes & New, 2016).

Method

Participants

Both the similarity- and concreteness-rating tasks were presented as online questionnaires. Participants for the two studies were all French native speakers and between 18 and 45 years old. We collected data from 373 participants (334 women; Mage = 26.43; SD = 8.34) for the similarity-rating task, and 529 (486 women; Mage = 29.7; SD = 9.03) for the concreteness-rating task. Participants volunteered in response to an announcement posted on Facebook group walls, and no compensation was paid. Participants took part in only one of the tasks in an attempt to ensure their ratings were not influenced by previous exposure to the items which are common to both tasks. Both studies obtained the approval of the Université Clermont Auvergne Research Ethics Committee.

Stimuli

To have some guarantee of the level of abstractnessFootnote 1 of our material before collecting our own ratings, we selected 1020 words having a low level of concreteness (range between 100 and 600) from Coltheart’s (1981) concreteness norms. We then translated the selected words into French following a back-translation procedure (Sperber, Devellis, & Boehlecke, 1994), following which 174 words were excluded. We also added the material from Ferrand (2001) comprising 260 French abstract words.

Based on our linguistic intuition, we then formed semantically similar pairs (e.g., joie-bonheur; [joy-happiness]). To the best of our ability (see below), we ensured that the semantic pairs were non-associates (according to McRae & Boisvert, 1998), and were not linked by either a super/supra-ordinate, part/whole, or antonym relationship. The material was then divided into six lists of pairs, and 30% of fillers (unrelated pairs, e.g., défaut-frisson; [flaw-chill]) were added per list. So that the participants would be sensitive to the abstractness of the pairs, we also added concrete words from Ferrand and Alario (1998) and formed semantic pairs. Accordingly, we were able to form 628 semantically related pairs (460 noun pairs, 99 adjective pairs, and 69 verb pairs). Both prime and target words had the same grammatical status within each semantically similar pair. To ensure the pairs were semantically similar and not associated, we translated the target words back into English and checked for forward strength in the Small World of Words databaseFootnote 2 (SWOW, De Deyne, Navarro, Perfors, Brysbaert, & Storms, 2019). We identified all pairs for which the prime and target presented a forward associative strength of higher than 10%. Seventy pairs were identified as both associated and semantically similar (e.g., anxiety-fear). We kept them in the main database with the possibility to filter them out. In addition, we created a secondary database containing only the semantically similar and associated word pairs. As suggested by De Deyne et al. (2019), association data are not to be discarded and provide a strong indication of meaning similarity.

For the concreteness-rating task, the pairs were separated, and the lists of individual words were presented in another experiment. Given the added material from Ferrand and Alario (1998), participants were presented with stimuli ranging from abstract to concrete, thereby ensuring their sensitivity to the task and avoiding learned response patterns.

Procedure

The stimuli (fillers included) were randomly divided into 6 lists of word pairs and 10 lists of isolated words, respectively, for the similarity-rating and concreteness-rating tasks. The motivation for dividing the pairs into different lists was twofold. Firstly, we wanted to keep the experiment concise so as to not overwhelm participants. Secondly, some words appear several times in different pairs, which is why we used semi-randomization to ensure that participants never saw pairs with the same words. The pairs and words were presented one by one on the screen in a randomised order. The experiment was conducted online using the Qualtrics software (2020). The design of the interface for this experiment allowed participants to complete the task on either a computer or smartphone.

Once they had given their consent and registered their demographic information, participants were randomly assigned to one of the lists. Their task was to judge the similarity between the two words presented for the similarity ratings and whether the words were more abstract or concrete for the concreteness ratings. Both tasks used a 7-point Likert-like scale ranging from 1 = “not at all similar” (“pas du tout similaires” in French) to 7 = “totally similar” (“tout à fait similaires”) for the similarity-rating tasks and from 1 = “very abstract” (“très abstrait”) to 7 = “very concrete” (très concret) for the concreteness-rating task (see supplementary material for the specific instructions). The words appeared one by one on the screen and were replaced as soon as participants had rated them. They were presented in the middle of the screen in Arial 12 font against a white background. We provided examples of items and their possible ratings in the instructions. No training was given before the tasks started. Both studies were self-paced, with no time limit for either the stimulus presentation (word pair or isolated pair) or participant’s answer. Both tasks took about 12 minutes to complete.

Results

We first computed general statistics for the entire data set. The general statistics collected for the semantic similarity and concreteness variable are shown in Table 1Footnote 3. Tables 2 and 3 provide the means for associated lexical variables computed by crossing our data set with the Lexique (New et al., 2004), FLP (Ferrand et al., 2010), MEGALEX (Ferrand et al., 2018), and Wordlex (Gimenes & New, 2016) databases.

Table 1. Semantic similarity for word pairs and associated concreteness for prime and target words
Table 2. Descriptive and behavioural data for target words
Table 3. Descriptive and behavioural data for prime words

It is apparent from the general statistics in Table 1 that the semantic similarity ratings range from 1.13 to 6.93 on a 7-point scale. This shows participants used the full range of the scale, but also reflects the diversity of the word pairs in terms of semantic similarity. Separating very similar (M = 5.13; SD = 0.41) and less similar (M = 3.67; SD = 0.59) pairs based on the median revealed a significant effect of semantic similarity [t(300) = 35.78, p < 0.001, d = 2.06]. This effect is particularly large, given that Cohen’s d suggests the difference is greater than two standard deviations. This will allow for the use of semantic similarity as either a continuous or categorical variable for researchers who would wish to study the effect of variation in semantic similarity. Concerning the concreteness variable, the means for prime and target are very close to one another, showing a good concreteness match within each pair (mean prime concreteness = 4.41; mean target concreteness = 4.40). A paired-samples t test showed no significant difference between the mean concreteness ratings for prime and target words [t(628) = 0.27, p = 0.80 ns]. This close match is further demonstrated in the correlation we computed between prime and target words with a strong and highly significant correlation [r = 0.87, t(628) = 44.50, p < 0.001].

Tables 2 and 3 display the lexical characteristics for the primes and targets composing our word pairs. The statistics presented in Tables 2 and 3 were obtained by cross-referencing our data set with Lexique (New et al., 2001, 2004, 2007), the French Lexicon Project (Ferrand et al., 2010), Wordlex (Gimenes & New, 2016), and MEGALEX (Ferrand et al., 2018). Movie subtitle frequency corresponds to the freqfilms2 variable from Lexique and refers to word frequency based on movie subtitles. The other frequencies were computed from books (Lexique: New et al., 2004), blog posts, Twitter, and newspapers (Wordlex: Gimenes & New, 2016).

We also computed correlations between semantic similarity for the pair and lexical variables as well as concreteness levels for the prime and target respectively. Such correlations were all non-significant except for the correlation between semantic similarity and concreteness. Indeed, the concreteness level of the prime and target was negatively and moderately correlated with the semantic similarity of the pair, respectively (Rprime_concreteness = −0.26; Rtarget_concreteness = −0.28, p < 0.001), suggesting that the higher the semantic similarity, the lower the level of concreteness. However, the mean concreteness is not as different for highly similar pairs (Mconcreteness = 4.14; SD = 1.59) as for less similar pairs (Mconcreteness = 4.78; SD = 1.39). This means researchers using the present database will be able to study phenomena of semantic similarity and their relationship with graded levels of concreteness without having to worry that the concreteness variable and the lexical variables might act as confounding variables.

In addition, we computed correlations between the concreteness variable and other lexical variables. It is clear from Table 4 that the concreteness variable shows a negative correlation to frequencies based on blog posts and Twitter. Such correlations are rather weak (r = −0.10), however, and should not be cause for concern as regards potential confounding variables. The concreteness variable is also moderately and negatively correlated with the number of letters and orthographic similarity, but positively correlated with the number of orthographic neighbours. All lexical variables are significantly intercorrelated, a result which replicates previous findings from the psycholinguistic norms literature. Indeed, upon comparing the correlations shown in Table 4 with those reported in MEGALEX (Ferrand et al., 2018), we found that the correlations between lexical variables were similar in size and significance levels, which further validates our data set. For example, and among the most widely used, word frequencies computed from books are highly correlated with other word frequencies computed from subtitles (r = 0.78), blogposts (r = 0.73), Twitter (r = 0.63), and newspapers (r = 0.68, see Table 4).

Table 4. Correlation matrix between concreteness levels and lexical variables with significance levels

We computed correlations between our concreteness variable and those collected by Bonin et al. (2018) in French and Brysbaert, Warriner, and Kuperman (2014) and Coltheart (1981) in English. Table 5 shows that the correlations are strong and highly significant, thus ensuring the validity of the concreteness variable we collected.

Table 5. Correlations of the present concreteness variable measures with those provided by other databases

Finally, to investigate the concreteness variable further, we implemented the package Ckmeans.1d.dp in R studio, an unsupervised learning algorithm for clustering univariate data (Wang & Song, 2011). Based on a Bayesian information criterion, the algorithm suggested the concreteness variable be split according to three clusters of abstractness, with cluster 1 the most abstract and cluster 3 the least abstract. The cluster variable is particularly important in relation to the previously discussed need to control the concreteness variable when manipulating semantic similarity. It is a variable which will therefore allow experimenters to select stimuli with matching concreteness levels. We have provided the cluster variable in the supplementary material.

Availability of the database

The data set for the present study is available in Excel format on the BRM and OSF websites (https://osf.io/qsd4v/). The main database is organised according to the following variables: word pairs in French, word-pair translation in English, word-pair mean concreteness, cluster variable based on word-pair mean concreteness, verbal association strength based on the SWOW, and mean pair similarity with associated general statistics (SD, min, max, median, range, skewness, Q1, Q3). The rest of the database is divided according to prime word and target word for the following variables: mean concreteness and associated general statistics, lexical variables (grammatical category, number of letters, and orthographic neighbours), reaction times (based on FLP and MEGALEX), and frequencies per million (movie subtitles, books, blogs, Twitter, and newspapers). The secondary database is organised following the same variables, but contains only the 70 word pairs that are semantically similar as well as verbally associated.

Discussion

The present study aimed to produce French norms of semantic similarity for abstract concepts. Based on our statistical analyses, we can provide material with varying levels of semantic similarity. In addition, based on our collection of concreteness ratings and the implementation of the k-means clustering algorithm, we organised the semantic pairs according to three clusters of abstractness. Our ultimate aim is for this database to be used to design material for studies such as semantic priming studies and other language-based paradigms (see, for example, Hutchison et al., 2013). The cross-references we computed with previously mentioned lexical databases allow stimuli to be matched on the basis of frequencies and other lexical variables. The analysis based on this cross-referencing also provides information about the potentially confounding variables that could create noise in an experimental design.

The comparison of prime and target words across the concreteness and lexical variables produced highly significant correlations, thus ensuring a good match within each pair. Further comparisons between semantic similarity and lexical variables, however, resulted in either very weak or non-significant correlations. This suggests there is no need to be particularly careful to avoid confounding lexical variables when using the similarity ratings. The strong and significant correlation in concreteness levels within word pairs, along with the cluster variable we introduced, were aimed at addressing Crutch and Jackson’s (2011) suggestion that discrepancies found when studying the organisation of semantic memory according to similarity or association might be due to a binary, rather than graded, definition of concreteness levels. Indeed, when considering the organisation of semantic memory at the extremes of concrete versus abstract concepts, we lose substantial evidence for the concepts in-between these two extremes. This limitation can be addressed by considering graded levels of concreteness. Previous findings have shown that concepts are organised according to semantic similarity when concreteness increases and according to verbal association when abstractness increases.

Finally, we suggest that, when creating materials, researchers pay attention to the moderate but significant correlation between semantic similarity and the concreteness variable, insofar as results have shown that more abstract pairs are also perceived as more similar than concrete pairs.

The aim of this database was also to fill a gap in the French literature regarding norms for abstract concepts. We therefore consider the present work to be a good starting point for developing other French-language databases focusing on abstract concepts such as verbal association.

Indeed, studies using word stimuli have a tendency to focus primarily on pairing stimuli according to word frequency, word length, and age of acquisition. However, such variables fail to capture fully the effect of word processing by the human mind, as best illustrated by the percentage of variance explained in norming studies and megastudies, which stagnates between .20 and .50 (Balota, Yap, Hutchison, Cortese, Kessler, Loftis, & Treiman, 2007; Keuleers, Brysbaert, & New, 2010; Ferrand et al., 2010; Brysbaert, Mandera, & Keulers, 2018). Newly developed variables have therefore been introduced with a view to capturing more of the word-processing phenomena. For instance, Brysbaert, Mandera, McCormick, and Keuleers (2019) introduced the word prevalence variable (the proportion of people who know a particular word), first in Dutch (Brysbaert, Stevens, Mandera, and Keuleers, 2016; Keuleers, Stevens, Mandera, & Brysbaert, 2015), and then in English (Brysbaert et al., 2019). This variable was shown to explain an additional 6–10% of the variance in response latencies in a lexical decision task.

In addition, we consider that most norming studies have focused mainly on concrete concepts, although, as shown by Recchia and Jones (2012), abstract concepts have a richness of their own which warrants further study. For instance, Chedid, Brambati, Bedetti, Rey, Wilson, and Vallet (2019) recently introduced a perceptual strength variable for Canadian French, which aims to identify auditory and visual involvement in conceptual knowledge. In addition, the sensory experience ratings variable (SER, Juhasz, & Yap, 2013; Bonin et al., 2015, 2018) was introduced as a measure of the extent to which a word can elicit sensory and perceptual experiences. The correlation analyses between our concreteness variable and the SER variable based on the 257 items in common is 0.33. This rather low correlation goes to show that the SER variable cannot capture the same psycholinguistic phenomena as the concreteness variable, thus ensuring the relevance of the latter. We also computed the correlation between our concreteness variable and the perceptual strength variable (Chedid, Brambati, Bedetti, Rey, Wilson, & Vallet, 2019) and found that r = 0.80 based on 507 items in common. Although this correlation value appears rather high, it is consistent with the findings of Chedid and colleagues who reported a correlation value of r = 0.76 between perceptual strength and Bonin and colleagues’ concreteness variables. According to Chedid et al. (2019), however, this new variable cannot be regarded as another form of concreteness since it made an independent contribution to the prediction of word latencies in word processing.

Until recently, grounding has mainly been studied in concrete concepts, owing to a previous consensus that abstract concepts are not grounded. However, several studies have shown that abstract concepts can be grounded in perceptual situations and events. In addition, Connell, Lynott, and Banks (2018) consider interoception a forgotten modality for abstract concepts and report a facilitation effect of interoceptive strength. Future work will therefore focus on developing norms that capture these modalities for abstract concepts to further our knowledge about their representation.

Conclusion

The present study aimed to provide French semantic similarity norms for 630 word pairs with varying levels of similarity and associated concreteness. The database is organised in such a way that semantic similarity and concreteness may be used as either continuous or categorical variables. The continuous variables correspond to the ratings we collected, whereas the categorical variables correspond to the cluster variable we computed for concreteness and the median for semantic similarity. The database also provides frequency and lexical variables for matching pairs in stimuli set design. We anticipate that it will be very useful for researchers working on memory and language, especially given the growing interest for studying abstract concept representation.